KohaCon26

Name: KohaCon26
Start: 2026-10-05T08:00:00+02:00
End: 2026-10-09T19:00:00+02:00
Location: Karlsruhe Institute of Technology (KIT)

Oct 5 – 9, 2026

Karlsruhe Institute of Technology (KIT)

Europe/Berlin timezone

AI and OCR Powered Koha Cataloguing

Oct 6, 2026, 2:35 PM

15m

Audimax (Karlsruhe Institute of Technology (KIT))

Audimax

Karlsruhe Institute of Technology (KIT)

Str. am Forum 1, 76131 Karlsruhe

Presentation

Mr Muhittin Enes Kale (Directorate General for Information Technologies / Ministry of Culture and Tourism) Erdem Acır (Ministry of Culture and Tourism, Republic of Türkiye)

Manual cataloging within the Koha Integrated Library System remains a labor-intensive task, often slowed by repetitive data entry and the persistent risk of human error. This paper introduces a native Koha tool designed to automate the extraction of MARC21 metadata directly from images of book title pages. By allowing librarians to simply upload a photo, the system identifies and populates essential fields such as Author (100), Title (245), Publication Information (260/264), and ISBN (020).
To ensure reliable data extraction from real-world photos, the system employs a sophisticated engineering pipeline. It utilizes ImageMagick for preprocessing—addressing common issues like improper orientation, shadows, and uneven lighting through local adaptive thresholding—before passing the image to Tesseract OCR.
A key innovation of this research is the transition from traditional regex-based parsing and external API dependencies toward the use of locally deployed Large Language Models (LLMs), such as Qwen. By processing raw OCR text through a local LLM, the system can "read between the lines" to reconstruct fragmented titles and organize bibliographic data into structured, MARC-compatible JSON. This context-aware approach significantly improves accuracy when dealing with noisy data while maintaining data privacy by keeping the entire workflow on local hardware.
Ultimately, this tool transforms cataloging from a manual "type-everything" chore into a streamlined "photo-to-verification" model. The result is a faster, more efficient workflow that paves the way for a truly AI-augmented library environment.

Duration of your presentation (in minutes)	15

Mr Muhittin Enes Kale (Directorate General for Information Technologies / Ministry of Culture and Tourism)

Erdem Acır (Ministry of Culture and Tourism, Republic of Türkiye)

There are no materials yet.

KohaCon26

AI and OCR Powered Koha Cataloguing

Audimax

Karlsruhe Institute of Technology (KIT)

Speakers

Description

Author

Co-author

Presentation materials

Choose timezone

KohaCon26

Speakers

Description

Author

Co-author

Presentation materials