Oct 5 – 9, 2026
Karlsruhe Institute of Technology (KIT)
Europe/Berlin timezone

AI and OCR Powered Koha Cataloguing

Oct 6, 2026, 2:35 PM
15m
Audimax (Karlsruhe Institute of Technology (KIT))

Audimax

Karlsruhe Institute of Technology (KIT)

Str. am Forum 1, 76131 Karlsruhe
Presentation

Speakers

Mr Muhittin Enes Kale (Directorate General for Information Technologies / Ministry of Culture and Tourism) Erdem Acır (Ministry of Culture and Tourism, Republic of Türkiye)

Description

Manual cataloging within the Koha Integrated Library System remains a labor-intensive task, often slowed by repetitive data entry and the persistent risk of human error. This paper introduces a native Koha tool designed to automate the extraction of MARC21 metadata directly from images of book title pages. By allowing librarians to simply upload a photo, the system identifies and populates essential fields such as Author (100), Title (245), Publication Information (260/264), and ISBN (020).
To ensure reliable data extraction from real-world photos, the system employs a sophisticated engineering pipeline. It utilizes ImageMagick for preprocessing—addressing common issues like improper orientation, shadows, and uneven lighting through local adaptive thresholding—before passing the image to Tesseract OCR.
A key innovation of this research is the transition from traditional regex-based parsing and external API dependencies toward the use of locally deployed Large Language Models (LLMs), such as Qwen. By processing raw OCR text through a local LLM, the system can "read between the lines" to reconstruct fragmented titles and organize bibliographic data into structured, MARC-compatible JSON. This context-aware approach significantly improves accuracy when dealing with noisy data while maintaining data privacy by keeping the entire workflow on local hardware.
Ultimately, this tool transforms cataloging from a manual "type-everything" chore into a streamlined "photo-to-verification" model. The result is a faster, more efficient workflow that paves the way for a truly AI-augmented library environment.

Duration of your presentation (in minutes) 15

Author

Mr Muhittin Enes Kale (Directorate General for Information Technologies / Ministry of Culture and Tourism)

Co-author

Erdem Acır (Ministry of Culture and Tourism, Republic of Türkiye)

Presentation materials

There are no materials yet.