Introduction
HOCR is a Hebrew optical character recognition library.
About:LibHocr is a GNU Hebrew optical character recognition library. It scans document images, improve the image, analyses the page layout, recognises the characters and outputs the text. The output texts are now editable text, ready for your blog, word processor or any other use.
Image processing module:LibHocr can use all image file formats (tiff, jpeg, pnm, bmp ...), using it's GTK image module. LibHocr can automaticaly convert yellow stained colour images to one-bit black and white images used for character recognising. LibHocr can automatically scale and rotate images as needed by the page layout and recognition modules.
Page layout analyses module:LibHocr can automatically recognise columns and paragraphs in the page, distinguishes between text and graphics, and ignores noise areas.
Character recognition module:LibHocr can recognise Hebrew characters with nikud. It is designed to scan old poetry and bible texts rich with nikud and teamim. LibHocr can recognises all nikud singes including shin/sin dots and dagesh.
Scripting:LibHocr has a Python language bindings to ease it's use in scripts. Complex python scripts can be made using HOCR's image processing, layout analyses and character recognition.
Demo applications:The libHocr package include two demo application.
- hocr - command line Hebrew OCR tool.
- hocr-gtk - GTK based graphical interface.
LIbHocr is free software: you can redistribute it and/or modify it under the terms of the GNU General Public License as published by the Free Software Foundation, either version 3 of the License, or (at your option) any later version. See: http://www.gnu.org/licenses
Thanks
- Ido Kanner: report a buffer overflow in the command line example.
- Diego Iastrubni: Help with QT4 example.
- Yuval Tanny: Help in code cleaning and GUI design.
- Dan Kenigsberg: Fix memory leak when openinig new picture.
- Tal: command line example enhancements and memory leaks fix.
- Mikael Ylikoski: Fix pbm loading bug.
- Oron Peled: Fix install automation.
- Daniel Nylander: Swedish translation.
- Shlomi Israel: Artwork.
- Dovix and Shlomi Fish: Mandriva rpm spec files and packages.
- Oron Peled: Fedora rpm spec files.
- Debain-hebrew team: Debian and Ubuntu deb packages.
Bindings:
- Yaacov Zamir: Python.
Translations:
- Daniel Nylander - Swedish.
<po_AT_danielnylander.se> - Yaacov Zamir - Hebrew.
Author:
-
Yaacov Zamir
<kzamir_AT_walla.co.il> - Other projects by Yaacov Zamir:
