Tesseract Open Source OCR Engine

Edit Package tesseract-ocr

Tesseract is a free optical character recognition engine originally developed at Hewlett-Packard and currently developed by Google. It is a raw OCR engine - it has no document layout analysis, no output formatting, and no graphical user interface. It only processes a TIFF or BMP image of a single column and creates text from it. It can detect fixed pitch vs proportional text. The engine was in the top 3 in terms of character accuracy in 1995. The source code will read a binary, grey or color image and output text.

Tesseract can process English, French, Italian, German, Spanish, Brazilian, Portuguese and Dutch and can be trained to work in other languages as well.

Refresh
Refresh
Source Files
Filename Size Changed
tesseract-ocr-4.1.0.tar.gz 0001965053 1.87 MB
tesseract-ocr.changes 0000008360 8.16 KB
tesseract-ocr.spec 0000004071 3.98 KB
Revision 7 (latest revision is 16)
Dominique Leuenberger's avatar Dominique Leuenberger (dimstar_suse) accepted request 756765 from Martin Pluskal's avatar Martin Pluskal (pluskalm) (revision 7)
- Packaging Cleanups
- Update dependencies and enable openCL

- Update to 4.1.0
  * Added a new output option formatted in the ALTO standard
  * SIMD optimization
  * Bugfixes
- Update to 4.0.0
  * New OCR engine based on LSTMs
  * Removed Cube OCR engine
  * Updated build system
  * Cleanups and fixes
Comments 0
openSUSE Build Service is sponsored by