BitmapOCR applies a mimetype- and filesize-based post filtering on the crawlquery result and copies matching documents, together with their metadata, to the interface folder for ZySCAN. These files are picked up from that folder by ZySCAN that tries to OCR them and initializes the metadata of the OCR version of the document with the metadata of the original bitmap file.
nly recognized files with a mimetype that codes bitmap content are applied in the mimetype-based filter.
Configurable items in the Analytics Bundle BitmapOCR configuration:
- The maximum amount of threads the task under execution consumes.
- The ZySCAN import directory Foldername
(The folder where a ZySCAN configuration expects its image files).
Minimal filesize: Scalar 20 kb
The minimal filesize a image file must have to be transferred to ZySCAN.
Some images could be small and can be considered not to contain any textual content when processed using OCR (like email banners etc.).
Article is closed for comments.