SYMPTOMS
When using one of ZySCAN PDF import filters, one or more PDF files does not get imported. In the import directory the file extention of the PDF file is renamed to .P#F.
CAUSE
This can be caused by non-supported PDF types.
RESOLUTION
This is an indication that the PDF is not supported by the PDF import filter.
Possible work-arounds:
1. Use or convert to a different format if possible. TIFF files are best option if possible. Also saving as a different kind of PDF may work in some cases.
2. If the PDFs are "searchable" PDFs then they can be indexed directly by ZyINDEX. You can copy these files directly to the "electronic" folder of your index data folder.
Supported PDF features in PDF image import
This applies to the PDF Images import filter, to the Graphics Import filter when it is importing a PDF file, and to the ZyLAB Data XML and PDF Xml import filters when they are importing a PDF file as an image.
Supported color spaces
Color space |
PDF name |
supported |
grayscale, including black-and-white |
/DeviceGray |
yes |
RGB |
/DeviceRGB |
yes |
CMYK |
/DeviceCMYK |
no |
device-specific |
/DeviceN |
no |
palette |
/Indexed |
yes |
ICC profile |
/ICCBased |
partially (not CMYK); treated as its alternate colour space (RGB or grayscale); the profile is ignored |
calibrated grayscale |
/CalGray |
treated as uncalibrated grayscale |
calibrated RGB |
/CalRGB |
treated as uncalibrated RGB |
L*a*b* |
/Lab |
no |
separations |
/Separation |
no |
pattern |
/Pattern |
no |
image mask (1) |
/ImageMask |
Partially. Black-and-white image masks can appear inverted, image masks which are part of color images can appear in the wrong color or disappear if that color is white. |
- Not really a color space, but an alternative way in PDF to process two-color images.
Supported compression methods and filters
"Mostly" means that the import filter lacks support for some not-often-used options, but it works with all PDFs that we have encountered so far. "Sometimes" means there are many common cases in which it does not work.
Compression method |
PDF name |
supported |
uncompressed |
(none) |
yes, written as CCITT Group 4 or LZW to save space |
CCITT Group 3, 4 and "Modified Huffman RLE" |
/CCITTFaxDecode |
partially (1) |
LZW |
/LZWDecode |
mostly (2) |
deflate (ZIP) |
/FlateDecode |
mostly (2), written as LZW because many TIFF readers don't support deflate |
JPEG |
/DCTDecode |
yes (SP5c, update 2) |
run-length |
/RunLengthDecode |
yes |
JBIG2 |
/JBIG2Decode |
yes (SP5c, update 2) |
JPEG2000 |
/JPXDecode |
no |
base-85 filter |
/ASCII85Decode |
yes |
hexadecimal filter |
/ASCIIHexDecode |
no |
encryption filter |
/Crypt |
no |
- Not supported are the PDF options /EndOfLine, /EndOfBlock and /DamagedRowsbeforeError, and uncompressed data inside Group 4 streams. Some Group 3 compressed images are known to cause problems.
- Predictors are not supported.
Composite versus non-composite images
The PDF Image import filter operates internally in two modes. This is decided automatically, based on the way the images are laid out in the PDF being processed. It can even differ between pages of the same PDF (that is, per generated TIFF file). Normally, you will not notice the difference between these two modes, except maybe a difference in speed. However, in a few cases, they exhibit different problems and knowing about them may help you find a way to solve these.
For a PDF page to be recognized as the more efficient non-composite type, the images on the page must be organized as horizontal strips. Each strip should have the same width as the page itself, and all strips together should have the same height as the page. Also, all strips except the bottommost one should have identical heights, and a number of other properties of the strips such as compression method and color space must be the same. A page may consist of one strip covering it entirely. Such non-composite images are processed efficiently by copying the image data directly to the output TIFF file. No image quality is lost.
For more complicated PDF pages with composite images, the import filter decompresses the image data, composes a full-page image from it, then re-compresses the result. This is somewhat less efficient. Because JPEG is a lossy compression method, this will inevitably introduce some quality degradation ("generation loss") for JPEG-compressed images, although this will usually not be noticeable by the human eye.
When the Graphics Import filter reads a PDF, it is always processed as a composite image, because that import filter always internally builds an image. The PDF images, PDF XML and ZyLAB Data XML filters can use both modes.
There may be subtle differences between the two modes, although none are known at the moment. However, when importing an image results in an error, there is a chance that the problem may sometimes be resolved by using the Graphics import filter instead of the PDF images import filter, or vice versa.
Additional notes:
- Rotation is only supported by 90, 180 and 270 degrees. Other amounts of rotation will result in a distorted image.
- When a page contains different types of images, some images are converted to higher color spaces if needed. If all images on the page use JPEG compression, the resulting page is written as JPEG, which saves space, but causes some quality degradation. Otherwise, it is written using either LZW compression or CCITT Group 4.
- Encrypted PDFs are not supported, not even when they have an empty password and Acrobat Reader opens them without asking for one.
APPLIES TO
6.2 Service Pack 2
Comments
0 comments
Article is closed for comments.