eZpedia : The Free eZ Publish CMS Documentation Encyclopedia

Solution: indexing/searching pdf files with accented chars

eZpublish needs external tools for pdf file indexing. Path to that tool is set in binaryfile.ini

Originally i was using the pstotext too, but it does not work well and often gives garbage as output. Better tool is pdftotext from xpdf project, as suggested in this article.

In my case the above was not enough, as pdf files apparently are in iso-8859-1 but eZpublish expects UTF-8 as input.

So, the contents of ezpdftotext should be:

#!/bin/sh
pdftotext $1 -|iconv -f ISO-8859-1 -t UTF-8

Article provided by eZpedia

All text is available under the terms of the GNU Free Documentation License

Powered by eZ Publish Community Project 4.2011

Hosted by Swiss eZ Publish partner YMC