Sun Sep 20 18:42:54 EEST 2009

Enable festival Linux (text-to-speech-system) to read/speak PDF and DOC files (Speech PDF and DOC in Festival Script)

Today I wondered if festival supports reading of PDF files. The answer due to my research was NO!.
Well not such a big deal since it's not so hard to convert PDF files into plain text files in Linux with pdftotext command.
pdftotext is part of the poppler-utils which is a nice package which also contains
pdfimages - enabling you to extract images from pdfs, pdftohtml - pdf to html converter and pdffonts - pdf font analyzier.
The normal way to read PDF files via festival is:
First use pdftotext to convert your PDF to text file
$ pdftotext filename.pdf outputfile.txt
and then:

$ cat outfile.txt | festival --tts

For convenience I've created a small shell script I called festival-read-pdf.sh which does this directly.
Please download the festival-read-pdf.sh shell script here

Furthermore I wondered how to make the Microsoft Office .doc files to be played through festival.
On that account It was required something to convert again the .doc file extension to plain text.
I came across antiword which I've blogged about in my previous post.
Thus to carry it via festival you need to:
antiword filename.doc | festival --tts

I've fastly scripted it for some convenience.
Download the festival-doc-read.sh script here

I've also created a third bash script which enables you to select either to play DOC or PDF file in Festival. Here is a link to the festival's festival-read-doc-en-pdf.sh PDF, DOC speaker script
. Talking about festival it might be interesting to mention fala - A simple text reader. If you're a Debian user you'll be glad to know there is already a package containg fala.
Well I hope you'll find the PDF, DOC festival speech scripts useful. Enjoy