Enable festival Linux (text-to-speech-system) to
read/speak PDF and DOC files (Speech PDF and DOC in Festival
Script)
Today I wondered if festival supports reading of PDF files. The
answer due to my research was NO!.
Well not such a big deal since it's not so hard to convert PDF
files into plain text files in Linux with pdftotext command.
pdftotext is part of the poppler-utils which is a nice package
which also contains
pdfimages - enabling you to extract images from pdfs, pdftohtml -
pdf to html converter and pdffonts - pdf font analyzier.
The normal way to read PDF files via festival is:
First use pdftotext to convert your PDF to text file
$ pdftotext filename.pdf outputfile.txt
and then:
$ cat outfile.txt | festival --tts
For convenience I've created a small shell script I called
festival-read-pdf.sh which does this directly.
Please download the
festival-read-pdf.sh
shell script here
Furthermore I wondered how to make the Microsoft Office .doc files
to be played through festival.
On that account It was required something to convert again the .doc
file extension to plain text.
I came across antiword which I've blogged about in my previous
post.
Thus to carry it via festival you need to:
antiword filename.doc | festival --tts
I've fastly scripted it for some convenience.
Download the
festival-doc-read.sh script here
I've also created a third bash script which enables you to select
either to play DOC or PDF file in Festival. Here is a link to the
festival's
festival-read-doc-en-pdf.sh PDF, DOC speaker script
. Talking about festival it might be interesting to mention
fala - A simple
text reader. If you're a Debian user you'll be glad to know
there is already a package containg fala.
Well I hope you'll find the PDF, DOC festival speech scripts
useful. Enjoy