Sun Sep 20 17:15:37 EEST 2009

Convert doc files to plain text (txt) in terminal / console (tty) on GNU / Linux

I was looking for a way to convert Microsoft .doc files to plain text (txt) in Linux directly through terminal.
After some lookup in Google Groups I found ANTIWORD! .
Luckily Debian comes even with a package containing the nice nifty program.
Here is the description of antiword - Converts MS Word files to text, PS and PDF
Fun, desciprtion Eh? :) Ain't it?
There are some other ways to Convert doc files to plain text, for instance you could use the command catdoc , for example to convert simple .doc to .txt file use catdoc -a whitepaper.doc.
Another way to convert .doc files to .txt mostly used by developers is via the wvware (nothing to do with vmware!:)) utility.
wvware could directly convert it to html. For example:
wvWare file.doc >file.htmlor
wvText file.doc file.txt
. A lot of things I'll skip here are well explained in the article Viewing Word files at the command line .