Convert .doc to .pdf on Linux and BSD using console / Convertion of PDF to DOC inside scripts

how to Convert .doc to .PDF using console / terminal on Linux and FreeBSD

On Linux, there are plenty of ways nowadays to convert Microsoft Word or OpenOffice .DOC documents to Adobe's PDF (Postscript). However most of the ways require a graphical environment. As I'm interested in how convertion is done mainly from console to suit shell scripts and php which has to routinely convert a bunch of .DOC files to .PDF. I've checked today how PDF to DOC is possible on Debian, Ubuntu, Arch Linux  and FreeBSD..

There are few tools one can use from console, that doesn't requiere you to have running Xorg on the convertion host. The quality of the produced converted document, may vary and with some Microsoft Office doc files, there might be some garbage. But generally for simplistic and well written "macros" free documents the quality of PDF is satisfactory with few of the tools.

Here I will list the few tools, one can use for convertion:

  • abiword – you probably know abiword GUI program which is a good substitute for people who doesn't want the huge openoffice on the host. interestingly abiword supports converts with no need for GUI
     
  • wvPDF (you have to have install wv package and usually this converter works well only with very old .DOC (MS Office 97) – I was not impressed with those convert results
     
  • oowriter / swriter (whether LibreOffice installed) or writer (on LibreOffice), on some Ubuntus and derivatives the equivalent cmd is lowriter
     
  • unoconv – this tool produces really good DOC to  PDF converts, it is a python script using openoffice / libreoffice as backend convertion engine so produced PDFs will be identical like the ones produced with oowriter, the pros of the tool is its syntax is very user friendly and along with PDF to DOC it supports easy syntax converting to  bunch of other file formats. Actually unoconv supports same convertions which supported by OpenOffice.org, the advantage is however you can use it within console and even schedule convertion to be processed by a remote host.

 

1. Convertion of DOC to PDF with abiword

abiword --to=pdf doc_file_to_convert.doc

2. Convert DOC to PDF with wvPDF

apt-get install --yes wv texlive-base texlive-latex-base ghostscript

wvPDF doc-file-to-convert-to-pdf.doc converted-to-pdf.pdf

wvPDF doc-file-to-convert-to-pdf.doc convert-to-pdf.pdf

Current directory: /home/hipo/Desktop
"doc-file-to-convert-to-pdf.eps" exists - skipping...
Some problem running latex.
Check for Errors in steinway.log
Continuing... 

 

The produced .pdf was not useful most of the text inside was completely missing as well as some weird probably PostScript convertion characters were in the .PDF. Seeing its output I would as of time of writing wvPDF Debian's verion 1.2.4 is crap.


3. Convert DOC to PDF with oowriter / swriter / lowrite

a) convert with oowriter and swriter

I saw posts online claiming DOC to PDF convertion is possible directly with oowriter or swriters with commands:

oowriter -convert-to pdf:writer_pdf_Export input-doc-file-to-convert.doc

or

swriter -convert-to pdf:writer_pdf_Export steinway.doc - as named on some Linux-es
 

As long as I tested it on my Debian Squeeze, neither of the two works
.I saw some suggestions that PDF can be generated by installing and using cups-pdf debian package:

apt-get install cups-pdf oowriter -pt pdf your_word_file.doc b) convert DOC to PDF with lowriter I've seen in Ubuntu documentation and in Ubuntu forums, users saying they had some good results using lowriter, which is a sort of front-end program to ImageMagick's convert. I never tested that but I doubt of any satisfactory results, as I tried converting to PDF earlier using convert and often converts failed. Anyways you try it with:

lowriter --convert-to pdf *.doc

 

4. Converting PDF to DOC  with unoconv

As of time of writing it seems unoconv is best Linux console tool for converting .doc to .pdf

It produces good readable text, as well as pictures and elements looks exactly as in OpenOffice.

To install it I run:

# apt-get install --yes unoconv
....

To use it:

$ unoconv -fpdf any-file-to-convert.doc

If you don't get errors or it doesn't crash a .doc file with same name any-file-to-convert.doc is created.

What unoconv, does is precisely the same as if using OpenOffice GUI's  to convert to PDF:

 

  • Open -> Open Office (3.2 in my case)
  • Open Document to export
  • File->Export as PDF
  • Click: Export
  • Choose file namefor output PDF


An interesting feature of unoconv is its possibility to run and convert as a port listening server. I never used this but noticed it mentioned in manual EXAMPLE section:

 

EXAMPLES
       You can use unoconv in standalone mode, this means that in absence of an OpenOffice listener, it will starts its own:

       unoconv -f pdf some-document.odt
       One can use unoconv as a listener (by default localhost:2002) to let other unoconv instances connect to it:

       unoconv --listener &
       unoconv -f pdf some-document.odt
       unoconv -f doc other-document.odt
       unoconv -f jpg some-image.png
       unoconv -f xsl some-spreadsheet.csv
       kill -15 %-
       This also works on a remote host:

       unoconv --listener --server 1.2.3.4 --port 4567
       and then connect another system to convert documents:

       unoconv --server 1.2.3.4 --port 4567

unoconv does not recognize wildcards like ' * ' , so in order to convert multiple DOC to PDF files one has to use the usual shell loop:

for i in *.doc; do unoconv -fpdf $i; done

From all my tests, I think unoconv is preferred tool for Linux and BSD users (good time to mention unoconv is available on FreeBSD too. BSD users can install it via port  /usr/ports/textproc/unoconv)

Share this on

One Response to “Convert .doc to .pdf on Linux and BSD using console / Convertion of PDF to DOC inside scripts”

  1. [...] too slow and takes too much of useless disk space), check  unoconv Besides from supporting convert from .DOC to .PDF and a bunch of other formats convertion, I've just learned it supports also convert .ODT to [...]

Leave a Reply