Posts Tagged ‘DoneIf’

How to convert any internet Webpage to PDF from command line on GNU/Linux

Friday, September 30th, 2011

Linux webpage html to pdf command line convertor wkhtmltopdf

If you're looking for a command line utility to generate PDF file out of any webpage located online you are looking for Wkhtmltopdf
The conversion of webpages to PDF by the tool is done using Apple's Webkit open source render.
wkhtmltopdf is something very useful for web developers, as some webpages has a requirement to produce dynamically pdfs from a remote website locations.
wkhtmltopdf is shipped with Debian Squeeze 6 and latest Ubuntu Linux versions and still not entered in Fedora and CentOS repositories.

To use wkhtmltopdf on Debian / Ubuntu distros install it via apt;

linux:~# apt-get install wkhtmltodpf
...

Next to convert a webpage of choice use cmd:

linux:~$ wkhtmltopdf www.pc-freak.net pc-freak.net_website.pdf
Loading page (1/2)
Printing pages (2/2)
Done

If the web page to be snapshotted in long few pages a few pages PDF will be generated by wkhtmltopdf
wkhtmltopdf also supports to create the website snapshot with a specified orientation Landscape / Portrait

-O Portrait options to it, like so:

linux:~$ wkhtmltopdf -O Portrait www.pc-freak.net pc-freak.net_website.pdf

wkhtmltopdf has many useful options, here are some of them:
 

  • Javascript disabling – Disable support for javascript for a website
  • Grayscale pdf generation – Generates PDf in Grayscale
  • Low quality pdf generation – Useful to shrink the output size of generated pdf size
  • Set PDF page size – (A4, Letter etc.)
  • Add zoom to the generated pdf content
  • Support for password HTTP authentication
  • Support to use the tool over a proxy
  • Generation of Table of Content based on titles (only in static version)
  • Adding of Header and Footers (only in static version)

To generate an A4 page with wkhtmltopdf:

wkhtmltopdf -s A4 www.pc-freak.net/blog/ pc-freak.net_blog.pdf

wkhtmltopdf looks promising but seems a bit buggy still, here is what happened when I tried to create a pdf without setting an A4 page formatting:

linux:$ wkhtmltopdf www.pc-freak.net/blog/ pc-freak.net_blog.pdf
Loading page (1/2)
OpenOffice path before fixup is '/usr/lib/openoffice' ] 71%
OpenOffice path is '/usr/lib/openoffice'
OpenOffice path before fixup is '/usr/lib/openoffice'
OpenOffice path is '/usr/lib/openoffice'
** (:12057): DEBUG: NP_Initialize
** (:12057): DEBUG: NP_Initialize succeeded
** (:12057): DEBUG: NP_Initialize
** (:12057): DEBUG: NP_Initialize succeeded
** (:12057): DEBUG: NP_Initialize
** (:12057): DEBUG: NP_Initialize succeeded
** (:12057): DEBUG: NP_Initialize
** (:12057): DEBUG: NP_Initialize succeeded
Printing pages (2/2)
Done
Printing pages (2/2)
Segmentation fault

Debian and Ubuntu version of wkhtmltopdf does not support TOC generation and Adding headers and footers, to support it one has to download and install the static version of wkhtmltopdf
Using the static version of the tool is also the only option for anyone on Fedora or any other RPM based Linux distro.