Posts Tagged ‘interesting stuff’

Merge (convert) multiple PDF files into one single PDF – Generate one pdf from many on Linux / Windows and Mac

Wednesday, August 6th, 2014

I was looking for English Orthodox Bible translation of the Old Testament (Septuagint Version) and found such divided in many pdf files. I wanted to create a common (single) PDF from all the separate Old Testamental Book files in order to put it online as it might be convenient for English native speakers to download and later read offline on their computers the Old Testament Orthodox version Holy Bible.

Before I explain how I did it I will make a short turn to explain few things about Septuagint, as this is probably interesting stuff, you might not know.

Septuagint (also referred as LXX or the Alexandrian Canon) – Is Translation of the Hebrew Bible and some related text in Koine Greek) by legendary 70 Jewish scholars as early as the 2nd century BC. Just for those interested in Christianity it is curious fact that the number of Old Testament books are different among Protestant, Roman Catholic and Orthodox Christians, whether the number of New Testament books are the same in Catholics, Protestant and Orthodox.

So How Many books are in Roman Catholic, Protestant and Orthodox Old Testament Holy Bible?

The Old Testament in Orthodox Holy Bible version has 50 (where Slavonic versions of the bible include also +2 More which are the  Edras books), whether protestant Holy Bible includes only 39 books in old testament and Roman Catholics has 46 old testamental books in there bibles. The reason why Protestants choose to have less books (only 39) is some of the books in the Roman Catholic and Orthodox Church are Apocryphal are referred to as the Apocryphal, or Deuterocanonical books this doesn't mean that the extra 8 Books in Orthodox Bibles are not God Inspired, this means, they don't have the historic authenticity as the early Church accepted canonicals.

The Orthodox Church accepted the Septuagint LXX as divinely inspired to be used in Church.

Now back to how I managed to merge (convert) multiple PDF files into single PDF on my Debian Linux home router.

My first attempt was with ImageMagick's convert (in the same manner as I used to generate PDF files from pictures earlier), e.g.:

convert intro.pdf genesis.pdf exodus.pdf leviticus.pdf numbers.pdf deuteronomy.pdf … SINGLE-FILE.PDF

I waited for convertion to complete quite long but it seemed looping so finally after 7 minutes I stopped it and decided to try with something else and, after quick search I found pdftk.

pdftk has plenty of functions and is great for anyone who needs to do Merge / Split Update / Encrypt / Repair corrupted PDFs on Linux:

 apt-cache show pdftk |grep -i desc -A 17
Description: tool for manipulating PDF documents
 If PDF is electronic paper, then pdftk is an electronic stapler-remover,
 hole-punch, binder, secret-decoder-ring, and X-Ray-glasses. Pdftk is a
 simple tool for doing everyday things with PDF documents. Keep one in the
 top drawer of your desktop and use it to:
  – Merge PDF documents
  – Split PDF pages into a new document
  – Decrypt input as necessary (password required)
  – Encrypt output as desired
  – Fill PDF Forms with FDF Data and/or Flatten Forms
  – Apply a Background Watermark
  – Report PDF on metrics, including metadata and bookmarks
  – Update PDF Metadata
  – Attach Files to PDF Pages or the PDF Document
  – Unpack PDF Attachments
  – Burst a PDF document into single pages
  – Uncompress and re-compress page streams
  – Repair corrupted PDF (where possible)

To install pdftk on Debian Linux Lenny / Wheezy:

apt-get install –yes pdftk

After installed to convert a number of separate PDF files into single (merged) PDF file:

pdftk file1.pdf file2.pdf file3.pdf cat output single-merged-pdf-file.pdf



pdftk intro.pdf genesis.pdf exodus.pdf leviticus.pdf numbers.pdf deuteronomy.pdf joshua.pdf judges.pdf ruth.pdf kingdoms_1.pdf kingdoms_2.pdf kingdoms_3.pdf kingdoms_4.pdf paraleipomenon_1.pdf paraleipomenon_2.pdf esdras_1.pdf esdras_2.pdf nehemiah.pdf tobit.pdf judith.pdf esther.pdf maccabees_1.pdf maccabees_2.pdf maccabees_3.pdf psalms.pdf job.pdf proverbs_of_solomon.pdf ecclesiastes.pdf song_of_songs.pdf wisdom_of_solomon.pdf wisdom_of_sirach.pdf hosea.pdf amos.pdf micah.pdf joel.pdf obadiah.pdf jonah.pdf nahum.pdf habbakuk.pdf zephaniah.pdf malachi.pdf isaiah.pdf jeremiah.pdf baruch.pdf lamentations_of_jeremiah.pdf an_epistle_of_jeremiah.pdf ezekiel.pdf daniel.pdf maccabees_4.pdf slavonic_appendix.pdf cat output Orthodox-English-translation-of-Old-Testament-Septuagint.pdf

And Hooray! It worked The resulting share Old Testament (Orthodox) English translation from Septuagint PDF is here

pdftk is also ported for Fedora / CentOS / RHEL etc. (RPM distros), so you to install it there:

yum -y install pdftk

Or if missing in repositories grab the respective pdf and

rpm -ivh pdftk-*yourarch.pdf

PDFtk has also Windows and Mac OS version just in case if you need to script Merging of multiple PDFs to single ones for more check out PDftk Server page homepage here

12 Lessons Steve Jobs Taught Guy Kawasaki – SEO Summit Guy Kawasaki speech

Monday, July 30th, 2012

I'm not a big fan of Steve Jobs, neither I like the cult that is nowdays to his personality. After his recent death the cult to Jobs and his works has bloomed once again. From philosophical point of view I don't like Jobs ideas that there is no good and bad but only success matters. However I should admit as a SEO and as a business enterpreneur his achievements are significant. Hence I decided to share with you a video of SEO Summit ex-employee Guy Kawasaki, who "worked for jobs twice and survived". Jobs is famous for not being loved too much by his employees. Also it is not a secret historically he screw up, Steve Wozniak and a number of other people who were either employeed or worked in any way with him.

The points his ex-employee KAWASAKI shares on this SEO Summit presentation are quite interesting and are things, beginning business enterpreneurs like me could learn tremendously from. One key point that is hardly underlined in the presentation is the importance of simplicity.

* Simplicity in everything is essential for the success. It was quite curious to me Steve Jobs presentations were consisting often with a slides of just one word. Obviously this means Jobs was a simplicity freak.

Some of other concepts of Jobs was:

Either it works or it doesn't work.

You see again his trend to be simplifying things. In business we all know the SMART (Keep it Simple And Stupid). It seems Jobs SMART was only SAS (Simple and Stupid) 🙂 ….

Some other things Kawasaki learned from Jobs was:

Never believe Experts

– If someone tells you he is an expert in something he is definitely not ….

Another belief (understanding) of Jobs and probably many other succesful enterpreneurs is that DESIGN COUNTS. Design is one of the most crucial points in any product, so one has to be extra-careful here. A fail in design is a fail in product line ….

There is plenty of other interesting stuff in the video but the key point is SIMPLICITY. Enjoy Kawasaki Speech …

12 Lessons Steve Jobs Taught Guy Kawasaki


How to convert html pages to text in console / terminal on GNU / Linux and FreeBSD

Thursday, December 8th, 2011

HTML to Plain Text Convertion on GNU / Linux and FreeBSD

I’m realizing the more I’m converting to a fully functional GUI user, the less I’m doing coding or any interesting stuff…
I remembered of the old glorious times, when I was full time console user and got a memory on a nifty trick I was so used to back in the day.
Back then I was quite often writing shell scripts which were fetching (html) webpages and converting the html content into a plain TEXT (TXT) files

In order to fetch a page back in the days I used lynx(a very simple UNIX text browser, which by the way lacks support for any CSS or Javascipt) in combination with html2text – (an advanced HTML-to-text converter).

Let’s say I wanted to fetch a my personal home page, I did that via the command:

$ lynx -source | html2text > pcfreak_page.txt

The content from got spit by lynx as an html source and passed html2pdf wchich saves it in plain text file pcfreak_page.txt
The bit more advanced elinks – (lynx-like alternative character mode WWW browser) provides better support for HTML and even some CSS and Javascript so to properly save the content of many pages in plain html file its better to use it instead of lynx, the way to produce .txt using elinks files is identical, e.g.:

$ elinks -source | html2text > pcfreak_blog_page.txt

By the way back in the days I was used more to links , than the superior elinks , nowdays I have both of the text browsers installed and testing to fetch an html like in the upper example and pipe to html2text produced garbaged output.

Here is the time to tell its not even necessery to have a text browser installed in order to fetch a webpage and convert it to a plain text TXT!. wget file downloading tools supports source dump as well, for all those who did not (yet) tried it and want to test it:

$ wget -qO- | html2text Anyways of course, some pages convertion of text inside HTML tags would not properly get saved with neither lynx or elinks cause some texts might be embedded in some elinks or lynx unsupported CSS or JavaScript. In those cases the GUI browser is useful. You can use any browser like Firefox, Epiphany or Opera ‘s File -> Save As (Text Files) embedded functionality, below is a screenshot showing an html page which I’m about to save as a plain Text File in Mozilla Firefox:

Firefox iceWeasel Opera etc. save html webpage as plain text on GNU / Linux, FreeBSD

Besides being handy in conjunction with text browsers, html2text is also handy for converting .html pages already existing on the computer’s hard drive to a plain (.TXT) text format.
One might wonder, why would ever one would like to do that?? Well I personally prefer reading plain text documents instead of htmls 😉
Converting an html files already existing on hard drive with html2text is done with cmd:

$ html2text index.html >index.txt

To convert a whole directory full of .html (documentation) or whatever files to plain text .TXT , cd the directory with HTMLs and issue the one liner bash loop command:

$ cd html/
html$ for i in $(echo *.html); do html2text $i > $(echo $i | sed -e 's#.html#.txt#g'); done

Now lay off your back and enjoy reading the dox like in the good old hacker days when .TXT files were fashionable 😉