Posts Tagged ‘Convert’

Convert single PDF pages to multiple SVG files on Debian Linux with pdf2svg

Sunday, February 26th, 2012

In my last article, I've explained How to create PNG, JPG, GIF pictures from one single PDF document
Convertion of PDF to images is useful, however as PNG and JPEG graphic formats are raster graphics the image quality gets crappy if the picture is zoomed to lets say 300%.
This means convertion to PNG / GIF etc. is not a good practice especially if image quality is targetted.

I myself am not a quality freak but it was interesting to find out if it is possible to convert the PDF pages to SVG (Scalable Vector Graphics) graphics format.

Converting PDF to SVG is very easy as for GNU / Linux there is a command line tool called pdf2svg
pdf2svg's official page is here

The traditional source way compile and install is described on the homepage. For Debian users pdf2svg has already existing a deb package.

To install pdf2svg on Debian use:

debian:~# apt-get install --yes pdf2svg
...

Once installed usage of pdf2svg to convert PDF to multiple SVG files is analogous to imagemagick's convert .
To convert the 44 pages Projects.pdf to multiple SVG pages – (each PDF page to a separate SVG file) issue:

debian:~/project-pdf-to-images$ for i in $(seq 1 44); do \
pdf2svg Projects.pdf Projects-$i.SVG $i; \
done

This little loop tells each page number from the 44 PDF document to be stored in separate SVG vector graphics file:

debian:~/project-pdf-to-images$ ls -1 *.svg|wc -l
44

For BSD users and in particular FreeBSD ones png2svg has a bsd port in:

/usr/ports/graphics/pdf2svg

Installing on BSD is possible directly via the port and convertion of PDF to SVG on FreeBSD, should be working in the same manner. The only requirement is that bash shell is used for the above little bash loop, as by default FreeBSD runs the csh. 
On FreeBSD launch /usr/local/bin/bash, before following the Linux instructions if you're not already in bash.

Now the output SVG files are perfect for editting with Inkscape or Scribus and the picture quality is way superior to old rasterized (JPEG, PNG) images

Create PNG, JPG, GIF pictures / images from PDF on Linux

Saturday, February 25th, 2012

I've received a PDF file with a plan for development of a bundle of projects, My task was to evaluate this plan and give feeback on the 44 pages PDF document.

Since don't know of program to directly be able edit PDF files on GNU / Linux ?, my initial idea was to open and convert the PDF to ODT / DOC with OpenOffice (Libre Office) and then edit the ODT file.
Unfortunately Open Office oowrite program was unable to open / visualize the PDF file. My assumption is OO failure to open the PDF is because the PDF was generated on Microsoft Windows with Adobe illustrator or smth.

The idea that came to my mind as alternative, way to edit the PDF file was to convert it in pictures edit and then convert the pictures to PDF.
In other words to follow these 3 steps:
1. Convert the PDF document to multiple images
2. Edit each of the images with GIMP or Inkscape
3. Convert back all images to a single PDF file

Some time ago, I've written an article how to create PDF file from many image files in JPEG, PNG or GIF on Linux

. This prior article was exactly describing how to complete Step 3.Therefore all left was to find a way to convert the PDF file to multiple JPEG / PNG / GIF images.

convert command to convert PDF document to multiple pictures which you can take from my earlier article is:

$ convert *.jpg outputpdffile.pdf
Actually in Step 1 I was aiming to do the opposite of what I've previously done.

Hence, in order to convert the singe Project.PDF file to multiple PNG images, I just switched convert IN / OUT arguments order.

hipo@noah:~/project-pdf-to-images$ convert Project.pdf Project.png
...

I've done the PDF to pictures conversion on my notebook running Debian Squeeze (6.0.2) GNU / Linux.Convertion of the PDF file to 44 images, took 25 seconds on my dual core 1.8 Ghz / 2GB RAM Thinkpad r61.
Afterwards, I've had at hand 44 PNG files generated, e.g.:

hipo@noah:~/project-pdf-to-images$ ls -al Project-*.png |wc -l
44

convert was also smart enough to produce correct file naming. The output file names were:
Project-1.png
Project-2.png
etc.

Nicely each number (-1.png) was corresponding to the respective PDF page. For instance Project-10.png was corresponding to page 10 of the Projects.PDF file

Rather ironically, after convertion of the PDF to pictures, while opening the Project-1.png, I've noticed The GIMP – (The GNU Image Manipulation Program) is capable of directly reading PDF files. GIMP has both the option to open files in layers or separate images 😉
Anyways even if GIMP is used to modify the different PDF pages as layers, once completed GIMP doesn't have the ability to save the file as PDF and therefore once saved the file if merging of layers is done the resulting picture becomes ONE BIG MESS.
Therefore it seems my the 3 steps way e.g.:

1. convertion PDF to pictures
2. picture edit with GIMP or Inkscape
3. convertion of pictures back to PDF

is still the only way to "modify PDF" in Linux or BSDs. I will be glad to hear if someone has come up with a better solution?

 

Convert Windows / MS-DOS end of line characters (CR/LF) to UNIX (LF) with sed

Tuesday, November 29th, 2011

I guess everyone has ended up with problems into a script files written under Windows using some text editor which incorrectly placed into the end of lines Windows (rn) end of lines instead of the UNIX (r).
Those who have have already take advantage of the nice tiny utility dos2unix which is capable of convert the Windows end of lines to UNIX. However some older UNIXes, like SunOS or HP-UX does not have the dos2unix utility into the list of packages one can install or even if its possible to install dos2unix it takes quite a hassle.
In that cases its good to say convertion of end of lines can be done without using external end programs by simply using UNIX sed .
The way to remove the incorrect Windows ^M (as seen in unix text editors) is by using the sed one liner:

server# sed 's/.$//' file-with-wrong-windows-eol.txt > file-with-fixed-unix-eol.txt

How to convert UTF-8 encoding files to Windows CP1251 on GNU / Linux

Friday, October 21st, 2011

I needed to convert a file which had a Bulgarian text written in UTF-8 encoding to Windows CP1251 in order to fix a website encoding problems after a move of the website from one physical server to another.

I tried first with enca( detects and convert encoding of text files from one encoding to another).

The exact way I tried to convert was:

linux:~# enca -L bg /home/site/www/includes/utf8_encoded_file.php
...
Unfortunately this attempt to conver was unsucesfully, and the second logical guess was to use iconvConvert encoding of given files from one encoding to another to do the utf8 to cp1251 conversion.
I reached for some help in irc.freenode.net, #varnalab channel and Alex Kuklin helped me, giving me an example command line to do the conversion.
iconv winedows to cp1251 conversion line, he pointed to me was:

linux:~# iconv -f utf8 -t cp1251 < in > out

Further on I adapted Alex’s example to convert my utf8_encoded_file.php encoded Bulgarian characted to CP1251 and used the following commands to convert and create backups of my original UTF8 file:

linux:~# cd /home/site/www/includes
linux:/home/site/www/includes# iconv -f utf8 -t cp1251 < utf8_encoded_file.php in > utf8_encoded_file.php.cp1251
linux:/home/site/www/includes# mv utf8_encoded_file.php utf8_encoded_file.php.bak
linux:/home/site/www/includes# mv utf8_encoded_file.php.cp1251 utf8_encoded_file.php

Convert Picture to ASCII

Tuesday, August 11th, 2009

A friend of mine has recently recommended me a web service which is able to convert a normal picture in JPG or PNG to an ASCII equivalent.You might try it out on text-image.com .
The converted picture of mine face.jpg can be seen in ascii here .END—–

Convert all text in a file from upper to lower case with VIM

Monday, December 21st, 2009

I’m playing a bit with some old crappy html files this days,
I’m trying to make them w3c compliant. One of the files was
filled with HTML tags in upper case, therefore I needed a quick way
to convert them to lower case.
Here is how I did it:
:%s/[A-Z]/L&/g

Howto to detect file encoding and convert default encoding of given files from one encoding to another on GNU/Linux and FreeBSD

Sunday, February 7th, 2010

I wanted to convert an html document character encoding to UTF-8, to achieve that of
course it was first needed to determine what kind of character encoding was used in
creation time of the file.
First thing I tried was:

hipo@noah:~/Desktop/test$ file File-Whole.htm
File-Whole.htm: HTML document text

as you can see that’s shit cause for some reason mime encoding is not printed by the file
command.
Next what I tried was:
hipo@noah:~/Desktop/test$ file --mime File-Whole.htm1File-Whole.htm1: text/html; charset=unknown-8bit

Here you see that character encoding is reported as charset=unknown-8bit which
ain’t cool at all and is of no use and prompts an error if I try it in iconv
Here is why I needed concretely to determine what kind of character set my file uses to later
be able to convert it using iconv .
To achieve my goal after consulting with Mr. Google , I found
out about enca — detect and convert encoding of text files
It’s obviously my lucky day because good guys from Debian has packaged enca so, everything came to the point of
apt-getting it.
# apt-get install enca

On FreeBSD enca port is available, so installing it cames simply to installing it from port tree.
Here is how:
pcfreak# cd /usr/ports/converters/enca;pcfreak# make install clean

Now I tried launching enca directly without any program parameters, but I was unlucky:

hipo@noah:~/Desktop/test$ enca file-Whole.htm
enca: Cannot determine (or understand) your language preferences.
Please use `-L language', or `-L none' if your language is not supported
(only a few multibyte encodings can be recognized then).
Run `enca --list languages' to get a list of supported languages.

I gave it another try, following prescribed usage parameters though I first checked my possibility
as a languages I can pass by to enca’s -L parameter.
Preliminary knowing that my text contains text in Bulgarian language, it wasn’t such a big deal
for me to determine the required language:

hipo@noah:~/Desktop/test$ enca -L bulgarian File-Whole.htm
transformation format 8 bits; CP1251

Knowing my character set all left for me was to do do the convert to UTF-8 to make text,
much more accessible.

hipo@noah:~/Desktop/test$ iconv --from-code=unknown-8bit --to=UTF-8 File-Whole.htm > File-Whole.htm.new
hipo@noah:~/Desktop/test$ mv File-Whole.htm.new File-Whole.htm

Well here we are conversion mission accomplished 🙂

Convert doc files to plain text (txt) in terminal / console (tty) on GNU / Linux

Sunday, September 20th, 2009

I was looking for a way to convert Microsoft .doc files to plain text (txt) in Linux directly through terminal.
After some lookup in Google Groups I found ANTIWORD! .
Luckily Debian comes even with a package containing the nice nifty program.
Here is the description of antiword – Converts MS Word files to text, PS and PDF
Fun, desciprtion Eh? 🙂 Ain’t it?
There are some other ways to Convert doc files to plain text, for instance you could use the command catdoc , for example to convert simple .doc to .txt file usecatdoc -a whitepaper.doc.
Another way to convert .doc files to .txt mostly used by developers is via the wvware (nothing to do with vmware!:)) utility.
wvware could directly convert it to html. For example:
wvWare file.doc >file.htmlor
wvText file.doc file.txt
.A lot of things I’ll skip here are well explained in the article Viewing Word files at the command line .
END—–