Posts Tagged ‘TXT’

How to convert html pages to text in console / terminal on GNU / Linux and FreeBSD

Thursday, December 8th, 2011

HTML to Plain Text Convertion on GNU / Linux and FreeBSD

I’m realizing the more I’m converting to a fully functional GUI user, the less I’m doing coding or any interesting stuff…
I remembered of the old glorious times, when I was full time console user and got a memory on a nifty trick I was so used to back in the day.
Back then I was quite often writing shell scripts which were fetching (html) webpages and converting the html content into a plain TEXT (TXT) files

In order to fetch a page back in the days I used lynx(a very simple UNIX text browser, which by the way lacks support for any CSS or Javascipt) in combination with html2text – (an advanced HTML-to-text converter).

Let’s say I wanted to fetch a my personal home page http://www.pc-freak.net/, I did that via the command:

$ lynx -source http://www.pc-freak.net/ | html2text > pcfreak_page.txt

The content from www.pc-freak.net got spit by lynx as an html source and passed html2pdf wchich saves it in plain text file pcfreak_page.txt
The bit more advanced elinks – (lynx-like alternative character mode WWW browser) provides better support for HTML and even some CSS and Javascript so to properly save the content of many pages in plain html file its better to use it instead of lynx, the way to produce .txt using elinks files is identical, e.g.:

$ elinks -source http://www.pc-freak.net/blog/ | html2text > pcfreak_blog_page.txt

By the way back in the days I was used more to links , than the superior elinks , nowdays I have both of the text browsers installed and testing to fetch an html like in the upper example and pipe to html2text produced garbaged output.

Here is the time to tell its not even necessery to have a text browser installed in order to fetch a webpage and convert it to a plain text TXT!. wget file downloading tools supports source dump as well, for all those who did not (yet) tried it and want to test it:

$ wget -qO- http://www.pc-freak.net | html2text Anyways of course, some pages convertion of text inside HTML tags would not properly get saved with neither lynx or elinks cause some texts might be embedded in some elinks or lynx unsupported CSS or JavaScript. In those cases the GUI browser is useful. You can use any browser like Firefox, Epiphany or Opera ‘s File -> Save As (Text Files) embedded functionality, below is a screenshot showing an html page which I’m about to save as a plain Text File in Mozilla Firefox:

Firefox iceWeasel Opera etc. save html webpage as plain text on GNU / Linux, FreeBSD

Besides being handy in conjunction with text browsers, html2text is also handy for converting .html pages already existing on the computer’s hard drive to a plain (.TXT) text format.
One might wonder, why would ever one would like to do that?? Well I personally prefer reading plain text documents instead of htmls 😉
Converting an html files already existing on hard drive with html2text is done with cmd:

$ html2text index.html >index.txt

To convert a whole directory full of .html (documentation) or whatever files to plain text .TXT , cd the directory with HTMLs and issue the one liner bash loop command:

$ cd html/
html$ for i in $(echo *.html); do html2text $i > $(echo $i | sed -e 's#.html#.txt#g'); done

Now lay off your back and enjoy reading the dox like in the good old hacker days when .TXT files were fashionable 😉

Share this on

Convert Windows / MS-DOS end of line characters (CR/LF) to UNIX (LF) with sed

Tuesday, November 29th, 2011

I guess everyone has ended up with problems into a script files written under Windows using some text editor which incorrectly placed into the end of lines Windows (rn) end of lines instead of the UNIX (r).
Those who have have already take advantage of the nice tiny utility dos2unix which is capable of convert the Windows end of lines to UNIX. However some older UNIXes, like SunOS or HP-UX does not have the dos2unix utility into the list of packages one can install or even if its possible to install dos2unix it takes quite a hassle.
In that cases its good to say convertion of end of lines can be done without using external end programs by simply using UNIX sed .
The way to remove the incorrect Windows ^M (as seen in unix text editors) is by using the sed one liner:

server# sed 's/.$//' file-with-wrong-windows-eol.txt > file-with-fixed-unix-eol.txt

Share this on

How to add multiple email accounts in qmail’s vpopmail with vpasswd via ssh (console) / Little shell script to add multiple email addresses

Sunday, June 12th, 2011

I’ve been assigned the task to add on one of the qmail powered servers I administrate about 50 email addresses via command line.

Each email addresses was required to be configured to have the same mail password.
Adding the email addresses via an interface would be a killing time consuming task and will probably require at least 1 hour of time to add the emails with qmailwebmin, qadmin, qubit or the other vpopmail qmail web administration interfaces available nowdays.

To solve the task, I’ve used a line oner bash shell script which reads all my 80 emails from a file and adds them with vpopmail’s command line tool vpasswd on the mail server.

Here is the one liner shell script I’ve written to solve the task:

debian:~# while read line; do vadduser $line Email_Pass_Phrase; done < email_list_file.txt

In above’s code I’ve used the email_list_file.txt file is a text file on the server and contains list of all my 50 email addresses, where each line in the file contains one email. The Email_Pass_Phrase is actually the password I’ve set for all the new email addresses being created with vpasswd

That’s all now the 50 email addresses on the server are created and I’ve saved at least one hour of boring repeating actions in the browser 😉

Share this on