Posts Tagged ‘content’

Linux convert and read .mht (Microsoft html) file format. MHT format explained

Thursday, June 5th, 2014

If you're using Linux as a Desktop system sooner or later you will receive an email with instructions or an html page stored in .mht file format.
So what is mht? MHT is an webpage archive format (short for MIME HTML document). MHTML saves the Web page content and incorporates external resources, such as images, applets, Flash animations and so on, into HTML documents. Usually those .mht files were produced with Microsoft Internet Explorer – saving pages through:

File -> Save As (Save WebPage) dialog saves pages in .MHT.

To open those .mht files on Linux, where Firefox is available add the UNMHT FF Extension to browser. Besides allowing you to view MHT on Linux, whether some customer is requiring a copy of an HTML page in MHT, UNMHT allows you to also save complete web pages, including text and graphics, into a MHT file.
There is also support for Google Chrome browser for MHT opening and saving via a plugin called IETAB. But unfortunately IETAB is not supported in Linux.
Anyways IETAB is worthy to mention here as if your'e a Windows users and you want to browse pages compatible only with Internet Explorer, IETAB will emulates exactly IE by using IE rendering engine in Chrome  and supports Active X Controls. IETAB is a great extension for QA (web testers) using Windows for desktop who prefer to not use IE for security reasons. IETab supports IE6, IE7, IE8 and IE9.

Another way to convert .MHT content file into HTML is to use Linux KDE's mhttohtml tool.


Another approach to open .MHT files in Linux is to use Opera browser for Linux which has support for .MHT

Note that because MHT files could be storing potentially malicious content (like embedded Malware) it is always wise when opening MHT on Windows to assure you have scanned the file with Antivirus program. Often mails containing .MHT from unknown recipients are containing viruses or malware. Also links embedded into MHT file could easily expose you to spoof attacks. MHT files are encoded in combination of plain text MIMEs and BASE64 encoding scheme, MHT's mimetype is:

MIME type: message/rfc822

Share this on

Check and Restart Apache if it is malfunctioning (not returning HTML content) shell script

Monday, March 19th, 2012

Check and Restart Apache Webserver on Malfunction, Apache feather logo

One of the company Debian Lenny 5.0 Webservers, where I'm working as sys admin sometimes stops to properly server HTTP requests.
Whenever this oddity happens, the Apache server seems to be running okay but it is not failing to return requested content

I can see the webserver listens on port 80 and establishing connections to remote hosts – the apache processes show normally as I can see in netstat …:

apache:~# netstat -enp 80
Active Internet connections (w/o servers)
Proto Recv-Q Send-Q Local Address Foreign Address State User Inode PID/Program name
tcp 0 0 SYN_RECV 0 0 -
tcp 0 0 SYN_RECV 0 0 -

Also the apache forked child processes show normally in process list:

apache:~# ps axuwwf|grep -i apache
root 46748 0.0 0.0 112300 872 pts/1 S+ 18:07 0:00 \_ grep -i apache
root 42530 0.0 0.1 217392 6636 ? Ss Mar14 0:39 /usr/sbin/apache2 -k start
www-data 42535 0.0 0.0 147876 1488 ? S Mar14 0:01 \_ /usr/sbin/apache2 -k start
root 28747 0.0 0.1 218180 4792 ? Sl Mar14 0:00 \_ /usr/sbin/apache2 -k start
www-data 31787 0.0 0.1 219156 5832 ? S Mar14 0:00 | \_ /usr/sbin/apache2 -k start

In spite of that, in any client browser to any of the Apache (Virtual hosts) websites, there is no HTML content returned…
This weird problem continues until the Apache webserver is retarted.
Once webserver is restarted everything is back to normal.
I use Apache Check Apache shell script set on few remote hosts to regularly check with nmap if port 80 (www) of my server is open and responding, anyways this script just checks if the open and reachable and thus using it was unable to detect Apache wasn't able to return back HTML content.
To work around the malfunctions I wrote tiny script –

The scripts idea is very simple;
A request is made a remote defined host with lynx text browser, then the output of lines is counted, if the output returned by lynx -dump is less than the number returned whether normally invoked, then the script triggers an apache init script restart.

I've set the script to periodically run in a cron job, every 5 minutes each hour.
# check if apache returns empty content with lynx and if yes restart and log it
*/5 * * * * /usr/sbin/ >/dev/null 2>&1

This is not perfect as sometimes still, there will be few minutes downtime, but at least the downside will not be few hours until I am informed ssh to the server and restart Apache manually

A quick way to download and set from cron execution my script every 5 minutes use:

apache:~# cd /usr/sbin
apache:/usr/sbin# wget -q
apache:/usr/sbin# chmod +x
apache:/usr/sbin# crontab -l > /tmp/file; echo '*/5 * * * *' /usr/sbin/ 2>&1 >/dev/null


Share this on

How to disable tidy HTML corrector and validator to output error and warning messages

Sunday, March 18th, 2012

I've noticed in /var/log/apache2/error.log on one of the Debian servers I manage a lot of warnings and errors produced by tidy HTML syntax checker and reformatter program.

There were actually quite plenty frequently appearing messages in the the log like:

To learn more about HTML Tidy see
Please fill bug reports and queries using the "tracker" on the Tidy web site.
Additionally, questions can be sent to
HTML and CSS specifications are available from
Lobby your company to join W3C, see
line 1 column 1 - Warning: missing <!DOCTYPE> declaration
line 1 column 1 - Warning: plain text isn't allowed in <head> elements
line 1 column 1 - Info: <head> previously mentioned
line 1 column 1 - Warning: inserting implicit <body>
line 1 column 1 - Warning: inserting missing 'title' element
Info: Document content looks like HTML 3.2
4 warnings, 0 errors were found!

I did a quick investigation on where from this messages are logged in error.log, and discovered few .php scripts in one of the websites containing the tidy string.
I used Linux find + grep cmds find in all php files the "tidy "string, like so:

server:~# find . -iname '*.php'-exec grep -rli 'tidy' '{}' \;
find . -iname '*.php' -exec grep -rli 'tidy' '{}' \; ./new_design/modules/index.mod.php

Opening the files, with vim to check about how tidy is invoked, revealed tidy calls like:

exec('/usr/bin/tidy -e -ashtml -utf8 '.$tmp_name,$rett);

As you see the PHP programmers who wrote this website, made a bigtidy mess. Instead of using php5's tidy module, they hard coded tidy external command to be invoked via php's exec(); external tidy command invocation.
This is extremely bad practice, since it spawns the command via a pseudo limited apache shell.
I've notified about the issue, but I don't know when, the external tidy calls will be rewritten.

Until the external tidy invocations are rewritten to use the php tidy module, I decided to at least remove the tidy warnings and errors output.

To remove the warning and error messages I've changed:

exec('/usr/bin/tidy -e -ashtml -utf8 '.$tmp_name,$rett);

exec('/usr/bin/tidy --show-warnings no --show-errors no -q -e -ashtml -utf8 '.$tmp_name,$rett);

The extra switches meaning is like so:

q – instructs tidy to produce quiet output
-e – show only errors and warnings
–show warnings no && –show errors no, completely disable warnings and error output

Onwards tidy no longer logs junk messages in error.log Not logging all this useless warnings and errors has positive effect on overall server performance especially, when the scripts, running /usr/bin/tidy are called as frequently as 1000 times per sec. or more

Share this on

Why and How facebook profits because of you? – People, The real facebook investors

Tuesday, March 6th, 2012

Facebook people real facebook investors, facebook profits because of you / facebook greedy money logo

Facebook is usually praised and very seldom criticized. I've seen already on a couple of occasions on the TV channel news on earthquake occasions or some kind of other calamities, where facebook was said to help the rescuing teams etc. We constantly hear how facebook has helped people point their location in disastrous situations or just helping people organize a protest against a harmful company activity. Whilst this might be true, the harms it does are quite big as well. A primary harm it does is to economy as we know it. As people are engaged in filling in Mark Zuckerberg and facebook investors pockets, they rarely think about how actually facebook gets their money?

Let me explain:

Basicly facebook makes money out of its constantly increased social network data content. This could have not been possible without the 800 000 000+ million of people who constantly post updates on facebook, create groups, post pictures, add likes, comment and post links to other facebook pages. If people had not all this volunteers (facebook users) to post all this bunch of mostly junky information, facebook inc. would not have a penny. Therefore what makes facebook grow is the people itself who willingly choose to be part of this money making machine. One would think with regular company the investors are the owners of the company shares. This classical business model is not facebook model, there it is rather different as the real investors in facebook are not the capital shareholders but the regular social network user base – this means (you and me)!.

For all those who still don't get what I'm talking about I will shortly explain.
Everyone who has a basic idea on how internet advertising works is aware that the primary origin for facebook todays profit is the left pane sky scraper field with ever changing advertisements.

Various advertisers pays facebook all the time big money for displaying those stupid advertisement. As many peole are viewing and clicking the advertisements, facebook makes billions out if its advertisers.

So far so good, facebook generates its profits out of peoples free time and delibarately information sharing you would say and you might argue me that facebook steals people (time / money). This would have been true if you don't put in the picture for a contrast, a regular blogger, who makes its daily living out of blogging.
What a regular blogger does is frequent blogging on various kind of topics of his interest. Various bloggers blog at various titles, but most of them has a few major topics which they're following.

The more articles a blogger collects and the higher the uniqueness of this information is the bigger the probability this blog to have a good users base and the more interesting content it will have for search engine robots like Google Bot Crawlers or Yahoo Bot etc. etc.
With all priod said, the higher the probability this blog to have more traffic drawn from web searches to the blog. As the blogger content increases with time when it gets 10000 or more unique articles (pages), consequently it can be used as an advertising place. A 10000 pages blog could earn a person a few hundred of euros (200, 300 EUR) per month.

Well the business scheme behind facebook is exactly the same, except they store and physically own the data of the facebook registered persons. The user posts content on his facebook wall, makes pages or does various activities which generate pages, the content gets indexed in Google and with time the overall facebook website content grows. As new users joins facebook with the increased popularity of website. The website is growing exponentially like in a atoms chain reaction.

Because of this steady content growth, it becomes an interesting place not only for advertisers but for all kind of people that use the internet.
And there you have the monetarization facebook makes billions of dollars every second because of you.  This is the shocking truth, they get their money because people click or view advertisement on each others profile, so there you're YOU make the little people who develop facebook and the original investors richer and richer with every day, where you make yourself poorer and poorer by investing your personal time in facebook instead of using it to work on something that will potentially generate you some dividents in short or long future.

Actually social network is nothing more than just a multiple blogging platform, but some marketing person come with this marketing hype work "social network".
The social network buzz word is in my view just another big marketing "white lie"!. Correct me if i'm wrong but what in fact is a "Social network?". I don't see facebook neither as social, network as network. I don't know about you but I have never made a long lasting friend or relationship using facebook so far. I think the poor Facebook creator Zuckerberg made facebook with a viral mindset. He intended it to be like a social virus and so far he succeed pretty much. I just wait and eager to see who will start the anti-virus for Zuckerberg's (facebook) – people time eating virus. 

Share this on

How to set up Qmail auto reply (Out of the Office), vacation message manually using .qmail message processing file

Tuesday, February 14th, 2012

Qmail Logo Auto reply message / how to setup qmail auto reply out of the office vacation message

I had to setup a QMAIL auto reply (Out of the Office) message on 5 email addresses and since I haven't done it for a long time it took me a couple 20 minutes to consult Qmail (Life With Qmail (great website!) documentation and read a couple of online forum threads until I finally remembered, how I used to be setting up a vacation message manually via qmail's .qmail file.

Of course Setting qmail auto reply can always be done via QmailAdmin or VQadmin ..Qmail Vpopmail web frontends however on many Qmail mail servers Qmailadmin or/and VQadmin is absent due to some reason or even on a big mail servers the server doesn't run Apache at all. Hence it is good to know how to set qmail vacation message directly via plain SSH terminal connection and this is why how this article got born.

So here is how I enable qmail auto reply "manually", through .qmail for my email address

1. Set a /var/vpopmail/domains/ file with the following content:

| /usr/bin/autorespond 86400 3 /home/vpopmail/domains/ /home/vpopmail/domains/

2. Create /home/vpopmail/domains/ directory

linux:~# mkdir -p /home/vpopmail/domains/

3. Create /home/vpopmail/domains/ file with auto reply message

First create the message file with touch command:

linux:~# touch /home/vpopmail/domains/

Then put with vim or mcedit etc. an auto-reply vacation message similar to the sample below:

Subject: We have received your message. Thank you!

Dear Customer, we thank you for the interest in our services.
A member of our team will reply promptly to your enquiry shortly.

4. Set proper permissions for vacation/message and .qmail files

/home/vpopmail/domains/ and /home/vpopmail/domains/ files has to be owned by user/group vpopmail:vchkpw, e.g.:

linux:~# chown -R vpopmail:vchkpw /home/vpopmail/domains/
linux:~# chown vpopmail:vchkpw /home/vpopmail/domains/

If you are a qmail administration with the requirement to create auto reply message for employees going on a holiday often (in a middle sized company office), setting up the out of the office auto reply manually one by one is a time consuming, annoying task and "crazy" task. Therefore some time ago while still I was employed in a Bulgarian mid-sized company called Design.BG, I've written a tiny shell script which creates qmail email users vacation messages by passing few arguments.

Here is my shell script
Note that this script might have a lot of bugs and is not much tested, so read it carefully and test it before you put it for daily use 😉
Happy Hacking! 😉

Share this on

How to make wordpress Update Plugins prompt to permanently store password / Get rid of annoying updates wordpress prompt

Thursday, February 2nd, 2012

I'm managing few wordpress installations which requires me to type in:
Hostname , FTP Username and FTP Password , every single time a plugin update is issued and I want to upgrade to the new version.
Below is a screenshot of this annoying behaviour:

How to get rid of update plugins wordpress username password prompt

As you can see in the above screenshot, there is no way through Update Plugins web interface to store the password permanently. Hence the only option to store it permanently is to manually edit wp-config.php (file located in wordpress docroot, e.g. /path/to/wordpress/wp-config.php , inside the file find the line:

define ('WPLANG', '');

Right after it put a code similar to:

define('FS_METHOD', 'ftpsockets');
define('FTP_BASE', '/path/to/wordpress/');
define('FTP_CONTENT_DIR', '/path/to/wordpress/wp-content/');
define('FTP_PLUGIN_DIR ', '/path/to/wordpress/wp-content/plugins/');
define('FTP_USER', 'Username');
define('FTP_PASS', 'Password');
define('FTP_HOST', 'localhost');

Change the above defines:
path/to/wordpress/ – with your wordpress location directory.
Username and Password – with your respective FTP username and password. The localhost

That's all, from now onwards the User/Password prompt will not appear anymore. Consider there is a security downside of storing the FTP User/Pass in wp-config.php , if someone is able to intrude the wordpress install and access the documentroot of the wordpress install he we'll be able to obtain the ftp user/pass and log in the server directly via FTP protocol.

Share this on

How to convert html pages to text in console / terminal on GNU / Linux and FreeBSD

Thursday, December 8th, 2011

HTML to Plain Text Convertion on GNU / Linux and FreeBSD

I’m realizing the more I’m converting to a fully functional GUI user, the less I’m doing coding or any interesting stuff…
I remembered of the old glorious times, when I was full time console user and got a memory on a nifty trick I was so used to back in the day.
Back then I was quite often writing shell scripts which were fetching (html) webpages and converting the html content into a plain TEXT (TXT) files

In order to fetch a page back in the days I used lynx(a very simple UNIX text browser, which by the way lacks support for any CSS or Javascipt) in combination with html2text – (an advanced HTML-to-text converter).

Let’s say I wanted to fetch a my personal home page, I did that via the command:

$ lynx -source | html2text > pcfreak_page.txt

The content from got spit by lynx as an html source and passed html2pdf wchich saves it in plain text file pcfreak_page.txt
The bit more advanced elinks – (lynx-like alternative character mode WWW browser) provides better support for HTML and even some CSS and Javascript so to properly save the content of many pages in plain html file its better to use it instead of lynx, the way to produce .txt using elinks files is identical, e.g.:

$ elinks -source | html2text > pcfreak_blog_page.txt

By the way back in the days I was used more to links , than the superior elinks , nowdays I have both of the text browsers installed and testing to fetch an html like in the upper example and pipe to html2text produced garbaged output.

Here is the time to tell its not even necessery to have a text browser installed in order to fetch a webpage and convert it to a plain text TXT!. wget file downloading tools supports source dump as well, for all those who did not (yet) tried it and want to test it:

$ wget -qO- | html2text Anyways of course, some pages convertion of text inside HTML tags would not properly get saved with neither lynx or elinks cause some texts might be embedded in some elinks or lynx unsupported CSS or JavaScript. In those cases the GUI browser is useful. You can use any browser like Firefox, Epiphany or Opera ‘s File -> Save As (Text Files) embedded functionality, below is a screenshot showing an html page which I’m about to save as a plain Text File in Mozilla Firefox:

Firefox iceWeasel Opera etc. save html webpage as plain text on GNU / Linux, FreeBSD

Besides being handy in conjunction with text browsers, html2text is also handy for converting .html pages already existing on the computer’s hard drive to a plain (.TXT) text format.
One might wonder, why would ever one would like to do that?? Well I personally prefer reading plain text documents instead of htmls 😉
Converting an html files already existing on hard drive with html2text is done with cmd:

$ html2text index.html >index.txt

To convert a whole directory full of .html (documentation) or whatever files to plain text .TXT , cd the directory with HTMLs and issue the one liner bash loop command:

$ cd html/
html$ for i in $(echo *.html); do html2text $i > $(echo $i | sed -e 's#.html#.txt#g'); done

Now lay off your back and enjoy reading the dox like in the good old hacker days when .TXT files were fashionable 😉

Share this on

How to test if imap and pop mail server service is working with Telnet cmd

Monday, August 8th, 2011

I’ve recently built new mail qmail server with vpopmail to serve pop3 connectins and courierimap and courierimaps to take care for IMAP IMAPS.

I further used telnet to test if the Linux server pop3 service on (110) and imap on (143) worked fine, straight after the completed qmail install.
Here is how to test mail server with vpopmail listening for connections on pop3 port :

debian:~# telnet 110
Trying 111.222.333.444...
Connected to
Escape character is '^]'.
+OK <>
PASS here_goes_my_secret_pass
1 309783
2 64053
3 2119
4 64357
5 317893
My first mail content retrieved with RETR commandgoes here ....
Connection closed by foreign host.

You see I have 5 messages in my mailbox, as you can see I used RETR command to check the content of my mail, this is handy as I can read my mails straight with telnet (if the mail is in plain text), of course it’s a bit more complicated if I have to read encrypted or html mail, though still its easy to write a tiny parser and pipe the content produced by telnet command to lynx or some other text based browser.

Now another sys admin handy tip is the use of telnet to check my mail servers IMAP servers is correctly operating.
Here is how:

debian:~# telnet 143
Trying 111.222.333.444...
Connected to localhost.
Escape character is '^]'.
01 LOGIN here_goes_my_secret_pass
02 LIST "" *
* LIST (Unmarked HasNoChildren) "." "INBOX"
02 OK LIST completed
* FLAGS (Draft Answered Flagged Deleted Seen Recent)
* OK [PERMANENTFLAGS (* Draft Answered Flagged Deleted Seen)] Limited
* OK [UIDVALIDITY 1312746907] Ok
* OK [MYRIGHTS "acdilrsw"] ACL
04 OK STATUS Completed.
As you can see according to standard to send commands to IMAP server from console after a telnet connection you will have to always include a command line number like 01, 02, 03 .. etc.

Using such a line numbering is not obligitory and also letters like A, B, C could be use still line numbering with numbers is generally a good idea since it’s easier for reading on the screen.

Now line 02 shows you available mailboxes, line 03 SELECT INBOX selects the imap Inbox to be further operated with, 04 STATUS INBOX cmd displays status about current mailboxes in folder.
FETCH 1 ALL instructs the imap server to get list of all IMAP message headers. Next command in line 05 FETCH 1 BODY will display the message body of the first message in list.
The 07 FETCH 1 ENVELOPE will display the mail headers for the 1 message.

Few other IMAP commands which might be helpfun on connection are:


First one would fetch complete content of a message numbered one from the imap server and the second one 09 FETCH * FULL will get all the mail content for all messages located on the remote IMAP server.

The STATUS command aforementioned earlier could take the following list of arguments:


These commands are a gold mine for me as a sysadmin as it helps quickly solve problems, hope they would help to somebody out there as well 😉
This way is a way shorter than bothering each time to check, if some customer e-mail account is improperly configured by creating setting up a new account in Thunderbird.

Share this on

How to configure networking in CentOS, Fedora and other Redhat based distros

Wednesday, June 1st, 2011

On Debian Linux I’m used to configure the networking via /etc/network/interfaces , however on Redhat based distributions to do a manual configuration of network interfaces is a bit different.

In order to configure networking in CentOS there is a special file for each interface and some values one needs to fill in to enable networking.

These network adapters configuration files for Redhat based distributions are located in the files:


Just to give you and idea on the content of this network configuration file, here is how it looks like:

[root@centos:~ ]# cat /etc/sysconfig/network-scripts/ifcfg-eth0
# Broadcom Corporation NetLink BCM57780 Gigabit Ethernet PCIe

This configuration is of course just for eth0 for other network card names and devices, one needs to look up for the proper file name which corresponds to the network interface visible with the ifconfig command.
For instance to list all network interfaces via ifconfig use:

[root@centos:~ ]# /sbin/ifconfig |grep -i 'Link encap'|awk '{ print $1 }'

In this case there are only two network cards on my host.
The configuration files for the ethernet network devices eth0 and eth1 from below example are located in files /etc/sysconfig/network-scripts/ifcfg-eth{1,2}

/etc/sysconfig/network-scripts/ directory contains plenty of shell scripts related to Fedora networking.
This directory contains actually the networking boot time load up rules for fedora and CentOS hosts.

The complete list of options available which can be used in /etc/sysconfig/network-scripts/ifcfg-ethx is located in:

, to quickly observe the documentation:

[root@centos:~ ]# less /usr/share/doc/initscripts-*/sysconfig.txt

One typical example of configuring a CentOS based host to possess a static IP address ( and a gateway (, which will be assigned in boot time during the /etc/init.d/network is loaded is:

[root@centos:~ ]# cat /etc/sysconfig/network-scripts/ifcfg-eth0
# Broadcom Corporation NetLink BCM57780 Gigabit Ethernet PCIe

After some changes to the network configuration files are made, to load up the new rules a /etc/init.d/network script restart is necessery with the command:

[root@centos:~ ]# /etc/init.d/network restart

Of course one can always use /etc/rc.local script as universal way to configure network rules on a Redhat based host, however using methods like rc.local to load up, ifconfig or route rules in a Fedora would break the distribution logic and therefore is not recommended.

There is also a serious additional reason against using /etc/rc.local post init commands load up script.
If one uses rc.local to load up and configure the networing, the network will get initialized only after all the other scripts in /etc/init.d/ gets started.

Therefore using /etc/rc.local might also be DANGEROUS!, if used remotely via (ssh), supposedly it might completely fail to load the networking, if all bringing the server interfaces relies on it.

Here is an example, imagine that some of the script set in to load up during a CentOS boot up hangs and does continue to load forever (for example after some crucial software upate), as a consequence the /etc/rc.local script will never get executed as it only starts up after all the rest init scripts had succesfully completed execution.

A network eth1 interface configuration for a Fedora host which has to fetch it’s network settings automatically via DHCP is as follows:

[root@fedora:/etc/network:]# cat /etc/sysconfig/network-scripts/ifcfg-eth1
# Intel Corporation 82557/8/9 [Ethernet Pro 100]DEVICE=eth1

To sum it up I think Fedora’s /etc/sysconfig/network-scripts methodology to configure ethernet devices is a way inferior if compared to Debian.

In GNU/Debian Linux configuration of all networking is (simpler)!, everything related to networking is in one single file ( /etc/network/interfaces ), moreover getting all the thorough documentation for the network configurations options for the interfaces is available as a system wide manual (e.g. man interfaces).

Partially Debian interfaces configuration is a bit more complicated in terms of syntax if matched against Redhat’s network-scripts/ifcfg-*, lest that generally I still find Debian’s manual network configuration interface to be easier to configure networking manually vicommand line.

Share this on