Posts Tagged ‘charset’

How to convert file content encoded in windows-cp1251 charset to UTF-8 (with iconv) to be delivered properly encoded to browsing end clients

Wednesday, May 16th, 2012

windows-cp1251 bulgarian to UTF-8 / Encoding Communication Decoding Communication Funny Picture

I have a bunch of old html files all encoded in the historically obsolete Windows-cp1251. Windows-CP1251 used to be common used 7 years ago and therefore still big portions of the web content in Bulgarian / Russian Cyrillic is still transferred to the end users in this encoding.

This was just before the "UTF-8 revolution", where massively people started using UTF-8,
Well it was clear the specific national country text encoding standards will quickly be moved by to UTF-8 – Universal Encoding format which abbreviation stands for (Unicode Transformation Format).

Though UTF-8 was clear to be "the future", many web developers mostly because of their incompetency or using an old sources of learning how to writen in HTML continued to use windows-cp1251 in HTMLs. I'm even convinced, there are still developers out there who are writting websites for Bulgarian / Russian / Macedonian customers using obsolete encodings …

The smarter developers of those accustomed to windows-cp1251, KOI-8R etc. etc., were using the meta tag to specify the type of charset of the web page content with:

<meta http-equiv="content-type" content="text/html;charset=windows-cp1251">

or

<meta http-equiv="content-type" content="text/html;charset=koi-8r">

Anyhow, still many devs even didn't placed the windows-cp1251 in the head of the HTML …

The result for the system administrator is always a mess – a lot of webpages that are showing like unreadable signs and tons of unhappy customers.
As always the system administrator is considered responsible, for the programmer mistakes :). So instead of programmers fix their bad cooking, the admin has to fix it all!

One quick work around me as admin has applied to failing to display pages in Cyrillic using the Windows-cp1251 character encoding was to force windows-cp1251 as a default encoding for the whole virtualhost or Apache directory with Apache directives like:

<VirtualHost *:80>
ServerAdmin some_user@some_host.com
DocumentRoot /var/www/html
AddDefaultCharset windows-cp1251
ServerName the_host_name.com
ServerAlias www.the_host_name.com
....
....
<Directory>
AddDefaultCharset windows-cp1251
>/Directory>
</VirtualHost>

Though this mostly would, work there are some occasions, where only a particular html files from all the content served by Apache is encoded in windows-cp1251, if most of the content is already written in UTF-8, this could be a big issues as you cannot just change the UTF-8 globally to windows-cp1251, just because few pages are written in archaic encoding….
Since most of the content is displayed to the client by Apache (as prior explained) just fine, only particular htmls lets's ay single.html, single2.html etc. etc. are displayed with some question marks or some non-human readable "hieroglyphs".

Below is a screenshot from two pages returned to my browser in wrongly set htmls charset:

Improper Windows CP1251 encoding with Apache set to serve UTF-8 encoding questiomarks

Improper Windows CP1251 delivered page in UTF-8 browser view

Apache returns cp1251 in some non-UTF8 wrong encoding (webserver improperly served cyrillic encoding)

Improperly served encoding CP1251 delivered by Apache in non-utf-8 encoding

When this kind of issues occur, the only solution is to simply login to the server and use iconv command to convert all files returning unreadable content from whatever the non UTF-8 encoding is lets say in my case Bulgarian typeset of cp1251 to UTF-8

Here is how the iconv command to convert between windows-cp1251 to utf-8 the two sample files named single1.html and single2.html

server:/web# /usr/bin/iconv -f WINDOWS-1251 -t UTF-8 single1.html > single1.html.utf8
server:/web# mv single1.html single1.html.bak;
server:/web# mv single1.html.utf8 single1.html
server:/web# /usr/bin/iconv -f WINDOWS-1251 -t UTF-8 single2.html > single2.html.utf8
server:/web# mv single2.html single2.html.bak;
server:/web# mv single2.html.utf8 single2.html

I always, make copies of the original cp1251 encoded files (as you see mv single1.html single1.html.bak), because if something goes wrong with convertion I can easily revert back.

If there are 10 files with consequential numbers naming they can be converted using a short for loop, like so:

server:/web# for i $(seq 1 10); do
/usr/bin/iconv -f WINDOWS-1251 -t UTF-8 single$i.html > single$i.html.utf8;mv single$i.html single$i.html.bak
mv single$i.html.utf8 single$i.html
done

Just as earlier mentioned if single1.html, single2.html … has in the html <head>:

<meta http-equiv="Content-Type" content="text/html; charset=windows-1251">

You should open, each of the files in question and wipe out the line either by hand or use sed to wipe it in one loop if it has to be done for lets say 10 files named (single{1..10})

server:/web# for i in $(seq 1 10); do
sed '/<meta http-equiv="Content-Type" content="text\/html; charset=windows-1251>/d' single$i.txt > single$i.txt.new;
mv single$i.txt single$i.txt.bak;
mv single$i.txt.new single$i.txt

Well now,

For the School-examination

Thursday, January 31st, 2008

Tell me which ideotic government would create a site based on php and would make the serverunder Windows?

Just Guess ours the Bulgarian ministry of Science and Knowledge has started a new site dedicated to helping graduating school pupils with the Future School-examinationthey have to make.

It’s pretty easy to see that just observe:

jericho% telnet zamaturite.bg 80

Trying 212.122.183.208…

Connected to zamaturite.bg.
Escape character is ‘^]’.
HEAD / HTTP/1.0HTTP/1.1 200 OK
Connection: close
Expires: Thu, 19 Nov 1981 08:52:00 GMT
Date: Wed, 30 Jan 2008 19:10:18 GMT
Content-Type: text/html; charset=UTF-8Server: Apache/2.2.6 (Win32) PHP/5.2.5X-Powered-By: PHP/5.2.5Set-Cookie: PHPSESSID=fn5jtjbet7clrapi0a5e5kgvt7; path=/
Cache-Control: no-store, no-cache, must-revalidate, post-check=0, pre-check=0
Pragma: no-cache
Keep-Alive: timeout=5, max=100
Connection closed by foreign host.
jericho%

Just great our Bulgarian government spend money on buying proprietary software OS to run a Free Software based solution.

This example is pretty examplary of what our country looks like. Sad …

END—–

How to change users quota to NO QUOTA on Qmail with Vpopmail Mail server install / Qmail mail over quota issue

Monday, February 20th, 2012

 

Qmail Vpopmail quota exceeded Dolphin Logo

Already on a couple of mail boxes located on one of the qmail powered mail servers I adminiter, there is an over QUOTA reached problem encountered.

Filling up the mailbox quota is not nice as mails starts get bounced back to the sender with a message QUOTA FULL or EXCEEDED MESSAGE, if this is a crucial mail waiting for some important data etc. the data is never received.
Below is a copy of the mail quota waarning notification message:

Delivered-To: email_use@my-mail-domain.net
Date: Wed, 15 Feb 2012 17:40:36 +0000
X-Comment: Rename/Copy this file to ~vpopmail/domains/.quotawarn.msg, and make appropriate changes
X-Comment: See README.quotas for more information
From: Mail Delivery System <Mailer-Daemon@different.bg>
Reply-To: email@www.pc-freak.net
To: Valued Customer:;
Subject: Mail quota warning
Mime-Version: 1.0
Content-Type: text/plain; charset=iso-8859-1
Content-Transfer-Encoding: 7bit
>
Your mailbox on the server is now more than 90% full. So that you can continue
to receive mail you need to remove some messages from your mailbox.

As you can read from the copy of the mail message above, the message content sent to the mail owner whose quota is getting full is red from /var/vpopmail/domains/.quotawarn.msg

The mail reaching quota problem is very likely to appear in cases like low mailbox quota set, but sometimes also occurs due to bugs in vpopmail quota handling.

Various interesting configuration settings for mail quotas etc. are in /home/vpopmail/etc/vlimits.default file, (assuming vpopmail is installed in /home).

In my specific case, the default vpopmail mailbox quota size was set to only 40 Megabytes.
40MB is too low if compared to todays mailbox size standards which in Gmail and Yahoo  mail services are already a couple of gigabytes.
Hence to get around the quota troubles, I  removed the quota for the mail.
To remove the quota size in vpopmail set for address (email_user@my-mail-domain.net) used cmd:

qmail-server:~# vmoduser -q NOQUOTA email_user@my-mail-domain.net

To save myself from future quota issues, I decided to apply a permanent fix to all those over quota size VPOPMAIL mailbox problems by removing completely quota restriction for all mailboxes in my vpopmail existent mail domain.

To do so, I wrote a quick simple bash loop one-liner script:

qmail-server:~# cd /home/vpopmail/domains
qmail-server:~/vpopmail/domains# cd my-mail-domain.net
qmail-server:~/vpopmail/domains/my-mail-domain.net# for i in *; do \
vmoduser -q NOQUOTA $(echo $i|grep -v vpasswd)@my-mail-domain.net; \
done

This works only on vpopmail installations which are configured to store the mail messages directly on the filesystem. Therefore this approach will not work for people who during vpopmail install had configured it to store mailboxes in MySQL or in other kind of SQL db engine.

Anyways for Vpopmail installed to use SQL backend, the script can be changed to read directly a list with all the mailboxes obtained from databasae (SQL query) and then, loop over each of the mail addresses apply the vmoduser -q NOQUOTA mail@samplemaildomain.net.

I've written also a few lines shell script (remove_vpopmail_emails_domain_quota.sh), it accepts one argument which is a vpopmail domain to which the admin would like to reset all applied mailbox quotas. The script is useful, if you have to often remove all quotas for vpopmail domainsor have to do quota wipe out simultaneously for multiple email domain names  located on different servers.