Posts Tagged ‘htmls’

How to convert file content encoded in windows-cp1251 charset to UTF-8 (with iconv) to be delivered properly encoded to browsing end clients

Wednesday, May 16th, 2012

windows-cp1251 bulgarian to UTF-8 / Encoding Communication Decoding Communication Funny Picture

I have a bunch of old html files all encoded in the historically obsolete Windows-cp1251. Windows-CP1251 used to be common used 7 years ago and therefore still big portions of the web content in Bulgarian / Russian Cyrillic is still transferred to the end users in this encoding.

This was just before the "UTF-8 revolution", where massively people started using UTF-8,
Well it was clear the specific national country text encoding standards will quickly be moved by to UTF-8 – Universal Encoding format which abbreviation stands for (Unicode Transformation Format).

Though UTF-8 was clear to be "the future", many web developers mostly because of their incompetency or using an old sources of learning how to writen in HTML continued to use windows-cp1251 in HTMLs. I'm even convinced, there are still developers out there who are writting websites for Bulgarian / Russian / Macedonian customers using obsolete encodings …

The smarter developers of those accustomed to windows-cp1251, KOI-8R etc. etc., were using the meta tag to specify the type of charset of the web page content with:

<meta http-equiv="content-type" content="text/html;charset=windows-cp1251">

or

<meta http-equiv="content-type" content="text/html;charset=koi-8r">

Anyhow, still many devs even didn't placed the windows-cp1251 in the head of the HTML …

The result for the system administrator is always a mess – a lot of webpages that are showing like unreadable signs and tons of unhappy customers.
As always the system administrator is considered responsible, for the programmer mistakes :). So instead of programmers fix their bad cooking, the admin has to fix it all!

One quick work around me as admin has applied to failing to display pages in Cyrillic using the Windows-cp1251 character encoding was to force windows-cp1251 as a default encoding for the whole virtualhost or Apache directory with Apache directives like:

<VirtualHost *:80>
ServerAdmin some_user@some_host.com
DocumentRoot /var/www/html
AddDefaultCharset windows-cp1251
ServerName the_host_name.com
ServerAlias www.the_host_name.com
....
....
<Directory>
AddDefaultCharset windows-cp1251
>/Directory>
</VirtualHost>

Though this mostly would, work there are some occasions, where only a particular html files from all the content served by Apache is encoded in windows-cp1251, if most of the content is already written in UTF-8, this could be a big issues as you cannot just change the UTF-8 globally to windows-cp1251, just because few pages are written in archaic encoding….
Since most of the content is displayed to the client by Apache (as prior explained) just fine, only particular htmls lets's ay single.html, single2.html etc. etc. are displayed with some question marks or some non-human readable "hieroglyphs".

Below is a screenshot from two pages returned to my browser in wrongly set htmls charset:

Improper Windows CP1251 encoding with Apache set to serve UTF-8 encoding questiomarks

Improper Windows CP1251 delivered page in UTF-8 browser view

Apache returns cp1251 in some non-UTF8 wrong encoding (webserver improperly served cyrillic encoding)

Improperly served encoding CP1251 delivered by Apache in non-utf-8 encoding

When this kind of issues occur, the only solution is to simply login to the server and use iconv command to convert all files returning unreadable content from whatever the non UTF-8 encoding is lets say in my case Bulgarian typeset of cp1251 to UTF-8

Here is how the iconv command to convert between windows-cp1251 to utf-8 the two sample files named single1.html and single2.html

server:/web# /usr/bin/iconv -f WINDOWS-1251 -t UTF-8 single1.html > single1.html.utf8
server:/web# mv single1.html single1.html.bak;
server:/web# mv single1.html.utf8 single1.html
server:/web# /usr/bin/iconv -f WINDOWS-1251 -t UTF-8 single2.html > single2.html.utf8
server:/web# mv single2.html single2.html.bak;
server:/web# mv single2.html.utf8 single2.html

I always, make copies of the original cp1251 encoded files (as you see mv single1.html single1.html.bak), because if something goes wrong with convertion I can easily revert back.

If there are 10 files with consequential numbers naming they can be converted using a short for loop, like so:

server:/web# for i $(seq 1 10); do
/usr/bin/iconv -f WINDOWS-1251 -t UTF-8 single$i.html > single$i.html.utf8;mv single$i.html single$i.html.bak
mv single$i.html.utf8 single$i.html
done

Just as earlier mentioned if single1.html, single2.html … has in the html <head>:

<meta http-equiv="Content-Type" content="text/html; charset=windows-1251">

You should open, each of the files in question and wipe out the line either by hand or use sed to wipe it in one loop if it has to be done for lets say 10 files named (single{1..10})

server:/web# for i in $(seq 1 10); do
sed '/<meta http-equiv="Content-Type" content="text\/html; charset=windows-1251>/d' single$i.txt > single$i.txt.new;
mv single$i.txt single$i.txt.bak;
mv single$i.txt.new single$i.txt

Well now,

Install jwchat web chat jabber interface to work with Debian ejabberd jabber server

Wednesday, January 4th, 2012

JWChat ejabber jabber Ajax / HTML based client logo
 

I have recently blogged how I've installed & configured ejabberd (jabber server) on Debian .
Today I decided to further extend, my previous jabberd installation by installing JWChat a web chat interface frontend to ejabberd (a good substitute for a desktop app like pidgin which allows you to access a jabber server from anywhere)

Anyways for a base of installing JWChat , I used the previously installed debian deb version of ejabberd from the repositories.

I had a lot of troubles until I actually make it work because of some very minor mistakes in following the official described tutorial ejabberd website jwchat install tutorual

The only way I can make jwchat work was by using the install jwchat with ejabberd's HTTP-Bind and file server method

Actually for quite a long time I was not realizing that, there are two ways to install JWChat , so by mistake I was trying to mix up some install instructions from both jwchat HTTP-Bind file server method and JWchat Apache install method

I've seen many people complaining on the page of Install JWChat using Apache method which seemed to be experiencing a lot of strangle troubles just like the mines when I mixed up the jwchat php scripts install using instructions from both install methods. Therefore my guess is people who had troubles in installing using the Apache method and got the blank page issues while accessing http://jabber.servername.com:5280/http-poll/ as well as various XML Parsing Error: no element found errors on – http://ejabberd.oac.com:5280/http-poll/ is most probably caused by the same install instructions trap I was diluted in.

The steps to make JWChat install using the HTTP-Bind and file server method, if followed should be followed absolutely precisely or otherwise THEY WILL NOT WORK!!!

This are the exact steps I followed to make ejabberd work using the HTTP-Bind file server method :

1. Create directory to store the jwchat Ajax / htmls

debian:~# mkdir /var/lib/ejabberd/www
debian:~# chmod +x /var/lib/ejabberd
debian:~# chmod +x /var/lib/ejabberd/www

2. Modify /etc/ejabberd/ejabberd.cfg and include the following configs

While editting the conf find the section:

{listen,
[


Scrolling down you will fine some commented code marked with %% that will read:

{5269, ejabberd_s2s_in, [
{shaper, s2s_shaper},
{max_stanza_size, 131072}
]},

Right after it leave one new line and place the code:

{5280, ejabberd_http, [
{request_handlers, [
{["web"], mod_http_fileserver}
]},

http_bind,
http_poll,
web_admin
]}
]}.

Scrolling a bit down the file, there is a section which says:

%%% =======
%%% MODULES

%%
%% Modules enabled in all ejabberd virtual hosts.
%%

The section below the comments will look like so:

{modules, [ {mod_adhoc, []},
{mod_announce, [{access, announce}]}, % requires mod_adhoc
{mod_caps, []},
{mod_configure,[]}, % requires mod_adhoc
{mod_ctlextra, []},
{mod_disco, []},
%%{mod_echo, [{host, "echo.localhost"}]},
{mod_irc, []},
{mod_last, []},

After the {mod_last, … the following lines should be added:

{mod_http_bind, []},
{mod_http_fileserver, [
{docroot, "/var/lib/ejabberd/www"},
{accesslog, "/var/log/ejabberd/webaccess.log"}
]},

3. Download and extract latest version of jwchat

Of the time of writting the latest version of jwchat is jwchat-1.0 I have mirrored it on pc-freak for convenience:

debian:~# wget http://www.pc-freak.net/files/jwchat-1.0.tar.gz
….

debian:~# cd /var/lib/ejabberd/www
debian:/var/lib/ejabberd/www# tar -xzvf jwchat-1.0.tar.gz
...
debian:/var/lib/ejabberd/www# mv jwchat-1.0 jwchat
debian:/var/lib/ejabberd/www# cd jwchat

4. Choose the language in which you will prefer jwchat web interface to appear

I prefer english as most people would I suppose:

debian:/var/lib/ejabberd/www/jwchat# for a in $(ls *.en); do b=${a%.en}; cp $a $b; done

For other languages change in the small one liner shell script b=${a%.en} (en) to whatever language you will prefer to make primary.After selecting the correct langauge a rm cmd should be issued to get rid of the .js.* and .html.* in other language files which are no longer needed:

debian:/var/lib/ejabberd/www/jwchat# rm *.html.* *.js.*

5. Configure JWChat config.js

Edit /var/lib/ejabberd/www/jwchat/config.js , its necessery to have inside code definitions like:

/* If your Jabber server is jabber.example.org, set this: */
var SITENAME = "jabber.example.org";

/* If HTTP-Bind works correctly, you may want do remove HTTP-Poll here */
var BACKENDS =
[
{
name:"Native Binding",
description:"Ejabberd's native HTTP Binding backend",
httpbase:"/http-bind/",
type:"binding",
servers_allowed:[SITENAME]
}
];

6. Restart EJabberd server to load the new config settings

debian:~# /etc/init.d/ejabberd restart
Restarting jabber server: ejabberd..

7. Test JWChat HTTP-Bind and file server backend

I used elinksand my beloved Epiphany (default gnome browser) which by the way is the browser I use daily to test that the JWChat works fine with the ejabberd.
To test the newly installed HTTP-Bind ejabberd server backend on port 5280 I used URL:

http://jabber.mydomain.com:5280/web/jwchat/I had quite a struggles with 404 not found errors, which I couldn't explain for half an hour. After a thorough examination, I've figured out the reasons for the 404 errors was my stupidity …
The URL http://jabber.mydomain.com:5280/web/jwchat/ was incorrect because I fogrot to move jwchat-1.0 to jwchat e.g. (mv jwchat-1.0 jwchat) earlier explained in that article was a step I missed. Hence to access the web interface of the ejabberd without the 404 error I had to access it via:

http://jabber.mydomain.com:5280/web/jwchat-1.0

JWChat Ejabber webchat Epiphany Linux screenshot

Finally it is handy to add a small index.php redirect to redirect to http://jabber.mydomain.com:5280/web/jwchat-1.0/

The php should like so:


<?
php
header( 'Location: http://jabber.mydomain.com:5280/web/jwchat-1.0' ) ;
?>