Have you wondered how could you check your website for broken links? Cause I did!
You might wonder why should I care so hard about the broken links?
Well it’s simple broken links on your webpage will have an influence on how Google indexesyour website, and how often does it bother to crawl on your website in other words, havinga eagly eye on your broken links interface is a vital for every self respecting web developer,as well as for system administrators.
From a web development perspective, it’s important that your website have as less 404 error pages possible since that is important for Webpages W3C Compliancy.
On other hand if you’re a SEO Specialist, having as less broken links on your domains is vital for Google Pageranking,Yahoo, Live.com, Altavista, Yandex etc, as well as for general Good Search Engine Indexing.
Having said all that you should already feel the topic is really interesting. I believe not many people has wrote stuff aboutit online.
That’s why I decided to share with you a possible way on how to track your broken web domain for broken pages on the Linux and possibly other Unix compatible architectures.
There are plenty of tools available that could be used for finding out the broken linkson your website using Linux operating system.
I used apt-get in order to look for a link checker software for Linux
noah:/home/hipo/Desktop# apt-cache search 'link check'
htcheck-php - Simple php interface to database generated by ht://Check
htcheck - Utility for checking web site for dead/external links
linkchecker - check websites and HTML documents for broken links
linklint - A fast link checker and web site maintenance tool
The 2 error link reporting tools I used were:
1. linkchecker
and
2. Htcheck
I’ll evaluate both of the tools and will share with you my impressions of the two really valuable,broken link checking tools for Linux.
Let me begin with a short introduction on what you could expect from the linkchecker broken links (error 404) links.
Here are the linkchecker Featuresrecursive and multithreaded checking
output in colored or normal text, HTML, SQL, CSV, XML or a sitemap graph in different formats
HTTP/1.1, HTTPS, FTP, mailto:, news:, nntp:, Telnet and local file links support
restriction of link checking with regular expression filters for URLs
proxy support
username/password authorization for HTTP and FTP and Telnet
honors robots.txt exclusion protocol
Cookie support
HTML and CSS syntax check
Antivirus check
a command line interface
a GUI client interface
a (Fast)CGI web interface (requires HTTP server)
Luckily linchecker has a Debian package port so installing it comes as easily as executing:
root@noah:~# apt-get install linkchecker linkchecker-gui
However at the present moment on Debian Sid (Testing/Unstable) linkchecker-gui is missing some dependencies with libqt and python
so I was not able to install and test the Graphic User Interface for Linkchecker .Anyways here is a screenshot of the linkchecker GUI interface in order to give you a glipmse on what to expect if you succeed in installing it on Mac OS X or some other operating system.

Using linkchecker’s command line interface is really straight forward you just have too invoke the linkchecker command and pipe it too the tee shell command
Here is how:
root@noah:~# linkcheker http://www.pc-freak.net/ | tee -a pc-freak.net-broken-links-linkchecker.log
Though it’s simplicity to use from a first look checking the manual of linkchecker reveals quite many interesting usage parameters, so be sure also to take a look at the manual.
Of course it might be wise to combine linkchecker with some bash scripting in order to pereodically review your website or websites for broken links.
I intend to do that in the coming days so if I write some script that uses linkchecker and facilitates the search for a broken links I’ll post it on the blog.
Having said all that linkchecker goody, let me proceed further to Htcheck
HtCheck is really wondeful and it in a certain sense better than linkchecker, because it offers some extra possibilities like for instance generation of reports which could be stored in MySQL and could be visualized any time via a web browser.
Here is a descrition extracted from HtCheck’s website:
ht://Check is more than a link checker. It is a console application written for Linux systems in C++ and derived from ht://Dig.
It can retrieve information through HTTP/1.1 and store the information in a MySQL database, and it is particularly suitable for
small Internet domains or Intranet.
Its purpose is to help a webmaster manage one or more related sites: after a “crawl”, ht://Check gives back very useful summaries
and reports, including broken links, anchors not found, content-types and HTTP status codes summaries, etc.
From version 1.2.3, ht://Check also performs accessibility checks in accordance with the principles of the University of Toronto’s
Open Accessibility Checks (OAC) project, allowing users to discover site-wide barriers like images without proper alternatives,
missing titles, etc.
ht://Check can also be used for Web structure analysis, as it stores information regarding links between HTML documents.
I have to admit this htcheck-php is really handy! To use the extra php web interface to htcheck you’ll need the htcheck-php package installed.
To install both htcheck and it’s web interface on Debian you’ll need to issue the command:
root@noah:~# apt-get install htcheck htcheck-php
Now there are few more things to do before you could start using htcheck.
You’ll need to edit /etc/htcheck/htcheck.conf
There you will need to change at least the start_url variable.
Another necessery thing will be to use phpmyadmin or the console mysql client in order to create the required htcheck username and password and grant some relevant permissions to the htcheck user in MySQL.
Yet if you try to execute the htcheck binary (which by the way is written in C) to generate you will experience a problem with connecting to mysql’s database and you will most likely get the error message.
noah:/home/hipo# htcheck
Error (1045): Access denied for user 'root'@'localhost' (using password: NO)
! htcheck: Database error
That really pissed me off but anyways you’re lucky that I got it for you.
This whole issue is well documented in htcheck’s installation notes which you can read here
If you’re lazy reading the whole document just skip and read The Htcheck MySQL Connection Settings part
The solutions to the above pointed htcheck problem, where htcheck could not connect to the database is easily solvable, by creating a .my.cnf file in your home directory e.g. ~/ .
Let’s say you’re running with a root user the htcheck, all you need to do is edit /root/.my.cnf and place in it:
[client]
host=127.0.0.1
user=htcheck
password=yoursqlpassword
That’s it now issue again the htcheck command again, so that it could create the proper “htcheck” database (created by default) and store crawl your website for broken links and generate and store the reports in your MySQL server.
root@noah:~# htcheck -i
In the above example the “-i” option passed to htcheck will take care for “htcheck”‘s database to be rebuilt, that’s necessery especially if you made any changes in /etc/htcheck/htcheck.conf after the first time you have invoked on htcheck and you’d like the new configuration changes to reflected in the generated reports in MySQL.
However if you run the htcheck tool for a first time, you can start it without the “-i” flag.
In order to configure htcheck’s web reporting interface to be properly show website crawling statistics you’ll have also to edit /etc/htcheck/global.inc.php and set the username and password variables according to the ones you have previously choose while creating the htcheck’s MySQL username and password.
As a last step before you could use the htcheck’s Web gui interface through your browser is to either configure a virtualhost for htcheck in your Apache configuration or simply make an Apache Alias from your Apache configuration, on Debian, you’ll have to edit /etc/apache2/apache2.conf
Place the following Apache Alias in order to be able to access your htcheck’s statistics from your default configured Apache domain name.
Alias /usr/share/htcheck/php/ /htcheck/
In order to load the new Apache configurations as usual you’ll need an Apache WebServer restart
root@noah:~# /etc/init.d/apache2 restart
Here you can take a quick look what to expect from Htcheck’s PHP Web Error 404 reporting interface on a Debian GNU/Linux System:

Now Enjoy htcheck neat Error page discover tool and it’s web statistics interface!



sorry for the odd post, but I new at this and wanted to say I really like your site.
There is only one thing I find as a problem with htcheck. It doesn’t include a native support for SSL (the https protocol)! I’ll have to look further online for a web crawler that generates nice web logs and supports this. The good news here is that if you need a quick way to crawl your website, you can still use linkchecker which I’ve just tested and it plays nicely with both http and the https (SSL) protocols.
Hey man, I’m glad you like it. It’s nothing special though I’m trying to share a valuable stuff and that way probably be helpful to somebody out there.
Very Good Blog! I also like your clean and smooth design.
Guter Web Log, aber der Feed funktioniert nicht mit Opera.
Thanks for the good post but do you optimize for Firefox. I was having a little trouble with my browser.
Hey I’m happy you like my post.
I personally review my blog with Iceweasel (The Debian Linux variant of Firefox).
What kind of troubles did you experienced with Firefox?
Best
Appears like tons of xbox enthusiasts here, I am a fan too and love to play games… my gf says I play way too much, but man it’s so amusing. I’ve been playing cod: modern warfare 2 and halo for months and can’t quit! What would you guys recommend? Anyhow, looks like a nice website, is this wordpress? I’ve created a few pages myself and it’s not easy. Thanks for taking time to post.
Awesome images! I love the post so much!
nothing special
Just wanted to say thanks for sharing!
I’m glad you like it
Thanks I found just the info I already searched everywhere and just couldn’t find. What a perfect site.
Wow… Really informative post!
Have a good day!
wuhuu.. Cool danke.
this article is exactly what i have been looking for! found your page bookmarked by a friend of mine. I’ll also share it. Thanks again!
You made some good quality points there. I did a look foron the issueand found most people will will be in agreementwith your blog.
Purple monkey dishwasher
I hate giving away my secrets but I have been using a service called Directory Maximizer for all my submissions for years.
Hi! I really love reading your shares. Thanks for your cool shares.
Excellent post
I have been on your site a few times and I always leave with a gem. I only thought it fair to deposit my comment as a way of showing appreciation. I honestly think you know what you are talking about. Out of your passion, you have become an authority on blogging. Congratulations.
I was doing a search and came across this blog site. I must admit that this article is on point! Keep writing more. I will be reading your posts
Wow, brilliant post, I was thinking how to do that. and found across your web page from yahoo, tons of brilliant stuff here, now that I’ve got some idea. I’ve bookmarked your blog and also added rss. Please keep us updated….
Pretty insightfull post. Never thought that it was this easy after all. I have spent a a lot of my time looking for someone to explain this matter clearly and you’re the only person that ever did that. I really appreciate it! Keep it up!
this was a very entertaining read. i enjoyed it very much!
Thank you for the smart critique. Me and my neighbor had been just getting ready to do some analysis about this. We got a grab a ebook from our local library but I think I discovered more from this publish. I’m very glad to view this kind of great info being shared freely out there.
Wow that is an very crazy article for me. I like your style of writing. Maybe you should write more articles of these type. By the way, sorry for my bad english
What I don’t realize is how you’re not even more well-liked than that you are now. You are just so intelligent. You know so significantly about this subject, produced me consider about it from numerous distinct angles. Its like people today arent interested unless it has a thing to complete with Lady Gaga! Your stuffs good. Preserve it up!
Really good post.. thanks..
I enjoy the opinions on this site, it definitely gives it that community feel!
I have really learned some new things through your weblog. One other thing I would really like to say is that often newer computer system operating systems tend to allow additional memory to be utilized, but they also demand more ram simply to run. If people’s computer could not handle additional memory plus the newest application requires that memory space increase, it usually is the time to shop for a new Laptop or computer. Thanks
Hi Steve,
Great I’m happy someone appreciate my writtings. Hope to see you around.
Best!
Georgi