Checking your website for broken Links on Linux with linkchecker and htcheck / How to find broken links on your website

Have you wondered how could you check your website for broken links? Cause I did!
You might wonder why should I care so hard about the broken links?

Well it’s simple broken links on your webpage will have an influence on how Google indexesyour website, and how often does it bother to crawl on your website in other words, havinga eagly eye on your broken links interface is a vital for every self respecting web developer,as well as for system administrators.
From a web development perspective, it’s important that your website have as less 404 error pages possible since that is important for Webpages W3C Compliancy.
On other hand if you’re a SEO Specialist, having as less broken links on your domains is vital for Google Pageranking,Yahoo, Live.com, Altavista, Yandex etc, as well as for general Good Search Engine Indexing.
Having said all that you should already feel the topic is really interesting. I believe not many people has wrote stuff aboutit online.
That’s why I decided to share with you a possible way on how to track your broken web domain for broken pages on the Linux and possibly other Unix compatible architectures.

There are plenty of tools available that could be used for finding out the broken linkson your website using Linux operating system.

I used apt-get in order to look for a link checker software for Linux

noah:/home/hipo/Desktop# apt-cache search 'link check'
htcheck-php - Simple php interface to database generated by ht://Check
htcheck - Utility for checking web site for dead/external links
linkchecker - check websites and HTML documents for broken links
linklint - A fast link checker and web site maintenance tool

The 2 error link reporting tools I used were:

1. linkchecker

and

2. Htcheck

I’ll evaluate both of the tools and will share with you my impressions of the two really valuable,broken link checking tools for Linux.

Let me begin with a short introduction on what you could expect from the linkchecker broken links (error 404) links.

Here are the linkchecker Featuresrecursive and multithreaded checking
output in colored or normal text, HTML, SQL, CSV, XML or a sitemap graph in different formats
HTTP/1.1, HTTPS, FTP, mailto:, news:, nntp:, Telnet and local file links support
restriction of link checking with regular expression filters for URLs
proxy support
username/password authorization for HTTP and FTP and Telnet
honors robots.txt exclusion protocol
Cookie support
HTML and CSS syntax check
Antivirus check
a command line interface
a GUI client interface
a (Fast)CGI web interface (requires HTTP server)

Luckily linchecker has a Debian package port so installing it comes as easily as executing:

root@noah:~# apt-get install linkchecker linkchecker-gui

However at the present moment on Debian Sid (Testing/Unstable) linkchecker-gui is missing some dependencies with libqt and python
so I was not able to install and test the Graphic User Interface for Linkchecker .Anyways here is a screenshot of the linkchecker GUI interface in order to give you a glipmse on what to expect if you succeed in installing it on Mac OS X or some other operating system.

Linkchecker GUI, Link Checker Graphic User Interface

Using linkchecker’s command line interface is really straight forward you just have too invoke the linkchecker command and pipe it too the tee shell command

Here is how:

root@noah:~# linkcheker http://www.pc-freak.net/ | tee -a pc-freak.net-broken-links-linkchecker.log

Though it’s simplicity to use from a first look checking the manual of linkchecker reveals quite many interesting usage parameters, so be sure also to take a look at the manual.
Of course it might be wise to combine linkchecker with some bash scripting in order to pereodically review your website or websites for broken links.

I intend to do that in the coming days so if I write some script that uses linkchecker and facilitates the search for a broken links I’ll post it on the blog.

Having said all that linkchecker goody, let me proceed further to Htcheck

HtCheck is really wondeful and it in a certain sense better than linkchecker, because it offers some extra possibilities like for instance generation of reports which could be stored in MySQL and could be visualized any time via a web browser.

Here is a descrition extracted from HtCheck’s website:

ht://Check is more than a link checker. It is a console application written for Linux systems in C++ and derived from ht://Dig.

It can retrieve information through HTTP/1.1 and store the information in a MySQL database, and it is particularly suitable for
small Internet domains or Intranet.

Its purpose is to help a webmaster manage one or more related sites: after a “crawl”, ht://Check gives back very useful summaries
and reports, including broken links, anchors not found, content-types and HTTP status codes summaries, etc.

From version 1.2.3, ht://Check also performs accessibility checks in accordance with the principles of the University of Toronto’s
Open Accessibility Checks (OAC) project, allowing users to discover site-wide barriers like images without proper alternatives,
missing titles, etc.

ht://Check can also be used for Web structure analysis, as it stores information regarding links between HTML documents.

I have to admit this htcheck-php is really handy! To use the extra php web interface to htcheck you’ll need the htcheck-php package installed.

To install both htcheck and it’s web interface on Debian you’ll need to issue the command:

root@noah:~# apt-get install htcheck htcheck-php

Now there are few more things to do before you could start using htcheck.
You’ll need to edit /etc/htcheck/htcheck.conf

There you will need to change at least the start_url variable.

Another necessery thing will be to use phpmyadmin or the console mysql client in order to create the required htcheck username and password and grant some relevant permissions to the htcheck user in MySQL.

Yet if you try to execute the htcheck binary (which by the way is written in C) to generate you will experience a problem with connecting to mysql’s database and you will most likely get the error message.

noah:/home/hipo# htcheck
Error (1045): Access denied for user 'root'@'localhost' (using password: NO)
! htcheck: Database error

That really pissed me off but anyways you’re lucky that I got it for you.
This whole issue is well documented in htcheck’s installation notes which you can read here

If you’re lazy reading the whole document just skip and read The Htcheck MySQL Connection Settings part

The solutions to the above pointed htcheck problem, where htcheck could not connect to the database is easily solvable, by creating a .my.cnf file in your home directory e.g. ~/ .

Let’s say you’re running with a root user the htcheck, all you need to do is edit /root/.my.cnf and place in it:

[client]
host=127.0.0.1
user=htcheck
password=yoursqlpassword

That’s it now issue again the htcheck command again, so that it could create the proper “htcheck” database (created by default) and store crawl your website for broken links and generate and store the reports in your MySQL server.

root@noah:~# htcheck -i

In the above example the “-i” option passed to htcheck will take care for “htcheck”‘s database to be rebuilt, that’s necessery especially if you made any changes in /etc/htcheck/htcheck.conf after the first time you have invoked on htcheck and you’d like the new configuration changes to reflected in the generated reports in MySQL.

However if you run the htcheck tool for a first time, you can start it without the “-i” flag.

In order to configure htcheck’s web reporting interface to be properly show website crawling statistics you’ll have also to edit /etc/htcheck/global.inc.php and set the username and password variables according to the ones you have previously choose while creating the htcheck’s MySQL username and password.

As a last step before you could use the htcheck’s Web gui interface through your browser is to either configure a virtualhost for htcheck in your Apache configuration or simply make an Apache Alias from your Apache configuration, on Debian, you’ll have to edit /etc/apache2/apache2.conf
Place the following Apache Alias in order to be able to access your htcheck’s statistics from your default configured Apache domain name.

Alias /usr/share/htcheck/php/ /htcheck/

In order to load the new Apache configurations as usual you’ll need an Apache WebServer restart

root@noah:~# /etc/init.d/apache2 restart

Here you can take a quick look what to expect from Htcheck’s PHP Web Error 404 reporting interface on a Debian GNU/Linux System:

htcheck php graphic user interface, htcheck gui interface

Now Enjoy htcheck neat Error page discover tool and it’s web statistics interface!

Related Posts

  • No Related Post

Tags:

33 Responses to “Checking your website for broken Links on Linux with linkchecker and htcheck / How to find broken links on your website”

  1. Insanity Workout says:

    sorry for the odd post, but I new at this and wanted to say I really like your site.

  2. admin says:

    There is only one thing I find as a problem with htcheck. It doesn’t include a native support for SSL (the https protocol)! I’ll have to look further online for a web crawler that generates nice web logs and supports this. The good news here is that if you need a quick way to crawl your website, you can still use linkchecker which I’ve just tested and it plays nicely with both http and the https (SSL) protocols.

  3. admin says:

    Hey man, I’m glad you like it. It’s nothing special though I’m trying to share a valuable stuff and that way probably be helpful to somebody out there.

  4. Kimberlie Whirlow says:

    Very Good Blog! I also like your clean and smooth design.

  5. Amateure Frei says:

    Guter Web Log, aber der Feed funktioniert nicht mit Opera.

  6. Leif Alire says:

    Thanks for the good post but do you optimize for Firefox. I was having a little trouble with my browser.

    • admin says:

      Hey I’m happy you like my post.
      I personally review my blog with Iceweasel (The Debian Linux variant of Firefox).
      What kind of troubles did you experienced with Firefox?

      Best

  7. Kyle Miotke says:

    Appears like tons of xbox enthusiasts here, I am a fan too and love to play games… my gf says I play way too much, but man it’s so amusing. I’ve been playing cod: modern warfare 2 and halo for months and can’t quit! What would you guys recommend? Anyhow, looks like a nice website, is this wordpress? I’ve created a few pages myself and it’s not easy. Thanks for taking time to post.

  8. Sergio Haluska says:

    Awesome images! I love the post so much! ;)

  9. Derek says:

    Just wanted to say thanks for sharing!

  10. computer technology information says:

    Thanks I found just the info I already searched everywhere and just couldn’t find. What a perfect site.

  11. Alexandria Gartenmayer says:

    Wow… Really informative post!
    Have a good day!

  12. Dominik says:

    wuhuu.. Cool danke.

  13. remortgage loan says:

    this article is exactly what i have been looking for! found your page bookmarked by a friend of mine. I’ll also share it. Thanks again!

  14. ชุดราตรี says:

    You made some good quality points there. I did a look foron the issueand found most people will will be in agreementwith your blog.

  15. hacking says:

    Purple monkey dishwasher

  16. SmallQueen says:

    I hate giving away my secrets but I have been using a service called Directory Maximizer for all my submissions for years.

  17. Backlink building software says:

    Hi! I really love reading your shares. Thanks for your cool shares.

  18. template joomla says:

    Excellent post

  19. pewter goblet says:

    I have been on your site a few times and I always leave with a gem. I only thought it fair to deposit my comment as a way of showing appreciation. I honestly think you know what you are talking about. Out of your passion, you have become an authority on blogging. Congratulations.

  20. Hank Villescas says:

    I was doing a search and came across this blog site. I must admit that this article is on point! Keep writing more. I will be reading your posts

  21. Robert John says:

    Wow, brilliant post, I was thinking how to do that. and found across your web page from yahoo, tons of brilliant stuff here, now that I’ve got some idea. I’ve bookmarked your blog and also added rss. Please keep us updated….

  22. Albany Optimization says:

    Pretty insightfull post. Never thought that it was this easy after all. I have spent a a lot of my time looking for someone to explain this matter clearly and you’re the only person that ever did that. I really appreciate it! Keep it up!

  23. Johnie Emigh says:

    this was a very entertaining read. i enjoyed it very much!

  24. Demetrius Berretti says:

    Thank you for the smart critique. Me and my neighbor had been just getting ready to do some analysis about this. We got a grab a ebook from our local library but I think I discovered more from this publish. I’m very glad to view this kind of great info being shared freely out there.

  25. LCD Fernseher Günstig kaufen says:

    Wow that is an very crazy article for me. I like your style of writing. Maybe you should write more articles of these type. By the way, sorry for my bad english ;)

  26. Shanta Caravalho says:

    What I don’t realize is how you’re not even more well-liked than that you are now. You are just so intelligent. You know so significantly about this subject, produced me consider about it from numerous distinct angles. Its like people today arent interested unless it has a thing to complete with Lady Gaga! Your stuffs good. Preserve it up!

  27. Kristy Niles says:

    Really good post.. thanks.. :)

  28. Wills says:

    I enjoy the opinions on this site, it definitely gives it that community feel!

  29. Steve Wegley says:

    I have really learned some new things through your weblog. One other thing I would really like to say is that often newer computer system operating systems tend to allow additional memory to be utilized, but they also demand more ram simply to run. If people’s computer could not handle additional memory plus the newest application requires that memory space increase, it usually is the time to shop for a new Laptop or computer. Thanks

Leave a Reply

Yandex Mail.ru Google LiveJournal myOpenId Flickr claimId Blogger Wordpress OpenID Yahoo Technorati Vidoop Verisign AOL