Home Accessibility Courses Twitter The Mouth Facebook Resources Site Map About Us Contact
 
For 2023 (and 2024 ...) - we are now fully retired from IT training.
We have made many, many friends over 25 years of teaching about Python, Tcl, Perl, PHP, Lua, Java, C and C++ - and MySQL, Linux and Solaris/SunOS too. Our training notes are now very much out of date, but due to upward compatability most of our examples remain operational and even relevant ad you are welcome to make us if them "as seen" and at your own risk.

Lisa and I (Graham) now live in what was our training centre in Melksham - happy to meet with former delegates here - but do check ahead before coming round. We are far from inactive - rather, enjoying the times that we are retired but still healthy enough in mind and body to be active!

I am also active in many other area and still look after a lot of web sites - you can find an index ((here))
robots.txt - a clue to hidden pages?

The robots.txt file is designed to provide spiders and crawlers with a list of places they should NOT go - it's described as the "robot exclusion standard" file and its intent is to allow the webmaster to segregate his site into indexable and non-indexable.

But because it lists directorys to be excluded, robots.txt is often an excellent source of links people don't want to be found. I have numerous examples that I've seen (and will NOT reproduce here!) where directories that are not for public consumption are listed. And - in theory - I'm perfectly at liberty to read the site's robots.txt with a regular browser then step through the places that robots are excluded manually to see what's there.

If you want to protect areas of your site from prying eyes / accidental discovery, do NOT rely on robots.txt - use a passwording system or some other form of authentication.

Our robots.txt file - which I'll happily reproduce here - lists URLs that I don't mind people finding - I just don't want them indexed. So even if they're looking with malicious intent - which I doubt - they won't "get" anywhere.

#
# robots.txt file for www.wellho.net and www.wellho.co.uk
#
# we encourage robots to visit and index almost ALL documents
# but not any executable scripts.
#
User-agent: *
Disallow: /cgi-bin/
Disallow: /net/unique.html


So all robots are allowed anywhere EXCEPT to cgi scripts, which we don't want indexed. On our site, all such scripts change their reports regularly and depending on the information entered, and so it would be misleading to encourage indexers to list them.

The /net/unique.html page is sortof-internal. It's generated by one of our site scripts and lists words that occur only once on the rest of the site. Purpose? to help us find spelling mistakes! I don't mind anyone seeing the page - and indeed I've just provide you with a link to it in this article - but people REALLY won't want to land there when they do a search!
(written 2007-01-13, updated 2007-01-17)

 
Associated topics are indexed as below, or enter http://melksh.am/nnnn for individual articles
W603 - Web and Intranet - Server Side Technologies
  [642] How similar are two words - (2006-03-11)
  [653] Easy feed! - (2006-03-21)
  [732] Where is a web site visitor browsing from - (2006-05-24)
  [1020] Parallel processing in PHP - (2007-01-03)
  [1355] .php or .html extension? Morally Static Pages - (2007-09-17)
  [1365] Korn Shell scripts on the web - (2007-09-25)
  [1554] Online hotel reservations - Melksham, Wiltshire (near Bath) - (2008-02-24)
  [1615] PHP training courses every month - (2008-04-18)
  [1749] Using server side and client side programming together - (2008-08-11)
  [2055] Effect on server when memory runs out and swapping starts - (2009-02-26)
  [2282] Checking robots.txt from Python - (2009-07-12)
  [3705] Django Training Courses - UK - (2012-04-23)
  [3915] How does PHP work? - (2012-11-07)
  [4277] Sending a message to the server and changing text on a page when a button is pressed - (2014-05-23)

W501 - Introduction to Web Site Structure
  [332] Looking up IP addresses - (2005-06-01)
  [528] Getting favicon to work - avoiding common pitfalls - (2005-12-14)
  [1024] Web site - a refresh to improve navigation - (2007-01-07)
  [1168] Moving out some of the web site bloat - (2007-04-29)
  [1176] A pu that got me into trouble - (2007-05-04)
  [1198] From Web to Web 2 - (2007-05-21)
  [1431] Getting the community on line - some basics - (2007-11-13)
  [1636] What to do if the Home Page is missing - (2008-05-08)
  [1686] FTP - how not to corrupt data (binary v ascii) - (2008-06-24)
  [1969] Search Engines. Getting the right pages seen. - (2009-01-01)
  [2094] If you have a spelling mistake in your URL / page name - (2009-03-21)
  [2214] Global Index to help you find resources - (2009-06-01)
  [2552] Web site traffic - real users, or just noise? - (2009-12-26)

P608 - Perl - Robots, Crawlers and Spiders
  [2045] Does robots.txt actually work? - (2009-02-16)
  [2229] Do not re-invent the wheel - use a Perl module - (2009-06-11)
  [2402] Automated Browsing in Perl - (2009-09-11)


Back to
Hotel for Trowbridge
Previous and next
or
Horse's mouth home
Forward to
Chronic fatigue help - a new discussion forum
Some other Articles
Longer hours and better value courses
The new web site look spreads
Empty at Easleigh, Missing at Melksham, Overflowing at Oldfield
Chronic fatigue help - a new discussion forum
robots.txt - a clue to hidden pages?
Hotel for Trowbridge
Our search engine placement is dropping.
Linux / Unix - process priority and nice
Cue the music, I'm happy.
The Wheatsheaf 2, The Bell 0
4759 posts, page by page
Link to page ... 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 36, 37, 38, 39, 40, 41, 42, 43, 44, 45, 46, 47, 48, 49, 50, 51, 52, 53, 54, 55, 56, 57, 58, 59, 60, 61, 62, 63, 64, 65, 66, 67, 68, 69, 70, 71, 72, 73, 74, 75, 76, 77, 78, 79, 80, 81, 82, 83, 84, 85, 86, 87, 88, 89, 90, 91, 92, 93, 94, 95, 96 at 50 posts per page


This is a page archived from The Horse's Mouth at http://www.wellho.net/horse/ - the diary and writings of Graham Ellis. Every attempt was made to provide current information at the time the page was written, but things do move forward in our business - new software releases, price changes, new techniques. Please check back via our main site for current courses, prices, versions, etc - any mention of a price in "The Horse's Mouth" cannot be taken as an offer to supply at that price.

Link to Ezine home page (for reading).
Link to Blogging home page (to add comments).

You can Add a comment or ranking to this page

© WELL HOUSE CONSULTANTS LTD., 2024: 48 Spa Road • Melksham, Wiltshire • United Kingdom • SN12 7NY
PH: 01144 1225 708225 • EMAIL: info@wellho.net • WEB: http://www.wellho.net • SKYPE: wellho

PAGE: http://www.wellho.info/mouth/1031_rob ... ages-.html • PAGE BUILT: Sun Oct 11 16:07:41 2020 • BUILD SYSTEM: JelliaJamb