Ok, I've been using the Google Sitemap generator to access my logs and generate a sitemap for Radke Land. There are some unfortunate problems with this process. One of the biggest ones is that when a page name is changed, the record of visiting it lives in the logs for awhile before they 'age away'. So there are two lingering problems. First, stale links are included in the sitemap, and must be manually excluded. Second, the pile of exclusions in the configuration file controlling the sitemap generation just lives on forever (even after the stale links are gone from the logs).
Over the next week, I intend to write a perl program to handle both of these problems. First, the program will read the current sitemap, and try to wget all of the pages. Pages that fail will be reported. Second, the program will compare each of the exclusions in the sitemap file against all of the logs ... 'stale exclusions' will be reported.
This is fairly simple to do, and I hope it helps out with googlebot's failed visits to various pages on my site. Stay tuned for details.