WHITE HOUSE’S SEARCH ENGINE PRACTICES CAUSE CONCERN
Posted 28 Oct 2003 04:59:54 UTC
Senator Hiram Johnson famously quipped that “the first casualty when war comes is the truth.” As the war in Iraq continues, is the White House intentionally preventing search engines from preserving a record of its statements on the conflict? Or, did their staff simply make a technical mistake?
When search engines “spider” the web in search of documents for their indices, web site owners sometimes put a file called robots.txt which instructs the “spiders” not to index certain files. This can be for policy reasons, if an author does not want his or her pages to appear in search listings, or it can be for technical reasons, for example if a web site is dynamically generated and can not or should not be downloaded in its entirety.
According to reports, though, the White House is requesting that search engines not index certain pages related to Iraq. In addition to stopping searches, this prevents archives like Google’s cache and the Internet Archive from storing copies of pages that may later change. 2600 called the White House to investigate the matter.
According to White House spokesman Jimmy Orr, the blocking of search engines is not an attempt to ensure future revisions will remain undetected. Rather, he explained, they “have an Iraq section [of the website] with a different template than the main site.” Thus, for example, a press release on a meeting between President Bush and “Special Envoy” Bremer is available in the Iraq template (blocked from being indexed by search engines) or the normal White House template (available for indexing by search engines). The attempt, Mr. Orr said, was that when people search, they should not get multiple copies of the same information. Most of the “suspicious” entries in the robots.txt file do, indeed, appear to have only this effect.
According to the robots.txt of October 24, though, the In Focus: Iraq section of the site was blocked from search engines. Some of the information there does not appear to be available anywhere else on the White House site. However, it seems that, in response to inquiries from 2600 and other sources, the White House web team has recently changed their robots.txt so that these files are no longer blocked. (The current Last-Modified date on the robots.txt is 23:22 GMT, October 27th, after work on this article had already begun.)
It is of course open to speculation as to whether the original blocking of the content in question was malicious or an honest mistake. Certainly anyone who maintains a large website has made some sort of technical mistake at least once, and the promptness with which the error was fixed after it was pointed out suggests that the White House had no interest in keeping it in place. The White House, as an entity responsible to the citizenry and an entity that has generated a lot of criticism over its handling of the situation in Iraq, ought to take special care to avoid similar mistakes in the future. Nonetheless, we are pleased to learn that, at least this time, the issue seems to have been resolved promptly.