Once you have put a page on the Web, you need to keep it there indefinitely:
- Other sites may link to your page, so removing it will cause linkrot and lost business opportunities as you turn away new users
- Users may have bookmarked the page because they want to go directly to a relevant part of your site instead of starting at the home page every time
- Search engines are slow in updating their databases, so they too will lead users astray if you remove pages
- Old content adds value to your site: some users will benefit from the old pages, so why not keep serving these customers?
The first three reasons are really arguments why URLs must stay active forever : any URL that has ever been exposed to the outside world must continue to bring up something reasonable when people go to it. Because they will. It is common experience among webmasters that they keep getting hits on URLs that were put out of service several years ago.
Even if you believe that the old page has zero value, the old URL should be supported and made into a redirect to the closest related page on the site.
Value of Old Content
Most old pages do have value for users, so I recommend keeping the pages themselves alive forever. Sure, new content is probably more valuable than old content, but there is more old content to choose from. As an example, consider a site that publishes new content on a weekly basis. After a year, this site will consist of 51 old editions and one new edition. Assuming that new content is ten times as valuable as old content, 84% of the site's value comes from old content .
I still get about 50 visitors per week who follow the link to my site from an article about usability in The New York Times four months ago (warning: access to the Times requires free registration). Adjusting for link click-through, this means that the newspaper provides extra value to many more readers simply by leaving this old article on their server. A great way to establish a reputation as a substantial online service of record.
Another example is my 1996 article on the top-ten mistakes of Web design . As shown in the table, it is getting more readers every year:
Admittedly, the "top-ten mistakes" Alertbox is somewhat of a Web classic, but the more average Alertbox columns also get most of their readers when they are "old." A typical Alertbox accumulates about 80,000 page views over time, only 20,000 of which are received while it is the "current" column.
Users benefit from old content because:
- it may be intrinsically interesting and worth reading even when it's not news (say, a well-written essay)
- it can experience renewed interest due to later events (what did the new CEO of your main competitor do two jobs ago?)
- it can have historical interest (how did reviewers view Gone With the Wind when it opened?)
- it helps with old products (your neighbor has an HP printer from 1995 for sale: will it satisfy your needs?)
- it provides background information and a richer texture for a website: the true killer app for the Web is diversity (Amazon.com gets many sales from listing a huge number of old books that each sell only a few copies per year; listing out-of-print books that they don't sell adds to the value of the service and makes users more likely to come back)
Cost of Old Content
From a site management perspective, the cost of keeping old content is trivial: the cost of hard disk space is close to zero, and the cost of maintaining old files can be very low if they are developed according to the HTML standards or kept in a publishing database.
In order to enhance the value of the old content, I recommend investing a small amount of resources on content gardening :
- have new articles link to old content for background or supplementary information: since the new content may be written by people who don't know the old stuff, it is often an editorial function to add these links
- maintain the links in the old files: kill or replace outdated ones
- add forward links to the old pages so that they point to newer pages: otherwise users will never discover follow-on products and more recent developments in a case
- remove obsolete or misleading information and replace with current data or a current link (for example, the announcement of a conference or product launch may be replaced with the proceedings or a report from the event; also add a forward link to this year's conference)
The cost of maintaining old content may be about 10% of the original cost of developing this content, but since doing so more than doubles the value of the website, it is a good investment.
Make Time Explicit
It can be confusing for users to stumble across old content if they are looking for current information. Confusion can be minimized by:
- explicitly mention the date (including the year) the page was originally written
- add a prominent disclaimer to point out ways in which the page no longer applies (e.g., "This product is no longer being manufactured" )
- forward-pointing links to the most recent pages about the same topic
Downplay Old Content in Search Listings
After a few years of accumulating old content, search results listings can be dominated by pointers to old stuff unless steps are taken to increase the priority of new content.
The simplest solution is to have the search engine give a lower weight to old pages. Note that the weight should be computed relative to the creation date and not to the latest modification date (which will often be very recent if the old content has been maintained properly).
A more advanced solution is to think of search as more of an index to the site and less of a simple keyword-counting operation. In this model, the search weight of old content will change based on its changing value as a resource for each query. It is a difficult research challenge to fully do this, but a manual approximation would be to have the content gardener change the search weights for each meta-keyword based on its current relevance.
See reader comments on this Alertbox.