Advice for Designing Vast Web Archives

by Jakob Nielsen on January 10, 1999

Here is some advice received from the Alertbox readers regarding the underlying problem of how to design a Web interface to a large archive of historical manuscripts .

Bertrand Denoix writes:

When I hear "information retrieval" and "web" in the same sentence, it hits a hot usability button of mine: more and more often, the results aren't bookmarkable , either because the interface is heavily frame-based, or because the results URLs aren't (time-dependent, or depend on cookies or what else).

Designers of such systems should bear in mind:

  • That users aren't searching for the fun of searching. They are there to get results, and be able to retrieve the results later. This means that the URL to the result should be usable later.
  • The results should be shareable with others, for instance by cut/paste to some e-mail. The recipient should be able to use that URL, which therefore shouldn't depend on cookies on the first computer.
  • The search process should be restartable from intermediate points, either by using the browser history or by re-using explicitly set bookmarks. These bookmarks should ideally be usable to prefill search pages, for instance, so that I could have a bookmark set for general search and other ones set for Computer Science, Biology, or whatever makes sense for the database/search.

John Hatchett writes:

I recently made use of a library collection, University of Texas Architectural Archives, and found myself lost in the process. While they have a website, it tends to focus on presentations on the organization of the staff, the history of the library, the operations of the library, and a brief listing of the collection.

A lesson can be gleamed from's use of online reviews and rankings. A library is not too much different than Amazon in some of it's goals. You want to guide the user to the appropriate title and lower the barrier to buying, which often means providing enough information to make the potential buyer/user not hesitate to make the purchase, or trip to the library in this case.

The plan to use web logs to determine Phase II activity is flawed. The problem with determining demand based solely on the usage through the internet is that until you have the collection itself online, you won't have true usage, and even then poor site design can lend to poor inferences. I would suggest that the number of times a resource has been requested historically is a better indicator of demand and thus a better prioritized list of what should be digitized.

Linda Wessel writes:

If I had this project the first thing I would do is cry. Then I would build a site following Jakob Nielsen's sage advice: Keep it simple. The beauty of this site is once it is finished there will be little maintenance.

Every large site I ever visited has the same problems. They are over-burdened with resource-hogging graphics and advertising. Additionally, the code is written using the latest technology tricks rather than HTML. The end result is often a mess that leaves users frustrated and pages unread. Very often these sites are placed on slow servers that can not handle the traffic or manage a page-call quickly and properly.

Speed will not be as much of an issue as five hundred users trying to see the same page at the same time. Thus the first step would be to determine the audience you wish to reach. If it is too technical and humorless viewers will leave. If it is too simply written, the academia will be insulted. There is a very thin line between success and failure when trying to please too many or few. And there is no easy answer.

To keep it simple I would place a picture of the Senator on the opening page with a brief history and index with live links. Additionally I would devote a page with links to every year and include sub-links to thumbnail size images of important photographs. Be sure to include a link to enlarge the photo (include size)for better viewing.

Each year would index sub-links to important categorized documents for that year. It's a lot of work and it's hard to ignore the temptation of grouping several years in one link, but the reward will be in ease of navigation. Users can go directly to June 1947 and locate document SB:xxxx.xx, providing they know what they are looking for, and also find the accompanying links to photographs, correspondence, etc. Let the users decide the type and how much information they want.

You will meet users that insist the site needs flash and pizazz that only new technology can provide. They are correct about new technology, but a great designer can also add 'pizazz' with HTML using colors, fonts, borders and backgrounds.

Be wary of using very dark background with red/blue/neon text. They may be 'hot' colors, but users with low-vision problems literally cannot see these color combinations.

In short, keep the site easy to use: Insist on code that the oldest browser will recognize; use light/white background and black text; write briefly and concisely for fast loading; include an index with live links; use thumbnails that can be enlarged; and let users determine their own needs.

Dean DeBolt from the University of West Florida Library writes:

Let me say at the beginning, that like Burt, I am an archivist at a different Florida university. I don't have Burt's immediate problem but I do expect to face similar questions about digitizing and placing online research collections similar though smaller than the Pepper Archives. For the moment, let me approach this problem as a researcher since I have done substantial research in archives for books and journal articles and the like.

As a researcher, I often do not know exactly what I'm looking for. What I'm looking for are clues, key words, and the like that pertain to what my topic is. When I do research in a group of papers (a pile, box, folder, etc.), I read or skim every one. If I find something, at that point I have the option of (1) writing the information down in a notebook, on a card, or whatever; (2) asking for a photocopy of the item; or making a note to come back to that later. In this way, I gradually glean clues and information from a wide variety of sources. These culled nuggets of information may lead me into other avenues of research as well.

Unfortunately for me, I don't know where important nuggets of information will occur. It's easy to guess at letters, memoranda, notebooks, diaries, and the like, but it is also surprising how a casual scratch on an invoice, sales slip, corner of an envelope, etc. may help me.

Translating these research habits to the web is intriguing. I would expect an internet/web site that has these same documents in electronic form to correspond roughly to the paper equivalent. That is, some type of hierarchical arrangement in categories, dates, types, and the like. That's not going to change my method of research very much, because I tend to look at everything, knowing that the original office clerk or later archivist could have unknowingly filed an important piece of paper in a wrong category, type, date. Since I am researching by reading and looking at the evidence, I don't need every item to be OCR'd or transcribed. I might also need to know what is on the BACK of a document if there is something on the reverse.

I would expect from a digital archive the same thing I would from a pile of papers. That I have access to ALL of them, no matter the inconsequentiality. That way I can be sure that I have not missed anything. If all the materials are not digitally scanned, then there should be clear SECTIONS so I would know what to look at later....Here I'm thinking of groups of materials, not that items 111, 118, 212, and 418 were not scanned, sorry. Trying to keep track of what was scanned and what I've seen, and what was not scanned would be a nightmare for me as a researcher UNLESS....clearly on each screen is some type of identifier for that particular document or image.

To use a publication as an example, I've seen lots of references to Washington Irving's History of New York with his description of St. Nicholas. If only those 2-3 pages were on a website, I would entirely miss the other 24 references to St. Nicholas in the book, including one about "laying a-finger aside of his nose" that was picked up by fellow Knickerbocker for his poem "Twas the Night Before Christmas." In short, as a researcher, I dislike the idea that someone else has judiciously decided what materials should be digitized and what should not. If the purpose of digitization and posting is to promote access, then only posting select examples does both me and other users a disservice.

Emery Jeffreys from Modis Government Systems Division writes:

Claude Pepper was an interesting man. Most people think of him as an advocate of senior citizens. Few people outside of archivists may realize that his influence dated all the way back to FDR.

I think categorizing Pepper's archive based on which material received the most requests is very risky. What if the material sought most isn't that critical to an understanding of Pepper? The collection might be better served by looking at what kinds of people made the requests.

I think adding the archives to the web based on requests from the collection is pretty useless unless you are doing some sort of study based on web accesses vs. request for access to the archives stacks.

If it were my project, I would put the entire collection online in this order:

  1. The basic web site explaining the collection. The outline of your plans for the initial site is good. However, if you have any means for site users to send feedback or ask questions, you had best be prepared to answer it or acknowledge it in some way.
  2. Put the archives up in stages. A good place to start would be in material that is accessed most. Then add content with less important or less access material.
  3. Give users an opportunity to both search or browse. It might also be helpful if there was a category for the material ranked by experts.

Share this article: Twitter | LinkedIn | Google+ | Email