Summary: Chapter 8 from Jakob Nielsen's book, Multimedia and Hypertext: The Internet and Beyond, explores a variety of information retrieval strategies for dealing with the ever-increasing volume of information on the internet.
This is chapter 8 from Jakob Nielsen's book Multimedia and Hypertext: The Internet and Beyond, Morgan Kaufmann Publishers, 1995. (For full literature references, please see the bibliography in the book.)
Michael Lesk once wrote a paper called "What To Do When There's Too Much Information" [Lesk 1989]. Lesk was dealing with a hypertext system with 800,000 objects which is certainly larger than most current systems, but future systems will have to deal with at least that many objects and possibly more. Consider that the number of objects available over the WWW was probably at least two million in the beginning of 1995 and that the Library of Congress holds more than 100 million publications. On the WWW, the millions of objects are not registered in any single place, so no single user interface has to deal with that many objects, but in return, the user has no way of truly taking advantage of the full amount of information because it is not being managed and presented in any way.
Figure 8.1. Chart showing the growth in number of entries per year in the HCIbib database of the human-computer interaction literature.
The Internet is about doubling every year. The amount of data transmitted over the Usenet netnews is growing by about 181% per year according to statistics from UUnet (a major netnews hub). The actual number of articles transmitted is "only" growing by 132%, and the discrepancy between these two numbers can probably be explained by the growing popularity of transmitting long messages with executable programs and digitized images.
At the end of 1994, about 100,000 netnews articles were posted per day, and 48% of the total number of bytes transmitted were due to articles that were 16 kB or more. In the beginning of 1994, only 40,000 articles were posted per day, and only 33% of the transmitted data were due to articles that were 16 kB or more.
Also, the number of newsgroups to which these messages get posted is growing by 52% per year, meaning that individual newsgroups do not see quite as rapid growth. Even so, individual newsgroups do grow. For example, alt.hypertext had an annual growth rate of 77% in number of messages posted from 1992 to 1994 and comp.human-factors had an annual growth rate of 92% in number of messages posted. These growth rates are largely due to the increase in readership that follows from the annual doubling of the Internet.
I subscribe to an email mailing list of Danes living abroad, and the statistics from this group are a good example of the increasing information overload on the Internet. The number of words of email traffic sent to the group has been growing by 170% per year from about 16,000 words in 1990 to almost 900,000 words in 1994. The number of distribution list members grew by 118% per year over the same period. The must faster growth in traffic than in the number of people is probably related to the fact that the number of potential interactions (person A commenting on person B's postings) grows by the square of the number of members.
Rapidly growing amounts of information can also be found outside electronic systems. According to the January 1990 issue of the journal Mathematical Review , the number of mathematical papers published annually has grown from 840 a year in 1870 to about 50,000 in 1989, with an accelerating growth that has doubled the number of papers every ten years since World War 2. In general, the number of scientific papers across all fields have been doubling every 10-15 years for the last two centuries [Price 1956, as cited in Odlyzko 1995]. In fact, the number of paper research publications has grown so large that many scientists have given up keeping up with all of the literature, even in their own highly specialized fields, and many journals now have the reputation of being "write-only," meaning that they are not being read very much. The Science Citation Index has found that more than half of the papers published in research journals are never cited by anybody else, and even though that does not prove that nobody read those papers that were not cited, it certainly means that nobody found them of particular value.
Gary Perlman maintains the "HCIbib" online bibliography of the human-computer interaction literature at the Ohio State University. The number of new articles in the HCIbib database has grown by 30% per year as shown in Figure 8.1. A 30% annual growth rate corresponds to 1,300% growth every ten years, which is much more than the doubling of the research literature per decade seen in more established fields like mathematics.
Even though the growth rates differ among disciplines there is no doubt that the research literature is growing fast enough to present scientists with a major information overload problem. Research publications are edited with a view toward keeping down the number of publications by weeding out less interesting submissions. Unedited publications like the netnews or the Danish email list grow much faster and quickly reach the point where many people have to stop reading them.
There are many different ways of calculating the economic value of information: one can consider the cost needed to produce the information, or one can consider what it can sell for. In a world with information overload, one also needs to consider the negative value of information in terms of the resources spent reading or pondering it. If somebody sends an email message to all the employees of a company with a staff of 10,000 then the cost to the company of the time spent on the message can be anywhere from $1,000 (if everybody immediately discards the message) to $15,000 (if everybody reads it). A steady increase in the amount of information risks acting as a time sink that can prevent people from ever getting any real work done.
Fortunately, it is possible to deal with large amounts of information. As an example, Table 8.1 shows an estimate of the amount of information in a Sunday issue of The New York Times . (Printing the daily and Sunday editions of The New York Times required 301,000 metric tons of newsprint in 1993.) The Sunday paper does include huge amounts of information, and it is sometimes said that a single Sunday Times has more information than the average villager would get in a lifetime during the Middle Ages. In fact, I am sure that old-time villagers encountered lots of information when farming the fields since it takes many megabytes to accurately represent data about weather and growth patterns. But if we only consider official "news" in the form of words or images reporting on world events, edicts from the King or Pope, and similar types of newspaper-like information the comparison may in fact be correct.
The estimates of data content in Table 8.1 were made under the following assumptions: Each full page of text is about 31 kB. Each page is about 262 square inches (0.17 m 2 ). Each page of images is about 1.6 MB of uncompressed data, given that about 10% of the images are in color and that the print resolution is approximately equivalent to 72 pixels per inch, in 8 bits grayscale or 24 bit color. Each page of display ads is about 30% empty space, 40% images, and 30% text, corresponding to 0.6 MB of image data and 9 kB of text.
In total, the sample Sunday New York Times contained 7.5 MB of text data and 177 MB of image data. (For sake of comparison, this entire book contains about 1.0 MB of text and 13 MB of image data.) People can get this much information in the door every week and still have time for other activities on Sundays. Admittedly, it takes a long time to read every word and study every image in the Sunday Times , but then people don't do that. Instead, every reader selects some parts of the paper that is of interest to that individual and skips the rest. It is feasible to get many times more information delivered than one wants because of the fairly cheap distribution mechanism. And it is feasible to skip the most of the paper because it has been designed to make it easy for readers to find information of interest to them.
In the future, one of the most promising approaches to hypertext journalism is the delivery of individualized electronic newspapers. Since all components of a modern newspaper are edited online, it is possible to replace the delivery of a huge printout with online access to exactly those stories that interest the individual reader. An online newspaper would also deliver the latest version of all stories as of the exact time the reader asked for them.
|Section Number and Title||Pages||Editorial Text||Editorial Illustrations||Display Ads||Text Ads|
|2. Arts & Leisure||40||24%||15%||60%||1%|
|4. The Week in Review||22||24%||14%||20%||42%|
|4A. Education Life (actually 52 half-size pages)||26||23%||15%||58%||3%|
|6. The New York Times Magazine (actually 68 half-size pages)||34||32%||26%||33%||9%|
|7. Book Review (actually 32 half-size pages)||16||54%||9%||38%||1%|
|9. Styles of the Times||10||35%||40%||27%||1%|
|10. Real Estate||42||7%||8%||34%||51%|
|11. Help Wanted||42||0%||0%||13%||87%|
|12. Television listings (actually 56 quarter-size pages)||14||75%||7%||18%||0%|
|13. New Jersey Weekly (distributed to suburban subscribers as a replacement for the City Weekly that was distributed in New York City)||24||26%||11%||62%||2%|
|Total for the entire Sunday paper||406||26%||12%||40%||>22%|
|Equivalent number of full pages||406||105||50||161||91|
|Information in Megabytes||3.2||77||102||>2.8|
Figures 8.2 and 8.3 show an example of an individualized electronic newspaper developed at GMD in Germany [A. Haake et al. 1994]. The newspaper interface, designed by Klaus Reichenberger, can automatically lay out the current stories that match the user's stated interests, resulting in interesting and appealing displays that invite further exploration and reading. As can be seen from comparing Figures 8.2 and 8.3, different sets of stories can be assembled for readers with different interests.
Figure 8.2. The experimental individualized electronic newspaper (IEN) from GMD in Germany showing a page customized for a reader with an interest in science. Compare with Figure 8.3 showing a page from the same newspaper customized for a reader with an interest in sport. Copyright © 1992 by Klaus Reichenberger and GMD-IPSI, reprinted by permission.
Utilizing hypermedia linking, each part of the electronic newspaper can follow the well-established principle from printed newspaper with front pages and cover stories for each of the main sections of the newspaper.
Current attempts at putting newspaper stories online on services like America Online use much more boring menu interfaces where the user gets very little information about the stories before having to decide which ones to read. There is no doubt that better systems for automated layout (like the ones shown in the figures here) will be necessary for online newspapers to have a chance of competing with printed ones that are based on hundreds of years of typographical and editorial experience.
Figure 8.3. The experimental individualized electronic newspaper (IEN) from GMD in Germany showing a page customized for a reader with an interest in sport. Copyright © 1992 by Klaus Reichenberger and GMD-IPSI, reprinted by permission.
There are three main approaches to addressing information overload. The first (and often the most successful) is good user interface design and good editorial preparation of the data, resulting in an ability for the user to rapidly skim the information and pick out the exact pieces that interest him or her. Paper newspapers like The New York Times exemplify this solution to the information overload problem. If I am really busy one day, I can just scan the front page of the newspaper and know that I have not missed being informed about any really important event.
The two other solutions are information retrieval and information filtering [Belkin and Croft 1992]. The difference between the two is that retrieval is normally done actively by the user in specific cases where the user is looking for a certain piece of information, whereas filtering is done continuously in cases where the user wants to be kept informed about certain events. For example, a typical retrieval task would be to find the name of the president of IBM and a typical filtering task would be to be informed every time IBM announced a new workstation but not when it announces a new mainframe or PC.
A search for information in a hypertext might be performed purely by navigation, but it should also be possible for the user to have the computer find things through various query mechanisms. Navigation is best for information spaces that are small enough to be covered exhaustively and familiar enough to the users to let them find their way around. Many information spaces in real life are unfortunately large and unfamiliar and require the use of queries to find information.
The simplest query principle is the full text search which finds the occurrences of words specified by the user. Some hypertext systems simply take the user to the first occurrence of the search term, but it is much better to display a menu of the hits first as shown in the example from Intermedia in Figure 8.4. The problem with jumping directly to the first term occurrence is that the user has no way of knowing how many other hits are in the hypertext. The general usability principle of letting the user know what is going on leads to a requirement for an overview, even in the case of query results. Figure 8.5 shows the search method from Storyspace which provides a list of all the nodes with hits without indicating the number of times the search terms were found in each nodes. Storyspace has a preview facility which the user can activate by clicking "View current text" to quickly see the beginning of the various nodes before deciding where to jump.
Figure 8.4. Intermedia's full-text interface allows users to search the entire Intermedia database to find every occurrence of the specified text in all documents, regardless of type. The list of retrieved documents can be sorted according to five different criteria. Clicking on the document name in the list will allow the user to view information about the document. Double-clicking on the document name will open the document. Copyright © 1989 by Brown University, reprinted with permission.
Normally search is done in stages where the user first specifies the query and then has to wait for the system to return the set of found objects. With faster computers it is becoming possible to perform dynamic queries where the users manipulate sliders or other controls to specify desired search values and get immediate feedback from the system as they do so. In one study the subjects were able to find information in a database 119% faster (i.e., in less than half the time) when they were given dynamic feedback as they constructed their query than when they did not get any feedback until after they had submitted a complete query to the system [Ahlberg et al. 1992].
Figure 8.6 shows the use of dynamic queries in the FilmFinder from the University of Maryland [Ahlberg and Shneiderman 1994]. The user can specify that only films of a certain running length are of interest by moving the range selector slider, and the display will update in real time while the user moves the slider, making it very clear whether reasonable values are being specified. The overview diagram in the FilmFinder is a so-called starfield display where each of the retrieved objects is shown as a "star" in a two-dimensional scatterplot. The two dimensions of the scatterplot can be chosen by the user to represent particularly meaningful object attributes, and a third dimension can be used to color-code the dots (in the figure, genre like Sci-Fi or Western was the attribute used to color-code the films). Note in Figure 8.6 how zooming and panning the scatterplot in effect is the same as specifying query intervals for the attributes represented by the diagram axes.
Figure 8.5. Searching for the name "Beer" in the Storyspace version of the Dickens Web [Landow and Kahn 1992]. Copyright © 1992-94 by Paul Kahn, George P. Landow, and Brown University, reprinted by permission.
Figure 8.6 also illustrates the output-as-input interaction technique. When the user has found an interesting film (here Murder on the Orient Express ), the user can click on the dot representing that object and link to a box with more detailed information about the film. The user can then link further by taking this query output as input for the next query: in our example, the user has chosen Sean Connery's name as a search term and transferred it from the initial search result to a new query specification to see only films starring Sean Connery.
Figure 8.6. The Maryland FilmFinder uses dynamic queries to allow users to search for films with various attributes. Here, the user has specified a search for films starring Sean Connery with a running length between 60 and 269 minutes. The user has furthermore used the zoom control sliders for the x and y axes to display only the part of the diagram with films after 1960 that are rated more than 4 in popularity. Copyright © 1993 by University of Maryland Human-Computer Interaction Lab., reprinted by permission.
Even though most query systems perform text searches or select objects based on numeric attribute values, it is also possible to search on other types of media. Since humans are very visually oriented, they often rely on images to remember things, and image-based searchers might well be a very useful supplement to text and attribute-based search. Unfortunately, current computer capabilities in the pattern recognition area are very limited, and computers cannot really understand pictures well enough to deal with them as well as with text. Therefore, the traditional way to search image databases has been the one shown in Figure 8.7, where each picture has to be annotated with a text caption or keywords for search purposes.
Figure 8.7. Screendump from a session where the user has connected to a WAIS (wide area information service) server with photographs from the Smithsonian Institution to search for images of the Greek goddess Athena. The system uses a full text search in the caption to retrieve the image. Copyright © 1992 by the Smithsonian Institution, reprinted by permission.
Searching the captions is a much better way to find a picture than flipping through thousands of photos but it does not work in all cases. In the example in Figure 8.7 it might have been the case that the user wanted a coin with the picture of a woman or that the user remembered approximately how the coin looked but not exactly what it represented.
Figure 8.8. Hypermedia navigation by image retrieval. This experimental system shows a tourist guide to Paris where the user has asked the system to show all other images that look somewhat like the photo of the Eiffel Tower in the upper right window. In the middle of the screen, the system displays miniatures of its pictures of tall thin things, and the user has selected some of these images for full-scale display. The images again have hypertext links to the maps and to textual descriptions of the sights of Paris. Copyright © 1993 by NEC Corporation, reprinted by permission.
Some experimental systems have been developed that allow computers to deal with image understanding in a rudimentary manner. For example, Figure 8.8 shows a system that understands the general shape of the major objects depicted in an image [Hirata et al. 1993]. In order to find pictures, the user can either sketch the approximate composition of the image or select an existing picture and use it to link to more of the same.
Figure 8.9. Screen from Joel Tesler's FSN file system navigator [Fairchild 1993]. Here, the user has performed a search on the file system to find files that are larger than one million bytes and older than one hundred days. Copyright © 1994 by Silicon Graphics, Inc., reprinted by permission.
Figure 8.10. Revised view of the file system from Figure 8.9. Here, the user has used FSN's three-dimensional navigation system to move in to a close-up of one of the directories in order to get a better view of the individual files. Copyright © 1994 by Silicon Graphics, Inc., reprinted by permission.
A very promising way of showing search results is to integrate them with the overview diagram by highlighting those nodes that contain "hits." Hits indicate the number of the user's search terms that can be found in the node. It is possible to use more advanced query facilities and also add to the hit score if words are found which are synonyms or otherwise related to the search terms.
Figures 8.9 and 8.10 show how the FSN system highlights search hits in a three-dimensional overview of an information space. Movie aficionados will be interested in knowing that FSN (pronounced "fusion") was the system used in the film Jurassic Park in the memorable example of product placement where a child sees a workstation and happily declares "This is Unix; I can use that" (and saves the day by rapt navigation of the FSN interface). Most of the time, though, the goal of FSN is not to fight dinosaurs but to manage large file systems.
SuperBook [Egan et al. 1989] annotates the names of nodes with the number of hits to allow users to see not just where there is something of interest but also how much there is. It would be possible to use this type of search result to construct fisheye views since the number of hits in a given region of the information space would indicate how interesting that region must be to the user.
One can also use more sophisticated methods from the field of information retrieval. This brief section cannot do justice to that field, which is an active research area in its own right, so the interested reader should read a good textbook like Salton's Automatic Text Processing  or at least a full survey article like [Bärtschi 1985].
Information retrieval can be integrated with hypertext navigation to deliver powerful means of finding information. Figure 8.11 shows a hypertext system [Andersen et al. 1989] for reading the Usenet network news, which is a world-wide bulletin board system with a huge number of messages about various computer-related topics. Since there are far too many nodes in the system to rely on manually constructed links, we use a full text similarity rating calculated by counting the overlap in vocabulary between any two nodes. A list of the articles that are rated as the most similar to the current article is displayed when the user clicks on the "similarity" button.
In a case where we have a hypertext available in which the links have already been constructed, we should be able to utilize the information inherent in the linking structure to perform more semantically meaningful searches than just plain full text searches. This step is possible because a hypertext can be considered as a "belief network" to the extent that if two nodes are linked, then we "believe" that their contents are related in some way.
Figure 8.11. A screen from the HyperNews system showing pie icons rating the links to other articles.
Thus if a node matches a search, then we should also assign a higher score for the other nodes it is linked to since our "belief" that the connected nodes are related justifies the propagation of scores among them. One way of calculating this score is by assigning the final search result for a node as the sum of the number of hits in the node itself (called the intrinsic score) and some weighted average of the scores for the nodes it is linked to (called the extrinsic score). As a simple example, we could assign the final query score as the intrinsic score plus half the extrinsic score.
In the example in Figure 8.12, we see that the central node ends up getting the highest query score even though it does not contain any of the search terms (as can be seen from the fact that it has an intrinsic score of zero). This is because the central node sits in the middle of a lot of information related to the user's query and is therefore probably also highly relevant.
In addition to just finding information, query mechanisms can also be used to filter the hypertext so that only relevant links are made active and only relevant nodes are shown in overview diagrams. Even though the "raw" hypertext may be large and confusing, the filtered hypertext can still be easy to navigate. Such a combination of query methods to select a subset of the hypertext and traditional navigation to look at the information might be the best of both worlds if done right.
Figure 8.12. An example of a calculation of query scores as a combination of intrinsic scores (how well the individual node itself matches the user's query) and extrinsic scores (how well the nodes it is linked to match the query). Here we have used a rule that gives a node a search score equal to its intrinsic score plus half its extrinsic score (the sum of the scores of the nodes it is linked to).
Despite much work on automated ways of reducing information overload, the most promising approaches will probably be the ones that rely on human judgment to some extent. Some authorities on the human factors of information believe that it is impossible to achieve sufficiently usable information filtering without having a human in the loop somewhere to make individual judgments as to the quality and relevance of each information object. As long as computers are not intelligent enough to be able to actually understand the content of the information they are processing, they will never be able to provide true quality ratings. In fact, perfect information filtering is likely to be an "AI complete problem" in the sense that solving it will be equivalent to solving the complete set of intelligent computing problems. (Maybe some day we will achieve sufficiently good artificial intelligence to allow the computer to understand the content of information objects, but quality judgments seem to be beyond the scope of AI for the next many years. Whether it is impossible or just currently infeasible to get computers to rate quality is a philosophical discussion that is fairly irrelevant for anybody wanting to ship product during the next ten or twenty years since the two positions are equivalent within this time horizon.) An alternative approach to reducing information overload is the time-honored approach of an editor with a firm hand.
Figure 8.13. Edited column on the Interchange service with hypertext links to other articles that are available on the service. Copyright © 1994 by Interchange Network Company, reprinted by permission.
Figure 8.13 shows an example of an edited column from the Interchange online service [Perkins 1995]. The human editor has selected a number of articles which she believes will be of relevance to readers with an interest in the stated topic of the column (shopping tips for computer buyers). She annotates the links with her own comments and then provides a link to the source material. Note how the personality of the editor is emphasized by showing her picture and by the style of her writing. In fact, it is likely that the information explosion will increase users' desire to feel that they are in touch with real humans and not just with some mass of Internet data. In his book Megatrends, John Naisbitt talked about a trend he called High Tech-High Touch, and the edited column in Figure 8.13 is an example of this trend. In a series of user tests I conducted of various WWW sites, people were invariably thrilled about a page with a picture of Microsoft's webmaster standing in front of their server: users enjoyed seeing the guy who was brining them the information as opposed to having it come from a faceless bureaucracy.
Note how the "Deals & Steals" column in Figure 8.13 has a hypertext link to the "Buyer's Advisor Discussion" where the users can add their comments to the topics discussed in the column. By offering this facility, Interchange makes it easier for users to find comments by other users on topics they are interested in, since these comments will often be linked to specific columns or articles. A Bellcore research project called ephemeral interest groups had very similar goals [Brothers et al. 1992]. In the ephemeral interest group system, people could indicate their interest in a topic by "joining" postings to a bulletin board system in order to get sent follow-up messages to the postings. Every single message posted was a potential seed for an ephemeral interest group, and the groups only lived for as long as members posted additional follow-ups. This scheme was very successful in increasing the value of the messages seen by any individual participant: on a 1-5 scale (where 1 indicated completely irrelevant material and 5 indicated very relevant material), users rated messages sent to them by the ephemeral interest group system as 3.9 on the average, whereas the same messages only received a 2.7 rating when we tried sending them to a control group.
Editing can also be done collaboratively where a group of users build up information structures to help each other. Figure 8.14 shows a sample information digest constructed at Lotus Development Corporation [Maltz and Ehrlich 1995]. Because people who work in the same organization know and trust each other, they can assume that information recommended by their colleagues will be of much higher-than-average relevance to them. Information digests can exist on the corporate net for a variety of topics, and employees who find interesting information on the Internet or elsewhere can add the information with an annotation and a hypertext link.
Figure 8.14. An Information Digest written in Lotus Notes by Kate Ehrlich at Lotus Development Corp. Copyright © 1994 by Kate Ehrlich, reprinted by permission.
Maltz and Ehrlich refer to the system shown in Figure 8.14 as "active filtering" in contrast to the passive filtering discussed in the following section where there is no direct connection between a person casting a vote for some information and the readers who come later and filter the documents based on these votes. In the "active" filtering approach, there is an intent on the part of the person who finds some information to share it with his or her colleagues. The person finding the information may even recommend the information for particular colleagues who are known to have an interest in a particular area, and the Lotus information digest system has a feature for sending announcements of new and interesting information to specific individuals in addition to adding the information to a digest for public consumption.
Interest Voting and Readwear
Even though human editing is ideal, there are many cases where it is infeasible to rely solely on individual editors. The two main problems with human editors are that nobody can cover the full extent of information available on the Internet and other rich sources and that any individual reader only has a partial degree of agreement with the judgment of any individual editor.
Figure 8.15. Reading a netnews article with a modified front end in GroupLens. The user can click on one of the five ratings buttons with the mouse, or type a number from 1 to 5 on the keyboard. Copyright © 1994 by Paul Resnick, Neophytos Iacovou, Mitesh Suchak, Peter Bergstrom, and John Riedl, reprinted by permission.
An alternative approach is to rely on aggregate judgments by a larger group of editors. Figures 8.15 and 8.16 show an approach to reducing information overload in netnews suggested by Paul Resnick and colleagues from MIT and the University of Minnesota. Their GroupLens system [Resnick et al. 1994] collects quality ratings from anybody who happens to read a netnews article. These ratings are forwarded to special servers called Better Bit Bureaus from which they are made available to future readers. As Figure 8.16 shows, having ratings available can help people decide what articles to choose from a menu.
So far, quality ratings are only used in research projects like GroupLens. If they become more widely used, one could imagine several different ways of using them. The simplest approach would be to collect ratings from anybody who happens to read an article and just compute its quality as the mean of the individual ratings. Given that netnews was estimated to have seven million readers in 1994, sufficient ratings to form a reliable mean should be available in a few minutes no matter when an article is posted, especially considering that the readers are scattered in all timezones around the world. A major problem with this approach is that it is not clear that you share interests with a group of geeks in New Zealand or whoever happens to be the first to rate an article. It is likely that the early ratings will come to dominate the quality score for an article since very few other users will bother reading an article once it has gathered a string of poor ratings.
Figure 8.16. Quality rating scores integrated into a netnews reading interface. The bars represent quality grades as retrieved from the Better Bit Bureau server. Copyright © 1994 by Paul Resnick, Neophytos Iacovou, Mitesh Suchak, Peter Bergstrom, and John Riedl, reprinted by permission.
A second potential problem with world-wide ratings is the potential for "ratings wars" where people would stuff the ballot box so to speak and seek out postings from their friends or enemies and rate articles without really having read the content. Indeed, netnews articles can be processed automatically as proven, e.g., by the Norwegian hacker who released a "cancelbot" on the Internet to hunt down and erase messages from a certain pair of lawyers who had "spammed" inappropriate ads for their office to thousands of unrelated newsgroups (and who do not deserve further publicity by having their name mentioned here).
An alternative approach to quality voting would be to collect only votes from people within your own organization. The downside of doing so would be that more people would have to read irrelevant articles before it was determined that they were irrelevant, but in return the ratings would be more directed toward the specific needs of the individual organization. It would also be possible to collect votes world-wide but only count the ones from people who in some way had been deemed responsible or who were known to have the same taste as the user who wanted to review the quality ratings.
Hill et al.  experimented with interest voting where the users actively had to indicate their interest in each of 500 films. The system collected votes from 291 people who provided more than 55,000 ratings and then constructed a statistical model of which users had similar interests to which other users. In other words, instead of assuming that everybody had the same taste, the system recognized that people are different and that the same film may be a favorite of some people while being hated by others. The trick is to find a group of other people who share taste with the user for whom the system is trying to find relevant information. Luckily, if the system can aggregate information across a large enough population (e.g., the users of the Internet), there will always be others who have similar tastes, no matter how eclectic you may think you are. Hill et al.'s system was fairly successful in generating recommendations of new films for the users, with a correlation of 0.62 between the system's prediction of how well people would like a film and the actual rating given by the users. In comparison, the correlation between ratings from nationally-known movie critics and the users' own ratings was only 0.22. In other words, one can find much better information objects for users if one knows their taste and ratings of other objects than if one just rely on ratings of the intrinsic quality of the objects.
No matter how quality votes are gathered and distributed they have the distinct disadvantage that they require active decisions on the part of the users. In Figure 8.15, the user has to make up his or her mind as to what rating on a five-point scale to assign to the article, after which the user has to move the mouse to the chosen button to enter the vote. People might be motivated to vote on things that particularly upset them as being bad or that they find exceptionally good, but it will be difficult to get people to spend the time to rate everything they read. Just consider the many publications that have reader reply cards where you are asked to rate all articles in a given issue. How many of these cards have you sent back? And how often did you provide ratings for all the articles?
Figure 8.17. The link dialog box from HyperTED. The system keeps track of how often each link has been followed and displays this count when the user selects a link. Copyright © 1994 by Adrian Vanzyl, Monash University Medical Informatics, reprinted by permission.
An alternative to voting is to rely on information that is gathered unobtrusively by the computer in the background as the user goes about his or her normal activities. This approach is called readwear [Hill et al. 1992] because the notion is that reading things on the computer will cause them to be "worn" in the same way as a physical book is worn by repeated reading. In fact, a book that has been used a lot will often open by itself to the pages that have been read the most even if the user has not left a bookmark. Figure 8.17 shows an example of readwear in the HyperTED system: the system keeps track of how many times each link has been followed, and users can use this information to help decide which of several links they want to follow themselves.
In the case of netnews, readwear might be collected by an instrumented system that recorded how long each user spent reading each article. The assumption might be that articles that people spent a long time on would be those that had some inherent quality. Of course there is no guarantee that this is true every time. For example, a user might spend a long time looking at something that was upsetting or potentially false—or the user may just have left the window open while taking a phone call. On the average, though, it does seem plausible that people would invest their time wisely and spend the most time on the best information, and a study of eight users reading 8,000 netnews messages found a strong correlation between the time the users spent reading each message and their subjective rating of the message [Morita and Shinoda 1994].
The n of 2 n Approach
It is possible to combine automated and human methods for information filtering. Susan Dumais.; and I have experimented with doing so for assigning submitted conference papers to members of the review committee [Dumais and Nielsen 1992]. Our method is called " n of 2 n " and involves having the computer pick about twice as many papers for each reviewer as that person actually is asked to review. The reviewer thus uses his or her individual judgment to pick n information objects from a selection of 2 n objects presented by the system. By adding this last element of human judgment, any wildly wrong guesses by the computer are removed from the final set of information objects actually read by the user.
The " n of 2 n " method can be used for many other applications than conference paper reviews. It requires just one additional attribute of the information objects: it should be possible for the human user to pick the most relevant n objects in substantially less time than it would take to read all 2 n information objects. In the case of conference submissions, it is usually possible for a reviewer to read the abstract in much less time than it would take to read the full paper and it is usually possible to assess the topic of a paper from its abstract, so conference papers are ideal for the " n of 2 n " method.
The assignment of submitted manuscripts to reviewers is a common task in the scientific community and is an important part of the duties of journal editors, conference program chairs, and research councils. Finding reviewers for journal submissions and some types of grant proposals can normally be done for a small number of submissions at a time and at a more or less leisurely pace. For conference submissions and other forms of grant proposals, however, the reviews and review assignments must be completed under severe time pressure, with a very large number of submissions arriving near the announced deadline, making it difficult to plan the review assignments much in advance.
These dual problems of large volume and limited time make the assignment of submitted manuscripts to reviewers a complicated job that has traditionally been handled by a single person (or at most a few people) under quite stressful conditions. Also, manual review assignment is only possible if the person doing the assignments (typically the program chair for the conference) knows all the members of the review committee and their respective areas of expertise. As some conferences grow in scope with respect to number of submissions and reviewers as well as the number of sub-domains of their fields, it would be desirable to develop automated means of assigning the submitted manuscripts to appropriate members of the review committee.
The actual application of assigning manuscripts to reviewers involves two further considerations in addition to the matching of manuscripts and reviewers. First, one needs to guard against conflicts of interest by not assigning any reviewers their own papers or those of close colleagues. Second, there is a need to balance the review assignments to ensure that no single reviewer is overworked just because that person happens to be an appropriate choice for many papers, and that each paper gets assigned a certain minimum number of reviewers. All these constraints can be expressed as linear inequalities that can be handled by a linear programming package, so the entire review assignment can be done automatically.
Dumais and I tried automatic assignment of manuscripts to reviewers for the Hypertext'91 conference as a pilot project where a human program chair made the final assignments. We also did actual automated assignments for the INTERCHI'93 and CHI'94 conferences where we were papers co-chairs ourselves. Hypertext'91 was a fairly small conference with 117 manuscripts submitted and 25 members of the review committee. INTERCHI'93 and CHI'94 were larger conferences with 330 and 263 submitted papers, respectively, and 307 and 276 members of the review committee, respectively.
For Hypertext'91 the members of the review committee had been asked to review an average of 26 papers each by the program chair. The program chair had received help from our automated method and thus the actual review assignments could not be seen as representative of the work of an unaided human. We simulated the result of purely human review assignments by asking three other hypertext experts to manually assign papers to reviewers. The human experts assigned an average of 28 papers to each reviewer, achieving a mean rated relevance of 3.6 on a 1-5 scale where 5 was best. (Relevance ratings were gathered by asking each reviewer to rate how closely each manuscript matched his or her expertise as a reviewer: The rating scale was 1=how did I get this one?; 2=I'm following it, sort of; 3=somewhat relevant; 4=good match; and 5=right up my alley.) In comparison, our automated n of 2 n method achieved a mean rated relevance of 3.8 on the same scale, thus doing slightly better.
For INTERCHI'93 each reviewer was sent ten papers and was asked to review five of them. In order to make sure that each paper was read by at least some reviewers, we had pre-assigned three papers to each reviewer, meaning that the reviewer had the freedom to pick two additional manuscripts from the remaining seven that had not been pre-assigned. On a 1-5 scale (with 5 best), the reviewers rated the relevance of the papers they ended up reviewing as 4.1 on the average. For CHI'94 each reviewer was sent eleven papers and actually reviewed seven, and the mean relevance rating was 3.9 on the 1-5 scale. (Note that even though we refer to our approach as " n of 2 n " it also works when people can choose some other proportion of the selected information object (e.g., 7 of 11).)
For the INTERCHI'93 and CHI'94 conferences we do not have data from simulated human manuscript assignments because the job was too large to be done when it was not absolutely necessary. However, data exists from the very similar CHI'92 conference where a human program chair made the review assignments with help from several experts in different subfields. For the 1992 conference, the mean rated relevance of the manuscripts sent to the reviewers was 4.1 on the 1-5 scale. This is exactly the same as the relevance achieved by our automated method for INTERCHI'93 and slightly better than the result of the automated method for CHI'94.
An interesting aspect of the automated review assignment compared with manual assignment is that some of the more famous reviewers expressed satisfaction with getting papers that were more in line with their current interests than they were used to. Normally, such famous people continue getting papers in areas for which they are famous for many years after they have stopped working in those areas, simply because human committee chairs think, "Oh, a paper on XX, that must be just the thing for Dr. YY." With automated assignment, all the computer knows is what the reviewers told it themselves when defining their interest profile, so they mainly get papers in the areas they specifically indicated as their current interests.