Summary: The file system has been a trusted part of most computers for many years, and will likely continue as such in operating systems for many more. However, several emerging trends in user interfaces indicate that the basic file-system model is inadequate to fully satisfy the needs of new users, despite the flexibility of the underlying code and data structures.
Originally published as: 145. Nielsen, J. (1996). The impending demise of file systems. IEEE Software 13, 2 (March).
Relax, oh Nerdy Reader: I am not going to take away your beloved file-system APIs. Here I am talking about what the user experiences, not how we provide that experience. The file system has been a trusted part of most computers for many years, and will likely continue as such in operating systems for many more. However, several emerging trends in user interfaces indicate that the basic file-system model is inadequate to fully satisfy the needs of new users, despite the flexibility of the underlying code and data structures.
There is no need for users to know how their information is stored inside the guts of the computer. Indeed, the notion of a continuous file is itself an abstraction: It masks the fact that the information is normally stored on noncontiguous sectors of the hard disk. From a user perspective, current file systems are based on three assumptions:
- Information is partitioned into coherent and disjunct units, each of which is treated as a separate object (file). Users typically manipulate information using a file and are restricted to be "in" one file at a time.
- Information objects are classified according to a single hierarchy: the subdirectory structure.
- Each information object is given a single, semiunique name, which is fixed. This file name is the main way users access information inside the object.
Window systems have made these assumptions less intolerable, but they still exist. Modern computing, particularly the Internet, is further undermining these assumptions in several ways.
Before the Internet, printed output supported a canonical representation of most information objects; the goal of computers was to deliver a WYSIWYG identity mapping between content and presentation. In modern user interfaces, information objects often have multiple presentations and units are combined in multiple ways for different users and tasks. For example, nomadic users might want to retrieve e-mail on PDAs or even by voice synthesis over the telephone. To allow this, the presentation must be significantly briefer than that for a large workstation screen.
On the Internet, a typical Web page consists of a text file and one or more image files that are not combined until the page is displayed by the browser. Even the atomic component objects may not always map to individual files. For example, an image may exist as both a GIF and a JPEG file; the version the server ships to the browser is determined by content negotiation.
Some ways to improve the Web user experience will complicate the issue further: the GIF and JPEG image "files" may not exist as such in the file system, but might be generated on demand from an underlying image representation with parameters such as compression, lossyness, and color-map depth determined dynamically by available bandwidth and other considerations. For example, if you download a Web page in your office using a direct Internet link, the pictures may arrive as large, beautiful, 24-bit color images. Should you download the same page at home using a slow modem, the page will arrive with small, coarse, black-and-white images.
Even on stand-alone PCs, the file model is falling apart. Consider, for example, the task of installing a new application. A simple, file-based user interface would have you drag the application icon from the place it is stored to the place where it will be needed. These days, however, an application is rarely satisfied with a single file: typically, installation litters the system with numerous subsidiary files, preference and configuration files, initialization files, and so on down the laundry list -- all stored in obscure directories of little relevance to the average user.
Using the file system as a user interface to install and copy applications has caused users many painful hours. In response, vendors have provided special installer and uninstaller utilities with their software. This led to a profusion of installers and uninstallers -- and to inconsistency and extra work for the user trying to keep track of all the additional utilities.
Unit To Units
Not only does a single information unit often map to multiple files; it can also contain multiple information units that should be treated differently in the user interface. The e-mail inbox, for example, should definitely be treated as a multiplicity of message objects. On a Web page, it is sometimes useful for the HTML file to contain both the visible information shown to the user and other information that is used for other purposes. Our server, for example, has more than 20,000 Web pages, so we decided to add metainformation to each file -- the e-mail address of the person responsible for the file. For performance reasons, this information is stored as a comment field in the file itself, even though it is a separate information object. If I changed my email address because I moved to a different domain within the company, the user interface should allow me to update the e-mail address associated with all my Web pages with a single operation.
File systems are structured as strict hierarchies of directories and subdirectories. For users, however, the same information unit often has multiple classifications. A corporate logo might appear on several Web pages, and thus would belong to several page objects across the server hierarchy. I often produce presentations (implemented as a small set of files per presentation) with slides that include screen shots, Web pages, or other designs I work on. A specific image might thus be classified as part of the "AnswerBook" development project as well as the "Singapore keynote" presentation project. In either case, if I change one object, I want all occurrences of the object to change. This should be true even if I changed the look -- expanding a slide to 400 percent in one case, reducing it in another: it's still the same information even if presented with a different look. Remember, WYSIWYG is tired; enriched representation is wired.
Hypertext links are the classic case of breaking up file hierarchies. Indeed, the very name comes from the fact that they form an n-dimensional hyperspace. Users are notoriously incapable of understanding large hierarchies, which is why cross-references and other hypertext links are so useful on the Web. Anybody who has tried to find something in the Yellow Pages knows how difficult it is to navigate somebody else's classification structure. If you want to buy a steak, do you look under B for butchers? No. How about S for steaks? Not quite. Try M for meat, retail. Butcher's supplies, however, are under B! (Your Yellow Pages directory may use a different classification. Indeed, that there is no single classification scheme in the real world is a further example of the problem.)
Currently, files are represented in the user interface by their name and a few additional attributes (mainly data types illustrated by icons). File names are problematic user-interface primitives for several reasons. First, users rarely generate good file names, even in systems that allow long ones. In general, users don't like to type, have limited creativity in thinking up good names, and are hit by what I call the "premature classification problem": the name normally has to be generated long before the content is created, and thus users may not fully understand what they're naming. Second, users often have difficulty recognizing a name and remembering what it stands for, especially when many similar names are in use. Finally, when numerous information objects are in use, users sometimes have to type a name rather than simply recognizing it from a list. Not only is name-typing error prone, but users often don't remember the exact name and directory path they need to retrieve a certain information object.
In addition to these practical problems, named references are fundamentally unsuitable for accessing information in a system laden with it. Users often don't know exactly what they are looking for. They have no way of peeking inside a file without opening it up and paying the ensuing penalty in terms of performance and screen space. Hypertext links are fundamentally content-driven and context-sensitive: if well designed, they provide a preview of the content and get users to it without revealing where or how it is stored. URLs, on the other hand, are poorly designed file names.
The technology needed to create more flexible information interfaces will certainly include object-storage mechanisms and compound-document architectures, although exactly how they will be structured is not yet clear. What is clear is that we should stop presenting computer and network information storage as one icon per file and start visualizing the logical structure of the information.