Voodoo Usability

by Jakob Nielsen on December 12, 1999

The good news is that usability has been recognized as an important element of Internet success: the average speaker at industry conferences now promotes good user experience in preference to "cool sites."

The bad news is that most sites employ horribly misguided methodologies that do not assess real usability. Sometimes the methods are simply worthless; other times they are directly misleading.

Studying Opinion Instead of Use

Traditional market research methods don't work for the Web. The basic problem is that one cannot ask users what they want and expect the answer to have any relation to their actual behavior when they go online.

Focus groups can often be directly misleading. When people sit around a table and discuss what they might like to see on a site, they will often focus on superficial aspects and praise fancy features like animation and Flash effects. But if these same users were ever asked to actually use the site to accomplish a task, they would usually ignore the animations and would find that the Flash effects hurt them more than it helped them.

Self-reported data is extremely weak and three levels removed from the truth:

  1. Users tell you what they think you want to hear or what they think is a socially preferred answer (especially when they are part of a group)
  2. Users tell you what they remember believing that they did (but memory is highly fallible, especially regarding the specifics of interaction behavior)
  3. Users can only report what they believe they did; not what they actually did, and people always rationalize their behavior when thinking about it after the fact; also they don't even notice many of the things they do

Surveys are just opinion polls : a weak method even though survey results are reported too-frequently in the trade press. Just as with focus groups, you get results three levels removed from the truth. For example, an often-quoted survey by Zona Research found that 28% of respondents reported finding it somewhat or extremely difficult to locate products on the Web. I have even quoted this survey myself, even though I should know better than to quote the outcome of an opinion poll. The truth is that observations of actual, real-life user behavior show that people find the product they are looking for less than half of the time. Several usability studies have independently confirmed the same result: on average, users cannot find what they are looking for on today's Web.

Why this paradox? More than 50% of the time, people can't find what they are looking for, and yet only 28% of respondents report problems. In all likelihood, close to 100% of the people who were polled had encountered a case where they could not find a product on a website that did sell it. But they may have assumed that the site didn't carry the product or they may have blamed themselves for not searching well enough or thoroughly enough. Or they may have found the product on another site (causing the first site to lose the sale), after which they thought of their Web experience as having been successful, even though they actually failed the first time they tried to find the product. But all they remember is that they did find it in the end.

In a particularly misleading type of survey, a panel of users is asked to check out a website and fill in a questionnaire with their opinions of the site. The three methodological hazards in this method are:

  • the users are members of a panel of people who have signed up to be professional opinion-givers in return for money; they are not representative of your customers who almost certainly would not have time to spend on such activities (unless your site is targeting students or the unemployed)
  • being asked to check something out is completely different than having to use it to accomplish a real task; this is equally true whether the task is work-related (book an airline ticket for my boss to attend a meeting in London) or leisure-related (buy a cheap vacation in a city I like in Europe)
  • self-reported behavior and opinions have very little relation to real behavior and real usability problems

This said, short surveys are still good for simple questions like "why are you visiting our site" that relate to users' opinions instead of assessing the design.

Automated Methods Cannot Work

Another category of voodoo services sics a computer program on a site and produces an automated report that is claimed to measure the site's usability.

Having a computer follow links and count the number of clicks is a very poor substitute for whether users can actually find what they are looking for. Real usability depends on which link you click on and how fast you discover the errors of your ways if you clicked on the wrong one. This cannot be assessed by computer. A program can count the time needed to follow the optimal path to the solution, but that's not how the average user behaves. One wrong word in a menu, and the user is lost for five minutes - or forever.

Simple things like counting clicks to solutions are misleading. For example, I recently advised on an ecommerce site where people had to find certain products. The original design provided product pages in 3 clicks from the home page, and the revised design required one more click. Yet, shopping success was 7 times higher in the revised design because each of the new steps was completely intuitive. Even with one more click, the revised design was faster because users didn't have to spend as much time thinking about where to click. More importantly, it made people find the right product much more frequently, whereas the original design was very error prone. Despite this usability finding, automated assessment would have given a higher rating to the original design. Whether or not the choices make sense is the one thing a program can't check.

Another measure typically computed by automated "usability" services is "freshness" as defined by the percentage of pages that are new. But you simply cannot tell whether a website is up-to-date by looking at the time stamps on the files. A site can be extremely fresh even if 90% of its content is more than a year old. That just means that it keeps good archives to supplement the current content. By now, there are probably less than 1% of the pages on nytimes.com that are "current" even though it is a daily newspaper and one of the freshest sites in the world. Conversely, a site can be stale even if most of the pages have been edited recently (if the changes are not the appropriate ones to bring the content up to date in ways that matter to users).

How do you distinguish between two types of old files:

  1. good content that should be archived because it is still of value
  2. outdated content that should be removed or updated

The answer is that you can't tell without understanding the content and the way it will be used. Even full natural language understanding would not be sufficient to allow a computer to make this judgment.

Automated usability is downright dangerous because it will cause site managers to

  • make the wrong choices since it often gives the wrong advice or causes them to pursue pseudo-important directions
  • think that they are covered and don't need to spend resources on real usability activities

What Can Be Automated?

A few aspects of usability can be assessed automatically by a computer program:

  • Response times : it is not necessary to see or understand a page in order to measure how long time it takes to download it. So a computer can provide a perfect estimate of response times. At the same time, most sites are so incredibly slow these days that it is not really necessary to track their download times to the millisecond. Instead of spending big bucks on a response time measurement service , simply ask the CEO to download the home page from his or her hotel room while logged in on a laptop on the next business trip. Anybody trying this simple exercise will know that the site is too slow and approximately how much the design needs to slim down.
  • HTML validation : a computer can easily flag illegal HTML code and identify all deviations from the official Web Consortium standard. Unfortunately, many sites resort to illegal code in an attempt to code around bugs in the browsers, so it is still necessary to have a human decide whether a given instance of illegal HTML was included deliberately or whether it was a mistake.
  • Linkrot can be measured to a first approximation: can the computer follow the link and get a page returned from the remote site? Unfortunately, the computer cannot measure whether the page that is returned is the one the author intended to link to. Some sites give articles a different URL when they move into archives and reuse the old URL for new articles. Big mistake since this makes it harder for other sites to link (and incoming links are the most powerful Web marketing method). If a site does change URLs around like this, then the linkrot program may report that the link does work, even if it now links to something completely irrelevant. Until we get natural language understanding (in 50 years?), there is no way that a computer can find out whether a destination page complies with the linking author's intentions.
  • Accessibility for users with disabilities can only be partly measured. Sure, it's possible to have a computer check for things like use of ALT text for all images, but without natural language comprehension, the computer cannot determine whether the ALT text will be meaningful to a blind user and help him or her understand the site. Also, sometimes a page gets to be more usable by avoiding ALT text on certain images. Thus, automated measures of accessibility should only be used as a checklist and not as a final judgment.

How to Gather Usability Data

There is only one valid way to gather usability data: observe real users as they use your site to accomplish real tasks. This is actually the simplest of all the methods: just see what happens!

Maybe it's because the method is so simple that it is not used more often. Anyway, it really is easy to get real usability insights. It's also very cheap since you only need to test a small number of users to find the main usability problems.

Share this article: Twitter | LinkedIn | Google+ | Email