Guesses vs. Data as Basis for Design Recommendations

by Jakob Nielsen on June 8, 2009

Summary: Even the tiniest amount of empirical facts (say, observing 2 users) vastly improves the probability of making correct UI design decisions.


Should you offer users help to adjust font sizes or can you simply rely on the built-in browser commands? This question was recently posted to an interaction designers' discussion group (which will remain unnamed to preserve the anonymity of the individuals dissected below).

12 people responded to this question. Most simply offered a personal opinion as to what they would prefer. Fair enough: All people are experts on their own preferences. But there were 6 postings that commented on what would be best for other people.

2/3 of these postings were pure guesses, whereas 1/3 was based on some form of data in the form of empirical user observations.

Guesses:

  • "In this day and age, [...] most people who need to increase their font sizes in their web browser already know how to do it." WRONG
  • "People who do need to resize type will do so via the browser; it's not hard to do so." WRONG
  • "It's not 1995; not all 50+ people are such newbies that they don't know, or wouldn't want to know, how to resize text in a browser." WRONG
  • "The people who most need to increase font size are people 65+, which is the group least-likely to be skilled enough to have adjusted settings." RIGHT

Data:

  • "I had to set it manually for my parents, and while the percentage of people over 65 becoming more and more savvy is increasing at an amazing rate — hidden functions like adjusting text size is something that escapes them." RIGHT
  • "I've observed usability studies on sites that included text resize widgets [...] most, if not all, the participants [...] had no idea what it was." RIGHT

Data Beats Guesses

The general guideline is to use relative font sizes that let users resize (if they know how), but to display big and legible text as the default. This conclusion is based on numerous observations that show that many older users don't have the skills to resize fonts.

In our discussion group example,

  • 100% of the designers who provided external data were right, whereas
  • 25% of the designers who relied on their personal opinion were right.

Most strikingly, 75% of guessers were wrong. You'd be better off tossing a coin than asking advice of these people.

In this simple example, basing design advice on the smallest amount of empirical observation of real users quadrupled the probability of being right.

A word of caution: Although data from your parents is better than no data, I don't recommend that you base design decisions on your family members because they're likely to be smarter than average users. (Because you're smarter, being somebody who understands usability.) We know from our studies of children and teenagers that average kids and teens have much greater difficulties using websites than one would think after listening to Internet executives proudly tell stories about their offspring's online skills.

Testing 2 Users Beats Guessing

While striking, our text-size example is based only on a small set of responses. Another example provides a similar conclusion with a bigger sample.

We tested two different ways of displaying bank account information with 76 users each, for a total of 152 test participants in a between-subjects benchmark test. We asked users to perform tasks such as checking their account balances and finding out what interest rate the bank was currently offering. The results were as follows:

Usability Metric Design A Design B
Success Rate (across four tasks) 56% 76%
Time to Complete Four Tasks (min:secs.) 5:15 5:03
Subjective Satisfaction (1–5 scale, 5 best) 2.8 3.0

On all three usability attributes, version B scored better, though only the difference in success rates was big enough to be statistically significant. Overall, there is no doubt that B was better.

(In contrast to this study, sometimes both designs win on different usability attributes. For example, one design might make people more successful, while the other helps them accomplish the task faster. In such cases, you might have to make tradeoffs or, when possible, create a third design that combines the best aspects of both alternatives.)

In this case, I showed designs A and B to 21 people who were taking an interaction design course and asked them which one they would recommend to the bank. Going purely on their personal guesses as to which design was best, the probability of getting the best design recommended was 50%. That is, no better than flipping a coin. (Asking your trusty coin is an easy way to save on consulting fees.)

I then asked another group of 38 people taking the same course to test the two designs with 2 users for each design. Now, going on empirical observations of 2 users' behavior for each alternative, the probability of recommending the best design was 76% .

Another way of looking at this outcome is that testing just 2 users per design reduced the probability of being wrong from 50% to 24% — cutting it in half. Of course, a 24% probability of picking the wrong design is not good enough if you're talking about a high-ROI design decision, so we'd obviously want to test more than 2 users per design in such cases. (I usually recommend 5 users.)

Still, even though it's an extremely scaled-back study, testing 2 users per design hugely improved the recommendation over the flipping-a-coin performance from guessing.

(In this study, the two versions looked equally good, which is important for measurement studies. If you compare a rough-looking prototype with a fully refined graphic design, you will bias the scores.)

When Guesses Go Horribly Wrong

Comparing our two case studies, the guessing camp from the text-size example had by far the worst performance. A person who based a design decision on these guesses would be wrong 3/4 of the time. In the bank example, they'd be wrong only 1/2 the time.

So, why the miserable discussion-group guesses? The answer lies in the following two statements:

  • "In this day and age..."
  • "It's not 1995..."

Sadly, too many Web designers refuse to believe in the durability of usability findings. Thinking that "things that were difficult in the past must surely be easy now" has led many websites to their doom.

When we actually study real users, we see how slowly they learn about technology and how little their ability to use fancy websites has improved. And, most important, we see how little users care about learning fancy Web techniques. People just want to get in, get their stuff done, and get out. They don't want to learn.

Guesses go wrong because many designers desperately want to believe in the potential of advanced design. They simply can't fathom how little most people know about their pet technologies.

(Yes, in recent testing, we did find a few small advances in users' skills, but it's slow progress; you'd better believe that simplicity will continue to win the day for decades to come.)

A Little Data Goes a Long Way

In my two examples, the probability of making the right design decision was vastly improved when given the tiniest amount of empirical data: observing your own parents, or testing 2 users per design.

Of course, a bigger study would be better, but any data is better than no data. How many design decisions do you make without any empirical observation of your customers' behavior?


Share this article: Twitter | LinkedIn | Google+ | Email