How Many Test Users in a Usability Study?

by Jakob Nielsen on June 4, 2012

Summary: The answer is 5, except when it's not. Most arguments for using more test participants are wrong, but some tests should be bigger and some smaller.


If you want a single number, the answer is simple: test 5 users in a usability study. This lets you find almost as many usability problems as you'd find using many more test participants.

This answer has been the same since I started promoting "discount usability engineering" in 1989. Doesn't matter whether you test websites, intranets, PC applications, or mobile apps. With 5 users, you almost always get close to user testing's maximum benefit-cost ratio.

As with any human factors issue, however, there are exceptions:

  • Quantitative studies (aiming at statistics, not insights): Test at least 20 users to get statistically significant numbers; tight confidence intervals require even more users.
  • Card sorting: Test at least 15 users.
  • Eyetracking: Test 39 users if you want stable heatmaps.

However, these exceptions shouldn't worry you much: the vast majority of your user research should be qualitative — that is, aimed at collecting insights to drive your design, not numbers to impress people in PowerPoint.

The main argument for small tests is simply return on investment: testing costs increase with each additional study participant, yet the number of findings quickly reaches the point of diminishing returns. There's little additional benefit to running more than 5 people through the same study; ROI drops like a stone with a bigger N .

And if you have a big budget? Yea! Spend it on additional studies, not more users in each study.

Sadly, most companies insist on running bigger tests. During the Usability Week conference, I surveyed 217 participants about the practices at their companies. The average response was that they used 11 test participants per round of user testing — more than twice the recommended size. Clearly, I need to better explain the benefits of small-N usability testing.

(Weak) Arguments for More Test Participants

"A big website has millions of users." Doesn't matter for the sample size, even if you were doing statistics. An opinion poll needs the same number of respondents to find out who will be elected mayor of Pittsburgh or president of France. The variance in statistical sampling is determined by the sample size, not the size of the full population from which the sample was drawn. In user testing, we focus on a website's functionality to see which design elements are easy or difficult to use. The evaluation of a design element's quality is independent of how many people use it. (Conversely, the decision about whether to fix a design flaw should certainly consider how much use it'll get: it might not be worth the effort to improve a feature that has few users; better to spend the effort recoding something with millions of users.)

"A big website has hundreds of features." This is an argument for running several different tests — each focusing on a smaller set of features — not for having more users in each test. You can't ask any individual to test more than a handful of tasks before the poor user is tired out. Yes, you'll need more users overall for a feature-rich design, but you need to spread these users across many studies, each focusing on a subset of your research agenda.

"We have several different target audiences." This can actually be a legitimate reason for testing a larger user set because you'll need representatives of each target group. However, this argument holds only if the different users are actually going to behave in completely different ways. Some examples from our projects include

  • a medical site targeting both doctors and patients, and
  • an auction site where you can either sell stuff or buy stuff.

When the users and their tasks are this different, you're essentially running a new test for each target audience, and you'll need close to 5 users per group. Typically, you can get away with 3–4 users per group because the user experience will overlap somewhat between the two groups. With, say, a financial site that targets novice, intermediate, and experienced investors, you might test 3 of each, for a total of 9 users — you won't need 15 users total to assess the site's usability.

"The site makes so much money that even the smallest usability problem is unacceptable." Rich companies certainly have an ROI case to spend more on usability. Even if they spend "too much" on each quality improvement, they'll make even more back because of the vast amounts of money flowing through the user interface. However, even the highest-value design projects will still optimize their ROI by keeping each study small and conducting many more studies than a lower-value project could afford.

The basic point is that it's okay to leave usability problems behind in any one version of the design as long as you're employing an iterative design process where you'll design and test additional versions. Anything not fixed now will be fixed next time. If you have many things to fix, simply plan for a lot of iterations. The end result will be higher quality (and thus higher business value) due to the additional iterations than from testing more users each time.

83 Case Studies

The following chart summarizes 83 of Nielsen Norman Group's recent usability consulting projects. Each dot is one usability study and shows how many users we tested and how many usability findings we reported to the client. (The chart includes only "normal" qualitative studies; we also run competitive studies and benchmark measurements, and conduct other types of research not shown here.)

 

Scatterplot of 83 usability-testing case studies, showing the number of users tested in each study as well as the number of usability findings reported.

There's a small correlation, but it's really tiny. Across these many projects, testing more users didn't result in appreciably more insights.

Why did we run more users in the first place, given that I certainly believe my own research results showing the superiority of small- N testing? Three reasons:

  • Some clients wanted bigger studies for internal credibility. When a study's sponsor presents findings to executives who don't understand usability, the recommendations are easier to swallow when more users were tested. (If management trusted its own employees, much money could be saved.)
  • Some design projects had multiple target audiences and the differences in expected (or at least suspected ) behaviors were large enough to justify the expense of sampling additional users.
  • Finally, the very fact that these were consulting projects justified including a few more users, which is why we often run studies with around 8 users. ROI is the ratio between benefits and expense. When hiring a consultant, the true expense is higher than just the fee because the client must also spend time finding the consultant and negotiating the project. With higher investment, you want a larger benefit.

The last point also explains why the true answer to "how many users" can sometimes be much smaller than 5. If you have an Agile-style UX process with very low overhead, your investment in each study is so trivial that the cost–benefit ratio is optimized by a smaller benefit. (It might seem counterintuitive to end up with more money by making less money from each study, but this occurs because the smaller overhead lets you run so many more studies that the sum of numerous small benefits becomes a big number.)

For really low-overhead projects, it's often optimal to test as little as 2 users per study. For some other projects, 8 users — or sometimes even more — might be better. For most projects, however, you should stay with the tried-and-true: 5 users per usability test.


Share this article: Twitter | LinkedIn | Google+ | Email