Multiple-User Simultaneous Testing (MUST)

by Jakob Nielsen on October 15, 2007

Summary: Testing 5-10 users at once lets you conduct large-scale usability testing and still meet your deadlines.

Sometimes you need to test a large number of users. One option, of course, is to apply the standard user-testing methodology, and just do more of it. Keep testing until you're blue in the face. Unfortunately, this often gets you into serious trouble with project deadlines.

Alternatively, you can use multiple-user simultaneous testing, or MUST (a term I have from Dennis Wixon). As the name indicates, with MUST, you test multiple users at the same time so you get done sooner. In most MUST studies, we test 5-10 users at once (but, as I describe below, you can also set up labs with many more test stations). Theoretically, there's no upper limit to the number of users you can test in each session.

When to Use MUST

Most usability studies should be simple and small-scale, but in some scenarios, it's useful to conduct MUST:

  • For quantitative studies and benchmarking, you typically need to test at least 20 users per condition in order to get statistical significance.
  • For long-duration tasks, you need to test each user for days or weeks to observe valid behaviors. Examples here include:
    • Developer tools. You can't test a system to support professional programmers by having users develop and debug a 20-line "hello world" program; users must work through an industrial-scale problem. The same is true for other high-end problem-solving applications, such as CAD.
    • E-learning. You can't test lesson 39 unless the learners have first made it through lessons 1-38. For a sufficiently advanced e-course, each test could take a week or more.
  • Usability focus groups. To alleviate the problems with traditional focus groups, each participant should start with a one-on-one session testing the live user interface. Following the test sessions, participants can then congregate to discuss the experience and how it relates to their everyday needs. This method definitely requires MUST because all participants should test the interface just before the focus group meets.
  • Games design. I describe this case in detail below.

How to Run Many Users Simultaneously

When testing many users, you usually need many test facilitators. The main exception is for the long-duration tests, where a few facilitators can circulate among users and/or review video recordings of critical incidents.

If you're one of the lucky few companies with many usability specialists, they can facilitate MUST sessions. This is expensive, but efficient: All you need to do is turn the experts loose, since they already know how to run a study.

Most companies, however, don't have enough usability pros to assign one to each test user. Happily, non-usability staff can run user test sessions, especially if a seasoned usability expert has prepared the test plan and written the tasks.

For our latest MUST study, we hired cognitive science students from Don Norman's former department at the University of California, San Diego; they were excellent facilitators. In other studies, we've used developers and marketers from the project team. Being responsible for a MUST session is a great way for such team members to get intense exposure to customers.

Training Facilitators

Ideally, your new facilitators would go through a full usability training workshop, but this is rarely possible in practice. Still, it's best if you spend at least a few hours training facilitators before giving them a go with real users:

  • First, of course, you should explain the theory and best practices of user testing, including steps such as "keep quiet and let users do the talking," which I've discussed many times before.
  • Second, newbie facilitators should watch an experienced usability expert run a sample session with a pilot user. Doing so
    • shows newbies how to facilitate a study, and
    • better concretizes the test plan and test tasks than simply discussing them or going through them on paper.
  • Third, conduct a role-playing exercise in which the usability expert plays the user and simulates difficult situations that facilitators can encounter, such as users who don't talk or users who ask if they can use certain features. (In the latter case, we typically say: "You can do anything you would normally do at home/in the office.")

Preparing the Users

There's very little special preparation to do for MUST study participants. Just follow standard procedure for recruiting test users, welcome them to the session, give them consent forms and instructions, and so on.

However, the actual MUST sessions differ from traditional sessions in two key ways:

  • Thinking aloud doesn't work when people are tightly packed into small cubicles, so you shouldn't include the usual instruction for users to vocalize their thoughts as they move through your design.
  • You can minimize the distraction of having multiple users by telling participants that they'll likely be working on different tasks. This reduces their natural inclination to look at other users' screens and also prevents people from feeling stupid if other people finish before them and leave the room.

Why Not Use Automated Testing?

Why is MUST worth the trouble when you can just outsource your testing to one of several services that promise to run a panel of users through your site and give you elaborate charts of the outcome? Because usability is strategically vital to the success of any website or interactive product — and outsourced panels just aren't up to the job.

The #1 rule of all user testing is to test with representative customers. Panels rarely meet this requirement; they're composed of people who get paid a pittance to sit like drones and complete online tests. If you're targeting a very low-level audience, then this might be worth a gamble. But not if you're a B2B site selling to construction engineers or hospital pharmacists. Not even if you're a normal B2C e-commerce site.

For fun, some of my colleagues once signed up with a panel operator. Despite being fully truthful in their responses to the initial questionnaire (which many people aren't when they register for panels), they were assigned to several studies for which they were not even remotely in the target audience. These "studies" are often a form of voodoo usability that generate misleading results.

Even if a panel operator could get you representative customers, automated studies are still a shadow of real usability research because you can't sit next to the user. Direct observation is invaluable, both for seeing the details that would never get reported in a chart, and for gaining the deep understanding of each user's individual behavior.

Based on user responses to questions, a screener assigns users to various segments. It's common to find that such assignments are a bad fit, however. Using typical persona names, for example, you might say, "no way is this user a Susan, but he's a pretty close Patrick," and then reclassify the user. Other times, you might have to throw out the data because someone is simply not in your target audience; still other times, you can use that person's insights qualitatively — as a representative of a corner case — even though you won't include them in the core sample. When you're in the room with users, you can identify these situations and act accordingly. When all you get is a chart, you won't even know that some of the test participants were border cases or outside your audience entirely.

Finally, having members of your project team serve as facilitators and get live exposure to real customers is immensely motivating. A chart from 500 anonymous panel members doesn't have a fraction of the emotional impact of sitting next to a few fellow human beings as they suffer through your design.

High-End MUST Lab: Microsoft Games Studios

There are many different ways to run MUST. The most impressive setup I know of is at Microsoft Games Studios. Here's one of its playtest labs:

Photo of a row of test cubicles with game players at Microsoft Games Studios Playtest lab at Microsoft Games Studios.
Headsets are helpful when many people in the same room are playing audio-intensive games.

Plan of test lab configuration, showing rows of cubicles in each lab Floor plan for three labs at Microsoft Games Studios.

According to Dennis Wixon, Microsoft Games Studios' User Research Manager, his group runs 8,000 gamers through its labs every year. This is a huge amount of user testing — far beyond the amount of research conducted by the average company. No wonder Microsoft needs a high-end lab devoted exclusively to gameplay testing.

(Microsoft tests its software and websites in other usability labs. These labs don't have Xbox 360s at each seat as they might distract users from making their pie charts in Excel. :-)

Why does Microsoft do so much usability testing for its games? A huge amount of money is at stake. A game like Halo 3 is considered the "drive title" for the Xbox 360. If Halo 3 is great, gamers will buy the 360. If Halo 3 sucks, they'll stay with their old Xbox or get a PlayStation.

I don't know the budget for Wixon's group, but I'd bet it's a tiny fraction of the $300M Halo 3 sold in its first week alone — and an even tinier fraction of those Xbox sales that are dependent on the title's success. Obviously, the usability group works on many other games besides the Halo series, but for the latest version of that game alone, they analyzed more than 3,000 hours of gameplay across 600 test participants. This detailed user research made Halo 3 much more fun to play and also more approachable for new players, who are essential to boosting sales beyond those of Halo 2 .

The second reason to test multiple users for computer games is that it's a more difficult field of user interface design than the domains we usually test. Every game is a new world, whereas all websites follow much the same rules.

When we test websites, we can rely on thousands of documented guidelines that explain user behavior with this style of interaction design. So, when we observe a behavior, we can typically conclude: "oh, this is an instance of guideline #728," which we have already seen hundreds of times with other users on other sites. Recognizing a documented behavior means that we don't need to test that many users, because a few observations suffice to build confidence that we're on the right track. Also, data analysis is simplified because we can build on published research.

Games require a more delicately balanced user experience than functional interfaces. Say that you're designing a weapon's targeting system. If you work for the Army, you want to make the system's user interface as fast and accurate as possible. Shooting the bad guys before they get you is obviously the way to go. But, if you're doing the equivalent design for Halo 3 , the answer isn't as obvious. Make targeting too fast and easy, and the game soon stops presenting a challenge. Sure, you could blast all the bad guys without breaking a sweat, but the game's purpose is to make you feel like you're on the edge and living dangerously. (A danger real soldiers typically want to avoid.)

It's easy to test a UI to eliminate difficulties. It's hard to determine whether one presents just the right amount of difficulty. That's why Microsoft Games Studios runs so many users through its playtest labs.

Simpler MUST Labs

You can run MUST studies without the fancy lab. The following photo shows a study we ran recently, where we tested 5 users in each session.

Photo of several test cubicles set up within a bigger lab Impromptu cubicles installed in a meeting room for multiple-user simultaneous testing.
(The photo shows 3 of the lab's 5 test stations.)

In our lab, we simulated cubicles by taping cardboard partitions to each desk. Of course, it's better to use real cubes, but our discount cubes worked swimmingly. We used slave monitors in each cubicle so facilitators could get a good view of the action on their user's screen without having to lean in. However, for most studies, it's perfectly fine to use a single monitor.

In addition to cubicles, we've used three other setups for previous MUST studies:

  • Single-person offices for each user. (For an intranet study, we used participants' actual offices.) Individual offices can be in different parts of the building (or even in different buildings) so long as each user is assigned a facilitator who stays for the session's duration.
  • Large-scale usability labs with a row of test labs. In such cases, we tested each user in a separate lab. This setup is particularly suited for studies in which users work on large-scale problems for days at a time, and a few facilitators move around among users. Because the facilitators can enter and leave each lab's observation room without the user's knowledge, it's possible to watch many users in a day without disrupting their concentration. Also, you can leave users who are working on a part of the task you don't care about without communicating this fact and thereby biasing their behavior.
  • Offices turned into a networked lab . This case mixes the previous two setups: many single-user offices are wired to a single observation room. You can do all this over the local-area network, which can carry the streaming screendumps for a remote slave monitor as well as webcam views of the users.

Ultimately, the vast majority of usability studies should be qualitative and test 5 users. Still, there are situations in which you need more, and that's when it's nice to have MUST in your toolkit so that you can get the study done before your deadline.

Learn the many different variants of user testing (and when to do what) in our full-day Usability Testing course.

Share this article: Twitter | LinkedIn | Google+ | Email