Authentic Behavior in User Testing

by Jakob Nielsen on February 14, 2005

Summary: Despite being an artificial situation, user testing generates realistic findings because people engage strongly with the tasks and suspend their disbelief.

It's a miracle that user testing works: You bring people into a room where they're surrounded by strangers, monitored by cameras, and asked to perform difficult tasks with miserable websites. Often, there's even an intimidating one-way mirror dominating one wall. Under these conditions, how can anybody accomplish anything?

In fact, all experience shows that people can and do use websites and other user interfaces during test sessions, and that many valuable usability findings emerge from such studies. Why?

Two main reasons: the power of engagement and the suspension of disbelief.

User Engagement

When test participants are asked to perform tasks, they usually get so engaged in using the interface that the usability lab's distractions recede. Users know that there's a camera, and maybe even a one-way mirror hiding additional observers, but their attention is focused on the screen.

It's a basic human desire to want to perform well on a test. We can say, "we're not testing you, we're testing the system" all we want. People still feel like they're taking a test, and they want to pass. They don't want to be defeated by a computer.

Because they want to be successful, users allocate their mental resources to what's happening on the screen, not what's happening in the room. Of course, this concentration can easily be broken, which is why it's a cardinal rule of user testing to have observers remain absolutely quiet if they're in the room with users. Also, in fancy usability labs, I've seen users distracted by the noise of cameras moving along ceiling tracks, so that type of thing is best avoided. But, generally, as long as observers stay quiet and out of view (behind the user or behind the mirror), participants will remain engaged in their tasks.

One downside of users' tendency to engage strongly is that they sometimes work harder on tasks in a test session than they would at home. If the user says, "I would stop here," you can bet that they'd probably stop a few screens earlier if they weren't being tested.

Suspension of Disbelief

In user testing, we pull people away from their offices or homes and ask them to pretend to perform business or personal tasks with our design. Obviously, an artificial scenario.

As part of the full usability life cycle, there are good reasons to conduct field studies and observe users' behavior in their natural habitats. Unfortunately, field studies are much more expensive than lab studies, and it's typically difficult to get permission to conduct research inside other companies. During a design process, most usability sessions involve user testing; we're therefore lucky that users can typically overcome a lab's artificial nature and pretend to be at home.

The tendency to suspend disbelief is deeply rooted in the human condition, and may have developed to help prehistoric humans bond around the camp fire in support of storytelling and magic ceremonies. In the modern world, TV shows like Star Trek only work because of our propensity to suspend disbelief. Consider the number of untruths involved in watching Star Trek:

  • You're not looking at people, you're looking at pictures of people, in the form of glowing dots on a video screen.
  • You're not looking at pictures of real people, you're looking at pictures of actors pretending to be characters, like Mr. Spock and Captain Picard, that don't exist.
  • You're not looking at pictures of actors using transporters, shooting phasers, and flying faster-than-light starships. All such activities are simulated with special effects.

You know all of this, and yet you engage in the story when watching the show.

Similarly, in usability studies, participants easily pretend that the scenario is real and that they're really using the design. For this to happen, you obviously need realistic test tasks and to have recruited representative users who might actually perform such tasks in the real world. Assuming both, most usability participants will suspend disbelief and simply attempt the task at hand.

Suspension of disbelief goes so far that users engage strongly with paper prototypes where the user interface is purely a piece of paper. As long as you can move through screens in pursuit of your goal, you will behave as if the system were real and not simulated.

In fact, sometimes users go too far in suspending disbelief and try to role play other users' potential performance. You must put a stop to this immediately, as soon as you hear users speculate about what "some people" might do or might not know. Politely tell such users that you're testing many other people as well, but that you invited them to the test because their personal experiences are very important to the project. You can say to such users, "be yourself" and that "you know what you know," then ask them to use their own job situation as the usage scenario's background legend.

When Engagement and Suspension of Disbelief Fail

User testing typically works, but there are exceptions. Occasionally, test participants are so lazy and difficult to engage that they never suspend disbelief and work on the tasks for real. For example, if you ask such users to select a product to solve a problem, they'll typically stop at the first remotely related product, even if it's basically unsuitable and wouldn't be bought by anybody who really had the target problem.

In rare cases, such nonrealistic usage is serious enough that you must simply excuse the participant and discard the session's data. If users haven't suspended disbelief and performed conscientiously, you can't trust that anything they've done represents real use.

More commonly, you can rescue the situation by employing a few facilitation tricks to encourage authentic behavior.

The easiest and most common approach is to ask the user whether this is what he or she would do at the office (or at home, for a consumer project). This small reminder is often enough to get users engaged. Variants of this technique include:

  • Ask users if they have enough information to make a decision (when they stop after finding the first nugget of information about the problem).
  • Ask users if they are sure that they've selected the best product (if they stop after finding the first likely purchase, without evaluating alternative options).

If users fail to take the tasks seriously in your first few test sessions, you can usually rescue the study by modifying the test instructions or task descriptions. For example, it's often enough to add an introductory remark such as, "Pretend that you must not only find the best solution, but justify the choice to your boss."

Including an absent boss in the test scenario encourages suspension of disbelief and usually works wonders. I've also found it effective to ask users to "write five bullet points for your boss, explaining the main pros and cons of this product." (You can also ask for three bullets of pros and three bullets of cons.)

In our tests of the investor relations areas of corporate websites, we used a variant of this technique, simply asking the financial analysts and individual investors to decide whether they thought the company would do better or worse than the stock market and state the main reasons why.

Other times, you can induce constraints to better engage users with the task. For example, we once tested an e-commerce site that sold office supplies. In pilot testing, a test task for administrative assistants was to stock the office for a newly hired employee. Unfortunately, the pilot participant wanted to be really nice to this hypothetical new colleague, and bought all the most expensive pens, office chairs, and so on. Although in this case, we could have asked users to pretend that they had to answer to the boss, we chose instead to give them a specific budget. This small change was enough to make users consider each item carefully, which let us discover how they assessed product quality and value.

Despite the artificial nature of user testing, we typically see enough authentic behavior to easily identify design flaws that would diminish an interface's value for real customers. Getting to actually use something is so powerful an experience that most users provide good data. When they don't, you can typically prod users lightly with verbal reminders, or alter the task a bit to get them fully engaged.

Share this article: Twitter | LinkedIn | Google+ | Email