Technology Transfer of Heuristic Evaluation and Usability Inspection

by Jakob Nielsen on June 27, 1995

Summary: Participants in a course on usability inspection methods were surveyed 7-8 months after the course to find out what methods they were in fact using, and why they used or did not use the methods they had been taught. The major factor in method usage was the quality of the usability information gained from the method, with a very strong correlation between the rated benefit of using a method and the number of times the method had been used. Even though the respondents came from companies with above-average usability budgets (7% of development budgets were devoted to usability), the cost of using the methods was also a very strong factor in determining use. Other observations were that technology transfer was most successful when methods were taught at the time when people had a specific need for them in their project, and that methods need to have active evangelists to succeed.


Originally presented as a keynote at the IFIP INTERACT'95 International Conference on Human-Computer Interaction, Lillehammer, Norway, June 27, 1995.

The Need for More Usable Usability

User interface professionals ought to take their own medicine some more. How often have we heard UI folks complain that "we get no respect" (from development managers)? At the same time, we have nothing but scorn for any programmer who has the attitude that if users have problems with his or her program then it must be the users' fault.

If we consider usability engineering as a system, a design, or a set of interfaces with which development managers have to interact, then it obviously becomes the usability professionals' responsibility to design that system to maximize its communication with its users. My claim is that any problems in getting usability results used more in development are more due to lack of usability of the usability methods and results than they are caused by evil development managers who deliberately want to torment their users.

In order to get usability methods used more in real development projects, we must make the usability methods easier to use and more attractive. One way of doing so is to consider the way current usability methods are being used and what causes some methods to be used and others to remain "a good idea which we might try on the next project." As an example of such studies I will report on a study of what causes usability inspection methods to be used.

Usability Inspection Methods

Usability inspection (Nielsen and Mack, 1994) is the generic name for a set of methods based on having evaluators inspect or examine usability-related aspects of a user interface. Some evaluators can be usability specialists, but they can also be software development consultants with special expertise (e.g., knowledge of a particular interface style for graphical user interfaces), end users with content or task knowledge, or other types of professionals. The different inspection methods have slightly different goals, but normally usability inspection is intended as a way of evaluating user interface designs to find usability problems. In usability inspection, the evaluation of the user interface is based on the considered judgment of the inspector(s). The individual inspection methods vary as to how this judgment is derived and on what evaluative criteria inspectors are expected to base their judgments. In general, the defining characteristic of usability inspection is the reliance on judgment as a source of evaluative feedback on specific elements of a user interface. See the appendix for a short summary of the individual usability inspection methods discussed in this paper.

Usability inspection methods were first described in formal presentations in 1990 at the CHI'90 conference where papers were published on heuristic evaluation (Nielsen and Molich, 1990) and cognitive walkthroughs (Lewis et al., 1990). Now, only four to five years later, usability inspection methods have become some of the most widely used methods in the industry. As an example, in his closing plenary address at the Usability Professionals' Association's annual meeting in 1994 (UPA'94), Ken Dye, usability manager at Microsoft, listed the four major recent changes in Microsoft's approach to usability as:

Many other companies and usability consultants are also known to have embraced heuristic evaluation and other inspection methods in recent years. Here is an example of an email message I received from one consultant in August 1994:

"I am working [...] with an airline client. We have performed so far, 2 iterations of usability [...], the first being a heuristic evaluation. It provided us with tremendous information, and we were able to convince the client of its utility [...]. We saved them a lot of money, and are now ready to do a full lab usability test in 2 weeks. Once we're through that, we may still do more heuristic evaluation for some of the finer points."

Work on the various usability inspection methods obviously started several years before the first formal conference presentations. Even so, current use of heuristic evaluation and other usability inspection methods is still a remarkable example of rapid technology transfer from research to practice over a period of very few years.

Technology Transfer

There are many characteristics of usability inspection methods that would seem to help them achieve rapid penetration in the "marketplace of ideas" in software development organizations:

  • Many companies have just recently realized the urgent need for increased usability activities to improve their user interfaces. Since usability inspection methods are cheap to use and do not require special equipment or lab facilities, they may be among the first methods tried.
  • The knowledge and experience of interface designers and usability specialists need to be broadly applied; inspections represent an efficient way to do this. Thus, inspections serve a similar function to style guides by spreading the expertise and knowledge of a few to a broader audience, meaning that they are well suited for use in the many companies that have a much smaller number of usability specialists than needed to provide full service to all projects.
  • Usability inspection methods present a fairly low hurdle to practitioners who want to use them. In general, it is possible to start using simple usability inspection after a few hours of training. Also, inspection methods can be used in many different stages of the system development lifecycle.
  • Usability inspection can be integrated easily into many established system development practices; it is not necessary to change the fundamental way projects are planned or managed in order to derive substantial benefits from usability inspection.
  • Usability inspection provides instant gratification to those who use it; lists of usability problems are available immediately after the inspection and thus provide concrete evidence of aspects of the interface that need to be improved.

To further study the uptake of new usability methods, I conducted a survey of the technology transfer of usability inspection methods.

Method

The data reported in the following was gathered by surveying the participants in a course on usability inspection taught in April 1993. A questionnaire was mailed to all 85 regular attendees in the tutorial taught by the author at the INTERCHI'93 conference in Amsterdam. Surveys were not sent to students under the assumption that they would often not be working on real projects and that they therefore could not provide representative replies to a technology transfer survey. Similarly, no questionnaires were sent to instructors from other INTERCHI'93 tutorials who were sitting in on the author's tutorial, since they were deemed to be less representative of the community at large.

Of the 85 mailed questionnaires, 4 were returned by the post office as undeliverable, meaning that 81 course attendees actually received the questionnaire. 42 completed questionnaires were received, representing a response rate of 52%.

The questionnaire was mailed in mid-November 1993 (6.5 months after the tutorial) with a reminder mailed in late December 1993 (8 months after the tutorial). 21 replies were received after the first mailing, and another 21 replies were received after the second mailing. The replies thus reflect the respondents' state approximately seven or eight months after the tutorial.

With a response rate of 49%, it is impossible to know for sure what the other half of the course participants would have replied if they had returned the questionnaire. However, data from the two response rounds allows us to speculate on possible differences based on the assumption that the non-respondents would be more like the second-round respondents than the first-round respondents. Table 1 compares these two groups on some relevant parameters. The first conclusion is that none of the differences between the groups are statistically different, meaning that it is likely that the respondents are fairly representative of the full population. Even so, there might be a slight tendency to having the respondents were associated with larger projects than the non-respondents and that the respondents were probably more experienced with respect to usability methods than the non-respondents. Thus, the true picture with respect to the full group of tutorial participants is might reflect slightly less usage of the usability inspection methods than reported here but probably not much less.

 

Table 1
Comparison of respondents from the first questionnaire round with the respondents from the second round. None of the differences between groups are statistically significant.
Question First-round Respondents Second-round Respondents p
Usability effort on project in staff-years 3.1 1.3 .2
Had used user testing before the course 89% 70% .1
Had used heuristic evaluation after the course 65% 59% .7
Number of different inspection methods used after course 2.2 1.8 .5

The median ratio between the usability effort of the respondents' latest project and the project's size in staff-year was 7%. Given the sample sizes, this is equivalent to the 6% of development budgets that was found to be devoted to usability in 31 projects with usability engineering efforts in a survey conducted in January 1993 (Nielsen, 1993). This result further adds to the speculation that our respondents are reasonably representative.

Questionnaire Results

Respondents were asked which of the inspection methods covered in the course they had used in the (approximately 7-8 month) period after the course. They were also asked whether they had conducted user testing after the course. The results from this question are shown in Table 2. Usage frequency in a specific period may be the best measure of the fit between the methods and project needs since it is independent of the methods' history. User testing and heuristic evaluation were clearly used much more than the other methods.

 

Table 2
Proportion of the respondents who had used each of the inspection methods and user testing in the 7-8 month period after the course, the number of times respondents had used the methods, and their mean rating of the usefulness of the methods on a 1-5 scale (5 best). Methods are sorted by frequency of use after the course.
Method Respondents Using Method After INTERCHI Times Respondents Had Used the Method (Whether Before or After the Course) Mean Rating of Benefits from Using Method
User testing 55% 9.3 4.8
Heuristic evaluation 50% 9.1 4.5
Feature inspection 31% 3.8 4.3
Heuristic estimation 26% 8.3 4.4
Consistency inspection 26% 7.0 4.2
Standards inspection 26% 6.2 3.9
Pluralistic walkthrough 21% 3.9 4.0
Cognitive walkthrough 19% 6.1 4.1

Respondents were also asked how many times they had used the methods so far, whether before or after the course. Table 2 shows the mean number of times each method had been used by those respondents who had used it at all. This result is probably a less interesting indicator of method usefulness than is the proportion of respondents who had used the methods in the fixed time interval after the course, since it depends on the time at which the method was invented: older methods have had time to be used more than newer methods.

Finally, respondents were asked to judge the benefits of the various methods for their project(s), using the following 1-5 scale:

1 = completely useless
2 = mostly useless
3 = neutral
4 = somewhat useful
5 = very useful

The results from this question are also shown in Table 2. Respondents were only rated those methods with which they had experience, so not all methods were rated by the same number of people. The immediate conclusion from this question is that all the methods were judged useful, getting ratings of at least 3.9 on a scale where 3 was neutral.

 

Figure 1
Regression chart showing the relation between the rated usefulness of each method and the number of times those respondents who had tried a method had used it. Data was only given by respondents who had tried a method.
Scatterplot

The statistics for proportion of respondents having used a method, their average usefulness rating of a method, and the average number of times they had used the method were all highly correlated. This is only to be expected, as people would presumably tend to use the most useful methods the most. Figure 1 shows the relation between usefulness and times a method was used (r = .71, p < .05) and Figure 2 shows the relation between usefulness and the proportion of respondents who had tried a method whether before or after the course (r = .85, p < .01). Two outliers were identified: Feature inspection had a usefulness rating of 4.3 which on the regression line would correspond to being used 6.7 times though in fact it had only been used 3.8 times on the average by those respondents who had used it. Also, heuristic estimation had a usefulness rating which on the regression line would correspond to having been tried by 56% even though it had in fact only been used by 38%. These two outliers can be explained by the fact that these two methods are the newest and least well documented of the inspection methods covered in the course.

 

Figure 2
Regression chart showing the relation between the rated usefulness of each method and the proportion of respondents who had tried the method. Usefulness ratings were only given by those respondents who had tried a method.
Scatterplot

 

The figures are drawn to suggest that usage of methods follows from their usefulness to projects. One could in fact imagine that the respondents rated those methods the highest that they had personally used the most in order to avoid cognitive dissonance, meaning that causality worked in the opposite direction as that implicitly shown in the figures. However, the correlation between the individual respondents' ratings of the usefulness of a method and the number of times they had used the method themselves is very low (r=.05), indicating that the respondents judged the usefulness of the methods independently of how much they had used them personally. There is only a high correlation in the aggregate between the mean values for each method. Thus, we conclude that the reason for this high correlation is likely to be that usability methods are used more if they are judged to be of benefit to the project. This is not a surprising conclusion but it does imply that inventors of new usability methods will need to convince usability specialists that their methods will be of benefit to concrete development projects.

 

Table 3
Proportion of respondents who used the methods the way they were taught. For each method, the proportion is computed relative to those respondents who had used the method at least once.
Method Respondents using the method
as it was taught
Pluralistic walkthrough 27%
Heuristic estimation 25%
Heuristic evaluation 24%
Standards inspection 22%
Cognitive walkthrough 15%
Feature inspection 12%
Consistency inspection 0%

The survey showed that only 18% of respondents used the methods the way they were taught. 68% used the methods with minor modifications, and 15% used the methods with major modifications (numbers averaged across methods). In general, as shown in Table 3, the simpler methods seemed to have the largest proportion of respondents using them as they were taught. Of course, it is perfectly acceptable for people to modify the methods according to their specific project needs and the circumstances in their organization. The high degree of method modification does raise one issue with respect to research on usability methodology, in that one cannot be sure that different projects use the "same" methods the same way, meaning that one will have to be careful when comparing reported results.

The normal recommendation for heuristic evaluation is to use 3-5 evaluators. Only 35% of the respondents who used heuristic evaluation did so, however. 38% used two evaluators and 15% only used a single evaluator. The histogram in Figure 3 shows the distribution of number of evaluators used for heuristic evaluation.

With respect to user testing, even though 35% did use 3-6 test participants (which would normally be referred to as discount usability testing), fully 50% of the respondents used 10 participants or more. Thus, "deluxe usability testing" is still being used to a great extent. The histogram in Figure 4 shows the distribution of number of test participants used for a test.

 

Histogram   Histogram
Figure 3
Histogram of the number of evaluators normally used by the respondents for heuristic evaluations.
  Figure 4
Histogram of the number of test users normally used by the respondents for user testing.

As one might have expected, the participants' motivation for taking the course had major impact on the degree to which they actually used the inspection methods taught in the course. People who expected to need the methods for their current project indeed did use the methods more than people who expected to need them for their next project, who again used more methods than people who did not anticipate any immediate need for the methods. Table 4 shows the number of different inspection methods used in the (7-8 month) period after the course for participants with different motivation. The table also shows the number of inspection methods planned for use during the next six months. Here, the participants with pure academic or intellectual interests have the most ambitions plans, but we still see that people who had the most immediate needs when they originally took the course plan to use more methods than people who had less immediate needs.

 

Table 4
Relation between the main reason people took the course and the number of different methods they have used.
Motivation for taking the course Proportion of the respondents Number of different inspection methods used since the course Number of different inspection methods planned for use during the next six months
Specific need to know for current project 31% 3.0 2.2
Expect to need to know for next project 21% 1.4 1.7
Expect the topic to be important in future, but don't anticipate any immediate need 14% 1.2 1.3
Pure academic or intellectual interest 12% 2.0 3.4

In addition to the reasons listed in Table 4, 22% of the respondents indicated other reasons for taking the course. 5% of the respondents wanted to see how the instructor presented the materials in order to get material for use in their own classes and 5% wanted to validate their own experience with usability inspection and/or were developing new inspection methods. The remaining 12% of the respondents were distributed over a variety of other reasons for taking the course, each of which was only given by a single respondent.

Free-Form Comments

At the end of the questionnaire, respondents were asked to state their reasons for using or not using the various methods. A total of 186 comments were collected, comprising 119 reasons why methods were used and 67 reasons why methods were not used.

 

Table 5
Classification of the 186 free-form comments made by respondents when asked to explain why they used (or did not use) a method. In each cell, the first number indicates reasons given for using a method and the second number (after the slash) indicates reasons given for not using a method (empty cells indicate that nobody made a comment about a method in that category)
  Cognitive walkthrough Consistency inspection Feature inspection Heuristic evaluation Heuristic estimation Pluralistic walkthrough Standards inspection User testing Proportion of all comments
Method generates good/bad information 9 / 1 5 / 0 5 / 0 3 / 1 4 / 2 5 / 0 6 / 0 20 / 0 33%
Resource and/or time requirements 1 / 3 1 / 3 4 / 1 8 / 1 1 / 2 0 / 11 1 / 0 0 / 2 21%
Expertise and/or skills required 1 / 8 1 / 3 0 / 4 5 / 1 0 / 3   1 / 4   17%
Specific characteristics of individual project 2 / 0 2 / 4 1 / 2   2 / 1   0 / 6 1 / 0 11%
Communication, team-building, propaganda   2 / 0 1 / 0   3 / 0 5 / 0   4 / 0 8%
Method mandated by management   1 / 0 1 / 0 1 / 0 1 / 0   1 / 0 2 / 0 4%
Interaction between multiple methods       3 / 0 1 / 0 1 / 0 0 / 1   3%
Other reasons 0 / 2     2 / 0         2%
Proportion of comments that were positive 48% 55% 63% 88% 60% 50% 45% 93%

Table 5 summarizes the free-form comments according to the following categories:

  • Method generates good/bad information: reasons referring to the extent to which the results of using a method are generally useful.
  • Resource and/or time requirements: reasons related to the expense and time needed to use a method.
  • Expertise and/or skills required: reasons based on how easy or difficult it is to use a method. Mostly, positive comments praise methods for being easy and approachable and negative comments criticize methods for being too difficult to learn. One exception was a comment that listed it as a reason to use heuristic evaluation that it allowed usability specialists to apply their expertise.
  • Specific characteristics of individual project: reasons referring to why individual circumstances made a method attractive or problematic for a specific project. For example, one comment mentioned that there was no need for consistency inspection in a project because it was the first GUI in the company and thus did not have to be consistent with anything.
  • Communication, team-building, propaganda: reasons referring to the ways in which use of a method helps evangelize usability, generate buy-in, or simply placate various interest groups.
  • Method mandated by management: reasons mentioning that something was done because it was a requirement in that organization.
  • Interaction between multiple methods: reasons referring to the way the specific method interacts with or supplements other usability methods.

It can be seen from Table 5 that the most important attribute of a usability method is the quality of the data it generates and that user testing is seen as superior in that respect. In other words, for a new usability method to be successful, it should first of all be able to generate useful information.

The two following criteria in the table are both related to the ease of using the methods: resources and time as well as expertise and skill needed. The respondents view heuristic evaluation as superior in this regard and express reservations with respect to cognitive walkthroughs and pluralistic walkthroughs. Remember that the survey respondents came from projects that had already decided to use usability engineering and that had invested in sending staff to an international conference. The situation in many other organizations is likely to make the cost and expertise issues even more important elsewhere.

Conclusions

In planning for technology transfer of new usability methods, we have seen that the first requirement is to make sure that the method provides information that is useful in making user interfaces better. Equally important, however, is to make the method cheap and fast to use and to make it easy to learn. Actually, method proponents should make sure to cultivate the impression that their method is easy to learn since decisions as to what methods to use are frequently made based on the method's reputation, and not by assessing actual experience from pilot usage. It is likely that cognitive walkthrough suffers from an image problem due to the early, complicated, version of the method (Lewis et al., 1990), even though recent work has made it easier to use (Wharton et al., 1994). The need for methods to be cheap is likely to be even stronger in the average development projects than in those represented in this survey, given that they were found to have above-average usability budgets.

Furthermore, methods should be flexible and able to adapt to changing circumstances and the specific needs of individual projects. The free-form comments analyzed in Table 5 show project needs as accounting for 11% of the reasons listed for use or non-use of a method, but a stronger indication of the need for adaptability is the statistic that only 18% of respondents used the methods the way they were taught, whereas 68% required minor modifications and 15% required major modifications.

A good example of flexibility is the way heuristic evaluation can be used with varying numbers of evaluators. The way the method is usually taught (Nielsen, 1994a) requires the use of 3-5 evaluators who should preferably be usability specialists. Yet, as shown in Figure 3, many projects were able to use heuristic evaluation with a smaller number of evaluators. Of course, the results will not be quite as good, but the method exhibits "graceful degradation" in the sense that small deviations from the recommended practice only results in slightly reduced benefits.

The survey very clearly showed that the way to get people to use usability methods is to get to them at the time when they have specific needs for the methods on their current project (Table 4). This finding again makes it easier to transfer methods that have wide applicability across a variety of stages of the usability lifecycle. Heuristic evaluation is a good example of such a method since it can be applied to early paper mock-ups or written specifications as well as later prototypes, ready-to-ship software, and even the clean-up of legacy mainframe screens that need to be used for a few more years without available funding for major redesign.

A final issue in technology transfer is the need for aggressive advocacy. Figure 1 shows that heuristic evaluation is used somewhat more than its rated utility would justify and that feature inspection is used much less that it should be. The most likely reason for this difference is that heuristic evaluation has been the topic of many talks, panels, seminars, books, and even satellite TV shows (Shneiderman, 1993) over the last few years, whereas feature inspection has had no vocal champions in the user interface community.

Acknowledgments

I thank Michael Muller for help in developing the survey and the many anonymous respondents for taking the time to reply. I thank Robin Jeffries and Michael Muller for helpful comments on an earlier version of this manuscript.

References

  • Bell, B. (1992). Using programming walkthroughs to design a visual language. Technical Report CU-CS-581-92 (Ph.D. Thesis), University of Colorado, Boulder, CO.
  • Bias, R. G. (1994). The pluralistic usability walkthrough: Coordinated empathies. In Nielsen, J., and Mack, R. L. (Eds.), Usability Inspection Methods, John Wiley & Sons, New York, 65-78.
  • Kahn, M. J., and Prail, A. (1994). Formal usability inspections. In Nielsen, J., and Mack, R.L. (Eds.), Usability Inspection Methods, John Wiley & Sons, New York, 141-172.
  • Lewis, C., Polson, P., Wharton, C., and Rieman, J. (1990). Testing a walkthrough methodology for theory-based design of walk-up-and-use interfaces. Proceedings ACM CHI'90 Conference (Seattle, WA, April 1-5), 235-242.
  • Nielsen, J. (1993). Usability Engineering (revised paperback edition 1994). Academic Press, Boston.
  • Nielsen, J. (1994a). Heuristic evaluation. In Nielsen, J., and Mack, R. L. (Eds.), Usability Inspection Methods. John Wiley & Sons, New York. 25-62.
  • Nielsen, J. (1994b). Enhancing the explanatory power of usability heuristics. Proceedings ACM CHI'94 Conference (Boston, MA, April 24-28), 152-158.
  • Nielsen, J., and Mack, R. L. (Eds.) (1994). Usability Inspection Methods. John Wiley & Sons, New York.
  • Nielsen, J., and Molich, R. (1990). Heuristic evaluation of user interfaces. Proc. ACM CHI'90 (Seattle, WA, April 1-5), 249-256.
  • Nielsen, J., and Phillips, V. L. (1993). Estimating the relative usability of two interfaces: Heuristic, formal, and empirical methods compared. Proceedings ACM/IFIP INTERCHI'93 Conference (Amsterdam, The Netherlands, April 24-29), 214-221.
  • Shneiderman, B. (Host) (1993). User Interface Strategies '94. Satellite TV show and subsequent videotapes produced by the University of Maryland's Instructional Television System, College Park, MD.
  • Wharton, C., Rieman, J., Lewis, C., and Polson, P. (1994). The cognitive walkthrough method: A practitioner's guide. In Nielsen, J., and Mack, R. L. (Eds.), Usability Inspection Methods, John Wiley & Sons, New York, 105-140.
  • Wixon, D., Jones, S., Tse, L., and Casaday, G. (1994). Inspections and design reviews: Framework, history, and reflection. In Nielsen, J., and Mack, R.L. (Eds.), Usability Inspection Methods, John Wiley & Sons, New York, 79-104.

Share this article: Twitter | LinkedIn | Google+ | Email