Summary: Testing a bank account statement with 2 users increased the likelihood of picking the best design from 50% to 76%.
In one early study, I compared two different designs of bank account information. With the first design, bank customers had a 56% success rate; with the second, they had a 76% success rate. The second design was also slightly faster to use and scored higher on a subjective satisfaction survey. These usability metrics, which we collected from a test of 152 users, clearly showed the second design's superiority.
In a second experiment, I gave both designs to two groups of students in a user interface course, and asked them which one they'd recommend to the bank. (Obviously, I didn't tell them how the designs had scored.)
Members of one group made their recommendation without conducting any user testing. They had only a 50% probability — that is, pure chance — of recommending the best design. In fact, the two designs looked almost the same and differed only in a few subtle points that our tests showed caused problems when bank customers checked their account information.
Members of the other group tested two users with each of the two designs. Basically, this is the smallest test possible if you want data from more than one person. Despite their minimal usability data, members of the second group had a 76% probability of recommending the best design.
Even though the people in the second group had done almost no usability work, they still performed vastly better than the people who relied purely on their own judgment. Assuming you prefer more than a 76% probability of design improvement, I usually recommend testing with five users per design, rather than two.