Friday, April 2, 2010

5 Mistakes People Make Analyzing Qualitative Data

My last blog post was about common mistakes that people make when analyzing quantitative data, such as you might get from multivariate testing or business metrics. Today I’d like to talk about the mistakes people make when analyzing and using qualitative data.

I’m a big proponent of using both qualitative and quantitative data, but I have to admit that qualitative feedback can be a challenge. Unlike a product funnel or a revenue graph, qualitative data can be messy and open ended, which makes it particularly tough to interpret.

For the purposes of this post, qualitative information is generated by the following types of activities:
  • Usability tests
  • Contextual Inquiries
  • Customer interviews
  • Open ended survey questions (ie. What do you like most/least about the product?)

Insisting on Too Large a Sample

With almost every new client, somebody questions how many people we need for a usability test “to get significant results.” Now, if you read my last post, you may be surprised to hear me say that you shouldn’t be going for statistical significance here. I prefer to run usability tests and contextual inquiries with around five participants. Of course, I prefer running tests iteratively, but that’s another blog post.

Analyzing the data from a qualitative test or even just reading through essay-type answers in surveys takes a lot longer per customer than running experiments in a funnel or looking at analytics and revenue graphs. You get severely diminishing returns from each extra hour you spend asking people the same questions and listening to their answers.

Here’s an example from a test I ran. The customer wanted to know all the different pain points in their product so that they could make one big sweep toward the end of the development cycle to fix all the problems. Against my better judgment, we spent a full two weeks running sessions, complete with a moderator, observers, a lab, and all the other attendant costs of running a big test. The problem was that we found a major problem in the first session that prevented the vast majority of participants from ever finding an entire section of the interface. Since this problem couldn’t be fixed before moving on to the rest of the sessions, we couldn’t actually test a huge portion of the product and had to come back to it later, anyway.

The Fix: Run small, iterative tests to generate a manageable amount of data. If you’re working on improving a particular part of your product or considering adding a new feature, do a quick batch of interviews with four or five people. Then, immediately address the biggest problems that you find. Once you’re done, run another test to find the problems that were being masked by the larger problems. Keep doing this until your product is perfect (ie. forever). It’s faster, cheaper, and more immediately actionable than giant, statistically significant qualitative tests, and you will eventually find more issues with the same amount of testing time.

It’s also MUCH easier to pick out a few major problems from five hours of testing than it is to find dozens of different problems from dozens of hours of testing. In the end though, you’ll find more problems with the iterative approach.


Extrapolating From Too Small a Sample

I always do this don’t I? Say one thing, and then immediately warn you not to go too far in the opposite direction. The thing is, I get really tired of running five person tests and having a product owner only show up for one session and then go off and address whatever problems s/he saw during that one hour. One or two participants aren’t enough to really get a sense of the pattern of problems in your product.

Besides, I have this little rule of thumb I’ve developed for studies. No matter how great your screener or recruiter, on average for every 10 participants you schedule, one will be a no-show, one will be some sort of statistical outlier (intelligence, computer savvy…something), and one will be completely insane. If the product owner happens to show up only for one of the last two types, their perception of the product’s problems will be totally skewed.

I had one product where we interviewed ten people over the course of two tests. Nine of the ten people were wildly confused by the product, but one, who I swear was a ringer, nailed all the tasks in record time. Guess which session the product manager showed up for? Yeah.

The Fix: As the person making the decisions about what changes you should make in your product, you should be attending all or at least most of your user interview sessions, even if you’re not running them yourself. You should also be looking directly at all of your survey data, not just skimming it or reading a high level report. Honestly, if you’re the one making decisions about product direction, then you are the one who most benefits from listening to your users. If you’re not paying attention to the results, then the testing is really just a waste of time.

Look at all your data before drawing conclusions. I mean it.

Trying to Answer Specific Questions

Qualitative data is very bad at answering specific questions like “Which landing page will people like better?” or “How much will people pay for this?” What it’s great for is generating hypotheses that can then be tested with quantitative means.

In more than one test, I’ve had clients ask me to test various different images to use on landing pages to see which one was most appealing. I always explain that they’re better off just doing a split test to see which one does best, but sometimes they insist. Unfortunately, these sorts of preference differences are often very subtle. Since people are not making the decisions consciously, it's very hard for them to explain why they prefer one thing over another. We always end up getting a lot of people trying to rationalize why they something, and I rarely trust the results.

The Fix: Use qualitative data to generate hypotheses that you then test quantitatively OR to find major problems in your interface. Don’t try to use qualitative data to get a definitive answer to questions about expected user preferences.

Ignoring Inconvenient Results

Because qualitative testing doesn’t generate hard numbers, it’s easy to let confirmation bias sneak into the analysis. While it might be tough to argue with “this registration flow generated 12% more paying customers than the other one,” it’s pretty easy to discount problems observed in user sessions.

I dealt with a particularly resistant product owner who had an excuse for every single participant’s struggles with the product. One was unusually stupid. Another just didn’t understand the task. Another actually understood it but was, for some reason, actively screwing with us. This went on and on while every single participant had the same problems over and over. Also, the discussion guide, which the product owner and everyone on the team had originally thought was perfectly fair, suddenly became wildly biased and the tasks were judged to be impossible. The problem couldn’t possibly have been with the product!

The Fix: If you are finding fault with all of the participants or the moderator or the questions or the survey, it’s time to get somebody neutral into the room to help determine what is biasing the results. Hint: it’s almost certainly you.

Remember, your customers, moderator, and test participants don’t have a stake in making your product seem worse than it is. You, however, may have an emotional stake in making it better than it actually is. Make sure you’re not ignoring results just because they’re not what you want to hear.

Not Acting on the Data

Why would you even bother to run test if you’re not going to pay attention to the results? I mean, tests aren’t free. Even running surveys has an associated cost, since you’re interrupting what your user is doing to ask them to help you out. And yet, so many clients do exactly this.

One client I worked with wanted to set up a system where they ran tests every week. They wanted to have a constant stream of users and potential users coming in all the time so that they could stay in contact with their users. I thought this was a fantastic idea, and so I started bringing people in for them. Unfortunately, after a few months, people began to complain that they were hearing the same problems over and over again.

I explained that they were going to continue to hear the same problems over and over again until they fixed the problems. I gave them a list of the major issues that their current and new users were facing. Every once in awhile, if I complained loudly enough, they would fix one of the easier problems, and unsurprisingly these changes always positively affected their metrics. And yet, it was always a struggle to get the results from the tests incorporated into the product. I eventually stopped running tests and told them that I would be happy to come back and start again as soon as they had addressed some of the major problems.

The Fix: This one should be simple. If you’re going to spend all that time and money generating data, you should act on the results.

I want your feedback!

Have you had problems interpreting or using your qualitative data, or do you have stories about people in your company who have? Please, share them in the comments section!

Want more? Follow me on Twitter!

Also, if your company is currently working on getting feedback from users, I’d love to hear more about what you are doing and what you’d like to be doing better. Please take this short survey!