Users Know: a/b testing

Showing posts with label a/b testing. Show all posts

Wednesday, April 24, 2013

You Can't Make Good Decisions with Bad Data

I think a critical lesson of the Lean Startup movement is that you have to learn quickly.

The “quickly” part of that lesson can lead to a culture of “good enough.” Your features should be good enough to attract some early adopters. Your design should be good enough to be usable. Your code should be good enough to make your product functional.

While this might drive a lot of perfectionists nuts, I’m all for it. Good enough means that you can spend your time perfecting and polishing only the parts of your product that people care about, and that means a much better eventual experience for your users. It may also mean that you stay in business long enough to deliver that experience.

I think though that there’s one part of your product where the standard for “good enough” is a whole lot higher: Data. Data are different.

You Can’t Make Good Decisions With Bad Data

The most important reason to do good research is that it can keep you from destroying your startup. I’m not being hyperbolic here. Bad data can ruin your product.

Imagine for a moment an a/b testing system that randomly returned the wrong test winner 30% of the time. It would be tough to make decisions based on that information, wouldn’t it? How would you know if you were choosing the right experiment branch?

Qualitative research can be just as bad. I can’t tell you how many founders have spent time and money talking to potential customers and then wondered why nobody used their product. Nine times out of ten, they were talking to the wrong people, asking the wrong questions, or using terrible interview techniques.

I had one person tell me, “bad data are better than no data,” but I strongly disagree here. After all, if I know I don’t have any data, I can go do some research and learn something.

But if I have some bad data, I think I already know the answers. Confirmation bias will make it even harder for me to unlearn that bad information. I’m going to stop looking and start acting on that information, and that may influence all of my product decisions.

If I “know” that all of my users are left handed, I can spend an awful lot of time building and throwing out features for left handed people before realizing that what I got wrong was the original premise. And, of course, that problem is made even worse if I’m not getting good information about how the features are actually performing.

You Have To Keep Doing It

Unlike any given feature or piece of code, collecting data is guaranteed to be part of your process for the life of your startup.

One of the best arguments for building minimum viable products and features is that you might just throw them out once you’ve learned something from them (like that nobody wants what you built).

This isn’t true of collecting data. Obviously you may change the way you collect data or the types of data you collect, but you’re going to keep doing it, because there’s simply no other way to make informed decisions.

Because this is something that you know is absolutely vital to your company, it’s worth getting it right early.

Data Collection Is Not a Mystery

Most of your product development is going to be a mystery. That’s the nature of startups.

You’ve got a new product in a new market, possibly with new technology. You have to do a lot of digging in order to figure out what you should be building. There’s no guide book telling you exactly what features your revolutionary new product should have.

That’s not true of gathering data. There is a ton of useful, pertinent information about the right way to do both qualitative and quantitative research. There are workshops and courses you can take on how to not screw up user interviews. There are coaches you can hire to get you trained in gathering all sorts of data. There are tools you can drop in to help you do a/b testing and funnel tracking. There are blogs you can read written by people who have already made mistakes so that you don’t have to make the same ones. There is a book called Lean Analytics that pretty much lays it out for you.

You don’t have to take advantage of all of these things, but you also don’t have to start from scratch. Taking a little time to learn about the tools and methods already available to you gives you a huge head start.

Good Data Take Less Time Than Bad Data

Here’s the good news: good data actually take less time to collect than bad data. Sure, you may have to do a little bit of upfront research on the right tools and methods, but once you’ve got those down, you’re going to move a hell of a lot faster.

For example, customer development interviews go much more quickly when you’re asking the right questions of the right people. You don’t have to talk to nearly as many users when you know how to not lead them and to interpret their answers well. Observational and usability research becomes much simpler when you know what you’re looking for.

The same is true for quantitative data collection. Your a/b tests won’t seem nearly so random when you’re sure that the information in the system is correct. You won’t have to spend time as much time figuring out what’s going on with your experiments if you trust your graphs.

Good Data Does Not Mean Complete Data

I do want to make one thing perfectly clear: the quest for good data should be more about avoiding bad data than it is about making sure you have every scrap of information available.

If you don’t have all the data, and you know you don’t have all the data, that’s fine. You can always go out and do more research and testing later. You just don’t want to put yourself into the situation where you have to unlearn things later.

You don’t have to have all the answers. You just have to make sure you don’t have any wrong answers. And you do that by setting the bar for “good enough” pretty damn high on your data collection skills.

Like the post? Please share it!

Want more information like this?

My new book, UX for Lean Startups, will help you learn how to do better qualitative and quantitative research. It also includes tons of tips and tricks for better, faster design.

Buy It Now

Wednesday, February 20, 2013

Combining Qualitative & Quantitative Research

Designers are infallible. At least, that’s the only conclusion that I can draw, considering how many of them flat out refuse to do any sort of qualitative or quantitative testing on their product. I have spoken with designers, founders, and product owners at companies of all sizes, and it always amazes me how many of them are so convinced that their product vision is perfect that they will come up with the most inventive excuses for not doing any sort of customer research or testing.

Before I share some of these excuses with you, let’s take a look at the types of research I would expect these folks to be doing on their products and ideas.

Quantitative Reserach

When I say quantitative research in this context, I’m talking about a/b testing, product analytics, and metrics - things that tell you what is happening when users interact with your product. These are methods of finding out, after you’ve shipped a new product, feature, or change, exactly what your users are doing with it.

Are people using the new feature once and then abandoning it? Are they not finding the new feature at all? Are they spending more money than users who don’t see the change? Are they more likely to sign up for a subscription or buy a premium offering? These are the types of questions that quantitative research can answer.

For a simple example, if you were to design a new version of a landing page, you might run an a/b test of the new design against the old design. Half of your users would see each version, and you’d measure to see which design got you more registered users or qualified leads or sales or any other metric you cared about.

Qualitative Research

By qualitative testing, I mean the act of watching people use your product and talking to them about it. I don’t mean asking users what you should build. I just mean observing and listening to your users in order to better understand their behavior.

You might do qualitative testing before building a new feature or product so that you can learn more about your potential users’ behaviors. What is their current workflow? What is their level of technical expertise? What products are they already using? You might also do it once your product is in the hands of users in order to understand why they’re behaving the way they are. Do they find something confusing? Are they getting lost or stuck at a particular point? Does the product not solve a critical problem for them?

For example, you might find a few of your regular users and watch them with your product in order to understand why they’re spending less money since you shipped a new feature. You might give them a task in order to see if they could complete it or if they got stuck. You might interview them about their usage of the new feature in order to understand how they feel about it.

Excuses, Excuses

While it may seem perfectly reasonable to want to know what your users are really doing and why they are doing it, a huge number of designers seem really resistant to performing these simple types of research or even listening to the results. I don’t know why they refuse to pay any attention to their users, but I can share some of the terrible excuses they’ve given me.

A/B Testing is Only Good for Small Changes

I hear this one a lot. There seems to be a misconception that a/b testing is only useful for things like button color and that by doing a/b testing you’re only ever going to get small changes. The argument goes something like, “Well, we can only test very small things and so we will test our way to a local maximum without ever being able to really make an important change to our user experience.”
This is simply untrue.

You can a/b test anything. You can show two groups of users entirely different experiences and measure how each group behaves. You can hide whole features from users. You can change the entire checkout flow for half the people buying things from you. You can test a brand new registration or onboarding system. And, of course, you can test different button colors, if that is something that you are inclined to do.

The important thing to remember here is that a/b testing is a tool. Itʼs agnostic about what youʼre testing. If youʼre just testing small changes, youʼll only get small changes in your product. If, on the other hand, you test big things - major navigation changes, new features, new purchasing flows, completely different products - then youʼll get big changes. And, more importantly, you’ll know how they affected your users.

Quantitative Testing Leads to a Confused Mess of an Interface

This is one of those arguments that has a grain of truth in it. It goes something like, “If we always just take the thing that converts best, we will end up with a confusing mess of an interface.”
Anybody who has looked at Amazonʼs product pages knows the sort of thing that a/b testing can lead to. They have a huge amount of information on each screen, and none of it seems particularly attractive. On the other hand, they rake in money.

Itʼs true that when youʼre doing lots of a/b testing on various features, you can wind up with a weird mishmash of things in your product that donʼt necessarily create a harmonious overall design. You can even wind up with features that, while they improve conversion on their own, end up hurting conversion when they’re combined.

As an example, letʼs say youʼre testing a product detail page. You decide to run several a/b tests simultaneously for the following new features:

 customer photos 
comments
ratings 
extended product details 
shipping information 
sale price 
return info

Now, letʼs imagine that each one of those items, in its own a/b test, increases conversion by some small, but statistically significant margin. That means you keep all of them. Now youʼve got a product detail page with a huge number of things on it. You might, rightly, worry that the page is becoming so overwhelming that youʼll start to lose conversions.

Again, this is not the fault of a/b testing – or in this case, a/b/c/d/e testing. This is the fault of a bad test. You see, itʼs not enough that you run an a/b test. You have to run a good a/b test. In this case, just because the addition of a particular feature to your product page improved conversions doesn’t mean that adding a dozen new features to your product page will increase your conversion.

In this instance, you might be better off running several a/b tests serially. In other words, add a feature, test it, and then add another and test. This way you’ll be sure that every additional feature is actually improving your conversion. Alternatively, you could test a few different versions of the page with different combinations of features to see which converts best.

A/B Testing Takes Away the Need For Design

For some reason, people think that a/b testing means that you just randomly test whatever crazy shit pops into your head. They envision a world where engineers algorithmically generate feature ideas, build them all, and then just measure which one does best.

This is just absolute nonsense.

A/B testing only specifies that you need to test new designs against each other or against some sort of a control. It says absolutely zero about how you come up with those design ideas.

The best way to come up with great products is to go out and observe users and find problems that you can solve and then use good design processes to solve them. When you start doing testing, youʼre not changing anything at all about that process. Youʼre just making sure that you get metrics on how those changes affect real user behavior.

Letʼs imagine that youʼre building an online site to buy pet food. You come up with a fabulous landing page idea that involves some sort of talking sock puppet. You decide to create this puppet character based on your intimate knowledge of your user base and your sincere belief that what they are missing in their lives is a talking sock puppet. Itʼs a reasonable assumption.

Instead of just launching your wholly re-imagined landing page, complete with talking sock puppet video, you create your landing page and show it to only half of your users, while the rest of your users are stuck with their sad, sock puppet-less version of the site. Then you look to see which group of users bought more pet food. At no point did the testing process have anything to do with the design process.

Itʼs really that simple. Nothing about a/b testing determines what youʼre going to test. A/B testing has literally nothing to do with the initial design and research process.

Whatever youʼre testing, you still need somebody who is good at creating the experiences youʼre planning on testing against one another. A/B testing two crappy experiences does, in fact, lead to a final crappy experience. After all, if youʼre looking at two options that both suck, a/b testing is only going to determine which one sucks less.

Design is still incredibly important. It just becomes possible to measure designʼs impact with a/b testing.

There’s No Time to Usability Test

When I ask people whether they’ve done usability testing on prototypes of major changes to their products, I frequently get told that there simply wasn’t time. It often sounds something like, “Oh, we had this really tight deadline, and we couldn’t fit in a round of usability testing on a prototype because that would have added at least a week, and then we wouldn’t have been able to ship on time.”

The fact is you don't have time NOT to usability test. As your development cycle gets farther along, major changes get more and more expensive to implement. If you're in an agile development environment, you can make updates based on user feedback quickly after a release, but in a more traditional environment, it can be a long time before you can correct a big mistake, and that spells slippage, higher costs, and angry development teams. Even in agile environments, it’s still faster to fix things before you write a lot of code than after you have pissed off customers who are wondering why you ruined an important feature that they were using.

I know you have a deadline. I know it's probably slipped already. It's still a bad excuse for not getting customer feedback during the development process. You're just costing yourself time later. I’ve never known good usability testing to do anything other than save time in the long run on big projects.

Qualitative Research Doesn’t Work Because Users Don’t Know What They Want

This is possibly the most common argument against qualitative research, and it’s particularly frustrating, because part of the statement is quite true. Users aren’t particularly good at coming up with brilliant new ideas for what to build next. Fortunately, that doesn’t matter.

Let’s make this perfectly clear. Qualitative research is NOT about asking people what they want. At no point do we say, “What should we build next?” and then relinquish control over our interfaces to our users. People who do this are NOT doing qualitative research.

Qualitative research isn’t about asking people what they want and giving it to them. Qualitative research is about understanding the needs and behaviors of your users. It’s about really knowing what problem you’re solving and for whom.

Once you understand what your users are like and what they want to do with your product, it’s your job to come up with ways to make that happen. That’s the design part. That’s the part that’s your job.

It’s My Vision - Users Will Screw it Up

This can also be called the "But Steve Jobs doesn't listen to users..." excuse.

The fact is, understanding what your users like and don't like about your product doesn't mean giving up on your vision. You don't need to make every single change suggested by your users. You don't need to sacrifice a coherent design to the whims of a user test. You don’t even need to keep a design just because it converts better in an a/b test.

What you do need to do is understand exactly what is happening with your product and why. And you can only do that by gathering data. The data can help you make better decisions, but they don’t force you to do anything at all.

Design Isn’t About Metrics

This is the argument that infuriates me the most. I have literally heard people say things like, “Design can’t be measured, because design isnʼt about the bottom line. Itʼs all about the customer experience.”

Nope.

Wouldnʼt it be a better experience if everything on Amazon were free? Be honest! It totally would.

Unfortunately, it would be a somewhat traumatic experience for the Amazon stockholders. You see, we donʼt always optimize for the absolute best user experience. We make tradeoffs. We aim for a fabulous user experience that also delivers fabulous profits.

While itʼs true that we donʼt want to just turn our user experience design over to short term revenue metrics, we can vastly improve user experience by seeing which improvements and features are most beneficial for both users and the company.

Design is not art. If you think that thereʼs some ideal design that is completely divorced from the effect itʼs having on your companyʼs bottom line, then youʼre an artist, not a designer. Design has a purpose and a goal, and those things can be measured.

So, What’s the Right Answer?

If you’re all out of excuses, there is something that you can do to vastly improve your product. You can use quantitative and qualitative data together.

Use quantitative metrics to understand exactly what your users are doing. What features do they use? How much do they spend? Does changing something big have a big impact on real user behavior?

Use qualitative research to understand why your users do what they do. What problems are they trying to solve? Why are they dropping out of a particular task flow when they do? Why do they leave and never come back.

Let’s look at an example of how you might do this effectively. First, imagine that you have a payment flow in your product. Now, imagine that 80% of your users are not getting through that payment flow once they’ve started. Of course, you wouldn’t know that at all if you weren’t looking at your metrics. You also wouldn’t know that the majority of people are dropping out in one particular place in the flow.

Next, imagine that you want to know why so many people are getting stuck at that one place. You could do a very simple observational test where you watch four or five real users going through the payment flow in order to see if they get stuck in the same place. When they do, you could discuss with them what stopped them there. Did they need more information? Was there a bug? Did they get confused?

Once you have a hypothesis about what’s not working for people, you can make a change to your payment flow that you think will fix the problem. Neither qualitative nor quantitative research tells you what this change is. They just alert you that there’s a problem and give you some ideas about why that problem is happening.

After you’ve made your change, you can run an a/b test of the old version against the new version. This will let you know whether your change was effective or if the problem still exists. This creates a fantastic feedback loop of information so that you can confirm whether your design instincts are functioning correctly and you’re actually solving user problems.

As you can hopefully see from the example, nobody is saying that you have to be a slave to your data. Nobody is saying that you have to turn your product vision or development process over to an algorithm or a focus group. Nobody is saying that you can only make small changes. All I’m saying is that using quantitative and qualitative research correctly gives you insight into what your users are doing and why they are doing it. And that will be good for your designs, your product, and your business.

Like the post?

Follow me on Twitter!

Pre-order the book!

Friday, March 2, 2012

Fucking Ship It Already: Just Not to Everyone At Once

There is a pretty common fear that people have. They’re concerned that if they ship something that isn’t ready, they’ll get hammered and lose all their customers. Startups who have spent many painstaking months acquiring a small group of loyal customers are hesitant to lose those customers by shipping something bad.

I get it. It’s scary. Sorry, cupcake. Do it anyway.

First, your early adopters tend to be much more forgiving of a few misfires. They’re used to it. They’re early adopters. Yours is likely not the first product they’ve adopted early. If you’re feeling uncomfortable, go to the Way Back Machine and look at some first versions of products you use every day. When your eyes stop bleeding, come back and finish this post. I’ll wait.

Still nervous? That’s ok. The lucky thing is that you don’t have to ship your ridiculous first draft of a feature to absolutely everybody at once. Let’s look at a few strategies you can use to reduce the risk.

The Interactive Mockup

A prototype is the lowest risk way you can get your big change, new feature, or possible pivot in front of real users without ruining your existing product. And you’d be surprised at how often it helps you find easy to fix problems before you ever write a line of “real code.”

If you don’t want to build an entire interactive prototype, trying showing mockups, sketches, or wireframes of what you’re considering. The trick is that you have to show it to your real, current users.

Get on a screen share with some users and let them poke around the prototype. Whatever you do, never tell them why you made the changes or what the feature is supposed to be for or how awesome it is. You want the experience to be as close as possible to what it would be if you just released the feature into the wild and let the users discover it for themselves.

If your product involves any sort of user generated content, taking the time to include some of the tester’s own content can be extremely helpful. For example, if it’s a marketplace where you can buy and sell handmade stuff, having the user’s own items can make a mockup seem more familiar and orient the user more quickly.

Of course, if there’s sensitive financial data or anything private, make sure to get the user’s permission BEFORE you include that info in their interactive prototype. Otherwise, it’s just creepy.

The Opt In

Another method that works well is the Opt In. While early adopters tend to be somewhat forgiving of changes or new features, people who opt in to those changes are even more so.

Allowing people to opt in to new features means that you have a whole group of people who are not only accepting of change but actively seeking it out. That’s great for getting very early feedback while avoiding the occasional freakout from the small percentage of people who just enjoy screaming, “Things were better before!”

Here’s a fun thing you can learn from your opt in group: If people who explicitly ask to see your new feature hate your new feature, your new feature probably sucks.

The Opt Out

Of course, you don’t only want to test your new features or changes with people who are excited about change. You also want to test them with people who hate change, since they’re the ones who are going to scream loudest.

Once you’re pretty sure that your feature doesn’t suck, you can share it with more people. Just make sure to let them go back to the old way, and then measure the number of people who actually do switch back.

Is it a very vocal 1% that is voting with their opt out? You’re probably ok. Is half of your user base switching back in disgust? You may not have nailed that whole “making it not suck” thing.

The n% Rollout

Even with an opt out, if you’ve got a big enough user base, you can still limit the percentage of users who see the change. In fact, you really should be split testing this thing 50/50, but if you want to start with just 10% to make sure that you don’t have any major surprises, that’s a totally reasonable thing to do.

When you roll a new feature out to a small percentage of your users, just make sure that you know what sorts of things you’re looking for. This is a great strategy for seeing if your servers are going to keel over, for example.

It’s also nice for seeing if that small, randomly selected cohort behaves any differently from the group that doesn’t have the new feature. Is that cohort more likely to make a purchase? Are they more likely to set fire to their computers and swear never to use your terrible product ever again? These are both good things to know.

Do remember, however, that people on the internet talk about things. Kind of a lot. If you have any way at all for your users to be in contact with one another, people will find out that their friends are seeing something different. This can work for or against you. Just figure out who’s whining the loudest about being jealous of the other group, and you’ll know whether to continue the rollout. What you want to hear is, “Why don’t I have New New New New New Thing, yet?” and not “Just be thankful that they haven’t forced the hideous abomination on you. Then you will have to set your computer on fire.”

The New User Rollout

Of course, if you’re absolutely terrified of your current user base (and you’d be surprised at how many startups seem to be), you can always release the change only to new users.

This is nice, because you get two completely fresh cohorts where the only difference is whether or not they’ve seen the change. It’s a great way to do A/B testing.

On the other hand, if it’s something that’s supposed to improve things for retained users or users with lots of data, it can take a really long time to get enough information from this. After all, you need those new cohorts to turn into retained users before seeing any actual results, and that can take months.

Also, whether or not new users love your changes doesn’t always predict whether your old users will complain. Your power users may have invested a lot of time and energy into setting up your product just the way they want it, and making major changes that are better for new folks doesn’t always make them very happy.

In the end, you need to make the decision whether you’ll have enough happy new users to offset the possibly angry old ones. But you’ll probably need to make that decision about a million more times in the life of your startup, so get used to it.

So, are you ready to fucking ship it, already? Yes. Yes, you are. Just don't ship it to everybody all it once.

Now, follow me on Twitter.

Wednesday, September 21, 2011

How Metrics Can Make You a Better Designer

I have another new article in Smashing Magazine's UX section: How Metrics Can Make You a Better Designer.

Here's a little sample:

Metrics can be a touchy subject in design. When I say things like, “Designers should embrace A/B testing” or “Metrics can improve design,” I often hear concerns.

Many designers tell me they feel that metrics displace creativity or create a paint-by-numbers scenario. They don’t want their training and intuition to be overruled by what a chart says a link color should be.

These are valid concerns, if your company thinks it can replace design with metrics. But if you use them correctly, metrics can vastly improve design and make you an even better designer.

Read the rest here >

Friday, September 2, 2011

Why Your Test Results Don't Add Up and What To Do About It

Check out my guest blog post for KISSmetrics: Why Website Test Results Don’t Always Add Up & What To Do About It!

Here's a little sample:

If you do enough A/B testing, I promise that you will eventually have some variation of this problem:

You run a test. You see a 10% increase in conversion. You run a different, unrelated test. You see a 20% increase in conversion. You roll both winning branches out to 100% of your customers. You donʼt see a 30% increase in conversion.

Why? In every world Iʼve ever inhabited, 10 plus 20 equals 30, right? Youʼve proven that both changes youʼve made are improvements. Why arenʼt you seeing the expected overall increase in conversions when you roll them both out?

Read the Rest at KISSmetrics.

Thursday, August 18, 2011

Breaking the Rules: A UX Case Study

Recently, I was lucky enough to be featured in Smashing Magazine's brand new UX section! Smashing is already a fabulous resource for web design and coding, and I think it's going to be a great place to learn about user experience.

You should read my first article, Breaking the Rules: A UX Case Study.

Here's a little something to get you started:

I read a lot of design articles about best practices for improving the flow of sign-up forms. Most of these articles offer great advice, such as minimizing the number of steps, asking for as little information up front as possible, and providing clear feedback on the status of the user’s data.

If you’re creating a sign-up form, you could do worse than to follow all of these guidelines. On the other hand, you could do a lot better.

Design guidelines aren’t one size fits all. Sometimes you can improve a process by breaking a few rules. The trick is knowing which rules to break for a particular project.

Read the rest of the article!

Monday, August 1, 2011

Hypothesis Generation vs. Validation

A lot of people ask me what sort of research they should be doing on their products. There are a lot of factors that go into deciding which sort of information you should be getting from users, but it pretty much boils down to a question of “what do you want to learn.”

Today, I’m going to explore one of the many ways you can go about looking at this: Hypothesis Generation vs. Hypothesis Validation. Don’t worry, it’s not as complicated as I’ve made it sound.

What is Hypothesis Generation

In a nutshell, hypothesis generation is what helps you come up with new ideas for what you need to change. Sure, you can do this by sitting around in a room and brainstorming new features, but reaching out and learning from your users is a much faster way of getting the right data.

Imagine you were building a product to help people buy shoes online. Hypothesis generation might include things like:

Talking to people who buy shoes online to explore what their problems are
Talking to people who don’t buy shoes online to understand why
Watching people attempt to buy shoes both online and offline in order to understand what their problems really are rather than what they tell you they are
Watching people use your product to figure out if you’ve done anything particularly confusing that is keeping them from buying shoes from you

As you can see, you can do hypothesis generation at any point in the development of your product. For example, before you have any product at all, you need to do research to learn about your potential users’ habits and problems. Once you have a product, you need to do hypothesis generation to understand how people are using your product and what problems you’ve caused.

To be clear, the research itself does not generate hypotheses. YOU do that. The goal is not to just go out and have people tell you exactly what they want and then build it. The goal is to gain an understanding of your users or your product to help you think up clever ideas for what to build next.

Good hypothesis generation almost always involves qualitative research. At some point, you need to observe people or talk to people in order to understand them better.

However, you can sometimes use data mining or other metrics analyzation to begin to generate a hypothesis. For example, you might look at your registration flow and notice a severe drop off half way through. This might give you a clue that you have some sort of user problem half way through your registration process that you might want to look into with some qualitative research.

What is Hypothesis Validation

Hypothesis validation is different. In this case, you already have an idea of what is wrong, and you have an idea of how you might possibly fix it. You now have to go out and do some research to figure out if your assumptions and decisions were correct.

For our fictional shoe-buying product, hypothesis validation might look something like:

Standard usability testing on a proposed new purchase flow to see if it goes more smoothly than the old one
Showing mockups to people in a particular persona group to see if a proposed new feature appeals to that specific group of people
A/B testing of changes to see if a new feature improves purchase conversion

Hypothesis validation also almost always involves some sort of tangible thing that is getting tested. That thing could be anything from a wireframe to a prototype to an actual feature, but there’s something that you’re testing and getting concrete data about.

You can use both quantitative and qualitative data to validate a hypothesis, but you have to choose carefully to make sure you’re testing the right thing. In fact, sometimes a combination of the two is most effective. I’ve got some information on choosing the right type of test in my post Qual vs. Quant: When to Listen and When to Measure.

Types of Research

Why is this distinction between generation and validation important? Because figuring out whether you’re generating hypotheses or validating them is necessary for deciding which type of research you want to do.

Want to understand why nobody is registering for your site? Generate some hypotheses with observational testing of new users. Want to see if the mockups for your new registration flow are likely to improve matters? Validate your hypothesis with straight usability testing of a prototype.

These aren’t the only factors that go into determining the type of research necessary for your stage of product development, but they’re an important part of deciding how to learn from your users.

Like the post? Follow me on Twitter!

Wednesday, May 25, 2011

Designers Need to A/B Test Their Designs

The other day, I posted something I strongly believe on Twitter. A few people disagreed. I’d like to address the arguments, and I’d love to hear feedback and counter-arguments in the comments where you have more than 140 characters to tell me I’m wrong.

My original tweet was, “I don't trust designers who don't want their designs a/b tested. They're not interested in knowing if they were wrong.”

Here are some of the real responses that I got on Twitter with my longer form response.

“There’s a difference between A/B testing (public) and internally deciding. Design is also a matter of taste.”

I agree. There is a big difference between A/B testing in public and internally deciding. That’s why I’m such a huge fan of A/B testing. You can debate this stuff for weeks, and often it’s a huge waste of time.

When you’re debating design internally, what you should be asking is “which of these designs will be better for the business and users.” A/B testing tells you conclusively which side is right. Debate over!

Ok, there’s the small exception of short term vs. long term effects, which is addressed later, but in general, it’s more definitive than the opinion of the people in the room.

With regard to the “matter of taste,” that’s both true and false. Sure, different people like different designs. What you’re saying by refusing to A/B test your designs is that your taste as a designer should always trump that of the majority of your users. As long as you like your design, you don’t care whether users agree with you.

If you want your design aesthetic to override that of your users, you should be an artist. I love art. I even, very occasionally, buy some of it.

But I pay for products all the time, and I tend to buy products that I think are well designed, not necessarily ones where the designer thought they were well designed.

“If Apple had done A/B tests for the iPod in 2001 with a user-replaceable battery, that version would’ve likely won—initially.”

Honestly, it still might win. Is taking your iPod to the Apple store when the battery dies really a feature? No! It’s a design tradeoff. They couldn’t create something with the other design elements they wanted that still had a replaceable battery. That’s fine.  

But all other things about the iPod being totally equal, wouldn’t you buy the one where you could replace the battery yourself? I would. The key there is the phrase “totally equal.”

“Seeing far into the future of technology is not something consumers are particularly great at.”

I feel like the guy who made this argument was confusing A/B testing with bad qualitative testing or just asking users what they would like to see in a product.

This isn’t what A/B testing does. A/B testing measures actual user behavior right now. If I make this change, will they give me more money? It has literally nothing to do with asking users to figure out the future of technology.

“A/B testing has value but shouldn't be litmus test for designer or a design”

Really? What should be the litmus test for a designer or a design if not, “does this change or set of changes actually improve the key metrics of my company”?

In the end, isn’t that the litmus test for everybody in a company? Are you contributing to the profitability of the business in some way?

If you have some better way of figuring out if your design changes are actually improving real metrics, I’d love to hear about it. We can make THAT the litmus test for design.

“Data is valuable but must be interpreted. Doesn't "prove" wrongness or rightness. Designer still has judgment.”

I agree with the first sentence. Data certainly must be interpreted. I even agree that certain design changes may hurt certain metrics, and that can be ok if they’re improving other metrics or are shown to improve things in the long run.

But the only way to know if your overall design is actually making things better for your users is by scientifically testing it against a control.

If your overall design changes aren’t improving key metrics, where’s the judgement there? If you release something that is meant to increase the number of signups and it decreases the number of signups, I think that pretty effectively “proves wrongness.”

The great thing about A/B testing is that you know when this happens.

“Is it the designers fault, surely more appropriate to an IA? After all the IA should dictate the feel/flow.”

First off, I don’t work for companies that are big enough to draw a distinction between the two, but I’m sure there’s enough blame to go around.

Secondly, I think that everybody in an organization has the responsibility to improve key metrics. If you think that your work shouldn’t increase revenue, retention, or other numbers you want higher, why should you be employed?

Design of all kinds is important and can have a huge impact on company profitability. That impact can and should be measured. You don’t get a pass just because you’re not changing flow.

“A/B tests are a snapshot of current variables. They don’t embody nor convey a bigger strategy or long-term vision.”

Also, “That’s only an absolute truth you can rely on if you A/B test for the entire lifespan of the product, which defeats the point.”

These are excellent points, and they are a drawback of A/B testing. It’s sometimes tough to tell what the long term effects of a particular design change are going to be from A/B testing. Also, A/B testing doesn’t easily account for design changes that are a part of a larger design strategy.

In other words, sometimes you’re going to make changes that cause problems with your metrics in the short term, because you strongly believe that it’s going to improve things long term.

However, I believe that you address this by recognizing the potential for problems and designing a better test, not by refusing to A/B test at all.

Just because this particular tool isn’t perfect doesn’t mean we get to fall back on “trust the designers implicitly and never make them check their work.” That doesn’t work out so well sometimes either.

An Argument I Didn’t Hear
 There’s one really good argument that I didn’t get, although some of the above tweets touched on it. Sometimes changes that individually test well don’t test well as a whole.

This is a really serious problem with A/B testing because you can wind up with Frankenstein-style interfaces. Each individual decision wins, but the combination is a giant mess.

Again, you don’t address this by not A/B testing. You address it by designing better tests and making sure that all of your combined decisions are still improving things.

How I Really Feel

Look, if I’m hiring for a company that wants to make money (and most of them do), I want my designers to understand how their changes actually affect my bottom line.

No matter how great a designer thinks his or her design is, if it hurts my revenue and retention or other key metrics, it’s a bad design for my company and my users.

Saying you’re against having your designs A/B tested sounds like you’re saying that you just don’t care whether what you’re changing works for users and the company. As a designer, you’re welcome to do that, but I’m not going to work with you.

Like the post? Follow me on Twitter!

Monday, February 28, 2011

Qual vs. Quant: When to Listen and When to Measure

I have written about qualitative vs quantitative research before, but I still get a lot of questions about it. To answer some of those questions, I want to do a bit of a deeper dive here and give a few examples to help startups answer the key question.

To be clear, that key question is “when should I use qualitative research, and when should I use quantitative research for the best results?” Another way of looking at this is, “when should I be listening to users, and when should I just be shipping code and looking at the metrics?”

The real answer is that you should do both constantly, but there are times when one is significantly more helpful than the other.

I will continue to repeat my cardinal rule: Quantitative research tells you WHAT your problem is. Qualitative research tells you WHY you have that problem.

Now, let’s look at what that actually means to you when you’re making product decisions.

A One Variable Change

When you’re trying to decide between qualitative and quantitative testing for any given change or feature, you need to figure out how many variables you’re changing.

Here’s a simple example: You have a product page with a buy button on it. You want to see if the buy button performs better if it’s higher on the page without really changing anything else. Which do you do? Qualitative of quantitative?

That’s right, I said this one was simple. There’s absolutely no reason to qualitatively test this before shipping it. Just get this in front of users and measure their actual rate of clicking on the button.

The fact is, with a change this small, users in a testing session or discussion aren’t going to be able to give you any decent information. Hell, they probably won’t even notice the difference. Qualitative feedback here is not going to be worth the time and money it takes to set up interviews, talk to users, and analyze the data.

More importantly, since you are only changing one variable, if user behavior changes, you already have a really good idea WHY it changed. It changed because the CTA button was in a better place. There’s nothing mysterious going on here.

There’s an exception! In a few cases, you are going to ship a change that seems incredibly simple, and you are going to see an enormous and surprising change in your metrics (either positive or negative). If this happens, it’s worth running some observational tests with something like UserTesting.com where you just watch people using the feature both before and after the change to see if anything weird is happening. For example, you may have introduced a bug, or you may have made it so that the button is no longer visible to certain users.

Lean UX - A Case Study

For those very, very few (ok, none) of you who read my blog but don't read Eric Ries's blog, Startup Lessons Learned, I have some exciting news for you. But first, why the hell aren't you reading Eric's blog? You really should. It's great.

I've a written a guest post that now appears on the Startup Lessons Learned blog. It's a case study of a UX project I did with the lean startup Food on the Table.

If you're wondering whether design works well with lean startups, I answer that question in the post. Spoiler alert: The answer is 'yes'.

Thursday, January 6, 2011

Testing Whether Your Users Will Buy

As you all know by now, I’m a huge proponent of qualitative user testing. I think it’s wonderful for learning about your users and product.

But it’s not a panacea. The fact is, there are many questions that qualitative testing either doesn’t answer well or for which qualitative testing isn’t the most efficient solution. I cover some of them in my A Faster Horse post.

The trick is knowing which questions you can answer by listening to your users and which questions need a different methodology.

Unfortunately, one of the most important questions people want answered isn’t particularly well suited to qualitative testing.

If I Build It, Will They Buy?

I get asked a lot whether users will buy a product if the team adds a specific feature. Sadly, I always have to answer, “I have no idea.”

The problem is, people are terrible at predicting their future behavior. Imagine if somebody were to ask you if you were going to buy car this year. Now, for some of you, that answer is almost certainly yes, and for others it’s almost certainly no. But for most of us, the answer is, “it depends on the circumstances.”

For some, the addition of a new feature - say, an electric motor - might be the deciding factor, but for many the decision to buy a car depends on a lot of factors, most of which aren’t controlled by the car manufacturer: the economy, whether a current car breaks down, whether we win the lottery or land that job at Goldman Sachs, etc. There are other factors that are under the control of the car company but aren't related to the feature: maybe the new electric car is not the right size or isn't in our price range or isn't our style.

This is true for smaller purchases too. Can you absolutely answer whether or not you will eat a cookie this week? Unless you never eat cookies (I'm told these people exist), it’s probably not something you give a lot of thought to. If somebody were to ask you in a user study, your answer would be no better than a guess and would possibly even be biased by the simple act of having the question asked.

Admit it, a cookie sounds kind of good right now, doesn’t it?

There are other reasons why qualitative testing isn't great at predicting future behavior, but I'm not going to bore you with them. The fact is, it's just not the most efficient or effective method for answering the question, "If I build it, will they come?"

What Questions Can Qualitative Research Answer Well?

Qualitative research is phenomenal for telling you whether your users can do x. It tells you whether the feature makes sense to them and whether they can complete a given task successfully.

The Dangers of Metrics (Only) Driven Product Development

When I first started designing, it was a lot harder to know what I got right. Sure, we ran usability tests, and we looked generally at things like page counts and revenue before and after big redesigns, but it was still tough to know exactly what design changes were making the biggest difference. Everything changed once I started working with companies that made small, iterative design changes and a/b tested the results against specific metrics.

To be clear, not all the designers I know like working in this manner. After all, it's no fun being told that your big change was a failure because it didn't result in a statistically significant increase in revenue or retention. In fact, if you're a designer or a product owner and are required to improve certain metrics, it can sometimes be tempting to cheat a little.

This leads to a problem that I don't think we talk about enough: Metrics (Only) Driven Product Development.

What Is Metrics (Only) Driven Product Development?

Imagine that you work at a store, and your manager has noticed that when the store is busy, the store makes more money. The manager then tells you that your job is to make the store busier - that's your metric that you need to improve.

You have several options for improving your metric. You could:

Improve the quality of the shopping experience so that people who are already in the store want to stay longer
Offer more merchandise so that people find more things they want to buy
Advertise widely to try to attract more people into the store
Sell everything at half off
Remove several cash registers in order to make checking out take longer, which should increase the number of people in the store at a time, since it will take people longer to get out
Hire people to come hang out in the store

As you can see, all of the above would very likely improve the metric you were supposed to improve. They would all ensure that, for awhile at least, the store was quite busy. However, some are significantly better for the overall health of the store than others.

Please Stop Annoying Your Users

Once upon a time, I worked with a company that was addicted to interstitials. Interstitials, for those of you who don’t know the term, are web pages or advertisements that show up before an expected content page. For example, the user clicks a link or button and expects to be taken to a news article or to take some action, and instead she is shown a web page selling her something.

Like many damaging addictions, this one started out innocently enough. You see, the company had a freemium product, so they were constantly looking for ways to share the benefits of upgrading to the premium version in a way that flowed naturally within the product.

They had good luck with one interstitial that informed users of a useful new feature that required the user to upgrade. They had more good luck with another that asked the user to consider inviting some friends before continuing on with the product.

Then things got ugly.

Customers could no longer use the product for more than a few minutes without getting asked for money or to invite a friend or to view a video to earn points. Brand new users who didn’t even understand the value proposition of the free version were getting hassled to sign up for a monthly subscription.

Every time I tried to explain that this was driving users away, management explained, “But people buy things from these interstitials! They make us money! Besides, if people don’t want to see them, they can dismiss them.”

How This Affects Metrics

Of course, you know how this goes. Just looking at the metrics from each individual interstitial, it was pretty clear that people did buy things or invite friends or watch videos. Each interstitial did, in fact, make us some money. The problem was that overall the interstitials lost us customers and potential customers by driving away people who became annoyed.

The fact that the users could simply skip the interstitials didn’t seem to matter much. Sure people could click the cleverly hidden “skip” button – provided they could find it – but they had already been annoyed. Maybe just a little. Maybe only momentarily. But it was there. The product had annoyed them, and now they had a slightly more negative view of the company.

Here’s the important thing that the company had to learn: a mildly annoyed user does not necessarily leave immediately. She doesn’t typically call customer service to complain. She doesn’t write a nasty email. She just gets a little bit unhappy with the service. And the next time you do something to annoy her, she gets a little more unhappy with the service. And if you annoy her enough, THEN she leaves.

The real problem is that this problem is often tricky to identify with metrics. It’s a combination of a lot of little things, not one big thing, that makes the user move on, so it doesn’t show up as a giant drop off in a particular place. It’s just a slow, gradual attrition of formerly happy customers as they get more and more pissed off and decide to go elsewhere.

If you fix each annoyance and A/B test it individually, you might not see a very impressive lift, because, of course, you still have dozens of other things that are annoying the user. But over time, when you’ve identified and fixed most of the annoyances, what you will see is higher retention and better word of mouth as your product stops vaguely irritating your users.

Some Key Offenders

I can’t tell you exactly what you’re doing that is slightly annoying your customers, but here are a few things that I’ve seen irritate people pretty consistently over the years:

Slowness
Too many interstitials
Not remembering information - for example, not maintaining items in a shopping cart or deleting the information that a user typed into a form if there is an error
Confusing or constantly changing navigation
Inconsistent look and feel, which can make it harder for users to quickly identify similar items on different screens
Hard to find or inappropriately placed call to action buttons
Bad or unresponsive customer service

It’s frankly not easy to fix all of these things, and it can be a leap of faith for companies who want every single change to show a measurable improvement in key metrics. But by making your product less annoying overall, you will end up with happier customers who stick around.

Like the post? Follow me on Twitter!

Also, come hear me speak on Wednesday, Sept. 29^th, at Web 2.0 Expo New York. I’ll be talking about how to effectively combine qualitative research, quantitative analytics, and design vision in order to improve your products.

Friday, April 2, 2010

5 Mistakes People Make Analyzing Qualitative Data

My last blog post was about common mistakes that people make when analyzing quantitative data, such as you might get from multivariate testing or business metrics. Today I’d like to talk about the mistakes people make when analyzing and using qualitative data.

I’m a big proponent of using both qualitative and quantitative data, but I have to admit that qualitative feedback can be a challenge. Unlike a product funnel or a revenue graph, qualitative data can be messy and open ended, which makes it particularly tough to interpret.

For the purposes of this post, qualitative information is generated by the following types of activities:

Usability tests
Contextual Inquiries
Customer interviews
Open ended survey questions (ie. What do you like most/least about the product?)

Insisting on Too Large a Sample

With almost every new client, somebody questions how many people we need for a usability test “to get significant results.” Now, if you read my last post, you may be surprised to hear me say that you shouldn’t be going for statistical significance here. I prefer to run usability tests and contextual inquiries with around five participants. Of course, I prefer running tests iteratively, but that’s another blog post.

Analyzing the data from a qualitative test or even just reading through essay-type answers in surveys takes a lot longer per customer than running experiments in a funnel or looking at analytics and revenue graphs. You get severely diminishing returns from each extra hour you spend asking people the same questions and listening to their answers.

Here’s an example from a test I ran. The customer wanted to know all the different pain points in their product so that they could make one big sweep toward the end of the development cycle to fix all the problems. Against my better judgment, we spent a full two weeks running sessions, complete with a moderator, observers, a lab, and all the other attendant costs of running a big test. The problem was that we found a major problem in the first session that prevented the vast majority of participants from ever finding an entire section of the interface. Since this problem couldn’t be fixed before moving on to the rest of the sessions, we couldn’t actually test a huge portion of the product and had to come back to it later, anyway.

The Fix: Run small, iterative tests to generate a manageable amount of data. If you’re working on improving a particular part of your product or considering adding a new feature, do a quick batch of interviews with four or five people. Then, immediately address the biggest problems that you find. Once you’re done, run another test to find the problems that were being masked by the larger problems. Keep doing this until your product is perfect (ie. forever). It’s faster, cheaper, and more immediately actionable than giant, statistically significant qualitative tests, and you will eventually find more issues with the same amount of testing time.

It’s also MUCH easier to pick out a few major problems from five hours of testing than it is to find dozens of different problems from dozens of hours of testing. In the end though, you’ll find more problems with the iterative approach.

5 Big Mistakes People Make When Analyzing User Data

I was trying to write a blog post the other day about getting various different types of user feedback, when I realized that something important was missing. It doesn’t do any good for me to go on and on about all the ways you can gather critical data if people don’t know how to analyze that data once you have it.

I would have thought that a lot of this stuff was obvious, but, judging from my experience working with many different companies, it’s not. All of the examples here are real mistakes I’ve seen made by smart, reasonable, employed people. A few identifying characteristics have been changed to protect the innocent, but in general they were product owners, managers, or director level folks.

This post only covers mistakes made in analyzing quantitative data. At some point in the future, I’ll put together a similar list of mistakes people make when analyzing their qualitative data.

For the purposes of this post, the quantitative data to which I’m referring is typically generated by the following types of activities:

Multivariate or A/B testing
Site analytics
Business metrics reports (sales, revenue, registration, etc.)
Large scale surveys

Statistical Significance

I see this one all the time. It generally involves somebody saying something like, “We tested two different landing pages against each other. Out of six hundred views, one of them had three conversions and one had six. That means the second one is TWICE AS GOOD! We should switch to it immediately!”

Ok, I may be exaggerating a bit on the actual numbers, but too many people I’ve worked with just ignored the statistical significance of their data. They didn’t realize that even very large numbers can be statistically insignificant, depending on the sample size.

The problem here is that statistically insignificant metrics can completely reverse themselves, so it’s important not to make changes based on results until you are reasonably certain that those results are predictable and repeatable.

The Fix: I was going to go into a long description of statistical significance and how to calculate it, but then I realized that, if you don’t know what it is, you shouldn’t be trying to make decisions based on quantitative data. There are online calculators that will help you figure out if any particular test result is statistically significant, but make sure that whoever is looking at your data understands basic statistical concepts before accepting their interpretation of data.

Also, a word of warning: testing several branches of changes can take a LOT larger sample size than a simple A/B test. If you're running an A/B/C/D/E test, make sure you understand the mathematical implications.

Short Term vs. Long Term Effects

Again, this seems so obvious that I feel weird stating it, but I’ve seen people get so excited over short term changes that they totally ignore the effects of their changes in a week or a month or a year. The best, but not only, example of this is when people try to judge the effect of certain types of sales promotions on revenue.

For example, I've often heard something along these lines, “When we ran the 50% off sale, our revenue SKYROCKETED!” Sure it did. What happened to your revenue after the sale ended? My guess is that it plummeted, since people had already stocked up on your product at 50% off.

The Fix: Does this mean you should never run a short term promotion of any sort? Of course not. What it does mean is that, when you are looking at the results of any sort of experiment or change, you should look at how it affects your metrics over time.

Why Your Customer Feedback is Useless

Here’s the scenario: You have a minimum viable product. You’re talking to your users about it. You’re asking them questions, and they’re answering. But for some reason, it’s just not turning into usable information.

You wonder what’s going on. You imagine that perhapsyour users suck at giving good feedback or else they don’t have anything useful to say. Maybe, you think in a moment of hopeful delusion, your product is so perfect that it can’t be improved by customer feedback.

While these are all possibilities, the reality is that it’s probably not your customers’ fault. So, if you don’t seem to be getting any good data, what IS the problem? Probably one of the following things:

You’re asking the wrong questions

I’ve written before about asking customers the wrong questions, but in summary, customers are very good at giving you certain kinds of information and very bad at other kinds. For example, users are great at telling you about their problems. They can very easily tell you when something isn’t working or interesting or fun to use. What they suck at is telling you how to fix it.

Customers are great at:

Complaining about problems
Describing how they currently perform tasks
Saying whether or not they like a product
Showing you parts of a product that are particularly confusing
Comparing one product to another similar product
Explaining why they chose a particular method of doing something

Customers are bad at:

Predicting their future behavior
Predicting what other people will like
Predicting whether they’ll pay for something
Coming up with innovative solutions to their own or other people’s problems
Coming up with brand new ideas for what would make a product more appealing

To take advantages of users’ strengths, have them describe things like, “Tell me about your most recent experience using the product and how that went for you.” Or ask them questions like, “What about [competitor product] do you particularly enjoy? What do you hate?” You can even ask questions like, “Of the following 5 features, which would you prefer?” You’ll want to be a lot more careful about listening to their answers to overly broad questions like, “What brand new feature would you like to see implemented?” or “What would make this product more fun to use?”

A/B and Qualitative User Testing

Recently, I worked with a company devoted to A/B testing. For those of you who aren't familiar with the practice, A/B testing (sometimes called bucket testing or multivariate testing) is the practice of creating multiple versions of a screen or feature and showing each version to a different set of users in production in order to find out which version produces better metrics. These metrics may include things like "which version of a new feature makes the company more money" or "which landing screen positively affects conversion." Overall, the goal of A/B testing is to allow you to make better product decisions based on the things that are important to your business by using statistically significant data.

Qualitative user testing, on the other hand, involves showing a product or prototype to a small number of people while observing and interviewing them. It produces a different sort of information, but the goal is still to help you make better product decisions based on user feedback.

Now, a big part of my job involves talking to users about products in qualitative tests, so you might imagine that I would hate A/B testing. After all, wouldn't something like that put somebody like me out of a job? Absolutely not! I love A/B testing. It's a phenomenal tool for making decisions about products. It is not the only tool, however. In fact, qualitative user research combined with A/B testing creates the most powerful system for informing design that I have ever seen. If you're not doing it yet, you probably should be.

A/B Testing

What It Does Well

A/B testing on its own is fantastic for certain things. It can help you:

Get statistically significant data on whether a proposed new feature or change significantly increases metrics that matter - numbers like revenue, retention, and customer acquisition
Understand more about what your customers are actually doing on your site
Make decisions about which features to cut and which to improve
Validate design decisions
See which small changes have surprisingly large effects on metrics
Get user feedback without actually interacting with users

Pages