K#16. Why your A/B Test is doomed without a solid hypothesis
A senior data scientist taught me how to write hypotheses, here's what I learned
Hey there 👋
Welcome back to K's DataLadder ✨! I’m Khouloud and I regularly share a story from my life as a Spotify Data Scientist to help you level up in your career (and sometimes in life too).
We’re now 2661 strong! Thank you for being part of this journey 💜
I’ve recently attended a concert by one of my favorite bands: Kalandra. They do some type of Nordic Folk/Rock music, not sure how to describe them, but they’re incredible. Here’s a playlist I made if you’re curious.
Last week, I shared two challenges that can be encountered in A/B testing, and this week, we’re diving into the fundamentals: how to create a solid hypothesis.
Believe it or not, this is the most important step in any A/B test.
PS: I’ve been doing a whole series on A/B testing because it’s one of the most crucial skills you need to develop if you ever want to work in big tech. Make sure to check out part 1 and part 2 as well.
Also, if you haven’t done it yet:
Why the hypothesis matters
A/B tests are simply applications of hypothesis testing, and the hypothesis is the backbone of your test. I hope you took some divination classes in college because a good hypothesis involves looking into the future 🔮.
You’re saying, “I believe this change will lead to this effect”. But in practice, you don’t know for sure, it’s just your best bet, and that’s why it’s called a hypothesis.
It’s basically your best prediction of the future.
Your hypothesis isn’t just the starting point—it’s the north star of your experiment:
It defines the metrics: The hypothesis determines what metrics you’ll measure, as they directly tie back to what you’re predicting.
It clarifies the results: If the hypothesis is weak or vague, interpreting the results becomes difficult, which makes it harder to take action.
Why you must learn this skill
The thing is, it’s a Product Manager’s job to write the hypothesis, but it’s not always done right. Hypotheses often get rushed because PMs can become more focused on getting the test launched ASAP.
And that’s where things can go wrong.
If the hypothesis is off, the metrics will be too, which will make your analysis challenging or impossible. So it’s critical to make sure the test is asking the right questions from the start.
Last week, my colleague, a senior data scientist, urged me to learn how to write those myself.
Here’s what he told me:
“Write the hypothesis yourself and compare it with the PM’s version. This way, you’ll make sure it’s scientifically sound and aligned with what you’ll need for analysis.”
How to write a clear hypothesis?
I’m lucky to be working mainly with senior data scientists because I get to learn the best practices firsthand (and then I share them with you). So here’s a clear process for crafting hypotheses that I learned from my colleague:
Start with the hypothesis (H):
“I have a hypothesis H that suggests the user impact should be U.”
Identify the expected user impact (U):
“The user impact U should show up in the data D.”Create a measurable metric (M):
“I can create a metric M that captures shifts in D.”
This structure flows logically from hypothesis to measurable data: H → U → D → M. In other words:
Hypothesis leads to User Impact
Which is reflected in the Data
And captured by specific Metrics
By following this flow, you make sure that your hypothesis is tied directly to:
What you expect to happen
How you expect to see it in the data
What specific metrics will confirm it
Here’s a checklist for writing a clear, actionable hypothesis:
Make it a statement, not a question: Clearly state what you expect, rather than asking what might happen.
Clarify the outcomes: Be explicit about the changes that would support or weaken your hypothesis.
Be specific about variables: Define what exactly you’re measuring.
Ground it in past research: Don’t guess. Base it on real insights from previous experiments or research.
Keep assumptions: A simple, straightforward hypothesis is easier to test and leads to useful results.
💡 At work, our standard hypothesis format looks like this:
We believe that doing this/building this feature/creating this experience for these people/personas will result in a change in their behavior, as measured by success metric(s).
We will know this is true when we see a change in success metric(s) by a minimum detectable effect size (MDE).
Example in action:
Let’s say we’re testing a feature that automatically enables dark mode after a certain time of day to reduce eye strain and improve late-night usage.
We believe that automatically enabling dark mode for users after 8 PM will result in users spending more time on the app in the evening, as measured by the average session duration after 8 PM.
We will know this is true when we see a change in the average session duration after 8 PM by 5%, with no negative impact on the crash rate or user satisfaction, as measured by in-app feedback.
Of course, writing a strong hypothesis assumes you’ve already identified your success and guardrail metrics. If not, the hypothesis will be incomplete.
But don’t worry! In the next editions, I’ll dive deeper into:
What success and guardrail metrics are, and how to define them properly
The roles of the MDE (Minimum Detectable Effect) and NIM (Non-Inferiority Margin)
Again make sure to review the previous parts:
That’s it for today! I hope this breakdown helps you craft better hypotheses for your experiments. If you enjoyed this edition, please leave a ❤️ or drop a comment—I’d love to know I’m not just talking to a wall 🤓
My socials: YouTube, Instagram, LinkedIn & Medium
Take care & see you soon 👋🏼
Read this from the train. Great article as usual!
I’m looking forward to the next one!