Heyy 👋
Welcome back to K’s DataLadder ✨! Each week, I share a story from my life as a Tech Data Scientist to help you level up your career.
We’ve grown to a wonderful community of 1055 curious and driven data lovers. Thank you for being part of this journey ❤️
This week, I’m diving into part 2 of my introduction to product experimentation (part 1 here), focusing on the differences between A/B tests and Feature Rollouts, and when to use each method.
I’m not sure these technical deep dives are the best fit for my newsletter, I’ll leave that for my Medium page, so make sure to follow me there, because this will be the last one here.
I don’t want to turn this into a technical data science newsletter, there are already hundreds of them. I prefer to focus on sharing the behind-the-scenes experiences of being a data scientist in big tech. Moving forward, I'll stick to that.
Reading Time: 6 minutes
Agenda
Life checkpoint
This week’s story
Why you should care
Two types of experiments
Deciding between A/B Tests & Feature Rollouts
Life Checkpoint: Keeping it real 🌿
Hey friends, I'll be honest – life's been kinda shit lately despite all the good stuff.
I've been battling a chronic illness for the last two years, and this month, it decided to stage a huge comeback. This disease wreaks havoc on my gut and throws my entire system out of whack.
An altered microbiome can be a big cause of anxiety and depression, so I've been feeling pretty anxious and emotionally unstable about life because of it.
And this Thursday, while I was ruminating and panicking over my whole life at 2 AM, something good still emerged. A thought struck me: what if we connected?
I hope you're flattered that I think about you before I sleep lol.
But last week I asked for a sign of life, and your responses made my day. It gave me an idea: I want to set aside some time every now and then to chat with two of you on a 30-minute call each.
Whether you need career advice or data science tips, I'm here for you.
Want to join in? Here's how:
Comment or email me with what you'd like to discuss.
I’ll pick someone semi-randomly. Bonus points for leaving a ❤️ on this week’s edition!
This week’s story
I’ve been working for weeks now on preparing for our upcoming A/B tests in the TV app of Spotify.
Over the last 2 weeks, I had to collaborate back and forth with Product Managers (PMs) & Engineers to:
Check whether each feature is ready for testing
Make sure we had enough users for each experiment without overlaps
Schedule launch dates based on the completion of ongoing tests to free up more users to test on
Set up each test in our internal experimentation platform
Launch the tests that were ready to go
For each test, I needed to make a crucial decision: should the experiment be conducted as an A/B test or a Feature Rollout?
If you're unsure about the difference, don't worry—it's a common challenge even for Engineers and PMs. I'm here to clarify it for you.
Why you should care
(feel free to skip this section if you’ve covered the edition where I introduce the foundations of Product Experimentation)
If you’re dreaming of joining a big tech company like MAANG or Spotify, then know you’re venturing into the field of product-focused companies. This means they're constantly launching new features for many reasons, like keeping users engaged.
But we can't just roll new features out and hope for the best! What if they flop or break the user experience?
That's where experimentation comes in! A/B tests and Feature Rollouts help us validate ideas, gather insights, and make data-driven decisions about what to launch and how.
Two Types of Experiments
This is a bit complex, so I’ll be straightforward. You need to understand that in tech companies, we use two main types of experiments to test our hypotheses and validate our changes: A/B tests & Feature Rollouts.
A/B Tests
An A/B test compares two or more variations of a feature to determine which one performs better. For example, if I want to add a new shortcut on Spotify's Home Page, I would:
Calculate the required sample size based on the success metrics for the test.
Randomly assign users to control and treatment groups.
Each group experiences a different version of the feature, isolating the variations to measure their impact accurately.
The goal is to compare the "treatment" group (with the new feature) to the "control" group (without the feature) to determine the effect of the change.
What are the key characteristics?
Multiple treatments: You can test several variations to see which performs best against the control group.
Set ratios: We can set a specific percentage for how our control and treatment groups will be divided (50/50, or 70/30, whatever we think is best depending on the context).
Custom metrics: Choose metrics that align with your goals, using the Minimum Detectable Effect (MDE) to determine success (more on that in the future).
User behavior insights: Helps in understanding how users interact with different features by applying different types of metrics.
Segment analysis: Zoom in on specific user segments to see differentiated impacts, like by region or age groups.
Exploratory analysis: Allows for a broader exploration of potential impacts.
Exclusivity groups: Helps prevent overlap with other tests by isolating the impact of each feature change. This makes sure that results remain unbiased and that different tests do not interfere with each other.
What are the advantages?
Precise measurement of different variations.
Ability to detect specific impacts on user behavior and metrics.
Control over sample size and treatment ratios.
However, sometimes, our primary concern isn’t about measuring the exact impact of a feature change.
Feature Rollouts
Instead, we just want to align with other platforms’ features or make sure we don’t break stuff. In such cases, we prefer to use Feature Rollouts.
For example, after releasing Lyrics on Spotify's Mobile app, I might roll it out to other platforms without needing a detailed impact analysis for the TV app, since I already understand the feature's impact from the mobile launch.
A feature rollout involves gradually introducing a new feature to users while closely monitoring its impact.
This ensures stability and allows for quick adjustments based on real-time data. If metrics are negatively affected due to an engineering issue, engineers can quickly address and correct the problem.
What are the key characteristics?
Single treatment: Tests one new feature against the existing setup.
Gradual exposure: Starts with a small percentage of users and gradually increases exposure.
Automatic alerts: Does not use success metrics, only monitors important metrics for any negative impact using Non-Inferiority Margin (NIM) for evaluation (more on that later as well).
No exclusivity groups: Cannot use exclusivity groups, so it’s essential to manage overlapping tests carefully.
What are the advantages?
Allows for careful monitoring and adjustment during the rollout.
Suitable for incremental updates and platform alignment.
Minimizes the risk of negative impact by gradually increasing user exposure to the feature. Each day or week, we roll out the feature to more people. If something breaks, we don’t want it to happen to everyone.
Deciding between A/B Tests & Feature Rollouts
All the tests I’m currently launching are A/B tests. They’re being rolled out into the same product space (TV) and are running simultaneously, so it’s important to make sure that the tests don’t affect each other.
When to use A/B Tests?
When there’s a significant change expected to impact key metrics.
When PMs and stakeholders need detailed impact analysis.
When the test needs to be isolated from other ongoing experiments that could potentially impact our metrics and thus analysis.
When it’s essential to have clear and measurable results to make data-driven decisions.
When to use Feature Rollouts?
When aligning with other platforms, eg. launching a feature in TV already existing on Mobile.
When changes are not expected to significantly impact top-line metrics.
When there’s less need for detailed analysis of the feature’s impact on user behavior.
When launching features that need to be introduced gradually to make sure no adverse effects happen.
4 questions to ask yourself
#1. Experiment isolation: Does this experiment's impact need to be isolated from the effects of other ongoing tests in the same product space?
If Yes: Conduct an A/B Test. It allows for the setup of exclusivity groups to ensure that the impact of the experiment is isolated from other ongoing tests, providing clear and precise results.
If No: Proceed with a Feature Rollout.
#2. Feature impact: Does the feature have the potential to move key metrics?
If Yes: Conduct an A/B Test. It allows to set up and evaluate success metrics and from there conduct a detailed impact analysis.
If No: Proceed with a Feature Rollout. It’s better for changes that are not expected to significantly impact top-line metrics, and where the focus is instead on stability and performance.
#3. Newness of feature: Is it a completely new feature or an iteration to align with other platforms/devices?
If it is a completely new feature: Conduct an A/B Test. It’s ideal for new features where precise measurement is necessary to understand the effects.
If it’s an iteration to align with other platforms/devices: Proceed with a Feature Rollout, except if you’re being asked to conduct an impact analysis.
#4. Stakeholder requirements: Does the PM and other stakeholders need a detailed impact analysis?
If Yes: Conduct an A/B Test. It’s the only method that allows you to test the effects on success metrics.
If No: Proceed with a Feature Rollout. This approach is more appropriate when detailed impact analysis is not required, and the focus is on ensuring stability.
That's all for today.
In the future, I'll dive deeper on Medium into topics such as computing sample size for A/B tests, understanding MDE/NIM, choosing the right power and alpha levels, and more.
As you can see, it's a complex subject!
My previous editions:
Please leave a ❤️ or a comment to let me know that you read me. Until then, see you next week for more data stories 🫶
If you’re enjoying these insights, don’t forget to subscribe and follow along on YouTube, Instagram & LinkedIn for more updates and stories.
I read this week's article on the airplane—hope I'm not late to the game haha. Here's my weekly heart (whether I get to chat with you or not) ❤️. I completely agree with keeping the technical stuff on Medium and the practical, real-life talk in the newsletter.
Btw, if I got the chance to, I'd love to chat about the career prospects in data science. I'm still in the early stages of learning about the field, but I'm eager to understand the long-term career paths and future projections. Fulllove 🤍
Hey Khouloud! Thanks for taking some time out and educating us all. As a current data science student navigating his way through masters, it'd be great to connect with you for some data science tips. Thanks!