How to A/B Test Your Emails Without Wasting Months Getting Nowhere

How to A/B Test Your Emails Without Wasting Months Getting Nowhere

Let's be honest about something. Most of the A/B tests running inside ecommerce email programs right now are statistical theater. Pretty dashboards, official-looking winners, zero impact on revenue.

You change a subject line, Klaviyo declares Variant B the winner by 1.2%, you take a victory lap, and then... your monthly email revenue looks exactly the same as it did three months ago.

Sound familiar?

The good news: A/B testing absolutely works. It's how the brands doing $500K+ a month in email keep climbing. The bad news: 90% of operators are testing the wrong stuff, with the wrong sample sizes, drawing the wrong conclusions. Let's fix that.

Why most email A/B tests are useless

Three reasons, in order of how often I see them:

1. Your sample size is too small to mean anything

If you're sending a campaign to 8,000 people and splitting 50/50, each variant gets 4,000 sends. With a 25% open rate and a 2% click rate, you're looking at maybe 80 clicks per variant. A 10% lift on 80 clicks is 8 clicks. That's noise. That's somebody's dog walking past their phone.

You need real numbers to draw real conclusions. More on that in a sec.

2. You're testing things that don't move the needle

Emoji in the subject line. Send time at 10am vs 2pm. Button color. These are not zero, but they're rounding errors compared to what actually matters: the offer, the audience, the angle.

3. You're calling winners way too early

Klaviyo's built-in winner declaration is helpful, but it's not gospel. If you let a test run for 4 hours on a small list, you're declaring winners based on early-opener behavior, not your full audience.

The sample size rule nobody wants to hear

For a clean campaign A/B test, you generally want around 1,000 conversions per variant to feel confident in the result. For most ecommerce brands, that's wildly out of reach for a single send.

So here's the realistic version:

  • Subject line tests (open rate): aim for at least 5,000 recipients per variant. Smaller and you're guessing.

  • Content tests (click rate): minimum 10,000 per variant for anything you'd bet money on.

  • Revenue tests: honestly, you need 20,000+ per variant or you need to repeat the test multiple times.

If you don't have the volume? Run the same test concept across multiple campaigns over 3-4 weeks and look at the pattern. One test is an anecdote. Three tests pointing the same direction is a signal.

What to test, in order of impact

If you only have bandwidth for a few tests a month, here's where to spend it. Top of the list moves the most revenue.

1. The offer itself

This is the king. "Free shipping over $75" vs "15% off your first order" vs "Free gift with $100 purchase." The offer drives more variance than anything else, often 2-3x differences in conversion. Yet most brands set their offer once and never touch it.

Test this in your welcome flow. The first email is the highest-stakes email you'll ever send to a subscriber. A 20% lift in welcome conversion compounds across every single new subscriber for the rest of the year.

2. The angle / hook

Same product, different reason to care. Story-driven vs benefit-driven vs social proof vs scarcity. I've seen the same product email do 3x revenue just by leading with a customer review instead of the product itself.

3. Plain text vs designed

This one surprises people. A founder-style plain text email from "Sarah at [Brand]" often outperforms the gorgeous designed template, especially for flows. Test it. The results will annoy your designer.

4. Subject line

Yes, test these, but don't make this your whole testing program. A great subject line gets you 10-15% more opens. A great offer gets you 100% more revenue. Spend your reps accordingly.

5. Send time

Honestly? Stop testing this unless you have a really specific reason to. The variance is small, the noise is huge, and your audience checks email throughout the day anyway.

How to test flows (where the real money is)

Campaigns get all the attention but flows are where A/B testing pays you forever. A 15% lift in your welcome flow keeps paying every day, every new subscriber, with zero additional work.

Here's the framework I use:

  1. Pick one flow at a time. Welcome, abandoned cart, browse abandonment, post-purchase. Don't test everything at once.

  2. Run the test for at least 4 weeks. Flows get smaller daily traffic than campaigns, so you need time. If you have a healthy flow, 30 days usually gives you enough data.

  3. Test one variable at a time. If you change the subject line AND the email content AND the offer, you have no idea what won.

  4. Look at revenue per recipient, not open rate. Open rate is vanity. RPR is rent.

My favorite welcome flow test: put the offer in email 1 vs holding it until email 2. The "hold it" version often wins on total flow revenue because email 1 builds trust first. But not always. That's why you test.

The 90-day testing roadmap

Here's the schedule that actually moves numbers, for a brand sending 4-8 campaigns per week:

Month 1: Welcome flow

  • Week 1-2: Test offer in email 1 (discount % or shipping or gift)

  • Week 3-4: Test plain text vs designed for the founder email

Month 2: Abandoned cart

  • Week 1-2: Test the timing of email 1 (1 hour vs 4 hours after abandonment)

  • Week 3-4: Test discount-in-email-2 vs no-discount-in-email-2

Month 3: Campaigns

  • Test angle on your top 2 product launches or promos

  • Test plain text founder send vs designed for one major campaign

That's 6 meaningful tests in 90 days. If even half win, you've materially shifted your email program. Compound that over a year and email goes from "yeah we do email" to a real revenue channel.

How to actually call a winner

Three rules:

Rule 1: Wait for the full data. For campaigns, give it at least 24 hours. For flows, at least 30 days or 1,000+ recipients per variant, whichever comes second.

Rule 2: Look at revenue, not opens. An email that gets 5% more opens but 10% less revenue is not the winner. I don't care what the dashboard says.

Rule 3: If the difference is under 10%, run the test again. Statistical significance is real, but for the volumes most brands are working with, anything under 10% is probably noise. Run it a second time before you commit.

The mistake I see every single week

Brands run a test, see Variant B won, switch to Variant B, and never document anything. Six months later they have 40 "winners" and zero institutional knowledge.

Keep a simple Google Sheet. Date, what you tested, hypothesis, sample size, result, and what you learned. Takes 90 seconds per test. After a year, you'll have a playbook that's worth more than any course you could buy.

The bottom line

A/B testing isn't broken. The way most ecommerce brands are doing it is. Test the things that actually matter (offer, angle, format), give it real sample sizes, wait for real data, and document everything. Six clean tests will teach you more than sixty sloppy ones.

Stop testing button colors. Start testing offers. Watch what happens.

Want to know which tests would move the needle for YOUR brand?

Get a free email program audit. We'll look at your flows, segments, and recent campaigns and tell you exactly where the hidden revenue is and what to test first to unlock it.

Get Your Free Audit →