A/B Testing

for Fun and Profit

Ben Tilly

Pictage

(These slides use S5. Click anywhere to continue or use the keyboard shortcuts.)

Sample Programs

This is for after the presentation

What is A/B Testing?

Why A/B test?

What can you A/B test?

A/B tests do not substitute for

What is chi-square?

What to measure

Arrange your measurements

Yes No
A $a_yes $a_no $a
B $b_yes $b_no $b
$yes $no $total
  • $a_yes = # in A who are yes
  • $a_no = # in A who are no
  • $b_yes = # in B who are yes
  • $b_no = # in B who are no

Scary Math Part 1 - Addition

Yes No
A $a_yes $a_no $a
B $b_yes $b_no $b
$yes $no $total
  • $a = $a_yes + $a_no
  • $b = $b_yes + $b_no
  • $yes = $a_yes + $b_yes
  • $no = $a_no + $b_no
  • $total = $a + $b (or $yes + $no)

Scary Math Part 2 - Expectations

Yes No
A $e_a_yes $e_a_no $a
B $e_b_yes $e_b_no $b
$yes $no $total
  • $e_a_yes = $a * $yes / $total
  • $e_a_no = $a * $no / $total
  • $e_b_yes = $b * $yes / $total
  • $e_b_no = $b * $no / $total

Scary Math Part 3 - Chi-square

Scary Math Part 4 - Calculation

We have 4 measurements and 4 expectations. So we have 4 chi-square terms. We add them:
my $chi_square =
    ($a_yes - $e_a_yes)**2 / $e_a_yes
  + ($a_no  - $e_a_no )**2 / $e_a_no
  + ($b_yes - $e_b_yes)**2 / $e_b_yes
  + ($b_no  - $e_b_no )**2 / $e_b_no;

Scary Math Part 5: Interpretation

use Statistics::Distributions qw(chisqrprob);
my $p = chisqrprob(1, $chi_square);
  1. If the samples are all independent...
  2. and the expected predictions are all at least 10...
  3. and the real performance of A and B is the same...
  4. then $p ≈ prob(chi-square should be > $chi_square)
  5. If $p is "small", conclude #3 likely wrong

How Small Is "Small"?

Recap of A/B setup

Recap of chi-square evaluation

A/B test simulation

Best Case Example

A's true conversion: 50%, B's true conversion: 55%

Error Rate and Final Sample Size
confidence p(wrong)   25% 50% 75% 90%
95% 4.6% 146 570 1,500 2,750
98% 1.9% 383 1,170 2,430 3,920
99% 0.9% 670 1,680 3,120 4,770
99.5% 0.4% 1,020 2,200 3,770 5,500

Low Conversion Example

A's true conversion: 10%, B's true conversion: 11%

Error Rate and Final Sample Size
confidence p(wrong)   25% 50% 75% 90%
95% 7.2% 790 4,150 12,700 24,500
98% 3.3% 2,740 9,980 21,700 35,900
99% 1.5% 5,620 15,100 28,500 45,500
99.5% 0.8% 8,900 20,000 34,900 51,500

Low Lift Example

A's true conversion: 50%, B's true conversion: 51%

Error Rate and Final Sample Size
confidence p(wrong)   25% 50% 75% 90%
95% 16.2% 257 3,680 23,700 55,100
98% 8.0% 2,300 20,000 50,600 90,800
99% 4.5% 9,540 34,300 72,000 116,000
99.5% 2.4% 18,400 50,000 90,100 132,000

A/B test Scaling Principles

A/B test scaling tips

More A/B testing tips

Compare apples to apples

A/B ratio need not be 50/50

Beware of hidden correlations

Tip: Use rand()

A/B/C... testing the wrong way

Why is this wrong?

Extreme example

A/B/C... testing the right way

Questions?

(or we can continue on for advanced material)

What is the chi-square distribution?

Comparison with G-test

Testing non-yes/no questions

Some basic terms

Basic properties

Estimating E(X) and Var(X)

Central Limit Theorem

A/B Testing Setup

Variance calculation

The difference of the averages is..?

We can test that!

Final technical note