ESBPCS for VCL
Writing a module to test for statistical significance for A-B split ad testing
Thread Starter: Joe Hendricks Started: 10/9/2007 9:18 PM UTC
Replies: 5
Writing a module to test for statistical significance for A-B split ad testing
Which parts of the ESBPCS stats lib should I read up on?  I am planning to automate analysis of our Google Ad Campaigns and want to use my ESBPCS to check for statistical significance between 2 versions of same ad.

The raw data is in our NexusDB database(easily turned into sample sizes, std dev, etc), so the only guidance I need is which statistical ESBPCS functions to read up on for applying to the data to find significance(or not) at the 95% confidence interval.

(I'd be happy to make this available as an ESBPCS sample/demo application when finished)

Thanks!

Joe
RE: Writing a module to test for statistical significance for A-B split ad testing
Joe,

So you have two collections of data that you which to compare - for that it
sounds like you need to do Hypothesis Analysis of the Difference of the
Means at a 95% Confidence Level.

We supply data aware components for the Hypothesis Analysis of the Mean and
Hypothesis Analysis of the Variance - but they are both for single data
sources.

We have used ESBPCS to do Hypothesis Analysis of the Difference of the Means
in our ESBStats package - so I can give you the steps that are involved - if
that is what you are after - I can also look at adding a component for doing
it for the next release but that would be a few weeks away...


Glenn Crouch, mailto:glenn@esbconsult.com
ESB Consultancy, http://www.esbconsult.com
Home of ESBPCS, ESB Calculators, ESBStats and ESBPDF Analysis
Kalgoorlie-Boulder, Western Australia
Re: Writing a module to test for statistical significance for A-B split ad testing
Glenn Crouch wrote:
We have used ESBPCS to do Hypothesis Analysis of the Difference of the Means
in our ESBStats package - so I can give you the steps that are involved - if
that is what you are after  

yes, that would be very helpful.
JoeH
RE: Re: Writing a module to test for statistical significance for A-B split ad testing
yes, that would be very helpful.

Okay for Inference about the Difference of the Means of two lists of Data,
there are three different scenarios - which are you after:

a) Unpaired with unequal variance

b) Unpaired with equal variance

c) Paired.

I think from what you said that your data is paired - ie X[i] and Y[i] for
each i are related - eg same time value - is that right? As that is the
easiest.

I would move the data from your data source into TESBFloatVectors, X and Y -
then

XY := SubtractVectors (X, Y); XY is a TESBFloatVector giving us the
difference of the vectors, X, Y and XY are all the same length.

SampleVarianceAndMean (XY, XYVar, XYMean); XYVar and XYMean are Extended,
giving us the Mean and Variance of the difference of the Vectors.

XYStdDev := Sqrt (XYVar); XYStdDev is a Float, giving us the Standard
Deviation of the difference of the Vectors.

DF := Length (XY) - 1; DF is Integer, and represents the Degrees of
Freedom.

TestStat := (XYMean - MeanDiff) / (XYStdDev / Sqrt (DF + 1)); TestStat
and MeanDiff are Extended, where MeanDiff is the Hypothesised difference of
the Means, so probably for your case that would be 0 and see if the test
fails - which would then prove one is better.

Alpha := 1 - ConfidenceLevel; Alpha and ConfidenceLevel are Extended, and
you've stated ConfidenceLevel of 0.95 I believe.

TestAlpha := InvStudentTGreater (Alpha, DF); TestAlphaOn2 := InvStudentTGreater (Alpha / 2, DF); Use the Inverse
Student T to get the Test Values we are after.

Then it is estimated that the difference of the Mean will lie between:

(XMean - YMean) - TestAlphaOn2 Where XMean and YMean are the Means of the
original X and Y

And

(XMean - YMean) + TestAlphaOn2

Now depending on your Alternate Hypothesis:

1. Less Than: Rejected := TestStat < -TestAlpha;

2. Greater Than: Rejected := TestStat > TestAlpha;

3. Not Equal: Rejected := abs (TestStat) > TestAlpha;

Similarly the p-value depends on the Alternate Hypothesis:

1. Less Than:  PValue := 1 - StudentTLess (-1 * TestStat, DF);

2. Greater Than: PValue := 1 - StudentTLess (TestStat, DF);

3. Not Equal: PValue := 2 * (1 - StudentTLess (TestStat, DF));

if Rejected then "Since the Test Statistic is in the Rejection Region, it is
suggested that the  Null Hypothesis be Rejected"
else "Since the Test Statistic is not in the Rejection Region, it is
suggested that the Null Hypothesis not be Rejected"

hth

Glenn Crouch, mailto:glenn@esbconsult.com
ESB Consultancy, http://www.esbconsult.com
Home of ESBPCS, ESB Calculators, ESBStats and ESBPDF Analysis
Kalgoorlie-Boulder, Western Australia
Re: Writing a module to test for statistical significance for A-B split ad testing
Wow!

Thanks Glenn - give me a week to play with this guideline and my datasets, then get back to this thread :-)

JoeH
RE: Re: Writing a module to test for statistical significance for A-B split ad testing
Joe,

The best book I have that covers basic Statistics and the Hypothesis
Analysis and Inference that is in ESBPCS would be:

Statistics for Management and Economics by Keller and Warrack.

I've used it as my text many times when teaching Business Statistics to
Bachelor of Commerce and MBA students. This is a good book for explaining
why we are doing things :)

Another good book is:  

Weighing the Odds: A Course in Probability and Statistics by David Williams.

Hth


Glenn Crouch, mailto:glenn@esbconsult.com
ESB Consultancy, http://www.esbconsult.com
Home of ESBPCS, ESB Calculators, ESBStats and ESBPDF Analysis
Kalgoorlie-Boulder, Western Australia