Statistical significance
Understanding Statistical Significance
Learn when your A/B test results are reliable and how Keak calculates statistical significance.
What is Statistical Significance?
Statistical significance tells you whether the difference between your test variations is real or just random chance. It's the key to knowing when you can trust your results and make confident decisions.
Think of it like flipping a coin: If you flip 10 times and get 7 heads, that could easily happen by chance. But if you flip 1,000 times and get 700 heads, now you have strong evidence the coin is biased.
A/B testing works the same way. Small samples can be misleading, but larger samples give you confidence in the winner.
How Keak Calculates Significance
Keak uses Sequential Probability Ratio Testing (SPRT) — the same statistical framework used by modern experimentation teams at leading tech companies.
This allows Keak to make real-time decisions on whether your test has a true winner, needs more data, or shows no meaningful difference.
What SPRT Means for You
Traditional A/B testing waits for a fixed number of visitors before deciding.
SPRT continuously analyzes data as it arrives — allowing Keak to end tests early when there's enough evidence.
Benefits:
- Faster insights - Get results up to 70% sooner on average
- Controlled accuracy - Maintains exact confidence levels even if you check results frequently
- Real-time confidence - See updated significance calculations as data arrives
- Detects "no difference" - Stops wasting traffic on inconclusive tests
How Keak Determines a Winner
Behind the scenes, Keak compares two hypotheses:
- H₀ (null): Both versions perform the same
- H₁ (alternative): One version performs at least δ better (your minimum detectable effect)
Each new conversion or impression updates the likelihood ratio, which tells Keak how strongly the data supports one hypothesis over the other.
Decision Thresholds
Keak uses mathematical boundaries to make decisions:
Upper Boundary = ln((1 - β) / α)
Lower Boundary = ln(β / (1 - α))
Where:
- α (Alpha) = 0.05 → 95% confidence level
- β (Beta) = 0.2 → 80% statistical power
- δ (Delta) = 0.02 → detects differences of 2 percentage points or more
When Decisions Are Made
If the likelihood ratio crosses a boundary:
- Crosses upper boundary → Winner declared (significant difference found)
- Crosses lower boundary → No meaningful difference (futility detected)
- Between boundaries → Continue collecting data
Understanding Distance to Significance
In your Keak dashboard, you'll see metrics like Distance to Significance and Time Needed.
These estimates tell you how far your test is from reaching a conclusive result.
What "Distance to Significance: 2.4" Means
Your test's log-likelihood ratio is 2.4 units away from the decision boundary.
As more data arrives, this distance decreases. When it reaches 0, Keak declares a winner or determines no difference exists.
Time Needed Estimates
Keak calculates estimated time based on:
- Current sample size
- Observed conversion rates
- Test velocity (samples per day)
- Minimum detectable effect (δ)
Example: "Time Needed: 3 days" means Keak estimates 3 more days of current traffic levels to reach significance or futility.
Dashboard Terminology
Significance
Your test found a statistically valid winner. The difference between variations is real, not random chance.
What to do: Review the winner and consider implementing it.
Futility
There's likely no meaningful difference between variations worth acting on.
What to do: End the test and try a more impactful change.
Continue
Not enough data yet to make a decision.
What to do: Keep the test running and check back later.
Leading Conclusion
Which direction the test is trending toward (significance or futility).
This helps you anticipate the likely outcome before the test completes.
Why SPRT is Better
SPRT is optimal for A/B testing because it:
Reduces Sample Size
Save up to 70% on average compared to traditional fixed-sample tests.
Provides Early Stopping
Detect winners faster without inflating error rates.
Maintains Accuracy
Exact confidence and power levels even if you peek at results continuously.
Identifies Equivalence
Determines when two variants are effectively equal, saving you from running inconclusive tests indefinitely.
Gives You Timelines
Real-time estimates of when you'll reach a decision help with planning.
The Math (Simplified)
For those interested in the technical details:
Log-Likelihood Ratio (LLR)
Keak computes how strongly the data supports one variant over another:
LLR = log(Probability of data under H₁ / Probability of data under H₀)
As LLR increases, evidence for a true difference strengthens.
Decision Boundaries
With default parameters (α = 0.05, β = 0.2):
- Upper boundary: 2.944 → Declare winner at 95% confidence
- Lower boundary: -1.559 → Declare no difference
Distance Calculation
Distance to Significance = Upper Boundary - Current LLR
The smaller this number, the closer you are to declaring a winner.
Sample Size Recommendations
While SPRT adapts to your data, these minimums help ensure reliable results:
By Conversion Rate
High Conversion (>10%)
- Minimum: 1,000 visitors per variation
- Recommended: 2,500 visitors per variation
Medium Conversion (2-10%)
- Minimum: 2,500 visitors per variation
- Recommended: 5,000 visitors per variation
Low Conversion (<2%)
- Minimum: 5,000 visitors per variation
- Recommended: 10,000+ visitors per variation
Best Practices
Run Full Weeks
Always test for at least one complete week to account for:
- Weekday vs weekend behavior
- Traffic pattern variations
- Different user segments by day
Don't Stop Early
Even if you see promising results:
- Wait for Keak to declare significance
- Ensure minimum sample size is reached
- Cover full business cycles
Consider External Factors
Be aware of factors that might affect results:
- Marketing campaigns
- Seasonal trends
- Site speed changes
- Major news or events
Document Your Tests
Keep records of:
- Test hypothesis and reasoning
- Launch date and duration
- When significance was reached
- Final decision and implementation
When to Trust Your Results
Trust your test results when:
✅ Keak declares statistical significance
✅ Minimum sample size reached (1000+ visitors per variation)
✅ Test ran for at least one full week
✅ Both weekdays and weekends included
✅ Results align with your hypothesis
When to Be Cautious
Be skeptical of results when:
❌ Very low sample size (<500 visitors total)
❌ Test ran less than one week
❌ Major external events occurred during test
❌ Results contradict strong prior evidence
Advanced: Technical Details
For a deeper mathematical explanation of SPRT methodology, see our Technical SPRT Documentation.
This includes:
- Complete mathematical formulas
- Binomial log-likelihood calculations
- Expected sample size derivations
- Baseline rate calculations
- Warmup period methodology
Remember: Good tests take time. Don't rush to conclusions—let the data and SPRT guide you to the right decision.