Statistical significance

Understanding Statistical Significance

Learn when your A/B test results are reliable and how Keak calculates statistical significance.

What is Statistical Significance?

Statistical significance tells you whether the difference between your test variations is real or just random chance. It's the key to knowing when you can trust your results and make confident decisions.

Think of it like flipping a coin: If you flip 10 times and get 7 heads, that could easily happen by chance. But if you flip 1,000 times and get 700 heads, now you have strong evidence the coin is biased.

A/B testing works the same way. Small samples can be misleading, but larger samples give you confidence in the winner.

How Keak Calculates Significance

Keak uses Sequential Probability Ratio Testing (SPRT) — the same statistical framework used by modern experimentation teams at leading tech companies.

This allows Keak to make real-time decisions on whether your test has a true winner, needs more data, or shows no meaningful difference.

What SPRT Means for You

Traditional A/B testing waits for a fixed number of visitors before deciding.

SPRT continuously analyzes data as it arrives — allowing Keak to end tests early when there's enough evidence.

Benefits:

  • Faster insights - Get results up to 70% sooner on average
  • Controlled accuracy - Maintains exact confidence levels even if you check results frequently
  • Real-time confidence - See updated significance calculations as data arrives
  • Detects "no difference" - Stops wasting traffic on inconclusive tests

How Keak Determines a Winner

Behind the scenes, Keak compares two hypotheses:

  • H₀ (null): Both versions perform the same
  • H₁ (alternative): One version performs at least δ better (your minimum detectable effect)

Each new conversion or impression updates the likelihood ratio, which tells Keak how strongly the data supports one hypothesis over the other.

Decision Thresholds

Keak uses mathematical boundaries to make decisions:

Upper Boundary = ln((1 - β) / α)

Lower Boundary = ln(β / (1 - α))

Where:

  • α (Alpha) = 0.05 → 95% confidence level
  • β (Beta) = 0.2 → 80% statistical power
  • δ (Delta) = 0.02 → detects differences of 2 percentage points or more

When Decisions Are Made

If the likelihood ratio crosses a boundary:

  • Crosses upper boundary → Winner declared (significant difference found)
  • Crosses lower boundary → No meaningful difference (futility detected)
  • Between boundaries → Continue collecting data

Understanding Distance to Significance

In your Keak dashboard, you'll see metrics like Distance to Significance and Time Needed.

These estimates tell you how far your test is from reaching a conclusive result.

What "Distance to Significance: 2.4" Means

Your test's log-likelihood ratio is 2.4 units away from the decision boundary.

As more data arrives, this distance decreases. When it reaches 0, Keak declares a winner or determines no difference exists.

Time Needed Estimates

Keak calculates estimated time based on:

  • Current sample size
  • Observed conversion rates
  • Test velocity (samples per day)
  • Minimum detectable effect (δ)

Example: "Time Needed: 3 days" means Keak estimates 3 more days of current traffic levels to reach significance or futility.

Dashboard Terminology

Significance

Your test found a statistically valid winner. The difference between variations is real, not random chance.

What to do: Review the winner and consider implementing it.

Futility

There's likely no meaningful difference between variations worth acting on.

What to do: End the test and try a more impactful change.

Continue

Not enough data yet to make a decision.

What to do: Keep the test running and check back later.

Leading Conclusion

Which direction the test is trending toward (significance or futility).

This helps you anticipate the likely outcome before the test completes.

Why SPRT is Better

SPRT is optimal for A/B testing because it:

Reduces Sample Size

Save up to 70% on average compared to traditional fixed-sample tests.

Provides Early Stopping

Detect winners faster without inflating error rates.

Maintains Accuracy

Exact confidence and power levels even if you peek at results continuously.

Identifies Equivalence

Determines when two variants are effectively equal, saving you from running inconclusive tests indefinitely.

Gives You Timelines

Real-time estimates of when you'll reach a decision help with planning.

The Math (Simplified)

For those interested in the technical details:

Log-Likelihood Ratio (LLR)

Keak computes how strongly the data supports one variant over another:

LLR = log(Probability of data under H₁ / Probability of data under H₀)

As LLR increases, evidence for a true difference strengthens.

Decision Boundaries

With default parameters (α = 0.05, β = 0.2):

  • Upper boundary: 2.944 → Declare winner at 95% confidence
  • Lower boundary: -1.559 → Declare no difference

Distance Calculation

Distance to Significance = Upper Boundary - Current LLR

The smaller this number, the closer you are to declaring a winner.

Sample Size Recommendations

While SPRT adapts to your data, these minimums help ensure reliable results:

By Conversion Rate

High Conversion (>10%)

  • Minimum: 1,000 visitors per variation
  • Recommended: 2,500 visitors per variation

Medium Conversion (2-10%)

  • Minimum: 2,500 visitors per variation
  • Recommended: 5,000 visitors per variation

Low Conversion (<2%)

  • Minimum: 5,000 visitors per variation
  • Recommended: 10,000+ visitors per variation

Best Practices

Run Full Weeks

Always test for at least one complete week to account for:

  • Weekday vs weekend behavior
  • Traffic pattern variations
  • Different user segments by day

Don't Stop Early

Even if you see promising results:

  • Wait for Keak to declare significance
  • Ensure minimum sample size is reached
  • Cover full business cycles

Consider External Factors

Be aware of factors that might affect results:

  • Marketing campaigns
  • Seasonal trends
  • Site speed changes
  • Major news or events

Document Your Tests

Keep records of:

  • Test hypothesis and reasoning
  • Launch date and duration
  • When significance was reached
  • Final decision and implementation

When to Trust Your Results

Trust your test results when:

✅ Keak declares statistical significance

✅ Minimum sample size reached (1000+ visitors per variation)

✅ Test ran for at least one full week

✅ Both weekdays and weekends included

✅ Results align with your hypothesis

When to Be Cautious

Be skeptical of results when:

❌ Very low sample size (<500 visitors total)

❌ Test ran less than one week

❌ Major external events occurred during test

❌ Results contradict strong prior evidence

Advanced: Technical Details

For a deeper mathematical explanation of SPRT methodology, see our Technical SPRT Documentation.

This includes:

  • Complete mathematical formulas
  • Binomial log-likelihood calculations
  • Expected sample size derivations
  • Baseline rate calculations
  • Warmup period methodology

Remember: Good tests take time. Don't rush to conclusions—let the data and SPRT guide you to the right decision.