1. Why the usual interval breaks here
The textbook 95% confidence interval for a mean is
x̄ ± t(n−1, .975) · s / √n
where x̄ is the sample mean, s the sample standard deviation, n the sample size, and t the Student-t critical value. It assumes the sample mean is itself normally distributed. The Central Limit Theorem (CLT) makes that true only when n is large or the data are symmetric. With strong right-skew and n < 30 the CLT has not arrived: the interval is too short, forced symmetric, and can dip below zero for a positive quantity. So it is not really a 95% interval.
The multiplier, precisely. Because the interval is two-sided, α = 0.05 is split into 0.025 per tail, so t(n−1, .975) is the 0.975 quantile — the value leaving 2.5% in the upper tail — of the t-distribution on n−1 degrees of freedom. We use t rather than z precisely because s is an estimate of σ; that added uncertainty is what gives t its heavier tails, and t → z as n grows. Here s is the sample SD computed with the (n−1) denominator (Bessel's correction) and the standard error is s/√n.
If you have seen the same interval written with z and s/√(N−1), it is not in conflict. The z form is the large-sample approximation to the t form, and the √(N−1) denominator is simply the algebraic equivalent that arises when s is defined with N in the denominator instead of N−1. The two standard errors are mathematically identical:
sN / √(N−1) = s(N−1) / √N
So the two notations agree; the more general t form is the one that matters at small n — exactly the regime this calculator is built for.
2. The fix: change the scale
Many positive, skewed quantities (concentrations, costs, durations, biomarkers) are log-normal: their logarithm is normal. If y = ln(x) is exactly normal, a t-interval on the y-values is valid at any sample size — normality is now exact, not approximate. Four steps:
1) yi = ln(xi) 2) compute ȳ, sy 3) ȳ ± t(n−1,.975) · sy/√n 4) apply exp( · )
The back-transformed interval is asymmetric in the original units — the upper arm reaches further, as skew demands.
3. Which "average" are you bounding? (the key subtlety)
Back-transforming the log-mean does not give the arithmetic mean. exp(ȳ) is the geometric mean, which for a log-normal equals the median. Two different targets:
Median / geometric mean: exp( ȳ ± t · sy/√n )
Arithmetic mean (Cox): exp( ȳ + sy²/2 ± t · √( sy²/n + sy⁴ / [2(n−1)] ) )
The arithmetic mean of a log-normal is exp(μ + σ²/2), so it needs the extra sy²/2 term (Land's method is the exact version). Reporting the back-transformed median as the mean is the most common error — this tool shows both, labelled.
4. Where the coefficient of variation comes in (the screen)
CV = s / x̄ is a unit-free measure of relative spread. For a log-normal there is an exact bridge to the log-scale variance:
σ² = ln( 1 + CV² )
A large CV ⇒ large log-scale spread ⇒ strong skew, which is why the CV is the right trigger. It also gives a free honesty check: estimate the log variance directly as sy² and from the CV as ln(1+CV²), and compare. A big gap warns the data are not log-normal and the back-transformed interval should not be trusted.
A gamma companion (the three-way check). The same CV implies a second reference value under the gamma model: ψ′(1/CV²) (trigamma). The two straddle CV² — log-normal ln(1+CV²) just below, gamma ψ′(1/CV²) just above (agreeing to order CV², splitting at order CV⁴) — so comparing sy² to both makes the check three-way: nearer ln(1+CV²) → log-normal (log-scale t); nearer ψ′(1/CV²) → gamma (a gamma GLM with a log link, no back-transform bias); outside both → heavier/lighter tail, use the bootstrap. The eq-7 line reports the verdict. (See the Polygamma Bridge derivation.)
5. Heteroscedasticity and comparing two groups
A constant CV means the SD grows with the mean (SD ∝ mean) — heteroscedasticity. The log transform stabilises exactly that. For two groups, compare log-scale variances with
ρ = ln(1 + CV₂²) / ln(1 + CV₁²)
ρ ≈ 1 → equal log spread → pooled (Student) t on the logs; ρ far from 1 with unequal group sizes → Welch t on the logs. The result is a ratio of geometric means, exp( (ȳ₂ − ȳ₁) ± t · SE ) — for skewed data a ratio is meaningful where a difference is not.
5b. The gamma rescue, and reading either rescue in the natural domain
If the eq-7 check points to gamma rather than log-normal, the logarithm no longer normalizes the data, so the matched rescue is not the log-scale t but a gamma generalized linear model with a log link. It models the mean directly, so the contrast is a ratio of arithmetic means — exp(β) — with no back-transformation bias, and its variance function V(μ)=φμ² is exactly the mean-proportional spread the CV screen detected. Inference is by analysis of deviance (an F-test).
Carrying either rescue back to the natural domain. Each rescue is computed where the model is honest, then returned to original units:
p-value: tested on the log / link scale, but H₀ “ratio of means = 1” ⇔ “difference of log-means = 0” — so the rescued p is the natural-domain p (no transform).
interval: formed on the log / link scale, then exp( · ) → an asymmetric natural-domain ratio. Log-normal → ratio of geometric means; gamma → ratio of arithmetic means.
Example G (real data — serum bilirubin by ascites) shows both side by side: the naive raw-scale t gives p ≈ 5×10⁻⁴; the log-normal rescue returns a geometric-mean ratio 3.45 (95% CI 2.11–5.64, p ≈ 2×10⁻⁵), and the gamma rescue an arithmetic-mean ratio 3.33 (95% CI 1.90–5.81, p ≈ 9×10⁻⁷). Same data, two honest natural-domain effects — a median ratio and a mean ratio — each sharper than the naive interval.
6. When NOT to use the log route
If any value is zero or negative, ln() is undefined. If the data are skewed but not log-normal (the sy² vs ln(1+CV²) check disagrees, or the tail is heavier than log-normal), use a shape-free method: a bias-corrected and accelerated (BCa) bootstrap, or a wider distribution-free interval. You trade precision for honesty.
7. Worked example (Example A, n = 7)
Data: 4.2, 5.1, 6.0, 7.3, 9.8, 14.2, 31.0. t(6,.975) = 2.447.
Natural: mean 11.09, SD 9.41, CV 0.85, skew 2.0 → naive CI 11.09 ± 2.447×9.41/√7 = [2.38, 19.79] (symmetric).
Logs: ȳ = 2.173, sy = 0.690. Geometric mean exp(2.173)=8.79 → CI [4.64, 16.63] (asymmetric).
Arithmetic mean (Cox): exp(2.173+0.690²/2)=11.15 → CI [5.42, 22.91].
Check: sy² = 0.475 vs ln(1+0.85²) = 0.543 — close, so log-normal is reasonable.
In one sentence: small-N accuracy for skewed data is bought from a distributional assumption (log-normality) instead of from sample size (the CLT) — so the tool checks that assumption and tells you to fall back to a bootstrap when it fails.