last updated Mar 19, 2025
Table of Contents
AlphaMart (aka ALPHA) is a risk-measuring function that adapts to the drawn sample as it is made. It estimates the reported winner’s share of the vote before the jth card is drawn from the j-1 cards already in the sample. The estimator can be any measurable function of the first j − 1 draws, for example a simple truncated shrinkage estimate, described below. ALPHA generalizes BRAVO to situations where the population {xj} is not necessarily binary, but merely nonnegative and bounded. ALPHA works for sampling with or without replacement, with or without weights, while BRAVO is specifically for IID sampling with replacement.
θ true population mean
Xk the kth random sample drawn from the population.
X^j (X1 , . . . , Xj) is the jth sequence of samples.
µj E(Xj | X^j−1 ) computed under the null hypothesis that θ = 1/2.
"expected value of the next sample's assorted value (Xj) under the null hypothosis".
With replacement, its 1/2.
Without replacement, its the value that moves the mean to 1/2.
η0 an estimate of the true mean before sampling .
ηj an estimate of the true mean, using X^j-1 (not using Xj),
estimate of what the sampled mean of X^j is (not using Xj) ??
This is the "estimator function".
Let ηj = ηj (X^j−1 ), j = 1, . . ., be a "predictable sequence": ηj may depend on X^j−1, but not on Xk for k ≥ j.
Tj ALPHA nonnegative supermartingale (Tj)_j∈N starting at 1
E(Tj | X^j-1 ) = Tj-1, under the null hypothesis that θj = µj (7)
E(Tj | X^j-1 ) < Tj-1, if θ < µ (8)
P{∃j : Tj ≥ α−1 } ≤ α, if θ < µ (9) (follows from Ville's inequality)
BRAVO is based on Wald’s sequential probability ratio test (SPRT) of the simple hypothesis θ = µ against a simple alternative θ = η from IID Bernoulli(θ) observations.
ALPHA is a simple adaptive extension of BRAVO. It is motivated by the SPRT for the Bernoulli and its optimality when the simple alternative is true.
BRAVO is ALPHA with the following restrictions:
Bravo: Probability of drawing y if theta=n over probability of drawing y if theta=m:
y*(n/m) + (1-y)*(1-n)/(1-m)
makes sense where y is 0 or 1, so one term or the other vanishes.
Alpha:
Step 1: Replace discrete y with continuous x:
(1a) x*(n/m) + (1-x)*(1-n)/(1-m)
Its not obvious what this is now. Still the probability ratio?? Should you really use
(1b) (x*n + (1-x)*(1-n)) / (x*m + (1-x)*(1-m))
?? Turns out these equations are the same when mu = 1/2. then mu = (1-mu) and
(1a) x*n/m + (1-x)*(1-n)/(1-m) = (x*n + (1-x)*(1-n) / 2
(1b denominator) (x*m + (1-x)*(1-m)) = (x + (1-x))/2 = 1/2
(1c) (x*n + (1-x)(1-n)) / (1/2)
(I think they are not identical for the general case of m != 1/2 and m != n.
Step 2: Generalize range [0,1] to [0,upper]
I think you need to replace x with x/u, (1-x) with (u-x)/u, n with n/u, etc
So then (1a) becomes
(2a) (x/u*(n/m) + ((u-x)/u)*(u-n)/(u-m))
= (x*n/m + (u-x)*(u-n)/(u-m))/u
But 1b becomes
(2b) (x/u)*(n/u) + ((u-x)/u)*((u-n)/u)) / (x/u)*(m/u) + ((u-x)/u)*((u-m)/u))
= (x*n + (u-x)*(u-n)) / (x*m + (u-x)*(u-m))
note the lack of division by u, since every term has a u*u denominator, so all those cancel out. when m=1/2, 1c becomes
(2c) (x*n + (u-x)*(u-n)) / (x*m + (u-x)*(u-m)) = (x*n + (u-x)*(u-n)) / (u/2)
Step 3: Use estimated nj instead of fixed n, and mj instead of fixed m = 1/2 when sampling without replacement:
(3a) (xj * (nj/mj) + (u-xj) * (u-nj)/(u-mj)) / u
(3b) (xj*nj + (u-xj)*(u-nj)) / (x*mj + (u-xj)*(u-mj))
We need E(Xj | X^j−1 ) computed with the null hypothosis that θ == µ == 1/2. |
Sampling with replacement means that this value is always µ == 1/2.
For sampling without replacement from a population with mean µ, after draw j - 1, the mean of the remaining numbers is
(N * µ − Sum(X^j-1)/(N − j + 1).
If this ever becomes less than zero, the null hypothesis is certainly false. When allowed to sample all N values without replacement, eventually this value becomes less than zero.
The only settable parameter for the TruncShrink funcition function is d, which is the weighting between the initial guess at the population mean (eta0) and the running mean of the sampled data:
estTheta_i = (d*eta0 + sampleSum_i) / (d + sampleSize_i)
This trades off smaller sample sizes when theta = eta0 (large d) vs quickly adapting to when theta < eta0 (smaller d).
See section 2.5.2 of ALPHA for a function with parameters eta0, c and d.
See SHANGRLA shrink_trunc() in NonnegMean.py for an updated version with additional parameter f.
sample mean is shrunk towards eta, with relative weight d compared to a single observation,
then that combination is shrunk towards u, with relative weight f/(stdev(x)).
The result is truncated above at u*(1-eps) and below at m_j+etaj(c,j)
Shrinking towards eta stabilizes the sample mean as an estimate of the population mean.
Shrinking towards u takes advantage of low-variance samples to grow the test statistic more rapidly.
// Choosing epsi . To allow the estimated winner’s share ηi to approach √ µi as the sample grows
// (if the sample mean approaches µi or less), we shall take epsi := c/sqrt(d + i − 1) for a nonnegative constant c,
// for instance c = (η0 − µ)/2.
// The estimate ηi is thus the sample mean, shrunk towards η0 and truncated to the interval [µi + ǫi , 1), where ǫi → 0 as the sample size grows.
val weighted = ((d * eta0 + sampleSum) / (d + lastj - 1) + u * f / sdj3) / (1 + f / sdj3)
val npmax = max( weighted, mean2 + c / sqrt((d + lastj - 1).toDouble())) // 2.5.2 "choosing ǫi"
return min(u * (1 - eps), npmax)
To use BettingMart rather than AlphaMart, we just have to set
λ_i = (estTheta_i/µ_i − 1) / (upper − µ_i)
where upper is the upper bound of the assorter (1 for plurality, 1/(2f) for supermajority), and µ_i := E(Xi | Xi−1) as above. |
See the plots here: 9/4/24 plots
nvotes sampled vs theta = winning percent
N | 0.510 | 0.520 | 0.530 | 0.540 | 0.550 | 0.575 | 0.600 | 0.650 | 0.7 |
---|---|---|---|---|---|---|---|---|---|
1000 | 897 | 726 | 569 | 446 | 340 | 201 | 128 | 60 | 36 |
5000 | 3447 | 1948 | 1223 | 799 | 527 | 256 | 145 | 68 | 38 |
10000 | 5665 | 2737 | 1430 | 871 | 549 | 266 | 152 | 68 | 38 |
20000 | 8456 | 3306 | 1546 | 926 | 590 | 261 | 154 | 65 | 38 |
50000 | 12225 | 3688 | 1686 | 994 | 617 | 263 | 155 | 67 | 37 |
stddev samples vs theta
N | 0.510 | 0.520 | 0.530 | 0.540 | 0.550 | 0.575 | 0.600 | 0.650 | 0.700 |
---|---|---|---|---|---|---|---|---|---|
1000 | 119.444 | 176.939 | 195.837 | 176.534 | 153.460 | 110.204 | 78.946 | 40.537 | 24.501 |
5000 | 1008.455 | 893.249 | 669.987 | 478.499 | 347.139 | 176.844 | 101.661 | 52.668 | 28.712 |
10000 | 2056.201 | 1425.911 | 891.215 | 583.694 | 381.797 | 199.165 | 113.188 | 52.029 | 27.933 |
20000 | 3751.976 | 2124.064 | 1051.194 | 656.632 | 449.989 | 190.791 | 123.333 | 47.084 | 28.173 |
50000 | 6873.319 | 2708.147 | 1274.291 | 740.712 | 475.265 | 194.538 | 130.865 | 51.086 | 26.439 |
From ALPHA (p 9)
Choosing d. As d → ∞, the sample size for ALPHA approaches that of BRAVO, for
binary data. The larger d is, the more strongly anchored the estimate is to the reported vote
shares, and the smaller the penalty ALPHA pays when the reported results are exactly correct.
Using a small value of d is particularly helpful when the true population mean is far from the
reported results. The smaller d is, the faster the method adapts to the true population mean,
but the higher the variance is. Whatever d is, the relative weight of the reported vote shares
decreases as the sample size increases.
See output of DiffMeans.kt and PlotDiffMeans.kt. This is done for each value of N and theta.
A few representative plots are at meanDiff plots
Notes: