Report a bug
If you spot a problem with this page, click here to create a GitHub issue.
Improve this page
Quickly fork, edit online, and submit a pull request for this page. Requires a signed-in GitHub account. This works well for small changes. If you'd like to make larger changes you may want to consider using a local clone.

mir.stat.distribution.hypergeometric

This module contains algorithms for the Hypergeometric Distribution.
There are multiple alternative parameterizations of the Hypergeometric Distribution. The formulation in this module measures the number of draws (k) with a specific feature in n total draws without replacement from a population of size N such that K of these have the feature of interest.
Hypergeometric distribution functions can utilize different algorithms. The default is HypergeometricAlgo.direct, which can be more time-consuming for large values of the parameters. Additional algorithms are provided to the user to choose the trade-off between running time and accuracy.
License:
Authors:
John Michael Hall
enum HypergeometricAlgo: int;
Algorithms used to calculate hypergeometric distribution.
HypergeometricAlgo.direct can be more time-consuming for large values of the parameters. Additional algorithms are provided to the user to choose the trade-off between running time and accuracy.
direct
Direct
approxBinomial
Approximates hypergeometric distribution with binomial distribution.
approxPoisson
Approximates hypergeometric distribution with poisson distribution (uses gamma approximation, except for inverse CDF).
approxNormal
Approximates hypergeometric distribution with normal distribution.
approxNormalContinuityCorrection
Approximates hypergeometric distribution with normal distribution (including continuity correction).
pure nothrow @nogc @safe T hypergeometricPMF(T, HypergeometricAlgo hypergeometricAlgo = HypergeometricAlgo.direct)(const size_t k, const size_t N, const size_t K, const size_t n)
if (isFloatingPoint!T);

template hypergeometricPMF(HypergeometricAlgo hypergeometricAlgo = HypergeometricAlgo.direct)

template hypergeometricPMF(T, string hypergeometricAlgo)

template hypergeometricPMF(string hypergeometricAlgo)
Computes the hypergeometric probability mass function (PMF).
Additional algorithms may be provided for calculating PMF that allow trading off time and accuracy. If approxPoisson is provided, PoissonAlgo.gamma is assumed.
Parameters:
size_t k value to evaluate PMF (e.g. number of correct draws of object of interest)
size_t N total population size
size_t K number of relevant objects in population
size_t n number of draws
Examples:
import mir.test: shouldApprox;

0.hypergeometricPMF(7, 4, 3).shouldApprox == 0.02857143;
1.hypergeometricPMF(7, 4, 3).shouldApprox == 0.3428571;
2.hypergeometricPMF(7, 4, 3).shouldApprox == 0.5142857;
3.hypergeometricPMF(7, 4, 3).shouldApprox == 0.1142857;

// can also provide a template argument to change output type
static assert(is(typeof(hypergeometricPMF!float(3, 7, 4, 3)) == float));
Examples:
Alternate algorithms
import mir.test: shouldApprox;
import mir.math.common: exp;

// Can approximate hypergeometric with binomial distribution
20.hypergeometricPMF!"approxBinomial"(750_000, 250_000, 50).shouldApprox == exp(hypergeometricLPMF(20, 750_000, 250_000, 50));
// Can approximate hypergeometric with poisson distribution
20.hypergeometricPMF!"approxPoisson"(100_000, 100, 5_000).shouldApprox == exp(hypergeometricLPMF(20, 100_000, 100, 5_000));
// Can approximate hypergeometric with normal distribution
3_500.hypergeometricPMF!"approxNormal"(10_000, 7_500, 5_000).shouldApprox == exp(hypergeometricLPMF(3_500, 10_000, 7_500, 5_000));
// Can approximate hypergeometric with normal distribution (with continuity correction)
3_500.hypergeometricPMF!"approxNormalContinuityCorrection"(10_000, 7_500, 5_000).shouldApprox == exp(hypergeometricLPMF(3_500, 10_000, 7_500, 5_000));
pure nothrow @nogc @safe T fp_hypergeometricPMF(T = Fp!128)(const size_t k, const size_t N, const size_t K, const size_t n)
if (is(T == Fp!size, size_t size));
Computes the hypergeometric probability mass function (PMF) with extended floating point types (e.g. Fp!128), which provides additional accuracy for large values of k, N, K, or n.
Parameters:
size_t k value to evaluate PMF (e.g. number of correct draws of object of interest)
size_t N total population size
size_t K number of relevant objects in population
size_t n number of draws
Examples:
import mir.bignum.fp: Fp, fp_log;
import mir.test: shouldApprox;

enum size_t val = 1_000_000;
size_t N = val + 5;
size_t K = val / 2;
size_t n = val / 100;
0.fp_hypergeometricPMF(N, K, n).fp_log!double.shouldApprox == hypergeometricLPMF(0, N, K, n);
1.fp_hypergeometricPMF(N, K, n).fp_log!double.shouldApprox == hypergeometricLPMF(1, N, K, n);
2.fp_hypergeometricPMF(N, K, n).fp_log!double.shouldApprox == hypergeometricLPMF(2, N, K, n);
5.fp_hypergeometricPMF(N, K, n).fp_log!double.shouldApprox == hypergeometricLPMF(5, N, K, n);
(n / 2).fp_hypergeometricPMF(N, K, n).fp_log!double.shouldApprox == hypergeometricLPMF(n / 2, N, K, n);
(n - 5).fp_hypergeometricPMF(N, K, n).fp_log!double.shouldApprox == hypergeometricLPMF(n - 5, N, K, n);
(n - 2).fp_hypergeometricPMF(N, K, n).fp_log!double.shouldApprox == hypergeometricLPMF(n - 2, N, K, n);
(n - 1).fp_hypergeometricPMF(N, K, n).fp_log!double.shouldApprox == hypergeometricLPMF(n - 1, N, K, n);
n.fp_hypergeometricPMF(N, K, n).fp_log!double.shouldApprox == hypergeometricLPMF(n, N, K, n);
pure nothrow @nogc @safe T hypergeometricCDF(T, HypergeometricAlgo hypergeometricAlgo = HypergeometricAlgo.direct)(const size_t k, const size_t N, const size_t K, const size_t n)
if (isFloatingPoint!T);

template hypergeometricCDF(HypergeometricAlgo hypergeometricAlgo = HypergeometricAlgo.direct)

template hypergeometricCDF(T, string hypergeometricAlgo)

template hypergeometricCDF(string hypergeometricAlgo)
Computes the hypergeometric cumulative distribution function (CDF).
Additional algorithms may be provided for calculating CDF that allow trading off time and accuracy. If approxPoisson is provided, PoissonAlgo.gamma is assumed.
Setting hypergeometricAlgo = HypergeometricAlgo.direct results in direct summation being used, which can result in significant slowdowns for large values of k.
Parameters:
size_t k value to evaluate CDF (e.g. number of correct draws of object of interest)
size_t N total population size
size_t K number of relevant objects in population
size_t n number of draws
Examples:
import mir.test: shouldApprox;

0.hypergeometricCDF(7, 4, 3).shouldApprox == 0.02857143;
1.hypergeometricCDF(7, 4, 3).shouldApprox == 0.3714286;
2.hypergeometricCDF(7, 4, 3).shouldApprox == 0.8857143;
3.hypergeometricCDF(7, 4, 3).shouldApprox == 1.0;

// can also provide a template argument to change output type
static assert(is(typeof(hypergeometricCDF!float(3, 7, 4, 3)) == float));
Examples:
Alternate algorithms
import mir.test: shouldApprox;

// Can approximate hypergeometric with binomial distribution
20.hypergeometricCDF!"approxBinomial"(750_000, 250_000, 50).shouldApprox(1e-2) == 0.8740839;
// Can approximate hypergeometric with poisson distribution
8.hypergeometricCDF!"approxPoisson"(100_000, 100, 5_000).shouldApprox(1e-2) == 0.9370063;
// Can approximate hypergeometric with normal distribution
3_750.hypergeometricCDF!"approxNormal"(10_000, 7_500, 5_000).shouldApprox(2e-2) == 0.5092122;
// Can approximate hypergeometric with normal distribution
3_750.hypergeometricCDF!"approxNormalContinuityCorrection"(10_000, 7_500, 5_000).shouldApprox(1e-2) == 0.5092122;
pure nothrow @nogc @safe T hypergeometricCCDF(T, HypergeometricAlgo hypergeometricAlgo = HypergeometricAlgo.direct)(const size_t k, const size_t N, const size_t K, const size_t n)
if (isFloatingPoint!T);

template hypergeometricCCDF(HypergeometricAlgo hypergeometricAlgo = HypergeometricAlgo.direct)

template hypergeometricCCDF(T, string hypergeometricAlgo)

template hypergeometricCCDF(string hypergeometricAlgo)
Computes the hypergeometric complementary cumulative distribution function (CCDF).
Additional algorithms may be provided for calculating CCDF that allow trading off time and accuracy. If approxPoisson is provided, PoissonAlgo.gamma is assumed.
Setting hypergeometricAlgo = HypergeometricAlgo.direct results in direct summation being used, which can result in significant slowdowns for large values of k.
Parameters:
size_t k value to evaluate CCDF (e.g. number of correct draws of object of interest)
size_t N total population size
size_t K number of relevant objects in population
size_t n number of draws
Examples:
import mir.test: shouldApprox;

0.hypergeometricCCDF(7, 4, 3).shouldApprox == 0.9714286;
1.hypergeometricCCDF(7, 4, 3).shouldApprox == 0.6285714;
2.hypergeometricCCDF(7, 4, 3).shouldApprox == 0.1142857;
3.hypergeometricCCDF(7, 4, 3).shouldApprox == 0.0;

// can also provide a template argument to change output type
static assert(is(typeof(hypergeometricCCDF!float(3, 7, 4, 3)) == float));
Examples:
Alternate algorithms
import mir.test: shouldApprox;
import mir.math.common: exp;

// Can approximate hypergeometric with binomial distribution
20.hypergeometricCCDF!"approxBinomial"(750_000, 250_000, 50).shouldApprox(1e-2) == 0.1259161;
// Can approximate hypergeometric with poisson distribution
8.hypergeometricCCDF!"approxPoisson"(100_000, 100, 5_000).shouldApprox(1e-1) == 0.0629937;
// Can approximate hypergeometric with normal distribution
3_750.hypergeometricCCDF!"approxNormal"(10_000, 7_500, 5_000).shouldApprox(2e-2) == 0.4907878;
// Can approximate hypergeometric with normal distribution
3_750.hypergeometricCCDF!"approxNormalContinuityCorrection"(10_000, 7_500, 5_000).shouldApprox(1e-2) == 0.4907878;
pure nothrow @nogc @safe size_t hypergeometricInvCDF(T, HypergeometricAlgo hypergeometricAlgo = HypergeometricAlgo.direct)(const T p, const size_t N, const size_t K, const size_t n)
if (isFloatingPoint!T);

template hypergeometricInvCDF(T, string hypergeometricAlgo)

template hypergeometricInvCDF(string hypergeometricAlgo)
Computes the hypergeometric inverse cumulative distribution function (InvCDF).
Additional algorithms may be provided for calculating InvCDF that allow trading off time and accuracy. If approxPoisson is provided, PoissonAlgo.direct is assumed. This is different from other functions that use PoissonAlgo.gamma since in this case it does not provide the same result.
Setting hypergeometricAlgo = HypergeometricAlgo.direct results in direct summation being used, which can result in significant slowdowns for large values of k.
Parameters:
T p value to evaluate InvCDF
size_t N total population size
size_t K number of relevant objects in population
size_t n number of draws
Examples:
import mir.test: should;

0.0.hypergeometricInvCDF(40, 15, 20).should == 0;
0.1.hypergeometricInvCDF(40, 15, 20).should == 6;
0.2.hypergeometricInvCDF(40, 15, 20).should == 6;
0.3.hypergeometricInvCDF(40, 15, 20).should == 7;
0.4.hypergeometricInvCDF(40, 15, 20).should == 7;
0.5.hypergeometricInvCDF(40, 15, 20).should == 7;
0.6.hypergeometricInvCDF(40, 15, 20).should == 8;
0.7.hypergeometricInvCDF(40, 15, 20).should == 8;
0.8.hypergeometricInvCDF(40, 15, 20).should == 9;
0.9.hypergeometricInvCDF(40, 15, 20).should == 9;
1.0.hypergeometricInvCDF(40, 15, 20).should == 15;
Examples:
Alternate algorithms
import mir.test: shouldApprox;
import mir.math.common: exp;

// Can approximate hypergeometric with binomial distribution
0.5.hypergeometricInvCDF!"approxBinomial"(750_000, 250_000, 50).shouldApprox!double == 17;
// Can approximate hypergeometric with poisson distribution
0.4.hypergeometricInvCDF!"approxPoisson"(100_000, 100, 5_000).shouldApprox!double == 4;
// Can approximate hypergeometric with normal distribution
0.6.hypergeometricInvCDF!"approxNormal"(10_000, 7_500, 5_000).shouldApprox!double == 3755;
// Can approximate hypergeometric with normal distribution
0.6.hypergeometricInvCDF!"approxNormalContinuityCorrection"(10_000, 7_500, 5_000).shouldApprox!double(1) == 3755;
pure nothrow @nogc @safe T hypergeometricLPMF(T = double)(const size_t k, const size_t N, const size_t K, const size_t n)
if (isFloatingPoint!T);
Computes the hypergeometric log probability mass function (LPMF).
Parameters:
size_t k value to evaluate LPMF (e.g. number of correct draws of object of interest)
size_t N total population size
size_t K number of relevant objects in population
size_t n number of draws
Examples:
import mir.math.common: log;
import mir.test: shouldApprox;

0.hypergeometricLPMF(7, 4, 3).shouldApprox == log(0.02857143);
1.hypergeometricLPMF(7, 4, 3).shouldApprox == log(0.3428571);
2.hypergeometricLPMF(7, 4, 3).shouldApprox == log(0.5142857);
3.hypergeometricLPMF(7, 4, 3).shouldApprox == log(0.1142857);