Skip to content

Navigation Menu

Sign in
Appearance settings

Search code, repositories, users, issues, pull requests...

Provide feedback

We read every piece of feedback, and take your input very seriously.

Saved searches

Use saved searches to filter your results more quickly

Appearance settings
Merged
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
36 changes: 36 additions & 0 deletions 36 Doc/library/statistics.rst
Original file line number Diff line number Diff line change
Expand Up @@ -772,6 +772,42 @@ Carlo simulation <https://en.wikipedia.org/wiki/Monte_Carlo_method>`_:
>>> quantiles(map(model, X, Y, Z)) # doctest: +SKIP
[1.4591308524824727, 1.8035946855390597, 2.175091447274739]

Normal distributions can be used to approximate `Binomial
distributions <http://mathworld.wolfram.com/BinomialDistribution.html>`_
when the sample size is large and when the probability of a successful
trial is near 50%.

For example, an open source conference has 750 attendees and two rooms with a
500 person capacity. There is a talk about Python and another about Ruby.
In previous conferences, 65% of the attendees preferred to listen to Python
talks. Assuming the population preferences haven't changed, what is the
probability that the rooms will stay within their capacity limits?

.. doctest::

>>> n = 750 # Sample size
>>> p = 0.65 # Preference for Python
>>> q = 1.0 - p # Preference for Ruby
>>> k = 500 # Room capacity

>>> # Approximation using the cumulative normal distribution
>>> from math import sqrt
>>> round(NormalDist(mu=n*p, sigma=sqrt(n*p*q)).cdf(k + 0.5), 4)
0.8402

>>> # Solution using the cumulative binomial distribution
>>> from math import comb, fsum
>>> round(fsum(comb(n, r) * p**r * q**(n-r) for r in range(k+1)), 4)
0.8402

>>> # Approximation using a simulation
>>> from random import seed, choices
>>> seed(8675309)
>>> def trial():
... return choices(('Python', 'Ruby'), (p, q), k=n).count('Python')
>>> mean(trial() <= k for i in range(10_000))
0.8398

Normal distributions commonly arise in machine learning problems.

Wikipedia has a `nice example of a Naive Bayesian Classifier
Expand Down
Morty Proxy This is a proxified and sanitized view of the page, visit original site.