Skip to content

Navigation Menu

Sign in
Appearance settings

Search code, repositories, users, issues, pull requests...

Provide feedback

We read every piece of feedback, and take your input very seriously.

Saved searches

Use saved searches to filter your results more quickly

Appearance settings

ciphergoth/sansreplace

Open more actions menu

Repository files navigation

Sampling without replacement

The problem of sampling without replacement: given integers n, k such that 0 ≤ kn, return k distinct nonnegative integers < n, choosing at random such that every possible result is equally likely. One way to get a precise understanding of the problem is to look at the simplest algorithm for solving it, via these Python or C++ implementations.

Here, I propose an algorithm I call "Cardchoose" for this problem which as far as I know is novel, and which in C++ outperforms all other solutions to either problem for most (n, k) inputs. The only data structure used is the output vector, so no extra memory is needed; it runs in O(k log k) time. I've implemented this and other algorithms in C++ and Python to compare their performance.

In my C++ tests, cardchoose outperforms rejectionsample, floydf2, and hsel for all values of n and k. quadraticreject is best where k < 100 or so, and select and iterativechoose perform well when k/n is large. cardchoose always performs within an acceptable factor of other algorithms.

def sorted_choose(n, k):
    "k distinct integers 0 <= x < n, sorted"
    t = n - k + 1
    d = [None] * k
    for i in range(k):
        r = random.randrange(t + i)
        if r < t:
            d[i] = r
        else:
            d[i] = d[r - t]
    d.sort()
    for i in range(k):
        d[i] += i
    return d

graph

This is not an officially supported Google product.

About

Sampling without replacement algorithms

Resources

License

Contributing

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published
Morty Proxy This is a proxified and sanitized view of the page, visit original site.