2018年2月8日 星期四

Algorithm :: Reservoir Sampling


Q: For a given size = n array, how to choose k element fairly?
A: This is a rather easy question if n is small.

What if the size is unknown and really big? This kinds of problem comes a lot when we sampling data entity for machine learning
Answer:
Reservoir Sampling
This is a useful trick when we aim to randomly pick a number of sample without knowing the total size. (If the size of pool is given as n, which is a constant, we simply pick sample under 1/n)

(1) Put first k element to output array (for temporary)
(2) For [ k+1: n] element:
            element index = i
            generate a random number [0: i]
            if random number in [0: k]:
                replace corresponding element
                # 1/(i+1) chance to replace first element in output array
                # 1/(i+1) chance to replace second element in output array
                # etc...
(3) Return output array

Here is a trace example which gives a clear explanation. The size of pool is given here just to proof the chance for every element is equal.
(1) Choose 1(k=1) from size 8 array (n=8)





























(2) Choose 2(k=1) from size 8 array (n=8)

[2] CSDN
http://blog.csdn.net/hackbuteer1/article/details/7971328

沒有留言:

張貼留言