Q: For a given size = n array, how to choose k element fairly?
A: This is a rather easy question if n is small.
What if the size is unknown and really big? This kinds of problem comes a lot when we sampling data entity for machine learning
Answer:
Reservoir Sampling
This is a useful trick when we aim to randomly pick a number of sample without knowing the total size. (If the size of pool is given as n, which is a constant, we simply pick sample under 1/n)
(1) Put first k element to output array (for temporary)
(2) For [ k+1: n] element:
element index = i
generate a random number [0: i]
if random number in [0: k]:
replace corresponding element
# 1/(i+1) chance to replace first element in output array
# 1/(i+1) chance to replace second element in output array
# etc...
(3) Return output array
Here is a trace example which gives a clear explanation. The size of pool is given here just to proof the chance for every element is equal.
[2] CSDN
http://blog.csdn.net/hackbuteer1/article/details/7971328
http://blog.csdn.net/hackbuteer1/article/details/7971328
沒有留言:
張貼留言