A weighted version of random.choice
我需要写一个加权版本的random.choice(列表中的每个元素都有不同的被选择概率)。 这是我想出的:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 | def weightedChoice(choices): """Like random.choice, but each element can have a different chance of being selected. choices can be any iterable containing iterables with two items each. Technically, they can have more than two items, the rest will just be ignored. The first item is the thing being chosen, the second item is its weight. The weights can be any numeric values, what matters is the relative differences between them. """ space = {} current = 0 for choice, weight in choices: if weight > 0: space[current] = choice current += weight rand = random.uniform(0, current) for key in sorted(space.keys() + [current]): if rand < key: return choice choice = space[key] return None |
对于我来说,此功能似乎过于复杂且难看。 我希望这里的每个人都可以提出一些改进建议或替代方法。 对于我来说,效率并不像代码的清洁度和可读性那么重要。
从1.7.0版开始,NumPy具有
1 2 3 | from numpy.random import choice draw = choice(list_of_candidates, number_of_items_to_pick, p=probability_distribution) |
请注意,
从Python3.6开始,
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 | Python 3.6.1 (v3.6.1:69c0db5050, Mar 21 2017, 01:21:04) Type 'copyright', 'credits' or 'license' for more information IPython 6.0.0 -- An enhanced Interactive Python. Type '?' for help. In [1]: import random In [2]: random.choices( ...: population=[['a','b'], ['b','a'], ['c','b']], ...: weights=[0.2, 0.2, 0.6], ...: k=10 ...: ) Out[2]: [['c', 'b'], ['c', 'b'], ['b', 'a'], ['c', 'b'], ['c', 'b'], ['b', 'a'], ['c', 'b'], ['b', 'a'], ['c', 'b'], ['c', 'b']] |
人们还提到,有
因此,如果您拥有3.6.x Python,则基本上可以通过内置的
更新:
正如@roganjosh亲切提及的那样,
Return a
k sized list of elements chosen from the population with replacement.
@ ronan-paix?o的出色回答指出
1 2 3 4 5 6 7 8 9 | def weighted_choice(choices): total = sum(w for c, w in choices) r = random.uniform(0, total) upto = 0 for c, w in choices: if upto + w >= r: return c upto += w assert False,"Shouldn't get here" |
累积分布。
浮点
使用bisect.bisect进行分布
如http://docs.python.org/dev/library/bisect.html#other-examples中的示例所示。
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 | from random import random from bisect import bisect def weighted_choice(choices): values, weights = zip(*choices) total = 0 cum_weights = [] for w in weights: total += w cum_weights.append(total) x = random() * total i = bisect(cum_weights, x) return values[i] >>> weighted_choice([("WHITE",90), ("RED",8), ("GREEN",2)]) 'WHITE' |
如果您需要做出多个选择,请将其拆分为两个函数,一个用于构建累加权重,另一个用于平分至随机点。
如果您不介意使用numpy,则可以使用numpy.random.choice。
例如:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 | import numpy items = [["item1", 0.2], ["item2", 0.3], ["item3", 0.45], ["item4", 0.05] elems = [i[0] for i in items] probs = [i[1] for i in items] trials = 1000 results = [0] * len(items) for i in range(trials): res = numpy.random.choice(items, p=probs) #This is where the item is selected! results[items.index(res)] += 1 results = [r / float(trials) for r in results] print"item\texpected\tactual" for i in range(len(probs)): print"%s\t%0.4f\t%0.4f" % (items[i], probs[i], results[i]) |
如果您知道需要事先选择多少个选项,则可以像这样循环执行:
1 | numpy.random.choice(items, trials, p=probs) |
粗略,但可能足够:
1 2 | import random weighted_choice = lambda s : random.choice(sum(([v]*wt for v,wt in s),[])) |
它行得通吗?
1 2 3 4 5 6 7 8 9 10 11 | # define choices and relative weights choices = [("WHITE",90), ("RED",8), ("GREEN",2)] # initialize tally dict tally = dict.fromkeys(choices, 0) # tally up 1000 weighted choices for i in xrange(1000): tally[weighted_choice(choices)] += 1 print tally.items() |
印刷品:
1 | [('WHITE', 904), ('GREEN', 22), ('RED', 74)] |
假设所有权重都是整数。他们不必相加100,我只是这样做以使测试结果更易于解释。 (如果权重是浮点数,则将它们全部乘以10,直到所有权重> =1。)
1 2 3 4 | weights = [.6, .2, .001, .199] while any(w < 1.0 for w in weights): weights = [w*10 for w in weights] weights = map(int, weights) |
如果您有加权词典而不是列表,则可以这样写
1 2 | items = {"a": 10,"b": 5,"c": 1 } random.choice([k for k in items for dummy in range(items[k])]) |
请注意,
从Python
random.choices(population, weights=None, *, cum_weights=None, k=1)
-
人口:
list 包含独特的观察结果。 (如果为空,则引发IndexError ) -
权重:更精确地进行选择所需的相对权重。
-
cum_weights:进行选择所需的累积权重。
-
k:要输出的
list 的大小(len )。 (默认len()=1 )
注意事项:
1)它使用加权抽样进行替换,因此抽取的项目将在以后被替换。权重序列中的值本身并不重要,但它们的相对比率却无关紧要。
与
1 2 3 4 5 6 7 8 9 10 | >>> import random # weights being integers >>> random.choices(["white","green","red"], [12, 12, 4], k=10) ['green', 'red', 'green', 'white', 'white', 'white', 'green', 'white', 'red', 'white'] # weights being floats >>> random.choices(["white","green","red"], [.12, .12, .04], k=10) ['white', 'white', 'green', 'green', 'red', 'red', 'white', 'green', 'white', 'green'] # weights being fractions >>> random.choices(["white","green","red"], [12/100, 12/100, 4/100], k=10) ['green', 'green', 'white', 'red', 'green', 'red', 'white', 'green', 'green', 'green'] |
2)如果既未指定权重也未指定cum_weights,则选择的可能性均等。如果提供了权重序列,则其长度必须与总体序列的长度相同。
同时指定权重和cum_weights会引发
1 2 | >>> random.choices(["white","green","red"], k=10) ['white', 'white', 'green', 'red', 'red', 'red', 'white', 'white', 'white', 'green'] |
3)cum_weights通常是
From the documentation linked:
Internally, the relative weights are converted to cumulative weights
before making selections, so supplying the cumulative weights saves
work.
因此,为人为的案例提供
这是Python 3.6标准库中包含的版本:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 | import itertools as _itertools import bisect as _bisect class Random36(random.Random): "Show the code included in the Python 3.6 version of the Random class" def choices(self, population, weights=None, *, cum_weights=None, k=1): """Return a k sized list of population elements chosen with replacement. If the relative weights or cumulative weights are not specified, the selections are made with equal probability. """ random = self.random if cum_weights is None: if weights is None: _int = int total = len(population) return [population[_int(random() * total)] for i in range(k)] cum_weights = list(_itertools.accumulate(weights)) elif weights is not None: raise TypeError('Cannot specify both weights and cumulative weights') if len(cum_weights) != len(population): raise ValueError('The number of weights does not match the population') bisect = _bisect.bisect total = cum_weights[-1] return [population[bisect(cum_weights, random() * total)] for i in range(k)] |
来源:https://hg.python.org/cpython/file/tip/Lib/random.py#l340
我要求选择的总和是1,但这还是可行的
1 2 3 4 5 6 7 8 9 10 11 12 13 | def weightedChoice(choices): # Safety check, you can remove it for c,w in choices: assert w >= 0 tmp = random.uniform(0, sum(c for c,w in choices)) for choice,weight in choices: if tmp < weight: return choice else: tmp -= weight raise ValueError('Negative values in input') |
我可能为时已晚,无法提供任何有用的信息,但这是一个简单,简短且非常有效的代码段:
1 2 3 4 5 6 7 8 | def choose_index(probabilies): cmf = probabilies[0] choice = random.random() for k in xrange(len(probabilies)): if choice <= cmf: return k else: cmf += probabilies[k+1] |
无需对您的概率进行排序或使用cmf创建向量,并且一旦找到选择就终止。内存:O(1),时间:O(N),平均运行时间约为N / 2。
如果您有权重,只需添加一行:
1 2 3 4 5 6 7 8 9 | def choose_index(weights): probabilities = weights / sum(weights) cmf = probabilies[0] choice = random.random() for k in xrange(len(probabilies)): if choice <= cmf: return k else: cmf += probabilies[k+1] |
如果您的加权选择列表相对静态,并且您希望频繁采样,则可以执行一个O(N)预处理步骤,然后使用此相关答案中的函数在O(1)中进行选择。
1 2 3 4 5 | # run only when `choices` changes. preprocessed_data = prep(weight for _,weight in choices) # O(1) selection value = choices[sample(preprocessed_data)][0] |
1 2 3 | import numpy as np w=np.array([ 0.4, 0.8, 1.6, 0.8, 0.4]) np.random.choice(w, p=w/sum(w)) |
这是使用numpy的weighted_choice的另一个版本。传递权重向量,它将返回一个包含1的0数组,指示选择了哪个bin。该代码默认只进行一次抽奖,但是您可以传递要进行的抽奖次数,并且将返回每个抽奖箱的计数。
如果权重向量的总和不等于1,它将被归一化。
1 2 3 4 5 6 7 8 9 10 11 12 13 | import numpy as np def weighted_choice(weights, n=1): if np.sum(weights)!=1: weights = weights/np.sum(weights) draws = np.random.random_sample(size=n) weights = np.cumsum(weights) weights = np.insert(weights,0,0.0) counts = np.histogram(draws, bins=weights) return(counts[0]) |
通用解决方案:
1 2 3 4 5 6 7 8 | import random def weighted_choice(choices, weights): total = sum(weights) treshold = random.uniform(0, total) for k, weight in enumerate(weights): total -= weight if total < treshold: return choices[k] |
这取决于您要对分布进行采样的次数。
假设您要采样K次分布。然后,当
就我而言,我需要对同一分布进行多次采样,采样次数为10 ^ 3,其中n为10 ^ 6。我使用了以下代码,该代码预先计算了累积分布并在
1 2 3 4 5 6 7 8 9 10 11 12 13 | import numpy as np n,k = 10**6,10**3 # Create dummy distribution a = np.array([i+1 for i in range(n)]) p = np.array([1.0/n]*n) cfd = p.cumsum() for _ in range(k): x = np.random.uniform() idx = cfd.searchsorted(x, side='right') sampled_element = a[idx] |
我查看了所指向的其他线程,并提出了我的编码样式的这种变体,它返回用于计算目的的选择索引,但是返回字符串很简单(注释返回替代):
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 | import random import bisect try: range = xrange except: pass def weighted_choice(choices): total, cumulative = 0, [] for c,w in choices: total += w cumulative.append((total, c)) r = random.uniform(0, total) # return index return bisect.bisect(cumulative, (r,)) # return item string #return choices[bisect.bisect(cumulative, (r,))][0] # define choices and relative weights choices = [("WHITE",90), ("RED",8), ("GREEN",2)] tally = [0 for item in choices] n = 100000 # tally up n weighted choices for i in range(n): tally[weighted_choice(choices)] += 1 print([t/sum(tally)*100 for t in tally]) |
为random.choice()提供预加权列表:
解决方案和测试:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 | import random options = ['a', 'b', 'c', 'd'] weights = [1, 2, 5, 2] weighted_options = [[opt]*wgt for opt, wgt in zip(options, weights)] weighted_options = [opt for sublist in weighted_options for opt in sublist] print(weighted_options) # test counts = {c: 0 for c in options} for x in range(10000): counts[random.choice(weighted_options)] += 1 for opt, wgt in zip(options, weights): wgt_r = counts[opt] / 10000 * sum(weights) print(opt, counts[opt], wgt, wgt_r) |
输出:
1 2 3 4 5 | ['a', 'b', 'b', 'c', 'c', 'c', 'c', 'c', 'd', 'd'] a 1025 1 1.025 b 1948 2 1.948 c 5019 5 5.019 d 2008 2 2.008 |
一种方法是对所有权重的总和进行随机化,然后将这些值用作每个变量的极限点。这是生成器的粗略实现。
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 | def rand_weighted(weights): """ Generator which uses the weights to generate a weighted random values """ sum_weights = sum(weights.values()) cum_weights = {} current_weight = 0 for key, value in sorted(weights.iteritems()): current_weight += value cum_weights[key] = current_weight while True: sel = int(random.uniform(0, 1) * sum_weights) for key, value in sorted(cum_weights.iteritems()): if sel < value: break yield key |
使用numpy
1 2 | def choice(items, weights): return items[np.argmin((np.cumsum(weights) / sum(weights)) < np.random.rand())] |
我需要快速,非常简单地完成这样的工作,从寻找想法开始,我终于建立了这个模板。这个想法是从api接收json形式的加权值,这里是由dict模拟的。
然后将其转换为一个列表,其中每个值均按其权重成比例地重复,只需使用random.choice从列表中选择一个值即可。
我尝试了运行10、100和1000次迭代。分布似乎很稳定。
1 2 3 4 5 6 | def weighted_choice(weighted_dict): """Input example: dict(apples=60, oranges=30, pineapples=10)""" weight_list = [] for key in weighted_dict.keys(): weight_list += [key] * weighted_dict[key] return random.choice(weight_list) |
我不喜欢那些语法。我真的只想指定项目是什么,每个项目的权重是什么。我意识到我本可以使用
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 | import random, string from numpy import cumsum class randomChoiceWithProportions: ''' Accepts a dictionary of choices as keys and weights as values. Example if you want a unfair dice: choiceWeightDic = {"1":0.16666666666666666,"2": 0.16666666666666666,"3": 0.16666666666666666 ,"4": 0.16666666666666666,"5": .06666666666666666,"6": 0.26666666666666666} dice = randomChoiceWithProportions(choiceWeightDic) samples = [] for i in range(100000): samples.append(dice.sample()) # Should be close to .26666 samples.count("6")/len(samples) # Should be close to .16666 samples.count("1")/len(samples) ''' def __init__(self, choiceWeightDic): self.choiceWeightDic = choiceWeightDic weightSum = sum(self.choiceWeightDic.values()) assert weightSum == 1, 'Weights sum to ' + str(weightSum) + ', not 1.' self.valWeightDict = self._compute_valWeights() def _compute_valWeights(self): valWeights = list(cumsum(list(self.choiceWeightDic.values()))) valWeightDict = dict(zip(list(self.choiceWeightDic.keys()), valWeights)) return valWeightDict def sample(self): num = random.uniform(0,1) for key, val in self.valWeightDict.items(): if val >= num: return key |