How to make Pareto Chart in python?
帕累托(Pareto)是Excel和Tableu中非常流行的diagarm。 在excel中,我们可以轻松地绘制帕累托图,但是我发现没有简单的方法可以在Python中绘制该图。
我有一个这样的熊猫数据框:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 | import numpy as np import pandas as pd import seaborn as sns import matplotlib.pyplot as plt df = pd.DataFrame({'country': [177.0, 7.0, 4.0, 2.0, 2.0, 1.0, 1.0, 1.0]}) df.index = ['USA', 'Canada', 'Russia', 'UK', 'Belgium', 'Mexico', 'Germany', 'Denmark'] print(df) country USA 177.0 Canada 7.0 Russia 4.0 UK 2.0 Belgium 2.0 Mexico 1.0 Germany 1.0 Denmark 1.0 |
如何绘制帕累托图?
使用熊猫,seaborn,matplotlib等?
到目前为止,我已经能够制作降序条形图。
但是仍然需要将累积总和线图放在它们之上。
我的尝试:

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 | import pandas as pd import matplotlib.pyplot as plt from matplotlib.ticker import PercentFormatter df = pd.DataFrame({'country': [177.0, 7.0, 4.0, 2.0, 2.0, 1.0, 1.0, 1.0]}) df.index = ['USA', 'Canada', 'Russia', 'UK', 'Belgium', 'Mexico', 'Germany', 'Denmark'] df = df.sort_values(by='country',ascending=False) df["cumpercentage"] = df["country"].cumsum()/df["country"].sum()*100 fig, ax = plt.subplots() ax.bar(df.index, df["country"], color="C0") ax2 = ax.twinx() ax2.plot(df.index, df["cumpercentage"], color="C1", marker="D", ms=7) ax2.yaxis.set_major_formatter(PercentFormatter()) ax.tick_params(axis="y", colors="C0") ax2.tick_params(axis="y", colors="C1") plt.show() |
另一种方法是使用
1 2 3 4 5 | df['pareto'] = 100 *df.country.cumsum() / df.country.sum() fig, axes = plt.subplots() ax1 = df.plot(use_index=True, y='country', kind='bar', ax=axes) ax2 = df.plot(use_index=True, y='pareto', marker='D', color="C1", kind='line', ax=axes, secondary_y=True) ax2.set_ylim([0,110]) |
参数
ImportanceOfBeingErnest的代码的更通用的版本:
1 2 3 4 5 6 7 8 9 10 11 12 13 | def create_pareto_chart(df, by_variable, quant_variable): df.index = by_variable df["cumpercentage"] = quant_variable.cumsum()/quant_variable.sum()*100 fig, ax = plt.subplots() ax.bar(df.index, quant_variable, color="C0") ax2 = ax.twinx() ax2.plot(df.index, df["cumpercentage"], color="C1", marker="D", ms=7) ax2.yaxis.set_major_formatter(PercentFormatter()) ax.tick_params(axis="y", colors="C0") ax2.tick_params(axis="y", colors="C1") plt.show() |
而且这个也包括通过按阈值分组的帕累托。
例如:如果将其设置为70,它将把70岁以上的少数群体分为一个称为"其他"的组。
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 | def create_pareto_chart(by_variable, quant_variable, threshold): total=quant_variable.sum() df = pd.DataFrame({'by_var':by_variable, 'quant_var':quant_variable}) df["cumpercentage"] = quant_variable.cumsum()/quant_variable.sum()*100 df = df.sort_values(by='quant_var',ascending=False) df_above_threshold = df[df['cumpercentage'] < threshold] df=df_above_threshold df_below_threshold = df[df['cumpercentage'] >= threshold] sum = total - df['quant_var'].sum() restbarcumsum = 100 - df_above_threshold['cumpercentage'].max() rest = pd.Series(['OTHERS', sum, restbarcumsum],index=['by_var','quant_var', 'cumpercentage']) df = df.append(rest,ignore_index=True) df.index = df['by_var'] df = df.sort_values(by='cumpercentage',ascending=True) fig, ax = plt.subplots() ax.bar(df.index, df["quant_var"], color="C0") ax2 = ax.twinx() ax2.plot(df.index, df["cumpercentage"], color="C1", marker="D", ms=7) ax2.yaxis.set_major_formatter(PercentFormatter()) ax.tick_params(axis="x", colors="C0", labelrotation=70) ax.tick_params(axis="y", colors="C0") ax2.tick_params(axis="y", colors="C1") plt.show() |