Skip to main content

Pivot table sorting in Pandas [StackOverflow]

The answer to the question about the sorting based on the difference in the items with the same value in one column:

First you need to calculate the differences for each item type. One of the ways, how to do this with pandas would be to use pivot_tables. Here you tell it which dataframe (df), based on which columns to calculate (values="TotalCost"), what function to use to calculate it (aggfunc=np.diff) and how to group them (index=["ItemType"]).

diff = pandas.pivot_table(df, values="TotalCost", index=["ItemType"], aggfunc=np.diff)

You case above only have 2 possible months. If you had more than two, then np.diff would give you values in a list. In this case you have two options. Either you filter the data frame, so there are only two months in it. This can be done like this:

df = df[[a or b for a, b in zip(df["Year_Month"] == "Jul-2017", df["Year_Month"] == "Jun-2017")]]

The other option is that you calculate the mean difference in months. This can be done with the following function, which you would then replace np.diff with:

def mean_diff(l): return np.mean(np.diff(l))

Then you can use this to calculate the difference for each element:

df["Diff"] = [float(diff.loc[d]) for d in df["ItemType"]]

After that, you just sort by the difference (and then by item, in case there are multiple items with the same difference)

df.sort_values(by=["Diff", "ItemType", "Year_Month"]).drop(columns = 'Diff')