计算多索引 pandas 数据帧外部索引每行的总和
我有一个数据框:seller
、item
、price
、shipping
、免费送货最低
、count available
和count required
。我的目标是根据稍后计算的 total
找到 seller
和 item
的最便宜的组合(计算代码如下所示)。示例数据如下:
import pandas as pd
item1 = ['item 1', 'item 2', 'item 1', 'item 1', 'item 2']
seller1 = ['seller 1', 'seller 2', 'seller 3', 'seller 4', 'seller 1']
price1 = [1.85, 1.94, 2.00, 2.00, 2.02]
shipping1 = [0.99, 0.99, 0.99, 2.99, 0.99]
freeship1 = [5, 5, 5, 50, 5]
countavailable1 = [1, 2, 2, 5, 2]
countneeded1 = [2, 1, 2, 2, 1]
df1 = pd.dataframe({'seller':seller1,
'item':item1,
'price':price1,
'shipping':shipping1,
'free shipping minimum':freeship1,
'count available':countavailable1,
'count needed':countneeded1})
# create columns that states if seller has all counts needed.
# this will be used to sort by to prioritize the smallest number of orders possible
for index, row in df1.iterrows():
if row['count available'] >= row['count needed']:
df1.at[index, 'fulfills count needed'] = 'yes'
else:
df1.at[index, 'fulfills count needed'] = 'no'
# dont want to calc price based on [count available], so need to check if seller has count i need and calc cost based on [count needed].
# if doesn't have [count needed], then calc cost on [count available].
for index, row in df1.iterrows():
if row['count available'] >= row['count needed']:
df1.at[index, 'price x count'] = row['count needed'] * row['price']
else:
df1.at[index, 'price x count'] = row['count available'] * row['price']
但是,任何一个seller
都可以出售多个item
。我想尽量减少支付的运费,所以我想通过 seller
将 item
s 分组在一起。因此,我根据我在另一个线程中看到的方式使用 .first()
方法对它们进行分组,以便将每一列保留在新的分组数据框中。
# don't calc [total] until sellers have been grouped
# use first() method to return all columns and perform no other aggregations
grouped1 = df1.sort_values('price').groupby(['seller', 'item']).first()
此时我想通过seller
计算total
。所以我有以下代码,但它为每个 item
计算 total
,而不是 seller
,这意味着 shipping
根据每个组中的商品数量被多次添加,或者当 price x count
结束时不应用免费送货最低免运费
。
# calc [Total]
for index, row in grouped1.iterrows():
if (row['Free Shipping Minimum'] == 50) & (row['Price x Count'] > 50):
grouped1.at[index, 'Total'] = row['Price x Count'] + 0
elif (row['Free Shipping Minimum'] == 5) & (row['Price x Count'] > 5):
grouped1.at[index, 'Total'] = row['Price x Count'] + 0
else:
grouped1.at[index, 'Total'] = row['Price x Count'] + row['Shipping']
实际上看起来我可能需要在计算 total
时对每个 seller
求和 price x count
,但这本质上是同一个问题,因为我不知道如何计算外部索引的每行列。我可以使用什么方法来做到这一点?
另外,如果有人对如何实现我的后半部分目标有任何建议,请尽管提出。我只想退回我需要的每件商品。例如,我需要 2 个“项目 1”和 2 个“项目 2”。如果“卖家 1”有 2 个“商品 1”和 1 个“商品 2”,而“卖家 2”有 1 个“商品 1”和 1 个“商品 2”,那么我想要“卖家 1”的所有商品(假设它最便宜),但只有“卖家 2”的 1 个“商品 1”。这似乎会影响 total
列的计算,但我不确定如何实现它。
正确答案
我最终决定首先对 seller
进行分组,并对 price x count
进行求和以找到 subtotal
s,将其转换为数据帧,然后将 df1
与新的 subtotal
数据帧合并以创建 groupedphpcnend cphpcn 数据框。然后我使用
np.where
建议创建了 totals
列(这比我的 for 循环优雅得多,并且可以轻松处理 nan 值)。最后按seller
、total
、item
分组返回我想要的结果。最终代码如下:
import pandas as pd
import numpy as np
item1 = ['item 1', 'item 2', 'item 1', 'item 1', 'item 2']
seller1 = ['Seller 1', 'Seller 2', 'Seller 3', 'Seller 4', 'Seller 1']
price1 = [1.85, 1.94, 2.69, 2.00, 2.02]
shipping1 = [0.99, 0.99, 0.99, 2.99, 0.99]
freeship1 = [5, 5, 5, 50, 5]
countavailable1 = [1, 2, 2, 5, 2]
countneeded1 = [2, 1, 2, 2, 1]
df1 = pd.DataFrame({'Seller':seller1,
'Item':item1,
'Price':price1,
'Shipping':shipping1,
'Free Shipping Minimum':freeship1,
'Count Available':countavailable1,
'Count Needed':countneeded1})
# create columns that states if seller has all counts needed.
# this will be used to sort by to prioritize the smallest number of orders possible
for index, row in df1.iterrows():
if row['Count Available'] >= row['Count Needed']:
df1.at[index, 'Fulfills Count Needed'] = 'Yes'
else:
df1.at[index, 'Fulfills Count Needed'] = 'No'
# dont want to calc price based on [count available], so need to check if seller has count I need and calc cost based on [count needed].
# if doesn't have [count needed], then calc cost on [count available].
for index, row in df1.iterrows():
if row['Count Available'] >= row['Count Needed']:
df1.at[index, 'Price x Count'] = row['Count Needed'] * row['Price']
else:
df1.at[index, 'Price x Count'] = row['Count Available'] * row['Price']
# subtotals by seller, then assign calcs to column called [Subtotal] and merge into dataframe
subtotals = df1.groupby(['Seller'])['Price x Count'].sum().reset_index()
subtotals.rename({'Price x Count':'Subtotal'}, axis=1, inplace=True)
grouped = df1.merge(subtotals[['Subtotal', 'Seller']], on='Seller')
# calc [Total]
grouped['Total'] = np.where(grouped['Subtotal'] > grouped['Free Shipping Minimum'],
grouped['Subtotal'], grouped['Subtotal'] + grouped['Shipping'])
grouped.groupby(['Seller', 'Total', 'Item']).first()
以上就是计算多索引 pandas 数据帧外部索引每行的总和的详细内容,更多请关注编程网其它相关文章!
免责声明:
① 本站未注明“稿件来源”的信息均来自网络整理。其文字、图片和音视频稿件的所属权归原作者所有。本站收集整理出于非商业性的教育和科研之目的,并不意味着本站赞同其观点或证实其内容的真实性。仅作为临时的测试数据,供内部测试之用。本站并未授权任何人以任何方式主动获取本站任何信息。
② 本站未注明“稿件来源”的临时测试数据将在测试完成后最终做删除处理。有问题或投稿请发送至: 邮箱/279061341@qq.com QQ/279061341
计算多索引 pandas 数据帧外部索引每行的总和
下载Word文档到电脑,方便收藏和打印~