计算多索引 pandas 数据帧外部索引每行的总和

2024-02-05 22:08

短信预约 -IT技能 免费直播动态提醒

问题内容

我有一个数据框：seller、item、price、shipping、免费送货最低、count available和count required。我的目标是根据稍后计算的 total 找到 seller 和 item 的最便宜的组合（计算代码如下所示）。示例数据如下：

import pandas as pd

item1 = ['item 1', 'item 2', 'item 1', 'item 1', 'item 2']
seller1 = ['seller 1', 'seller 2', 'seller 3', 'seller 4', 'seller 1']
price1 = [1.85, 1.94, 2.00, 2.00, 2.02]
shipping1 = [0.99, 0.99, 0.99, 2.99, 0.99]
freeship1 = [5, 5, 5, 50, 5]
countavailable1 = [1, 2, 2, 5, 2]
countneeded1 = [2, 1, 2, 2, 1]

df1 = pd.dataframe({'seller':seller1,
                    'item':item1,
                    'price':price1,
                    'shipping':shipping1,
                    'free shipping minimum':freeship1,
                    'count available':countavailable1,
                    'count needed':countneeded1})

# create columns that states if seller has all counts needed.
# this will be used to sort by to prioritize the smallest number of orders possible
for index, row in df1.iterrows():
    if row['count available'] >= row['count needed']:
        df1.at[index, 'fulfills count needed'] = 'yes'
    else:
        df1.at[index, 'fulfills count needed'] = 'no'

# dont want to calc price based on [count available], so need to check if seller has count i need and calc cost based on [count needed].
# if doesn't have [count needed], then calc cost on [count available].
for index, row in df1.iterrows():
    if row['count available'] >= row['count needed']:
        df1.at[index, 'price x count'] = row['count needed'] * row['price']
    else:
        df1.at[index, 'price x count'] = row['count available'] * row['price']

但是，任何一个seller都可以出售多个item。我想尽量减少支付的运费，所以我想通过 seller 将 items 分组在一起。因此，我根据我在另一个线程中看到的方式使用 .first() 方法对它们进行分组，以便将每一列保留在新的分组数据框中。

# don't calc [total] until sellers have been grouped
# use first() method to return all columns and perform no other aggregations
grouped1 = df1.sort_values('price').groupby(['seller', 'item']).first()

此时我想通过seller计算total。所以我有以下代码，但它为每个 item 计算 total，而不是 seller，这意味着 shipping 根据每个组中的商品数量被多次添加，或者当 price x count 结束时不应用免费送货最低免运费。

# calc [Total]
for index, row in grouped1.iterrows():
    if (row['Free Shipping Minimum'] == 50) & (row['Price x Count'] > 50):
        grouped1.at[index, 'Total'] = row['Price x Count'] + 0
    elif (row['Free Shipping Minimum'] == 5) & (row['Price x Count'] > 5):
        grouped1.at[index, 'Total'] = row['Price x Count'] + 0
    else:
        grouped1.at[index, 'Total'] = row['Price x Count'] + row['Shipping']

实际上看起来我可能需要在计算 total 时对每个 seller 求和 price x count ，但这本质上是同一个问题，因为我不知道如何计算外部索引的每行列。我可以使用什么方法来做到这一点？

另外，如果有人对如何实现我的后半部分目标有任何建议，请尽管提出。我只想退回我需要的每件商品。例如，我需要 2 个“项目 1”和 2 个“项目 2”。如果“卖家 1”有 2 个“商品 1”和 1 个“商品 2”，而“卖家 2”有 1 个“商品 1”和 1 个“商品 2”，那么我想要“卖家 1”的所有商品（假设它最便宜），但只有“卖家 2”的 1 个“商品 1”。这似乎会影响 total 列的计算，但我不确定如何实现它。

正确答案

我最终决定首先对 seller 进行分组，并对 price x count 进行求和以找到 subtotals，将其转换为数据帧，然后将 df1 与新的 subtotal 数据帧合并以创建 groupedphpcnend cphpcn 数据框。然后我使用 np.where 建议创建了 totals 列（这比我的 for 循环优雅得多，并且可以轻松处理 nan 值）。最后按seller、total、item分组返回我想要的结果。最终代码如下：


import pandas as pd
import numpy as np

item1 = ['item 1', 'item 2', 'item 1', 'item 1', 'item 2']
seller1 = ['Seller 1', 'Seller 2', 'Seller 3', 'Seller 4', 'Seller 1']
price1 = [1.85, 1.94, 2.69, 2.00, 2.02]
shipping1 = [0.99, 0.99, 0.99, 2.99, 0.99]
freeship1 = [5, 5, 5, 50, 5]
countavailable1 = [1, 2, 2, 5, 2]
countneeded1 = [2, 1, 2, 2, 1]

df1 = pd.DataFrame({'Seller':seller1,
                    'Item':item1,
                    'Price':price1,
                    'Shipping':shipping1,
                    'Free Shipping Minimum':freeship1,
                    'Count Available':countavailable1,
                    'Count Needed':countneeded1})

# create columns that states if seller has all counts needed.
# this will be used to sort by to prioritize the smallest number of orders possible
for index, row in df1.iterrows():
    if row['Count Available'] >= row['Count Needed']:
        df1.at[index, 'Fulfills Count Needed'] = 'Yes'
    else:
        df1.at[index, 'Fulfills Count Needed'] = 'No'

# dont want to calc price based on [count available], so need to check if seller has count I need and calc cost based on [count needed].
# if doesn't have [count needed], then calc cost on [count available].
for index, row in df1.iterrows():
    if row['Count Available'] >= row['Count Needed']:
        df1.at[index, 'Price x Count'] = row['Count Needed'] * row['Price']
    else:
        df1.at[index, 'Price x Count'] = row['Count Available'] * row['Price']

# subtotals by seller, then assign calcs to column called [Subtotal] and merge into dataframe
subtotals = df1.groupby(['Seller'])['Price x Count'].sum().reset_index()

subtotals.rename({'Price x Count':'Subtotal'}, axis=1, inplace=True)

grouped = df1.merge(subtotals[['Subtotal', 'Seller']], on='Seller')


# calc [Total]
grouped['Total'] = np.where(grouped['Subtotal'] > grouped['Free Shipping Minimum'],
                             grouped['Subtotal'], grouped['Subtotal'] + grouped['Shipping'])

grouped.groupby(['Seller', 'Total', 'Item']).first()
以上就是计算多索引 pandas 数据帧外部索引每行的总和的详细内容，更多请关注编程网其它相关文章！

免责声明：

① 本站未注明“稿件来源”的信息均来自网络整理。其文字、图片和音视频稿件的所属权归原作者所有。本站收集整理出于非商业性的教育和科研之目的，并不意味着本站赞同其观点或证实其内容的真实性。仅作为临时的测试数据，供内部测试之用。本站并未授权任何人以任何方式主动获取本站任何信息。

② 本站未注明“稿件来源”的临时测试数据将在测试完成后最终做删除处理。有问题或投稿请发送至: 邮箱/279061341@qq.com QQ/279061341

阅读原文内容投诉

计算多索引 pandas 数据帧外部索引每行的总和
下载Word文档到电脑，方便收藏和打印～

下载Word文档

`相关文章`

`猜你喜欢`

计算多索引 pandas 数据帧外部索引每行的总和问题内容我有一个数据框：seller、item、price、shipping、免费送货最低、count available和count required。我的目标是根据稍后计算的 total 找到 seller 和 item 的最便宜的组

2024-02-05

如何利用pandas工具输出每行的索引值、及其对应的行数据

2024-04-02

PHP如何带索引检查计算数组的差集，用回调函数比较数据和索引

2024-04-02

PHP如何带索引检查计算数组的差集，用回调函数比较数据和索引本篇介绍了使用PHP回调函数计算数组差集的方法，其中差集的元素是仅存在于第一个数组中，且不满足回调函数定义的相等条件（考虑数据和索引）。回调函数用于自定义比较，返回布尔值指示相等性。文章提供了示例代码，展示了如何使用array_diff_uassoc()函数和回调函数来计算差集。

2024-04-02

Java如何带索引检查计算数组的差集，用回调函数比较数据和索引带索引检查计算Java数组差集。使用回调函数比较元素，返回元素和索引。该方法灵活、可扩展、可维护。它通过遍历一个数组，并使用回调函数在另一个数组中检查每个元素是否存在来实现。如果不存在，则将元素和索引添加到差集中。示例演示了如何使用该方法计算两个数组的差集。

2024-04-02

PHP如何带索引检查计算数组的交集，用单独的回调函数比较数据和索引本文介绍了使用PHP计算数组交集的两种方法，包括常规方法（array_intersect()）和带索引检查的方法（array_uintersect()）。后者使用回调函数比较数据和索引。此外，还提供了一个使用array_map()函数的备用方法。通过示例代码和说明，文章展示了如何实现这些方法来计算具有索引检查的数组交集。

2024-04-02

Java如何带索引检查计算数组的交集，用单独的回调函数比较数据和索引本文讲解如何利用回调函数比较数据和索引，计算Java数组的交集。算法步骤是遍历数组，调用回调函数比较元素，检查索引是否相等，然后将结果添加到新数组中。回调函数用于自定义比较标准，如值相等性或对象引用相等性。示例代码使用Integer数组，以值相等性和索引相等性计算交集，结果为[3,4,5]。

2024-04-02

PHP如何带索引检查计算数组的交集，用单独的回调函数比较数据和索引

2024-04-02

numpy与Python的异步编程：如何优化数据索引和计算？

2023-09-02

如何通过索引优化PHP与MySQL的计算字段和JSON数据的查询？引言：在PHP和MySQL开发中，经常会涉及到计算字段和JSON数据的查询需求。然而，由于这两种查询都会带来较高的计算量和数据处理复杂度，如果不加以优化，可能会导致性能下降。本文将介绍如何通过索引优化PHP与MySQL的计算字段和JSON数

2023-10-21

如何通过索引提升PHP与MySQL的行数估算和数据去重查询的效率？在开发PHP与MySQL的应用程序时，往往需要对数据库的行数进行估算和执行去重查询等操作。为了提升这些操作的效率，可以通过使用索引来优化查询过程。本文将介绍如何在PHP与MySQL中利用索引来提升行数估算和数据去重查询的效率，并给出具体的代

2023-10-21

`编程热搜`

Python 学习之路 - Python
一、安装Python34Windows在Python官网（https://www.python.org/downloads/）下载安装包并安装。Python的默认安装路径是：C:\Python34配置环境变量：【右键计算机】--》【属性】-
chatgpt的中文全称是什么
chatgpt的中文全称是生成型预训练变换模型。ChatGPT是什么ChatGPT是美国人工智能研究实验室OpenAI开发的一种全新聊天机器人模型，它能够通过学习和理解人类的语言来进行对话，还能根据聊天的上下文进行互动，并协助人类完成一系列
C/C++中extern函数使用详解
C/C++可变参数的使用
可变参数的使用方法远远不止以下几种，不过在C,C++中使用可变参数时要小心，在使用printf()等函数时传入的参数个数一定不能比前面的格式化字符串中的’%’符号个数少，否则会产生访问越界，运气不好的话还会导致程序崩溃
css样式文件该放在哪里
php中数组下标必须是连续的吗
Python 3 教程
Python 3 教程 Python 的 3.0 版本，常被称为 Python 3000，或简称 Py3k。相对于 Python 的早期版本，这是一个较大的升级。为了不带入过多的累赘，Python 3.0 在设计的时候没有考虑向下兼容。 Python
Python pip包管理
一、前言在Python中，安装第三方模块是通过 setuptools 这个工具完成的。 Python有两个封装了 setuptools的包管理工具： easy_install 和 pip ，目前官方推荐使用 pip。
ubuntu如何重新编译内核
改善Java代码之慎用java动态编译