pandas.DataFrame中如何提取特定类型dtype的列

2023-07-05 05:55

短信预约 -IT技能 免费直播动态提醒

本篇内容介绍了“pandas.DataFrame中如何提取特定类型dtype的列”的有关知识，在实际案例的操作过程中，不少人都会遇到这样的困境，接下来就让小编带领大家学习一下如何处理这些情况吧！希望大家仔细阅读，能够学有所成！

pandas.DataFrame为每一列保存一个数据类型dtype。

要仅提取（选择）特定数据类型为dtype的列，请使用pandas.DataFrame的select_dtypes（）方法。

以带有各种数据类型的列的pandas.DataFrame为例。

import pandas as pddf = pd.DataFrame({'a': [1, 2, 1, 3],                   'b': [0.4, 1.1, 0.1, 0.8],                   'c': ['X', 'Y', 'X', 'Z'],                   'd': [[0, 0], [0, 1], [1, 0], [1, 1]],                   'e': [True, True, False, True]})df['f'] = pd.to_datetime(['2018-01-01', '2018-03-15', '2018-02-20', '2018-03-15'])print(df)#    a    b  c       d      e          f# 0  1  0.4  X  [0, 0]   True 2018-01-01# 1  2  1.1  Y  [0, 1]   True 2018-03-15# 2  1  0.1  X  [1, 0]  False 2018-02-20# 3  3  0.8  Z  [1, 1]   True 2018-03-15print(df.dtypes)# a             int64# b           float64# c            object# d            object# e              bool# f    datetime64[ns]# dtype: object

将描述以下内容。

select_dtypes（）的基本用法

指定要提取的类型：参数include
指定要排除的类型：参数exclude

select_dtypes（）的基本用法

指定要提取的类型：参数include

在参数include中指定要提取的数据类型dtype。

print(df.select_dtypes(include=int))#    a# 0  1# 1  2# 2  1# 3  3

可以按原样指定作为Python的内置类型提供的那些变量，例如int和float。您可以将“ int”指定为字符串，也可以指定“ int64”（包括确切位数）。（标准位数取决于环境）

print(df.select_dtypes(include='int'))#    a# 0  1# 1  2# 2  1# 3  3print(df.select_dtypes(include='int64'))#    a# 0  1# 1  2# 2  1# 3  3

当然，当最多包括位数时，除非位数匹配，否则不会选择它。

print(df.select_dtypes(include='int32'))# Empty DataFrame# Columns: []# Index: [0, 1, 2, 3]

列表中可以指定多种数据类型dtype。日期和时间datetime64 [ns]可以由’datetime’指定。

print(df.select_dtypes(include=[int, float, 'datetime']))#    a    b          f# 0  1  0.4 2018-01-01# 1  2  1.1 2018-03-15# 2  1  0.1 2018-02-20# 3  3  0.8 2018-03-15

可以将数字类型（例如int和float）与特殊值“ number”一起指定。

print(df.select_dtypes(include='number'))#    a    b# 0  1  0.4# 1  2  1.1# 2  1  0.1# 3  3  0.8

元素为字符串str类型的列的数据类型dtype是object，但是object列还包含除str外的Python标准内置类型。实际上，数量并不多，但是，如示例中所示，如果有一列的元素为列表类型，请注意，该列也是由include = object提取的。

print(df.select_dtypes(include=object))#    c       d# 0  X  [0, 0]# 1  Y  [0, 1]# 2  X  [1, 0]# 3  Z  [1, 1]print(type(df.at[0, 'c']))# <class 'str'>print(type(df.at[0, 'd']))# <class 'list'>

但是，除非对其进行有意处理，否则字符串str类型以外的对象都不会（可能）成为pandas.DataFrame的元素，因此不必担心太多。

指定要排除的类型：参数exclude

在参数exclude中指定要排除的数据类型dtype。您还可以在列表中指定多个数据类型dtype。

print(df.select_dtypes(exclude='number'))#    c       d      e          f# 0  X  [0, 0]   True 2018-01-01# 1  Y  [0, 1]   True 2018-03-15# 2  X  [1, 0]  False 2018-02-20# 3  Z  [1, 1]   True 2018-03-15print(df.select_dtypes(exclude=[bool, 'datetime']))#    a    b  c       d# 0  1  0.4  X  [0, 0]# 1  2  1.1  Y  [0, 1]# 2  1  0.1  X  [1, 0]# 3  3  0.8  Z  [1, 1]

可以同时指定包含和排除，但是如果指定相同的类型，则会发生错误。

print(df.select_dtypes(include='number', exclude=int))#      b# 0  0.4# 1  1.1# 2  0.1# 3  0.8# print(df.select_dtypes(include=[int, bool], exclude=int))# ValueError: include and exclude overlap on frozenset({<class 'numpy.int64'>})

“pandas.DataFrame中如何提取特定类型dtype的列”的内容就介绍到这里了，感谢大家的阅读。如果想了解更多行业相关的知识可以关注编程网网站，小编将为大家输出更多高质量的实用文章！

免责声明：

① 本站未注明“稿件来源”的信息均来自网络整理。其文字、图片和音视频稿件的所属权归原作者所有。本站收集整理出于非商业性的教育和科研之目的，并不意味着本站赞同其观点或证实其内容的真实性。仅作为临时的测试数据，供内部测试之用。本站并未授权任何人以任何方式主动获取本站任何信息。

② 本站未注明“稿件来源”的临时测试数据将在测试完成后最终做删除处理。有问题或投稿请发送至: 邮箱/279061341@qq.com QQ/279061341

阅读原文内容投诉