为何筛选出来不是所想的

问答 / 42 / 4 / 创建于 1年前

df=pd.DataFrame([[‘小明’,15,’一中’,99,4],
[‘小美’,12,’一中’,63,1],
[‘灰灰’,19,’一中’,85,2],
[‘铭铭’,12,’一中’,76,2],
[‘豪豪’,15,’一中’,55,4],
[‘小黄’,18,’二中’,68,1],
[‘小黑’,18,’二中’,85,1]],
columns=[‘姓名’,’年龄’,’学校’,’成绩’,’类别’])
df1=(
df.groupby([‘年龄’,’学校’])
.filter(lambda x:(len(x)>1) & (1 not in x.类别))
)
print(df1)

结果：
姓名年龄学校成绩类别
0 小明 15 一中 99 4
4 豪豪 15 一中 55 4
5 小黄 18 二中 68 1
6 小黑 18 二中 85 1

为什么小黄和小黑这两条会出现呢，我不是定了筛选条件“1不在类别里”了吗

xmlhwl

18 声望

暂无个人描述~

0 人点赞

推荐文章：

更多推荐...

公告

Python Masonite 框架中文翻译召集（Python 中的类 Laravel 框架） 23 / 25 |

博客

收集了一些各大网站 python 的登陆方式,希望对学习 python 的小白，和想写爬虫的你们有所帮助,,本项目用于研究和分享各大网站的模拟登陆方式 16 / 5 |

翻译

Python 3.7 的一些新特性 10 / 2 |

链接

快速掌握一个语言最常用的 50% 11 / 1 |

翻译

使用 Python 一步步搭建自己的区块链 22 / 1 |

hustnzj

版主 2.2k 声望

最佳答案

还是改为英文变量名，方便输入，下面是完整代码：

import pandas as pd
df = pd.DataFrame([['小明', 15, '一中', 99, 4],
                   ['小美', 12, '一中', 63, 1],
                   ['灰灰', 19, '一中', 85, 2],
                   ['铭铭', 12, '一中', 76, 2],
                   ['豪豪', 15, '一中', 55, 4],
                   ['小黄', 18, '二中', 68, 1],
                   ['小黑', 18, '二中', 85, 1]],
                  columns=['name', 'age', 'school', 'score', 'category'])
# 打印出数据，便于后面分析
# print(df)
# 对 age 和 school 分组
grouped = df.groupby(['age', 'school'])
# 打印每个分组的数据
for name, x in grouped:
    # 分组字段，元组
    print(name)
    # Series对象的 _info_axis，这里实际上是源数据的行索引:
    print(x.category._info_axis)
    # 判断1是否在行索引组成的列表中：
    print(1 not in x.category._info_axis)
    # 再加上分组长度的判断：
    print(len(x) > 1 & 1 not in x.category._info_axis)


# 按需求: 将分组后的数据大于1条 且 category 全不等于1 的数据筛选出来，上面显然不符合。因此不能直接使用 `1 not in x.category`


grouped = df.groupby(['age', 'school'])
# 方法一：不使用 `in`
filtered = grouped.filter(lambda x: len(x) > 1 and (x.category != 1).all())
print(filtered)

# 方法二：使用`in`前，先将 `x.category` 转换为列表
filtered = grouped.filter(lambda x: len(x) > 1 & 1 not in list(x.category))
print(filtered)

1年前评论

讨论数量: 4

hustnzj

版主 2.2k 声望

答案

grouped = df.groupby(['年龄', '学校'])
# 方法一：不使用 `in`
filtered = grouped.filter(lambda x: len(x) > 1 and (x.类别 != 1).all())
print(filtered)

# 方法二：使用`in`前，先将 `x.类别` 转换为列表
filtered = grouped.filter(lambda x: len(x) > 1 & 1 not in list(x.类别))
print(filtered)

分析

参考这里：stackoverflow.com/questions/493930...

这是因为 x.类别 返回的是 pandas.Series 对象，对其进行 in 操作会被解释为对series.__contains__(value)的调用。

而参考 pandas.generic.py 源码，实际操作的是_info_axis 属性

def __contains__(self, key) -> bool_t:
        """True if the key is in the info axis"""
        return key in self._info_axis

而对于pandas.Series 对象来说，_info_axis 属性用来表示其行索引。可以用下面的代码来测试：

# 打印每个分组的数据
for name, x in grouped:
    # 分组字段，元组
    print(name)
    # Series对象的 _info_axis，这里实际上是源数据的行索引:
    print(x.category._info_axis)
    # 判断1是否在行索引组成的列表中：
    print(1 not in x.category._info_axis)
    # 再加上分组长度的判断：
    print(len(x) > 1 & 1 not in x.category._info_axis)


# 按需求: 将分组后的数据大于1条 且 category 全不等于1 的数据筛选出来，上面显然不符合。因此不能直接使用 `1 not in x.category`

所以这里直接用 in 会出现这种现象。

1年前评论

hustnzj

版主 2.2k 声望

还是改为英文变量名，方便输入，下面是完整代码：

import pandas as pd
df = pd.DataFrame([['小明', 15, '一中', 99, 4],
                   ['小美', 12, '一中', 63, 1],
                   ['灰灰', 19, '一中', 85, 2],
                   ['铭铭', 12, '一中', 76, 2],
                   ['豪豪', 15, '一中', 55, 4],
                   ['小黄', 18, '二中', 68, 1],
                   ['小黑', 18, '二中', 85, 1]],
                  columns=['name', 'age', 'school', 'score', 'category'])
# 打印出数据，便于后面分析
# print(df)
# 对 age 和 school 分组
grouped = df.groupby(['age', 'school'])
# 打印每个分组的数据
for name, x in grouped:
    # 分组字段，元组
    print(name)
    # Series对象的 _info_axis，这里实际上是源数据的行索引:
    print(x.category._info_axis)
    # 判断1是否在行索引组成的列表中：
    print(1 not in x.category._info_axis)
    # 再加上分组长度的判断：
    print(len(x) > 1 & 1 not in x.category._info_axis)


# 按需求: 将分组后的数据大于1条 且 category 全不等于1 的数据筛选出来，上面显然不符合。因此不能直接使用 `1 not in x.category`


grouped = df.groupby(['age', 'school'])
# 方法一：不使用 `in`
filtered = grouped.filter(lambda x: len(x) > 1 and (x.category != 1).all())
print(filtered)

# 方法二：使用`in`前，先将 `x.category` 转换为列表
filtered = grouped.filter(lambda x: len(x) > 1 & 1 not in list(x.category))
print(filtered)

1年前评论

xmlhwl

18 声望

非常感谢

1年前评论

xiong_d

2 声望

df1=( df.groupby(['年龄','学校']) .filter (lambda x:(len (x)>1) & (1 not in x.类别.values)) )

1年前评论

讨论应以学习和精进为目的。请勿发布不友善或者负能量的内容，与人为善，比聪明更重要！

帮助

为何筛选出来不是所想的

推荐文章：

答案

分析

社区赞助商

关于 LearnKu

资源推荐

服务提供商

其他信息

为何筛选出来不是所想的

推荐文章：

答案

分析

社区赞助商

关于 LearnKu

资源推荐

服务提供商

其他信息

请登录