数据科学和人工智能技术笔记 十九、数据整理(1)
十九、数据整理(1)
作者:Chris Albon
译者:飞龙
协议:CC BY-NC-SA 4.0
在 Pandas 中通过分组应用函数
import pandas as pd# 创立示例数据帧data = {'Platoon': ['A','A','A','A','A','A','B','B','B','B','B','C','C','C','C','C'], 'Casualties': [1,4,5,7,5,5,6,1,4,5,6,7,4,6,4,6]}df = pd.DataFrame(data)df| Casualties | Platoon | |
|---|---|---|
| 0 | 1 | A |
| 1 | 4 | A |
| 2 | 5 | A |
| 3 | 7 | A |
| 4 | 5 | A |
| 5 | 5 | A |
| 6 | 6 | B |
| 7 | 1 | B |
| 8 | 4 | B |
| 9 | 5 | B |
| 10 | 6 | B |
| 11 | 7 | C |
| 12 | 4 | C |
| 13 | 6 | C |
| 14 | 4 | C |
| 15 | 6 | C |
# 按照 df.platoon 对 df 分组# 而后将滚动平均 lambda 函数应用于 df.casualtiesdf.groupby('Platoon')['Casualties'].apply(lambda x:x.rolling(center=False,window=2).mean())'''0 NaN1 2.52 4.53 6.04 6.05 5.06 NaN7 3.58 2.59 4.510 5.511 NaN12 5.513 5.014 5.015 5.0dtype: float64''' 在 Pandas 中向分组应用操作
# 导入板块import pandas as pd# 创立数据帧raw_data = {'regiment': ['Nighthawks', 'Nighthawks', 'Nighthawks', 'Nighthawks', 'Dragoons', 'Dragoons', 'Dragoons', 'Dragoons', 'Scouts', 'Scouts', 'Scouts', 'Scouts'], 'company': ['1st', '1st', '2nd', '2nd', '1st', '1st', '2nd', '2nd','1st', '1st', '2nd', '2nd'], 'name': ['Miller', 'Jacobson', 'Ali', 'Milner', 'Cooze', 'Jacon', 'Ryaner', 'Sone', 'Sloan', 'Piger', 'Riani', 'Ali'], 'preTestScore': [4, 24, 31, 2, 3, 4, 24, 31, 2, 3, 2, 3], 'postTestScore': [25, 94, 57, 62, 70, 25, 94, 57, 62, 70, 62, 70]}df = pd.DataFrame(raw_data, columns = ['regiment', 'company', 'name', 'preTestScore', 'postTestScore'])df| regiment | company | name | preTestScore | postTestScore | |
|---|---|---|---|---|---|
| 0 | Nighthawks | 1st | Miller | 4 | 25 |
| 1 | Nighthawks | 1st | Jacobson | 24 | 94 |
| 2 | Nighthawks | 2nd | Ali | 31 | 57 |
| 3 | Nighthawks | 2nd | Milner | 2 | 62 |
| 4 | Dragoons | 1st | Cooze | 3 | 70 |
| 5 | Dragoons | 1st | Jacon | 4 | 25 |
| 6 | Dragoons | 2nd | Ryaner | 24 | 94 |
| 7 | Dragoons | 2nd | Sone | 31 | 57 |
| 8 | Scouts | 1st | Sloan | 2 | 62 |
| 9 | Scouts | 1st | Piger | 3 | 70 |
| 10 | Scouts | 2nd | Riani | 2 | 62 |
| 11 | Scouts | 2nd | Ali | 3 | 70 |
# 创立一个 groupby 变量,按团队(regiment)对 preTestScores 分组groupby_regiment = df['preTestScore'].groupby(df['regiment'])groupby_regiment# <pandas.core.groupby.SeriesGroupBy object at 0x113ddb550> “这个分组变量现在是GroupBy对象。 除了分组的键df ['key1']的少量中间数据之外,它实际上还没有计算任何东西。 我们的想法是,该对象具备将所有操作应用于每个分组所需的所有信息。” — PyDA
使用list()显示分组的样子。
list(df['preTestScore'].groupby(df['regiment']))'''[('Dragoons', 4 3 5 4 6 24 7 31 Name: preTestScore, dtype: int64), ('Nighthawks', 0 4 1 24 2 31 3 2 Name: preTestScore, dtype: int64), ('Scouts', 8 2 9 3 10 2 11 3 Name: preTestScore, dtype: int64)] '''df['preTestScore'].groupby(df['regiment']).describe()| count | mean | std | min | 25% | 50% | 75% | max | |
|---|---|---|---|---|---|---|---|---|
| regiment | ||||||||
| Dragoons | 4.0 | 15.50 | 14.153916 | 3.0 | 3.75 | 14.0 | 25.75 | 31.0 |
| Nighthawks | 4.0 | 15.25 | 14.453950 | 2.0 | 3.50 | 14.0 | 25.75 | 31.0 |
| Scouts | 4.0 | 2.50 | 0.577350 | 2.0 | 2.00 | 2.5 | 3.00 | 3.0 |
# 每个团队的 preTestScore 均值groupby_regiment.mean()'''regimentDragoons 15.50Nighthawks 15.25Scouts 2.50Name: preTestScore, dtype: float64 '''df['preTestScore'].groupby([df['regiment'], df['company']]).mean()'''regiment companyDragoons 1st 3.5 2nd 27.5Nighthawks 1st 14.0 2nd 16.5Scouts 1st 2.5 2nd 2.5Name: preTestScore, dtype: float64 '''df['preTestScore'].groupby([df['regiment'], df['company']]).mean().unstack()| company | 1st | 2nd |
|---|---|---|
| regiment | ||
| Dragoons | 3.5 | 27.5 |
| Nighthawks | 14.0 | 16.5 |
| Scouts | 2.5 | 2.5 |
# 按团队和公司(company)对整个数据帧分组df.groupby(['regiment', 'company']).mean()| preTestScore | postTestScore | ||
|---|---|---|---|
| regiment | company | ||
| Dragoons | 1st | 3.5 | 47.5 |
| 2nd | 27.5 | 75.5 | |
| Nighthawks | 1st | 14.0 | 59.5 |
| 2nd | 16.5 | 59.5 | |
| Scouts | 1st | 2.5 | 66.0 |
| 2nd | 2.5 | 66.0 |
# 每个团队和公司的观测数量df.groupby(['regiment', 'company']).size()'''regiment companyDragoons 1st 2 2nd 2Nighthawks 1st 2 2nd 2Scouts 1st 2 2nd 2dtype: int64 '''# 按团队对数据帧分组,对于每个团队,for name, group in df.groupby('regiment'): # 打印团队名称 print(name) # 打印它的数据 print(group)'''Dragoons regiment company name preTestScore postTestScore4 Dragoons 1st Cooze 3 705 Dragoons 1st Jacon 4 256 Dragoons 2nd Ryaner 24 947 Dragoons 2nd Sone 31 57Nighthawks regiment company name preTestScore postTestScore0 Nighthawks 1st Miller 4 251 Nighthawks 1st Jacobson 24 942 Nighthawks 2nd Ali 31 573 Nighthawks 2nd Milner 2 62Scouts regiment company name preTestScore postTestScore8 Scouts 1st Sloan 2 629 Scouts 1st Piger 3 7010 Scouts 2nd Riani 2 6211 Scouts 2nd Ali 3 70 '''按列分组:
特别是在这种情况下:按列对数据类型(即axis = 1)分组,而后使用list()查看该分组的外观。
list(df.groupby(df.dtypes, axis=1))'''[(dtype('int64'), preTestScore postTestScore 0 4 25 1 24 94 2 31 57 3 2 62 4 3 70 5 4 25 6 24 94 7 31 57 8 2 62 9 3 70 10 2 62 11 3 70), (dtype('O'), regiment company name 0 Nighthawks 1st Miller 1 Nighthawks 1st Jacobson 2 Nighthawks 2nd Ali 3 Nighthawks 2nd Milner 4 Dragoons 1st Cooze 5 Dragoons 1st Jacon 6 Dragoons 2nd Ryaner 7 Dragoons 2nd Sone 8 Scouts 1st Sloan 9 Scouts 1st Piger 10 Scouts 2nd Riani 11 Scouts 2nd Ali)] df.groupby('regiment').mean().add_prefix('mean_')| mean_preTestScore | mean_postTestScore | |
|---|---|---|
| regiment | ||
| Dragoons | 15.50 | 61.5 |
| Nighthawks | 15.25 | 59.5 |
| Scouts | 2.50 | 66.0 |
# 创立获取分组状态的函数def get_stats(group): return {'min': group.min(), 'max': group.max(), 'count': group.count(), 'mean': group.mean()}bins = [0, 25, 50, 75, 100]group_names = ['Low', 'Okay', 'Good', 'Great']df['categories'] = pd.cut(df['postTestScore'], bins, labels=group_names)df['postTestScore'].groupby(df['categories']).apply(get_stats).unstack()| count | max | mean | min | |
|---|---|---|---|---|
| categories | ||||
| Good | 8.0 | 70.0 | 63.75 | 57.0 |
| Great | 2.0 | 94.0 | 94.00 | 94.0 |
| Low | 2.0 | 25.0 | 25.00 | 25.0 |
| Okay | 0.0 | NaN | NaN | NaN |
在 Pandas 数据帧上应用操作
# 导入模型import pandas as pdimport numpy as npdata = {'name': ['Jason', 'Molly', 'Tina', 'Jake', 'Amy'], 'year': [2012, 2012, 2013, 2014, 2014], 'reports': [4, 24, 31, 2, 3], 'coverage': [25, 94, 57, 62, 70]}df = pd.DataFrame(data, index = ['Cochice', 'Pima', 'Santa Cruz', 'Maricopa', 'Yuma'])df| coverage | name | reports | year | |
|---|---|---|---|---|
| Cochice | 25 | Jason | 4 | 2012 |
| Pima | 94 | Molly | 24 | 2012 |
| Santa Cruz | 57 | Tina | 31 | 2013 |
| Maricopa | 62 | Jake | 2 | 2014 |
| Yuma | 70 | Amy | 3 | 2014 |
# 创立大写转换的 lambda 函数capitalizer = lambda x: x.upper()将capitalizer函数应用于name列。
apply()可以沿数据帧的任意轴应用函数。
df['name'].apply(capitalizer)'''Cochice JASONPima MOLLYSanta Cruz TINAMaricopa JAKEYuma AMYName: name, dtype: object '''将capitalizer lambda 函数映射到序列name中的每个元素。
map()对序列的每个元素应用操作。
df['name'].map(capitalizer)'''Cochice JASONPima MOLLYSanta Cruz TINAMaricopa JAKEYuma AMYName: name, dtype: object '''将平方根函数应用于整个数据帧中的每个单元格。
applymap()将函数应用于整个数据帧中的每个元素。
# 删除字符串变量,以便 applymap() 可以运行df = df.drop('name', axis=1)# 返回数据帧每个单元格的平方根df.applymap(np.sqrt)| coverage | reports | year | |
|---|---|---|---|
| Cochice | 5.000000 | 2.000000 | 44.855323 |
| Pima | 9.695360 | 4.898979 | 44.855323 |
| Santa Cruz | 7.549834 | 5.567764 | 44.866469 |
| Maricopa | 7.874008 | 1.414214 | 44.877611 |
| Yuma | 8.366600 | 1.732051 | 44.877611 |
在数据帧上应用函数。
# 创立叫做 times100 的函数def times100(x): # 假如 x 是字符串, if type(x) is str: # 原样返回它 return x # 假如不是,返回它乘上 100 elif x: return 100 * x # 并留下其它东西 else: returndf.applymap(times100)| coverage | reports | year | |
|---|---|---|---|
| Cochice | 2500 | 400 | 201200 |
| Pima | 9400 | 2400 | 201200 |
| Santa Cruz | 5700 | 3100 | 201300 |
| Maricopa | 6200 | 200 | 201400 |
| Yuma | 7000 | 300 | 201400 |
向 Pandas 数据帧赋予新列
import pandas as pd# 创立空数据帧df = pd.DataFrame()# 创立一列df['name'] = ['John', 'Steve', 'Sarah']# 查看数据帧df| name | |
|---|---|
| 0 | John |
| 1 | Steve |
| 2 | Sarah |
# 将一个新列赋予名为 age 的 df,它包含年龄列表df.assign(age = [31, 32, 19])| name | age | |
|---|---|---|
| 0 | John | 31 |
| 1 | Steve | 32 |
| 2 | Sarah | 19 |
将列表拆分为大小为 N 的分块
在这个片段中,我们接受一个列表并将其分解为大小为 n 的块。 在解决具备最大请求大小的 API 时,这是一种非常常见的做法。
这个漂亮的函数由 Ned Batchelder 贡献,发布于 StackOverflow。
# 创立名称列表first_names = ['Steve', 'Jane', 'Sara', 'Mary','Jack','Bob', 'Bily', 'Boni', 'Chris','Sori', 'Will', 'Won','Li']# 创立叫做 chunks 的函数,有两个参数 l 和 ndef chunks(l, n): # 对于长度为 l 的范围中的项目 i for i in range(0, len(l), n): # 创立索引范围 yield l[i:i+n]# 从函数 chunks 的结果创立一个列表list(chunks(first_names, 5))'''[['Steve', 'Jane', 'Sara', 'Mary', 'Jack'], ['Bob', 'Bily', 'Boni', 'Chris', 'Sori'], ['Will', 'Won', 'Li']] '''在 Pandas 中使用正则表达式将字符串分解为列
# 导入板块import reimport pandas as pd# 创立带有一列字符串的数据帧data = {'raw': ['Arizona 1 2014-12-23 3242.0', 'Iowa 1 2010-02-23 3453.7', 'Oregon 0 2014-06-20 2123.0', 'Maryland 0 2014-03-14 1123.6', 'Florida 1 2013-01-15 2134.0', 'Georgia 0 2012-07-14 2345.6']}df = pd.DataFrame(data, columns = ['raw'])df| raw | |
|---|---|
| 0 | Arizona 1 2014-12-23 3242.0 |
| 1 | Iowa 1 2010-02-23 3453.7 |
| 2 | Oregon 0 2014-06-20 2123.0 |
| 3 | Maryland 0 2014-03-14 1123.6 |
| 4 | Florida 1 2013-01-15 2134.0 |
| 5 | Georgia 0 2012-07-14 2345.6 |
# df['raw'] 的哪些行包含 'xxxx-xx-xx'?df['raw'].str.contains('....-..-..', regex=True)'''0 True1 True2 True3 True4 True5 TrueName: raw, dtype: bool '''# 在 raw 列中,提取字符串中的单个数字df['female'] = df['raw'].str.extract('(\d)', expand=True)df['female']'''0 11 12 03 04 15 0Name: female, dtype: object '''# 在 raw 列中,提取字符串中的 xxxx-xx-xxdf['date'] = df['raw'].str.extract('(....-..-..)', expand=True)df['date']'''0 2014-12-231 2010-02-232 2014-06-203 2014-03-144 2013-01-155 2012-07-14Name: date, dtype: object '''# 在 raw 列中,提取字符串中的 ####.##df['score'] = df['raw'].str.extract('(\d\d\d\d\.\d)', expand=True)df['score']'''0 3242.01 3453.72 2123.03 1123.64 2134.05 2345.6Name: score, dtype: object '''# 在 raw 列中,提取字符串中的单词df['state'] = df['raw'].str.extract('([A-Z]\w{0,})', expand=True)df['state']'''0 Arizona1 Iowa2 Oregon3 Maryland4 Florida5 GeorgiaName: state, dtype: object '''df| raw | female | date | score | state | |
|---|---|---|---|---|---|
| 0 | Arizona 1 2014-12-23 3242.0 | 1 | 2014-12-23 | 3242.0 | Arizona |
| 1 | Iowa 1 2010-02-23 3453.7 | 1 | 2010-02-23 | 3453.7 | Iowa |
| 2 | Oregon 0 2014-06-20 2123.0 | 0 | 2014-06-20 | 2123.0 | Oregon |
| 3 | Maryland 0 2014-03-14 1123.6 | 0 | 2014-03-14 | 1123.6 | Maryland |
| 4 | Florida 1 2013-01-15 2134.0 | 1 | 2013-01-15 | 2134.0 | Florida |
| 5 | Georgia 0 2012-07-14 2345.6 | 0 | 2012-07-14 | 2345.6 | Georgia |
由两个数据帧贡献列
# 导入库import pandas as pd# 创立数据帧dataframe_one = pd.DataFrame()dataframe_one['1'] = ['1', '1', '1']dataframe_one['B'] = ['b', 'b', 'b']# 创立第二个数据帧dataframe_two = pd.DataFrame()dataframe_two['2'] = ['2', '2', '2']dataframe_two['B'] = ['b', 'b', 'b']# 将每个数据帧的列转换为集合,# 而后找到这两个集合的交集。# 这将是两个数据帧共享的列的集合。set.intersection(set(dataframe_one), set(dataframe_two))# {'B'} 从多个列表构建字典
# 创立官员名称的列表officer_names = ['Sodoni Dogla', 'Chris Jefferson', 'Jessica Billars', 'Michael Mulligan', 'Steven Johnson']# 创立官员军队的列表officer_armies = ['Purple Army', 'Orange Army', 'Green Army', 'Red Army', 'Blue Army']# 创立字典,它是两个列表的 zipdict(zip(officer_names, officer_armies))'''{'Chris Jefferson': 'Orange Army', 'Jessica Billars': 'Green Army', 'Michael Mulligan': 'Red Army', 'Sodoni Dogla': 'Purple Army', 'Steven Johnson': 'Blue Army'} '''将 CSV 转换为 Python 代码来重建它
# 导入 pandas 包import pandas as pd# 将 csv 文件加载为数据帧df_original = pd.read_csv('http://vincentarelbundock.github.io/Rdatasets/csv/datasets/iris.csv')df = pd.read_csv('http://vincentarelbundock.github.io/Rdatasets/csv/datasets/iris.csv')# 打印创立数据帧的代码print('==============================')print('RUN THE CODE BELOW THIS LINE')print('==============================')print('raw_data =', df.to_dict(orient='list'))print('df = pd.DataFrame(raw_data, columns = ' + str(list(df_original)) + ')')'''==============================RUN THE CODE BELOW THIS LINE==============================raw_data = {'Sepal.Length': [5.0999999999999996, 4.9000000000000004, 4.7000000000000002, 4.5999999999999996, 5.0, 5.4000000000000004, 4.5999999999999996, 5.0, 4.4000000000000004, 4.9000000000000004, 5.4000000000000004, 4.7999999999999998, 4.7999999999999998, 4.2999999999999998, 5.7999999999999998, 5.7000000000000002, 5.4000000000000004, 5.0999999999999996, 5.7000000000000002, 5.0999999999999996, 5.4000000000000004, 5.0999999999999996, 4.5999999999999996, 5.0999999999999996, 4.7999999999999998, 5.0, 5.0, 5.2000000000000002, 5.2000000000000002, 4.7000000000000002, 4.7999999999999998, 5.4000000000000004, 5.2000000000000002, 5.5, 4.9000000000000004, 5.0, 5.5, 4.9000000000000004, 4.4000000000000004, 5.0999999999999996, 5.0, 4.5, 4.4000000000000004, 5.0, 5.0999999999999996, 4.7999999999999998, 5.0999999999999996, 4.5999999999999996, 5.2999999999999998, 5.0, 7.0, 6.4000000000000004, 6.9000000000000004, 5.5, 6.5, 5.7000000000000002, 6.2999999999999998, 4.9000000000000004, 6.5999999999999996, 5.2000000000000002, 5.0, 5.9000000000000004, 6.0, 6.0999999999999996, 5.5999999999999996, 6.7000000000000002, 5.5999999999999996, 5.7999999999999998, 6.2000000000000002, 5.5999999999999996, 5.9000000000000004, 6.0999999999999996, 6.2999999999999998, 6.0999999999999996, 6.4000000000000004, 6.5999999999999996, 6.7999999999999998, 6.7000000000000002, 6.0, 5.7000000000000002, 5.5, 5.5, 5.7999999999999998, 6.0, 5.4000000000000004, 6.0, 6.7000000000000002, 6.2999999999999998, 5.5999999999999996, 5.5, 5.5, 6.0999999999999996, 5.7999999999999998, 5.0, 5.5999999999999996, 5.7000000000000002, 5.7000000000000002, 6.2000000000000002, 5.0999999999999996, 5.7000000000000002, 6.2999999999999998, 5.7999999999999998, 7.0999999999999996, 6.2999999999999998, 6.5, 7.5999999999999996, 4.9000000000000004, 7.2999999999999998, 6.7000000000000002, 7.2000000000000002, 6.5, 6.4000000000000004, 6.7999999999999998, 5.7000000000000002, 5.7999999999999998, 6.4000000000000004, 6.5, 7.7000000000000002, 7.7000000000000002, 6.0, 6.9000000000000004, 5.5999999999999996, 7.7000000000000002, 6.2999999999999998, 6.7000000000000002, 7.2000000000000002, 6.2000000000000002, 6.0999999999999996, 6.4000000000000004, 7.2000000000000002, 7.4000000000000004, 7.9000000000000004, 6.4000000000000004, 6.2999999999999998, 6.0999999999999996, 7.7000000000000002, 6.2999999999999998, 6.4000000000000004, 6.0, 6.9000000000000004, 6.7000000000000002, 6.9000000000000004, 5.7999999999999998, 6.7999999999999998, 6.7000000000000002, 6.7000000000000002, 6.2999999999999998, 6.5, 6.2000000000000002, 5.9000000000000004], 'Petal.Width': [0.20000000000000001, 0.20000000000000001, 0.20000000000000001, 0.20000000000000001, 0.20000000000000001, 0.40000000000000002, 0.29999999999999999, 0.20000000000000001, 0.20000000000000001, 0.10000000000000001, 0.20000000000000001, 0.20000000000000001, 0.10000000000000001, 0.10000000000000001, 0.20000000000000001, 0.40000000000000002, 0.40000000000000002, 0.29999999999999999, 0.29999999999999999, 0.29999999999999999, 0.20000000000000001, 0.40000000000000002, 0.20000000000000001, 0.5, 0.20000000000000001, 0.20000000000000001, 0.40000000000000002, 0.20000000000000001, 0.20000000000000001, 0.20000000000000001, 0.20000000000000001, 0.40000000000000002, 0.10000000000000001, 0.20000000000000001, 0.20000000000000001, 0.20000000000000001, 0.20000000000000001, 0.10000000000000001, 0.20000000000000001, 0.20000000000000001, 0.29999999999999999, 0.29999999999999999, 0.20000000000000001, 0.59999999999999998, 0.40000000000000002, 0.29999999999999999, 0.20000000000000001, 0.20000000000000001, 0.20000000000000001, 0.20000000000000001, 1.3999999999999999, 1.5, 1.5, 1.3, 1.5, 1.3, 1.6000000000000001, 1.0, 1.3, 1.3999999999999999, 1.0, 1.5, 1.0, 1.3999999999999999, 1.3, 1.3999999999999999, 1.5, 1.0, 1.5, 1.1000000000000001, 1.8, 1.3, 1.5, 1.2, 1.3, 1.3999999999999999, 1.3999999999999999, 1.7, 1.5, 1.0, 1.1000000000000001, 1.0, 1.2, 1.6000000000000001, 1.5, 1.6000000000000001, 1.5, 1.3, 1.3, 1.3, 1.2, 1.3999999999999999, 1.2, 1.0, 1.3, 1.2, 1.3, 1.3, 1.1000000000000001, 1.3, 2.5, 1.8999999999999999, 2.1000000000000001, 1.8, 2.2000000000000002, 2.1000000000000001, 1.7, 1.8, 1.8, 2.5, 2.0, 1.8999999999999999, 2.1000000000000001, 2.0, 2.3999999999999999, 2.2999999999999998, 1.8, 2.2000000000000002, 2.2999999999999998, 1.5, 2.2999999999999998, 2.0, 2.0, 1.8, 2.1000000000000001, 1.8, 1.8, 1.8, 2.1000000000000001, 1.6000000000000001, 1.8999999999999999, 2.0, 2.2000000000000002, 1.5, 1.3999999999999999, 2.2999999999999998, 2.3999999999999999, 1.8, 1.8, 2.1000000000000001, 2.3999999999999999, 2.2999999999999998, 1.8999999999999999, 2.2999999999999998, 2.5, 2.2999999999999998, 1.8999999999999999, 2.0, 2.2999999999999998, 1.8], 'Petal.Length': [1.3999999999999999, 1.3999999999999999, 1.3, 1.5, 1.3999999999999999, 1.7, 1.3999999999999999, 1.5, 1.3999999999999999, 1.5, 1.5, 1.6000000000000001, 1.3999999999999999, 1.1000000000000001, 1.2, 1.5, 1.3, 1.3999999999999999, 1.7, 1.5, 1.7, 1.5, 1.0, 1.7, 1.8999999999999999, 1.6000000000000001, 1.6000000000000001, 1.5, 1.3999999999999999, 1.6000000000000001, 1.6000000000000001, 1.5, 1.5, 1.3999999999999999, 1.5, 1.2, 1.3, 1.3999999999999999, 1.3, 1.5, 1.3, 1.3, 1.3, 1.6000000000000001, 1.8999999999999999, 1.3999999999999999, 1.6000000000000001, 1.3999999999999999, 1.5, 1.3999999999999999, 4.7000000000000002, 4.5, 4.9000000000000004, 4.0, 4.5999999999999996, 4.5, 4.7000000000000002, 3.2999999999999998, 4.5999999999999996, 3.8999999999999999, 3.5, 4.2000000000000002, 4.0, 4.7000000000000002, 3.6000000000000001, 4.4000000000000004, 4.5, 4.0999999999999996, 4.5, 3.8999999999999999, 4.7999999999999998, 4.0, 4.9000000000000004, 4.7000000000000002, 4.2999999999999998, 4.4000000000000004, 4.7999999999999998, 5.0, 4.5, 3.5, 3.7999999999999998, 3.7000000000000002, 3.8999999999999999, 5.0999999999999996, 4.5, 4.5, 4.7000000000000002, 4.4000000000000004, 4.0999999999999996, 4.0, 4.4000000000000004, 4.5999999999999996, 4.0, 3.2999999999999998, 4.2000000000000002, 4.2000000000000002, 4.2000000000000002, 4.2999999999999998, 3.0, 4.0999999999999996, 6.0, 5.0999999999999996, 5.9000000000000004, 5.5999999999999996, 5.7999999999999998, 6.5999999999999996, 4.5, 6.2999999999999998, 5.7999999999999998, 6.0999999999999996, 5.0999999999999996, 5.2999999999999998, 5.5, 5.0, 5.0999999999999996, 5.2999999999999998, 5.5, 6.7000000000000002, 6.9000000000000004, 5.0, 5.7000000000000002, 4.9000000000000004, 6.7000000000000002, 4.9000000000000004, 5.7000000000000002, 6.0, 4.7999999999999998, 4.9000000000000004, 5.5999999999999996, 5.7999999999999998, 6.0999999999999996, 6.4000000000000004, 5.5999999999999996, 5.0999999999999996, 5.5999999999999996, 6.0999999999999996, 5.5999999999999996, 5.5, 4.7999999999999998, 5.4000000000000004, 5.5999999999999996, 5.0999999999999996, 5.0999999999999996, 5.9000000000000004, 5.7000000000000002, 5.2000000000000002, 5.0, 5.2000000000000002, 5.4000000000000004, 5.0999999999999996], 'Species': ['setosa', 'setosa', 'setosa', 'setosa', 'setosa', 'setosa', 'setosa', 'setosa', 'setosa', 'setosa', 'setosa', 'setosa', 'setosa', 'setosa', 'setosa', 'setosa', 'setosa', 'setosa', 'setosa', 'setosa', 'setosa', 'setosa', 'setosa', 'setosa', 'setosa', 'setosa', 'setosa', 'setosa', 'setosa', 'setosa', 'setosa', 'setosa', 'setosa', 'setosa', 'setosa', 'setosa', 'setosa', 'setosa', 'setosa', 'setosa', 'setosa', 'setosa', 'setosa', 'setosa', 'setosa', 'setosa', 'setosa', 'setosa', 'setosa', 'setosa', 'versicolor', 'versicolor', 'versicolor', 'versicolor', 'versicolor', 'versicolor', 'versicolor', 'versicolor', 'versicolor', 'versicolor', 'versicolor', 'versicolor', 'versicolor', 'versicolor', 'versicolor', 'versicolor', 'versicolor', 'versicolor', 'versicolor', 'versicolor', 'versicolor', 'versicolor', 'versicolor', 'versicolor', 'versicolor', 'versicolor', 'versicolor', 'versicolor', 'versicolor', 'versicolor', 'versicolor', 'versicolor', 'versicolor', 'versicolor', 'versicolor', 'versicolor', 'versicolor', 'versicolor', 'versicolor', 'versicolor', 'versicolor', 'versicolor', 'versicolor', 'versicolor', 'versicolor', 'versicolor', 'versicolor', 'versicolor', 'versicolor', 'versicolor', 'virginica', 'virginica', 'virginica', 'virginica', 'virginica', 'virginica', 'virginica', 'virginica', 'virginica', 'virginica', 'virginica', 'virginica', 'virginica', 'virginica', 'virginica', 'virginica', 'virginica', 'virginica', 'virginica', 'virginica', 'virginica', 'virginica', 'virginica', 'virginica', 'virginica', 'virginica', 'virginica', 'virginica', 'virginica', 'virginica', 'virginica', 'virginica', 'virginica', 'virginica', 'virginica', 'virginica', 'virginica', 'virginica', 'virginica', 'virginica', 'virginica', 'virginica', 'virginica', 'virginica', 'virginica', 'virginica', 'virginica', 'virginica', 'virginica', 'virginica'], 'Sepal.Width': [3.5, 3.0, 3.2000000000000002, 3.1000000000000001, 3.6000000000000001, 3.8999999999999999, 3.3999999999999999, 3.3999999999999999, 2.8999999999999999, 3.1000000000000001, 3.7000000000000002, 3.3999999999999999, 3.0, 3.0, 4.0, 4.4000000000000004, 3.8999999999999999, 3.5, 3.7999999999999998, 3.7999999999999998, 3.3999999999999999, 3.7000000000000002, 3.6000000000000001, 3.2999999999999998, 3.3999999999999999, 3.0, 3.3999999999999999, 3.5, 3.3999999999999999, 3.2000000000000002, 3.1000000000000001, 3.3999999999999999, 4.0999999999999996, 4.2000000000000002, 3.1000000000000001, 3.2000000000000002, 3.5, 3.6000000000000001, 3.0, 3.3999999999999999, 3.5, 2.2999999999999998, 3.2000000000000002, 3.5, 3.7999999999999998, 3.0, 3.7999999999999998, 3.2000000000000002, 3.7000000000000002, 3.2999999999999998, 3.2000000000000002, 3.2000000000000002, 3.1000000000000001, 2.2999999999999998, 2.7999999999999998, 2.7999999999999998, 3.2999999999999998, 2.3999999999999999, 2.8999999999999999, 2.7000000000000002, 2.0, 3.0, 2.2000000000000002, 2.8999999999999999, 2.8999999999999999, 3.1000000000000001, 3.0, 2.7000000000000002, 2.2000000000000002, 2.5, 3.2000000000000002, 2.7999999999999998, 2.5, 2.7999999999999998, 2.8999999999999999, 3.0, 2.7999999999999998, 3.0, 2.8999999999999999, 2.6000000000000001, 2.3999999999999999, 2.3999999999999999, 2.7000000000000002, 2.7000000000000002, 3.0, 3.3999999999999999, 3.1000000000000001, 2.2999999999999998, 3.0, 2.5, 2.6000000000000001, 3.0, 2.6000000000000001, 2.2999999999999998, 2.7000000000000002, 3.0, 2.8999999999999999, 2.8999999999999999, 2.5, 2.7999999999999998, 3.2999999999999998, 2.7000000000000002, 3.0, 2.8999999999999999, 3.0, 3.0, 2.5, 2.8999999999999999, 2.5, 3.6000000000000001, 3.2000000000000002, 2.7000000000000002, 3.0, 2.5, 2.7999999999999998, 3.2000000000000002, 3.0, 3.7999999999999998, 2.6000000000000001, 2.2000000000000002, 3.2000000000000002, 2.7999999999999998, 2.7999999999999998, 2.7000000000000002, 3.2999999999999998, 3.2000000000000002, 2.7999999999999998, 3.0, 2.7999999999999998, 3.0, 2.7999999999999998, 3.7999999999999998, 2.7999999999999998, 2.7999999999999998, 2.6000000000000001, 3.0, 3.3999999999999999, 3.1000000000000001, 3.0, 3.1000000000000001, 3.1000000000000001, 3.1000000000000001, 2.7000000000000002, 3.2000000000000002, 3.2999999999999998, 3.0, 2.5, 3.0, 3.3999999999999999, 3.0], 'Unnamed: 0': [1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 36, 37, 38, 39, 40, 41, 42, 43, 44, 45, 46, 47, 48, 49, 50, 51, 52, 53, 54, 55, 56, 57, 58, 59, 60, 61, 62, 63, 64, 65, 66, 67, 68, 69, 70, 71, 72, 73, 74, 75, 76, 77, 78, 79, 80, 81, 82, 83, 84, 85, 86, 87, 88, 89, 90, 91, 92, 93, 94, 95, 96, 97, 98, 99, 100, 101, 102, 103, 104, 105, 106, 107, 108, 109, 110, 111, 112, 113, 114, 115, 116, 117, 118, 119, 120, 121, 122, 123, 124, 125, 126, 127, 128, 129, 130, 131, 132, 133, 134, 135, 136, 137, 138, 139, 140, 141, 142, 143, 144, 145, 146, 147, 148, 149, 150]}'''df = pd.DataFrame(raw_data, columns = ['Unnamed: 0', 'Sepal.Length', 'Sepal.Width', 'Petal.Length', 'Petal.Width', 'Species']) # 假如你打算检查结果# 1\. 输入此单元格中上面单元格生成的代码raw_data = {'Petal.Width': [0.20000000000000001, 0.20000000000000001, 0.20000000000000001, 0.20000000000000001, 0.20000000000000001, 0.40000000000000002, 0.29999999999999999, 0.20000000000000001, 0.20000000000000001, 0.10000000000000001, 0.20000000000000001, 0.20000000000000001, 0.10000000000000001, 0.10000000000000001, 0.20000000000000001, 0.40000000000000002, 0.40000000000000002, 0.29999999999999999, 0.29999999999999999, 0.29999999999999999, 0.20000000000000001, 0.40000000000000002, 0.20000000000000001, 0.5, 0.20000000000000001, 0.20000000000000001, 0.40000000000000002, 0.20000000000000001, 0.20000000000000001, 0.20000000000000001, 0.20000000000000001, 0.40000000000000002, 0.10000000000000001, 0.20000000000000001, 0.20000000000000001, 0.20000000000000001, 0.20000000000000001, 0.10000000000000001, 0.20000000000000001, 0.20000000000000001, 0.29999999999999999, 0.29999999999999999, 0.20000000000000001, 0.59999999999999998, 0.40000000000000002, 0.29999999999999999, 0.20000000000000001, 0.20000000000000001, 0.20000000000000001, 0.20000000000000001, 1.3999999999999999, 1.5, 1.5, 1.3, 1.5, 1.3, 1.6000000000000001, 1.0, 1.3, 1.3999999999999999, 1.0, 1.5, 1.0, 1.3999999999999999, 1.3, 1.3999999999999999, 1.5, 1.0, 1.5, 1.1000000000000001, 1.8, 1.3, 1.5, 1.2, 1.3, 1.3999999999999999, 1.3999999999999999, 1.7, 1.5, 1.0, 1.1000000000000001, 1.0, 1.2, 1.6000000000000001, 1.5, 1.6000000000000001, 1.5, 1.3, 1.3, 1.3, 1.2, 1.3999999999999999, 1.2, 1.0, 1.3, 1.2, 1.3, 1.3, 1.1000000000000001, 1.3, 2.5, 1.8999999999999999, 2.1000000000000001, 1.8, 2.2000000000000002, 2.1000000000000001, 1.7, 1.8, 1.8, 2.5, 2.0, 1.8999999999999999, 2.1000000000000001, 2.0, 2.3999999999999999, 2.2999999999999998, 1.8, 2.2000000000000002, 2.2999999999999998, 1.5, 2.2999999999999998, 2.0, 2.0, 1.8, 2.1000000000000001, 1.8, 1.8, 1.8, 2.1000000000000001, 1.6000000000000001, 1.8999999999999999, 2.0, 2.2000000000000002, 1.5, 1.3999999999999999, 2.2999999999999998, 2.3999999999999999, 1.8, 1.8, 2.1000000000000001, 2.3999999999999999, 2.2999999999999998, 1.8999999999999999, 2.2999999999999998, 2.5, 2.2999999999999998, 1.8999999999999999, 2.0, 2.2999999999999998, 1.8], 'Sepal.Width': [3.5, 3.0, 3.2000000000000002, 3.1000000000000001, 3.6000000000000001, 3.8999999999999999, 3.3999999999999999, 3.3999999999999999, 2.8999999999999999, 3.1000000000000001, 3.7000000000000002, 3.3999999999999999, 3.0, 3.0, 4.0, 4.4000000000000004, 3.8999999999999999, 3.5, 3.7999999999999998, 3.7999999999999998, 3.3999999999999999, 3.7000000000000002, 3.6000000000000001, 3.2999999999999998, 3.3999999999999999, 3.0, 3.3999999999999999, 3.5, 3.3999999999999999, 3.2000000000000002, 3.1000000000000001, 3.3999999999999999, 4.0999999999999996, 4.2000000000000002, 3.1000000000000001, 3.2000000000000002, 3.5, 3.6000000000000001, 3.0, 3.3999999999999999, 3.5, 2.2999999999999998, 3.2000000000000002, 3.5, 3.7999999999999998, 3.0, 3.7999999999999998, 3.2000000000000002, 3.7000000000000002, 3.2999999999999998, 3.2000000000000002, 3.2000000000000002, 3.1000000000000001, 2.2999999999999998, 2.7999999999999998, 2.7999999999999998, 3.2999999999999998, 2.3999999999999999, 2.8999999999999999, 2.7000000000000002, 2.0, 3.0, 2.2000000000000002, 2.8999999999999999, 2.8999999999999999, 3.1000000000000001, 3.0, 2.7000000000000002, 2.2000000000000002, 2.5, 3.2000000000000002, 2.7999999999999998, 2.5, 2.7999999999999998, 2.8999999999999999, 3.0, 2.7999999999999998, 3.0, 2.8999999999999999, 2.6000000000000001, 2.3999999999999999, 2.3999999999999999, 2.7000000000000002, 2.7000000000000002, 3.0, 3.3999999999999999, 3.1000000000000001, 2.2999999999999998, 3.0, 2.5, 2.6000000000000001, 3.0, 2.6000000000000001, 2.2999999999999998, 2.7000000000000002, 3.0, 2.8999999999999999, 2.8999999999999999, 2.5, 2.7999999999999998, 3.2999999999999998, 2.7000000000000002, 3.0, 2.8999999999999999, 3.0, 3.0, 2.5, 2.8999999999999999, 2.5, 3.6000000000000001, 3.2000000000000002, 2.7000000000000002, 3.0, 2.5, 2.7999999999999998, 3.2000000000000002, 3.0, 3.7999999999999998, 2.6000000000000001, 2.2000000000000002, 3.2000000000000002, 2.7999999999999998, 2.7999999999999998, 2.7000000000000002, 3.2999999999999998, 3.2000000000000002, 2.7999999999999998, 3.0, 2.7999999999999998, 3.0, 2.7999999999999998, 3.7999999999999998, 2.7999999999999998, 2.7999999999999998, 2.6000000000000001, 3.0, 3.3999999999999999, 3.1000000000000001, 3.0, 3.1000000000000001, 3.1000000000000001, 3.1000000000000001, 2.7000000000000002, 3.2000000000000002, 3.2999999999999998, 3.0, 2.5, 3.0, 3.3999999999999999, 3.0], 'Species': ['setosa', 'setosa', 'setosa', 'setosa', 'setosa', 'setosa', 'setosa', 'setosa', 'setosa', 'setosa', 'setosa', 'setosa', 'setosa', 'setosa', 'setosa', 'setosa', 'setosa', 'setosa', 'setosa', 'setosa', 'setosa', 'setosa', 'setosa', 'setosa', 'setosa', 'setosa', 'setosa', 'setosa', 'setosa', 'setosa', 'setosa', 'setosa', 'setosa', 'setosa', 'setosa', 'setosa', 'setosa', 'setosa', 'setosa', 'setosa', 'setosa', 'setosa', 'setosa', 'setosa', 'setosa', 'setosa', 'setosa', 'setosa', 'setosa', 'setosa', 'versicolor', 'versicolor', 'versicolor', 'versicolor', 'versicolor', 'versicolor', 'versicolor', 'versicolor', 'versicolor', 'versicolor', 'versicolor', 'versicolor', 'versicolor', 'versicolor', 'versicolor', 'versicolor', 'versicolor', 'versicolor', 'versicolor', 'versicolor', 'versicolor', 'versicolor', 'versicolor', 'versicolor', 'versicolor', 'versicolor', 'versicolor', 'versicolor', 'versicolor', 'versicolor', 'versicolor', 'versicolor', 'versicolor', 'versicolor', 'versicolor', 'versicolor', 'versicolor', 'versicolor', 'versicolor', 'versicolor', 'versicolor', 'versicolor', 'versicolor', 'versicolor', 'versicolor', 'versicolor', 'versicolor', 'versicolor', 'versicolor', 'versicolor', 'virginica', 'virginica', 'virginica', 'virginica', 'virginica', 'virginica', 'virginica', 'virginica', 'virginica', 'virginica', 'virginica', 'virginica', 'virginica', 'virginica', 'virginica', 'virginica', 'virginica', 'virginica', 'virginica', 'virginica', 'virginica', 'virginica', 'virginica', 'virginica', 'virginica', 'virginica', 'virginica', 'virginica', 'virginica', 'virginica', 'virginica', 'virginica', 'virginica', 'virginica', 'virginica', 'virginica', 'virginica', 'virginica', 'virginica', 'virginica', 'virginica', 'virginica', 'virginica', 'virginica', 'virginica', 'virginica', 'virginica', 'virginica', 'virginica', 'virginica'], 'Unnamed: 0': [1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 36, 37, 38, 39, 40, 41, 42, 43, 44, 45, 46, 47, 48, 49, 50, 51, 52, 53, 54, 55, 56, 57, 58, 59, 60, 61, 62, 63, 64, 65, 66, 67, 68, 69, 70, 71, 72, 73, 74, 75, 76, 77, 78, 79, 80, 81, 82, 83, 84, 85, 86, 87, 88, 89, 90, 91, 92, 93, 94, 95, 96, 97, 98, 99, 100, 101, 102, 103, 104, 105, 106, 107, 108, 109, 110, 111, 112, 113, 114, 115, 116, 117, 118, 119, 120, 121, 122, 123, 124, 125, 126, 127, 128, 129, 130, 131, 132, 133, 134, 135, 136, 137, 138, 139, 140, 141, 142, 143, 144, 145, 146, 147, 148, 149, 150], 'Sepal.Length': [5.0999999999999996, 4.9000000000000004, 4.7000000000000002, 4.5999999999999996, 5.0, 5.4000000000000004, 4.5999999999999996, 5.0, 4.4000000000000004, 4.9000000000000004, 5.4000000000000004, 4.7999999999999998, 4.7999999999999998, 4.2999999999999998, 5.7999999999999998, 5.7000000000000002, 5.4000000000000004, 5.0999999999999996, 5.7000000000000002, 5.0999999999999996, 5.4000000000000004, 5.0999999999999996, 4.5999999999999996, 5.0999999999999996, 4.7999999999999998, 5.0, 5.0, 5.2000000000000002, 5.2000000000000002, 4.7000000000000002, 4.7999999999999998, 5.4000000000000004, 5.2000000000000002, 5.5, 4.9000000000000004, 5.0, 5.5, 4.9000000000000004, 4.4000000000000004, 5.0999999999999996, 5.0, 4.5, 4.4000000000000004, 5.0, 5.0999999999999996, 4.7999999999999998, 5.0999999999999996, 4.5999999999999996, 5.2999999999999998, 5.0, 7.0, 6.4000000000000004, 6.9000000000000004, 5.5, 6.5, 5.7000000000000002, 6.2999999999999998, 4.9000000000000004, 6.5999999999999996, 5.2000000000000002, 5.0, 5.9000000000000004, 6.0, 6.0999999999999996, 5.5999999999999996, 6.7000000000000002, 5.5999999999999996, 5.7999999999999998, 6.2000000000000002, 5.5999999999999996, 5.9000000000000004, 6.0999999999999996, 6.2999999999999998, 6.0999999999999996, 6.4000000000000004, 6.5999999999999996, 6.7999999999999998, 6.7000000000000002, 6.0, 5.7000000000000002, 5.5, 5.5, 5.7999999999999998, 6.0, 5.4000000000000004, 6.0, 6.7000000000000002, 6.2999999999999998, 5.5999999999999996, 5.5, 5.5, 6.0999999999999996, 5.7999999999999998, 5.0, 5.5999999999999996, 5.7000000000000002, 5.7000000000000002, 6.2000000000000002, 5.0999999999999996, 5.7000000000000002, 6.2999999999999998, 5.7999999999999998, 7.0999999999999996, 6.2999999999999998, 6.5, 7.5999999999999996, 4.9000000000000004, 7.2999999999999998, 6.7000000000000002, 7.2000000000000002, 6.5, 6.4000000000000004, 6.7999999999999998, 5.7000000000000002, 5.7999999999999998, 6.4000000000000004, 6.5, 7.7000000000000002, 7.7000000000000002, 6.0, 6.9000000000000004, 5.5999999999999996, 7.7000000000000002, 6.2999999999999998, 6.7000000000000002, 7.2000000000000002, 6.2000000000000002, 6.0999999999999996, 6.4000000000000004, 7.2000000000000002, 7.4000000000000004, 7.9000000000000004, 6.4000000000000004, 6.2999999999999998, 6.0999999999999996, 7.7000000000000002, 6.2999999999999998, 6.4000000000000004, 6.0, 6.9000000000000004, 6.7000000000000002, 6.9000000000000004, 5.7999999999999998, 6.7999999999999998, 6.7000000000000002, 6.7000000000000002, 6.2999999999999998, 6.5, 6.2000000000000002, 5.9000000000000004], 'Petal.Length': [1.3999999999999999, 1.3999999999999999, 1.3, 1.5, 1.3999999999999999, 1.7, 1.3999999999999999, 1.5, 1.3999999999999999, 1.5, 1.5, 1.6000000000000001, 1.3999999999999999, 1.1000000000000001, 1.2, 1.5, 1.3, 1.3999999999999999, 1.7, 1.5, 1.7, 1.5, 1.0, 1.7, 1.8999999999999999, 1.6000000000000001, 1.6000000000000001, 1.5, 1.3999999999999999, 1.6000000000000001, 1.6000000000000001, 1.5, 1.5, 1.3999999999999999, 1.5, 1.2, 1.3, 1.3999999999999999, 1.3, 1.5, 1.3, 1.3, 1.3, 1.6000000000000001, 1.8999999999999999, 1.3999999999999999, 1.6000000000000001, 1.3999999999999999, 1.5, 1.3999999999999999, 4.7000000000000002, 4.5, 4.9000000000000004, 4.0, 4.5999999999999996, 4.5, 4.7000000000000002, 3.2999999999999998, 4.5999999999999996, 3.8999999999999999, 3.5, 4.2000000000000002, 4.0, 4.7000000000000002, 3.6000000000000001, 4.4000000000000004, 4.5, 4.0999999999999996, 4.5, 3.8999999999999999, 4.7999999999999998, 4.0, 4.9000000000000004, 4.7000000000000002, 4.2999999999999998, 4.4000000000000004, 4.7999999999999998, 5.0, 4.5, 3.5, 3.7999999999999998, 3.7000000000000002, 3.8999999999999999, 5.0999999999999996, 4.5, 4.5, 4.7000000000000002, 4.4000000000000004, 4.0999999999999996, 4.0, 4.4000000000000004, 4.5999999999999996, 4.0, 3.2999999999999998, 4.2000000000000002, 4.2000000000000002, 4.2000000000000002, 4.2999999999999998, 3.0, 4.0999999999999996, 6.0, 5.0999999999999996, 5.9000000000000004, 5.5999999999999996, 5.7999999999999998, 6.5999999999999996, 4.5, 6.2999999999999998, 5.7999999999999998, 6.0999999999999996, 5.0999999999999996, 5.2999999999999998, 5.5, 5.0, 5.0999999999999996, 5.2999999999999998, 5.5, 6.7000000000000002, 6.9000000000000004, 5.0, 5.7000000000000002, 4.9000000000000004, 6.7000000000000002, 4.9000000000000004, 5.7000000000000002, 6.0, 4.7999999999999998, 4.9000000000000004, 5.5999999999999996, 5.7999999999999998, 6.0999999999999996, 6.4000000000000004, 5.5999999999999996, 5.0999999999999996, 5.5999999999999996, 6.0999999999999996, 5.5999999999999996, 5.5, 4.7999999999999998, 5.4000000000000004, 5.5999999999999996, 5.0999999999999996, 5.0999999999999996, 5.9000000000000004, 5.7000000000000002, 5.2000000000000002, 5.0, 5.2000000000000002, 5.4000000000000004, 5.0999999999999996]}df = pd.DataFrame(raw_data, columns = ['Unnamed: 0', 'Sepal.Length', 'Sepal.Width', 'Petal.Length', 'Petal.Width', 'Species'])# 查看原始数据帧的前几行df.head()| Unnamed: 0 | Sepal.Length | Sepal.Width | Petal.Length | Petal.Width | Species | |
|---|---|---|---|---|---|---|
| 0 | 1 | 5.1 | 3.5 | 1.4 | 0.2 | setosa |
| 1 | 2 | 4.9 | 3.0 | 1.4 | 0.2 | setosa |
| 2 | 3 | 4.7 | 3.2 | 1.3 | 0.2 | setosa |
| 3 | 4 | 4.6 | 3.1 | 1.5 | 0.2 | setosa |
| 4 | 5 | 5.0 | 3.6 | 1.4 | 0.2 | setosa |
# 查看使用我们的代码创立的,数据帧的前几行df_original.head()| Unnamed: 0 | Sepal.Length | Sepal.Width | Petal.Length | Petal.Width | Species | |
|---|---|---|---|---|---|---|
| 0 | 1 | 5.1 | 3.5 | 1.4 | 0.2 | setosa |
| 1 | 2 | 4.9 | 3.0 | 1.4 | 0.2 | setosa |
| 2 | 3 | 4.7 | 3.2 | 1.3 | 0.2 | setosa |
| 3 | 4 | 4.6 | 3.1 | 1.5 | 0.2 | setosa |
| 4 | 5 | 5.0 | 3.6 | 1.4 | 0.2 | setosa |
将分类变量转换为虚拟变量
# 导入板块import pandas as pd# 创立数据帧raw_data = {'first_name': ['Jason', 'Molly', 'Tina', 'Jake', 'Amy'], 'last_name': ['Miller', 'Jacobson', 'Ali', 'Milner', 'Cooze'], 'sex': ['male', 'female', 'male', 'female', 'female']}df = pd.DataFrame(raw_data, columns = ['first_name', 'last_name', 'sex'])df| first_name | last_name | sex | |
|---|---|---|---|
| 0 | Jason | Miller | male |
| 1 | Molly | Jacobson | female |
| 2 | Tina | Ali | male |
| 3 | Jake | Milner | female |
| 4 | Amy | Cooze | female |
# 从 sex 变量创立一组虚拟变量df_sex = pd.get_dummies(df['sex'])# 将虚拟变量连接到主数据帧df_new = pd.concat([df, df_sex], axis=1)df_new| first_name | last_name | sex | female | male | |
|---|---|---|---|---|---|
| 0 | Jason | Miller | male | 0.0 | 1.0 |
| 1 | Molly | Jacobson | female | 1.0 | 0.0 |
| 2 | Tina | Ali | male | 0.0 | 1.0 |
| 3 | Jake | Milner | female | 1.0 | 0.0 |
| 4 | Amy | Cooze | female | 1.0 | 0.0 |
# 连接新列的替代方案df_new = df.join(df_sex)df_new| first_name | last_name | sex | female | male | |
|---|---|---|---|---|---|
| 0 | Jason | Miller | male | 0.0 | 1.0 |
| 1 | Molly | Jacobson | female | 1.0 | 0.0 |
| 2 | Tina | Ali | male | 0.0 | 1.0 |
| 3 | Jake | Milner | female | 1.0 | 0.0 |
| 4 | Amy | Cooze | female | 1.0 | 0.0 |
将分类变量转换为虚拟变量
# 导入板块import pandas as pdimport patsy# 创立数据帧raw_data = {'countrycode': [1, 2, 3, 2, 1]} df = pd.DataFrame(raw_data, columns = ['countrycode'])df| countrycode | |
|---|---|
| 0 | 1 |
| 1 | 2 |
| 2 | 3 |
| 3 | 2 |
| 4 | 1 |
# 将 countrycode 变量转换为三个二元变量patsy.dmatrix('C(countrycode)-1', df, return_type='dataframe')| C(countrycode)[1] | C(countrycode)[2] | C(countrycode)[3] | |
|---|---|---|---|
| 0 | 1.0 | 0.0 | 0.0 |
| 1 | 0.0 | 1.0 | 0.0 |
| 2 | 0.0 | 0.0 | 1.0 |
| 3 | 0.0 | 1.0 | 0.0 |
| 4 | 1.0 | 0.0 | 0.0 |
将字符串分类变量转换为数字变量
# 导入板块import pandas as pdraw_data = {'patient': [1, 1, 1, 2, 2], 'obs': [1, 2, 3, 1, 2], 'treatment': [0, 1, 0, 1, 0], 'score': ['strong', 'weak', 'normal', 'weak', 'strong']} df = pd.DataFrame(raw_data, columns = ['patient', 'obs', 'treatment', 'score'])df| patient | obs | treatment | score | |
|---|---|---|---|---|
| 0 | 1 | 1 | 0 | strong |
| 1 | 1 | 2 | 1 | weak |
| 2 | 1 | 3 | 0 | normal |
| 3 | 2 | 1 | 1 | weak |
| 4 | 2 | 2 | 0 | strong |
# 创立一个函数,将 df['score'] 的所有值转换为数字def score_to_numeric(x): if x=='strong': return 3 if x=='normal': return 2 if x=='weak': return 1df['score_num'] = df['score'].apply(score_to_numeric)df| patient | obs | treatment | score | score_num | |
|---|---|---|---|---|---|
| 0 | 1 | 1 | 0 | strong | 3 |
| 1 | 1 | 2 | 1 | weak | 1 |
| 2 | 1 | 3 | 0 | normal | 2 |
| 3 | 2 | 1 | 1 | weak | 1 |
| 4 | 2 | 2 | 0 | strong | 3 |
说明
1. 本站所有资源来源于用户上传和网络,如有侵权请邮件联系站长!
2. 分享目的仅供大家学习和交流,您必须在下载后24小时内删除!
3. 不得使用于非法商业用途,不得违反国家法律。否则后果自负!
4. 本站提供的源码、模板、插件等等其他资源,都不包含技术服务请大家谅解!
5. 如有链接无法下载、失效或广告,请联系管理员处理!
6. 本站资源售价只是摆设,本站源码仅提供给会员学习使用!
7. 如遇到加密压缩包,请使用360解压,如遇到无法解压的请联系管理员
开心源码网 » 数据科学和人工智能技术笔记 十九、数据整理(1)
1. 本站所有资源来源于用户上传和网络,如有侵权请邮件联系站长!
2. 分享目的仅供大家学习和交流,您必须在下载后24小时内删除!
3. 不得使用于非法商业用途,不得违反国家法律。否则后果自负!
4. 本站提供的源码、模板、插件等等其他资源,都不包含技术服务请大家谅解!
5. 如有链接无法下载、失效或广告,请联系管理员处理!
6. 本站资源售价只是摆设,本站源码仅提供给会员学习使用!
7. 如遇到加密压缩包,请使用360解压,如遇到无法解压的请联系管理员
开心源码网 » 数据科学和人工智能技术笔记 十九、数据整理(1)