创建DataFrame结构数据:
df=DataFrame(data=np.random.randint(0,150,size=(100,50)),index=np.arange(100,200),columns=['Python','En','Math','Physics','Chen'])
对df中的空数据进行分析:
df.isnull().any()
df.notnull().all()
df.isnull().sum()
df.isnull().sum().sum()
将表格中的一些数据制空,然后在做相应的处理:
for i in range(50):
index=np.random.randint(100,200,size=1)[0]
cols=df.columns
col=np.random.choice(cols)
df.loc[index,col]=np.NAN
对空数据进行填充:
df2.fillna(value=0)
df3=df2.fillna(value=df2.mean())
df3.astype(np.int16)
df4=df2.fillna(df2.median())
zhongshu=[]
for col in df.columns:
zhongshu.append(df[col].value_counts().index[0])
s=Series(data=zhongshu,index=df.columns)
"""
method : {'backfill', 'bfill', 'pad', 'ffill', None}, default None
Method to use for filling holes in reindexed Series
pad / ffill: propagate last valid observation forward to next valid
backfill / bfill: use NEXT valid observation to fill gap
"""
df3.fillna(method='backfill')
df3.fillna(method='pad',axis=1)
对空数据进行删除:
df4=df.dropna()