apply function to rows with variable and expand multiple return variables to columns
dfnew = df.apply(function_name, args=(arg1, arg2),
result_type='expand', axis=1)
def function_name(row, arg1, arg2):
do stuff with row.column_name
return [out1, out2]
ploty in pandas
import cufflinks as cf
cf.go_offline()
df.iplot(kind='scatter')
rename columns
df2 = df.rename(columns={'int_col' : 'some_other_name'})
df2.rename(columns={'some_other_name' : 'int_col'}, inplace = True)
concatenate dataframes
result = pd.concat([df1, df2, df3, df4])
deal with date string / datetime
pd.to_datetime(z.datestring, unit='ms')
pd.to_datetime(unix_time_stamp, unit='ms')
Index issues
pd.reindex()
pd.reset_index()
missing values
df2.dropna()
df3['float_col'].fillna(mean)
delete a row in pandas
#delete row where col_name = ‘some val’
df1 = df1[df.col_name != ‘some val’]
df1 = df1.query(df.col_name != ‘some val’)
pivot tables / plotting after groupby
reset_index
enumerate groupby
for k, gdf in df.groupby('col_name'):
count the unique number of values in column (or set of columns)
df.value_counts(subset=None, normalize=False, sort=True, ascending=False)
Subset -> group of columns
Normalize -> give % instead of counts
unique number of values in row or col
df.nunique(axis)
axis=1 ->columns