- The data table is a Pandas dataframe
- The data column is a Pandas series
- We can obtain information of the dataset:
- 10 top rows:
df.head(10) - Number of rows & variables:
df.shape - Names of variables:
df.columns - Types of variables:
df.info() - Stats of numerical variables:
df.describe() - Stats of numerical variables:
df.describe(include=['object', 'bool'])
- 10 top rows:
- Information of numerical variables:
- Minimum
df.num_var.min() - Maximun
df.num_var.max() - Mean
df.num_var.mean() - Median
df.num_var.median() - Std
df.variable.std()
- Minimum
- Information of categorial variables:
- Unique values (count):
df.cat_var.value_counts() - Unique values (percen):
df.cat_var.value_counts(normalize=True) - 2 categorial vars (count):
pd.crosstab(df.cat_var1, df.cat_var2, margins=True) - 2 categorial vars (percen):
pd.crosstab(df.cat_var1, df.cat_var2, margins=True, normalize=True)
- Unique values (count):
- Information of numerical & categorial variables:
- Pivot table:
df.pivot_table([num_var1, num_var2, ...], [cat_var1, cat_var2], aggfunc='mean')
- Pivot table:
- Fitering rows:
- One filter:
df[ condition ] - Multiple filters:
df[ (condition1) & (condition2)]
- One filter:
01. Python y Pandas
Folders and files
| Name | Name | Last commit date | ||
|---|---|---|---|---|
parent directory.. | ||||