Skip to content

Latest commit

 

History

History
113 lines (79 loc) · 2.51 KB

File metadata and controls

113 lines (79 loc) · 2.51 KB

Learning Objectives

  • How to combine different data-frame
  • Learn how to reshape a data-frame

Combine Dataset in Pandas

  • Create two data-frames that have common columns
import pandas as pd

adf = pd.DataFrame(data={'x1': ['A', 'B', 'C'], 'x2' : [1, 2, 3]})

bdf = pd.DataFrame(data={'x1': ['A', 'B', 'D'], 'x3' : ['T', 'F', 'T']})

Merge adf and bdf based on adf

pd.merge(adf, bdf, how='left')

Merge adf and bdf based on bdf

pd.merge(adf, bdf, how= 'right')

Merge adf and bdf based on what they have in common

pd.merge(adf, bdf, how= 'inner')
# Another way to do the above Task
pd.merge(adf, bdf)

Union of two DataFrames

pd.merge(adf, bdf, how= 'outer')

Appending Pandas DataFrames

# Append bdf after bdf (row-wise concatenation)
pd.concat([adf, bdf], sort=True)

# Column-wise concatination
pd.concat([adf, bdf], axis=1)
cdf = pd.DataFrame(data={'x1': ['A', 'B', 'C'], 'x3' : ['T', 'F', 'T']})
# cdf
pd.concat([adf, cdf], axis=1)

pd.concat([adf, cdf], sort=False)

# Correct the indexing after concatination
df = pd.concat([adf, cdf], ignore_index=True)
df

pd.merge(adf, cdf, how='left')

When two data-frames have exactly the same columns

df = pd.DataFrame([[1, 2], [3, 4]], columns = ['a','b'])
df2 = pd.DataFrame([[5, 6], [7, 8]], columns = ['a','b'])

df = df.append(df2, ignore_index=True)

Slicing data-frame based on largest value for an specific column

import numpy as np

df = pd.DataFrame({'a': [1, 10, 8, 11, -1],'b': list('abdce'), 'c': [1.0, 2.0, np.nan, 3.0, 4.0]})
df.nlargest(3, 'a')

df

df['c'].isnull().sum()

df.isnull().sum()

Reshape Pandas data-frame with Melt

data = {'weekday': ["Monday", "Tuesday", "Wednesday",
         "Thursday", "Friday", "Saturday", "Sunday"],
        'Person 1': [12, 6, 5, 8, 11, 6, 4],
        'Person 2': [10, 6, 11, 5, 8, 9, 12],
        'Person 3': [8, 5, 7, 3, 7, 11, 15]}
df = pd.DataFrame(data, columns=['weekday',
        'Person 1', 'Person 2', 'Person 3'])

# Reshape Pandas Data With Melt
melted = pd.melt(df, id_vars=["weekday"],
                 var_name="Person", value_name="Score")

melted                         

More Exercise