Skip to content

Problem no. 3, 5, 6 and 7 in Exercise-4 #37

@saurabhp75

Description

@saurabhp75

The solutions for problem no. 3, 5, 6 and 7 in Exercise-4 appear to be missing plotting data for the years with no roles for actresses(viz. 1900, 1905, 1907, 1909).
Can be verified by plotting a subset (using head() with plot()).
Can be fixed (See below) by using "fillna(0)" while 'unstacking' the series to df.
Surprisingly, the area plot (kind = 'area'), used in problem no.4, does not get affected by NaNs.

# Plot the number of actor roles each year
# and the number of actress roles each year
# over the history of film.

c = cast 
c = c.groupby(['year', 'type']).size()
#c = c.unstack('type')              # Causing missing data in plot for NaNs
c = c.unstack('type').fillna(0)     # No missing data 
c.plot()                            # Verify by c.head(10).plot()




# Plot the difference between the number of actor roles each year
# and the number of actress roles each year over the history of film.

c = cast
c = c.groupby(['year', 'type']).size()
#c = c.unstack('type')            # Missing data 
c = c.unstack('type').fillna(0)   # No missing data  
(c.actor - c.actress).plot()



# Plot the fraction of roles that have been 'actor' roles
# each year in the history of film.
c = cast
c = c.groupby(['year', 'type']).size()
#c = c.unstack('type')            # Missing data 
c = c.unstack('type').fillna(0)   # No missing data
c1 = c.head(100)
(c1.actor / (c1.actor + c1.actress)).plot(ylim=[0,1])



# Plot the fraction of supporting (n=2) roles
# that have been 'actor' roles
# each year in the history of film.

c = cast
c = c[c.n == 2]
c = c.groupby(['year', 'type']).size()
#c = c.unstack('type')            # Missing data 
c = c.unstack('type').fillna(0) # No missing data
(c.actor / (c.actor + c.actress)).plot(ylim=[0,1]) 

Metadata

Metadata

Labels

No labels
No labels

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions