Analysing the most popular film locations in San Francisco. Part 2
Continuing with our project, we will solve some messy aspect in our film_locations dataset
import pandas as pd
df = pd.read_csv('./Film_Locations_in_San_Francisco.csv', nrows=20)
df
| Title | Release Year | Locations | Fun Facts | Production Company | Distributor | Director | Writer | Actor 1 | Actor 2 | Actor 3 | |
|---|---|---|---|---|---|---|---|---|---|---|---|
| 0 | 180 | 2011 | Epic Roasthouse (399 Embarcadero) | NaN | SPI Cinemas | NaN | Jayendra | Umarji Anuradha, Jayendra, Aarthi Sriram, & Suba | Siddarth | Nithya Menon | Priya Anand |
| 1 | 180 | 2011 | Mason & California Streets (Nob Hill) | NaN | SPI Cinemas | NaN | Jayendra | Umarji Anuradha, Jayendra, Aarthi Sriram, & Suba | Siddarth | Nithya Menon | Priya Anand |
| 2 | 180 | 2011 | Justin Herman Plaza | NaN | SPI Cinemas | NaN | Jayendra | Umarji Anuradha, Jayendra, Aarthi Sriram, & Suba | Siddarth | Nithya Menon | Priya Anand |
| 3 | 180 | 2011 | 200 block Market Street | NaN | SPI Cinemas | NaN | Jayendra | Umarji Anuradha, Jayendra, Aarthi Sriram, & Suba | Siddarth | Nithya Menon | Priya Anand |
| 4 | 180 | 2011 | City Hall | NaN | SPI Cinemas | NaN | Jayendra | Umarji Anuradha, Jayendra, Aarthi Sriram, & Suba | Siddarth | Nithya Menon | Priya Anand |
| 5 | 180 | 2011 | Polk & Larkin Streets | NaN | SPI Cinemas | NaN | Jayendra | Umarji Anuradha, Jayendra, Aarthi Sriram, & Suba | Siddarth | Nithya Menon | Priya Anand |
| 6 | 180 | 2011 | Randall Museum | NaN | SPI Cinemas | NaN | Jayendra | Umarji Anuradha, Jayendra, Aarthi Sriram, & Suba | Siddarth | Nithya Menon | Priya Anand |
| 7 | 180 | 2011 | 555 Market St. | NaN | SPI Cinemas | NaN | Jayendra | Umarji Anuradha, Jayendra, Aarthi Sriram, & Suba | Siddarth | Nithya Menon | Priya Anand |
| 8 | 24 Hours on Craigslist | 2005 | NaN | NaN | Yerba Buena Productions | Zealot Pictures | Michael Ferris Gibson | NaN | Craig Newmark | NaN | NaN |
| 9 | A Night Full of Rain | 1978 | Embarcadero Freeway | Embarcadero Freeway, which was featured in the... | Liberty Film | Warner Bros. Pictures | Lina Wertmuller | Lina Wertmuller | Candice Bergen | Giancarlo Gianni | NaN |
| 10 | A Night Full of Rain | 1978 | Fairmont Hotel (950 Mason Street, Nob Hill) | In 1945 the Fairmont hosted the United Nations... | Liberty Film | Warner Bros. Pictures | Lina Wertmuller | Lina Wertmuller | Candice Bergen | Giancarlo Gianni | NaN |
| 11 | A Night Full of Rain | 1978 | San Francisco Chronicle (901 Mission Street at... | The San Francisco Zodiac Killer of the late 19... | Liberty Film | Warner Bros. Pictures | Lina Wertmuller | Lina Wertmuller | Candice Bergen | Giancarlo Gianni | NaN |
| 12 | A Night Full of Rain | 1978 | Broadway (North Beach) | NaN | Liberty Film | Warner Bros. Pictures | Lina Wertmuller | Lina Wertmuller | Candice Bergen | Giancarlo Gianni | NaN |
| 13 | About a Boy | 2014 | Broderick from Fulton to McAlister | NaN | NBC Studios | National Broadcasting Company | Mark J. Kunerth | Jason Katims | David Walton | Minnie Driver | NaN |
| 14 | About a Boy | 2014 | Crissy Field | NaN | NBC Studios | National Broadcasting Company | Mark J. Kunerth | Jason Katims | David Walton | Minnie Driver | NaN |
| 15 | About a Boy | 2014 | Powell from Bush and Sutter | NaN | NBC Studios | National Broadcasting Company | Mark J. Kunerth | Jason Katims | David Walton | Minnie Driver | NaN |
| 16 | Age of Adaline | 2015 | Pier 50- end of the pier | NaN | Lionsgate / Sidney Kimmel Entertainment / Lake... | NaN | Lee Toland Krieger | J. Mills Goodloe | Blake Lively | Harrison Ford | Ellen Burstyn |
| 17 | Age of Adaline | 2015 | California @ Montgomery | NaN | Lionsgate / Sidney Kimmel Entertainment / Lake... | NaN | Lee Toland Krieger | J. Mills Goodloe | Blake Lively | Harrison Ford | Ellen Burstyn |
| 18 | Age of Adaline | 2015 | Montgomery/Green | NaN | Lionsgate / Sidney Kimmel Entertainment / Lake... | NaN | Lee Toland Krieger | J. Mills Goodloe | Blake Lively | Harrison Ford | Ellen Burstyn |
| 19 | Age of Adaline | 2015 | Driving various SF Streets | NaN | Lionsgate / Sidney Kimmel Entertainment / Lake... | NaN | Lee Toland Krieger | J. Mills Goodloe | Blake Lively | Harrison Ford | Ellen Burstyn |
Before continuing, we need to fix our first problem: drop the rows that contain NAN(missing values). To solve it, first, we will verify how many columns contains missing values:
print("Number of columns containing null values")
print(len(df.columns[df.isna().any()]))
print("Number of columns not containing null values")
print(len(df.columns[df.notna().all()]))
print("Total number of columns in the dataframe")
print(len(df.columns))
Number of columns containing null values 6 Number of columns not containing null values 5 Total number of columns in the dataframe 11
Our dataframe it contained 11 columns, of which 6 contained at least one null value.
We will automatically remove columns and rows depending on which has more null values:
df = df.drop(df.columns[df.isna().sum() > len(df.columns)],axis = 1)
df = df.dropna(axis = 0).reset_index(drop=True)
df
| Title | Release Year | Locations | Production Company | Director | Writer | Actor 1 | Actor 2 | Actor 3 | |
|---|---|---|---|---|---|---|---|---|---|
| 0 | 180 | 2011 | Epic Roasthouse (399 Embarcadero) | SPI Cinemas | Jayendra | Umarji Anuradha, Jayendra, Aarthi Sriram, & Suba | Siddarth | Nithya Menon | Priya Anand |
| 1 | 180 | 2011 | Mason & California Streets (Nob Hill) | SPI Cinemas | Jayendra | Umarji Anuradha, Jayendra, Aarthi Sriram, & Suba | Siddarth | Nithya Menon | Priya Anand |
| 2 | 180 | 2011 | Justin Herman Plaza | SPI Cinemas | Jayendra | Umarji Anuradha, Jayendra, Aarthi Sriram, & Suba | Siddarth | Nithya Menon | Priya Anand |
| 3 | 180 | 2011 | 200 block Market Street | SPI Cinemas | Jayendra | Umarji Anuradha, Jayendra, Aarthi Sriram, & Suba | Siddarth | Nithya Menon | Priya Anand |
| 4 | 180 | 2011 | City Hall | SPI Cinemas | Jayendra | Umarji Anuradha, Jayendra, Aarthi Sriram, & Suba | Siddarth | Nithya Menon | Priya Anand |
| 5 | 180 | 2011 | Polk & Larkin Streets | SPI Cinemas | Jayendra | Umarji Anuradha, Jayendra, Aarthi Sriram, & Suba | Siddarth | Nithya Menon | Priya Anand |
| 6 | 180 | 2011 | Randall Museum | SPI Cinemas | Jayendra | Umarji Anuradha, Jayendra, Aarthi Sriram, & Suba | Siddarth | Nithya Menon | Priya Anand |
| 7 | 180 | 2011 | 555 Market St. | SPI Cinemas | Jayendra | Umarji Anuradha, Jayendra, Aarthi Sriram, & Suba | Siddarth | Nithya Menon | Priya Anand |
| 8 | Age of Adaline | 2015 | Pier 50- end of the pier | Lionsgate / Sidney Kimmel Entertainment / Lake... | Lee Toland Krieger | J. Mills Goodloe | Blake Lively | Harrison Ford | Ellen Burstyn |
| 9 | Age of Adaline | 2015 | California @ Montgomery | Lionsgate / Sidney Kimmel Entertainment / Lake... | Lee Toland Krieger | J. Mills Goodloe | Blake Lively | Harrison Ford | Ellen Burstyn |
| 10 | Age of Adaline | 2015 | Montgomery/Green | Lionsgate / Sidney Kimmel Entertainment / Lake... | Lee Toland Krieger | J. Mills Goodloe | Blake Lively | Harrison Ford | Ellen Burstyn |
| 11 | Age of Adaline | 2015 | Driving various SF Streets | Lionsgate / Sidney Kimmel Entertainment / Lake... | Lee Toland Krieger | J. Mills Goodloe | Blake Lively | Harrison Ford | Ellen Burstyn |
The next problem with the dataframe is that the columns Actor 1, Actor 2, and Actor 3, are unnecesary, and can be reduce it to one variable. After melt the data, and keeping the most part of columns intact, we will rename the variable as Actors, and its value as Actor_Name.
#Melting the data:
df_long = pd.melt(df, id_vars= ['Title', 'Release Year', 'Locations',
'Production Company', 'Director', 'Writer'],
var_name = 'Actors',
value_name = 'Actor_Name')
df_long
| Title | Release Year | Locations | Production Company | Director | Writer | Actors | Actor_Name | |
|---|---|---|---|---|---|---|---|---|
| 0 | 180 | 2011 | Epic Roasthouse (399 Embarcadero) | SPI Cinemas | Jayendra | Umarji Anuradha, Jayendra, Aarthi Sriram, & Suba | Actor 1 | Siddarth |
| 1 | 180 | 2011 | Mason & California Streets (Nob Hill) | SPI Cinemas | Jayendra | Umarji Anuradha, Jayendra, Aarthi Sriram, & Suba | Actor 1 | Siddarth |
| 2 | 180 | 2011 | Justin Herman Plaza | SPI Cinemas | Jayendra | Umarji Anuradha, Jayendra, Aarthi Sriram, & Suba | Actor 1 | Siddarth |
| 3 | 180 | 2011 | 200 block Market Street | SPI Cinemas | Jayendra | Umarji Anuradha, Jayendra, Aarthi Sriram, & Suba | Actor 1 | Siddarth |
| 4 | 180 | 2011 | City Hall | SPI Cinemas | Jayendra | Umarji Anuradha, Jayendra, Aarthi Sriram, & Suba | Actor 1 | Siddarth |
| 5 | 180 | 2011 | Polk & Larkin Streets | SPI Cinemas | Jayendra | Umarji Anuradha, Jayendra, Aarthi Sriram, & Suba | Actor 1 | Siddarth |
| 6 | 180 | 2011 | Randall Museum | SPI Cinemas | Jayendra | Umarji Anuradha, Jayendra, Aarthi Sriram, & Suba | Actor 1 | Siddarth |
| 7 | 180 | 2011 | 555 Market St. | SPI Cinemas | Jayendra | Umarji Anuradha, Jayendra, Aarthi Sriram, & Suba | Actor 1 | Siddarth |
| 8 | Age of Adaline | 2015 | Pier 50- end of the pier | Lionsgate / Sidney Kimmel Entertainment / Lake... | Lee Toland Krieger | J. Mills Goodloe | Actor 1 | Blake Lively |
| 9 | Age of Adaline | 2015 | California @ Montgomery | Lionsgate / Sidney Kimmel Entertainment / Lake... | Lee Toland Krieger | J. Mills Goodloe | Actor 1 | Blake Lively |
| 10 | Age of Adaline | 2015 | Montgomery/Green | Lionsgate / Sidney Kimmel Entertainment / Lake... | Lee Toland Krieger | J. Mills Goodloe | Actor 1 | Blake Lively |
| 11 | Age of Adaline | 2015 | Driving various SF Streets | Lionsgate / Sidney Kimmel Entertainment / Lake... | Lee Toland Krieger | J. Mills Goodloe | Actor 1 | Blake Lively |
| 12 | 180 | 2011 | Epic Roasthouse (399 Embarcadero) | SPI Cinemas | Jayendra | Umarji Anuradha, Jayendra, Aarthi Sriram, & Suba | Actor 2 | Nithya Menon |
| 13 | 180 | 2011 | Mason & California Streets (Nob Hill) | SPI Cinemas | Jayendra | Umarji Anuradha, Jayendra, Aarthi Sriram, & Suba | Actor 2 | Nithya Menon |
| 14 | 180 | 2011 | Justin Herman Plaza | SPI Cinemas | Jayendra | Umarji Anuradha, Jayendra, Aarthi Sriram, & Suba | Actor 2 | Nithya Menon |
| 15 | 180 | 2011 | 200 block Market Street | SPI Cinemas | Jayendra | Umarji Anuradha, Jayendra, Aarthi Sriram, & Suba | Actor 2 | Nithya Menon |
| 16 | 180 | 2011 | City Hall | SPI Cinemas | Jayendra | Umarji Anuradha, Jayendra, Aarthi Sriram, & Suba | Actor 2 | Nithya Menon |
| 17 | 180 | 2011 | Polk & Larkin Streets | SPI Cinemas | Jayendra | Umarji Anuradha, Jayendra, Aarthi Sriram, & Suba | Actor 2 | Nithya Menon |
| 18 | 180 | 2011 | Randall Museum | SPI Cinemas | Jayendra | Umarji Anuradha, Jayendra, Aarthi Sriram, & Suba | Actor 2 | Nithya Menon |
| 19 | 180 | 2011 | 555 Market St. | SPI Cinemas | Jayendra | Umarji Anuradha, Jayendra, Aarthi Sriram, & Suba | Actor 2 | Nithya Menon |
| 20 | Age of Adaline | 2015 | Pier 50- end of the pier | Lionsgate / Sidney Kimmel Entertainment / Lake... | Lee Toland Krieger | J. Mills Goodloe | Actor 2 | Harrison Ford |
| 21 | Age of Adaline | 2015 | California @ Montgomery | Lionsgate / Sidney Kimmel Entertainment / Lake... | Lee Toland Krieger | J. Mills Goodloe | Actor 2 | Harrison Ford |
| 22 | Age of Adaline | 2015 | Montgomery/Green | Lionsgate / Sidney Kimmel Entertainment / Lake... | Lee Toland Krieger | J. Mills Goodloe | Actor 2 | Harrison Ford |
| 23 | Age of Adaline | 2015 | Driving various SF Streets | Lionsgate / Sidney Kimmel Entertainment / Lake... | Lee Toland Krieger | J. Mills Goodloe | Actor 2 | Harrison Ford |
| 24 | 180 | 2011 | Epic Roasthouse (399 Embarcadero) | SPI Cinemas | Jayendra | Umarji Anuradha, Jayendra, Aarthi Sriram, & Suba | Actor 3 | Priya Anand |
| 25 | 180 | 2011 | Mason & California Streets (Nob Hill) | SPI Cinemas | Jayendra | Umarji Anuradha, Jayendra, Aarthi Sriram, & Suba | Actor 3 | Priya Anand |
| 26 | 180 | 2011 | Justin Herman Plaza | SPI Cinemas | Jayendra | Umarji Anuradha, Jayendra, Aarthi Sriram, & Suba | Actor 3 | Priya Anand |
| 27 | 180 | 2011 | 200 block Market Street | SPI Cinemas | Jayendra | Umarji Anuradha, Jayendra, Aarthi Sriram, & Suba | Actor 3 | Priya Anand |
| 28 | 180 | 2011 | City Hall | SPI Cinemas | Jayendra | Umarji Anuradha, Jayendra, Aarthi Sriram, & Suba | Actor 3 | Priya Anand |
| 29 | 180 | 2011 | Polk & Larkin Streets | SPI Cinemas | Jayendra | Umarji Anuradha, Jayendra, Aarthi Sriram, & Suba | Actor 3 | Priya Anand |
| 30 | 180 | 2011 | Randall Museum | SPI Cinemas | Jayendra | Umarji Anuradha, Jayendra, Aarthi Sriram, & Suba | Actor 3 | Priya Anand |
| 31 | 180 | 2011 | 555 Market St. | SPI Cinemas | Jayendra | Umarji Anuradha, Jayendra, Aarthi Sriram, & Suba | Actor 3 | Priya Anand |
| 32 | Age of Adaline | 2015 | Pier 50- end of the pier | Lionsgate / Sidney Kimmel Entertainment / Lake... | Lee Toland Krieger | J. Mills Goodloe | Actor 3 | Ellen Burstyn |
| 33 | Age of Adaline | 2015 | California @ Montgomery | Lionsgate / Sidney Kimmel Entertainment / Lake... | Lee Toland Krieger | J. Mills Goodloe | Actor 3 | Ellen Burstyn |
| 34 | Age of Adaline | 2015 | Montgomery/Green | Lionsgate / Sidney Kimmel Entertainment / Lake... | Lee Toland Krieger | J. Mills Goodloe | Actor 3 | Ellen Burstyn |
| 35 | Age of Adaline | 2015 | Driving various SF Streets | Lionsgate / Sidney Kimmel Entertainment / Lake... | Lee Toland Krieger | J. Mills Goodloe | Actor 3 | Ellen Burstyn |
Implementing melt is very useful for modeling proposes. For example, before transfering data to a database, we can use this technique to organize more the data.
df.shape
(12, 9)
df_long.shape
(36, 8)
The memory usage has been reduce, after implementing melt.
df.info()
<class 'pandas.core.frame.DataFrame'> RangeIndex: 12 entries, 0 to 11 Data columns (total 9 columns): # Column Non-Null Count Dtype --- ------ -------------- ----- 0 Title 12 non-null object 1 Release Year 12 non-null int64 2 Locations 12 non-null object 3 Production Company 12 non-null object 4 Director 12 non-null object 5 Writer 12 non-null object 6 Actor 1 12 non-null object 7 Actor 2 12 non-null object 8 Actor 3 12 non-null object dtypes: int64(1), object(8) memory usage: 992.0+ bytes
df_long.info()
<class 'pandas.core.frame.DataFrame'> RangeIndex: 36 entries, 0 to 35 Data columns (total 8 columns): # Column Non-Null Count Dtype --- ------ -------------- ----- 0 Title 36 non-null object 1 Release Year 36 non-null int64 2 Locations 36 non-null object 3 Production Company 36 non-null object 4 Director 36 non-null object 5 Writer 36 non-null object 6 Actors 36 non-null object 7 Actor_Name 36 non-null object dtypes: int64(1), object(7) memory usage: 2.4+ KB
Finally, we will save our dataframe into a csv file:
df_long.to_csv('df_long.csv', index=False)
- Get link
- X
- Other Apps
Comments
Post a Comment