Exploring the dataset:

import pandas

df = pandas.read_csv('./Film_Locations_in_San_Francisco.csv')
df.head()

df.columns

Index(['Title', 'Release Year', 'Locations', 'Fun Facts', 'Production Company',
       'Distributor', 'Director', 'Writer', 'Actor 1', 'Actor 2', 'Actor 3'],
      dtype='object')

df.index

RangeIndex(start=0, stop=3414, step=1)

Numpy representation of the dataset:

df.values

array([['180', 2011, 'Epic Roasthouse (399 Embarcadero)', ...,
        'Siddarth', 'Nithya Menon', 'Priya Anand'],
       ['180', 2011, 'Mason & California Streets (Nob Hill)', ...,
        'Siddarth', 'Nithya Menon', 'Priya Anand'],
       ['180', 2011, 'Justin Herman Plaza', ..., 'Siddarth',
        'Nithya Menon', 'Priya Anand'],
       ...,
       ['Murder in the First, Season 3', 2016,
        'Linden Alley between Octavia and Gough Streets', ...,
        'Taye Diggs', 'Kathleen Robertson', 'Ian Anthony Dale'],
       ['George of the Jungle', 1997, '755 Vallejo Street', ...,
        'Brendan Fraser', 'Leslie Mann', nan],
       ['Alcatraz', 2012, 'Chestnut St. from Larkin to Columbus', ...,
        'Sarah Jones', 'Jorge Garcia', nan]], dtype=object)

type(df)

pandas.core.frame.DataFrame

df.shape

(3414, 11)

Getting the info about the data frame:

df.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 3414 entries, 0 to 3413
Data columns (total 11 columns):
 #   Column              Non-Null Count  Dtype 
---  ------              --------------  ----- 
 0   Title               3414 non-null   object
 1   Release Year        3414 non-null   int64 
 2   Locations           3306 non-null   object
 3   Fun Facts           898 non-null    object
 4   Production Company  3410 non-null   object
 5   Distributor         3212 non-null   object
 6   Director            3414 non-null   object
 7   Writer              3404 non-null   object
 8   Actor 1             3406 non-null   object
 9   Actor 2             3228 non-null   object
 10  Actor 3             2470 non-null   object
dtypes: int64(1), object(10)
memory usage: 293.5+ KB

Starting to Filtering and Slicing our data:

df['Actor 1']

0             Siddarth
1             Siddarth
2             Siddarth
3             Siddarth
4             Siddarth
             ...      
3409       Kevin Bacon
3410        Taye Diggs
3411        Taye Diggs
3412    Brendan Fraser
3413       Sarah Jones
Name: Actor 1, Length: 3414, dtype: object

Subset one or more columns in the data frame:

subset = df[['Title', 'Release Year', 'Locations', 'Production Company', 'Director']]

subset.head()

Subset rows:

#loc will show the character that matches, 2 in this case.
df.loc[2]

Title                                                               180
Release Year                                                       2011
Locations                                           Justin Herman Plaza
Fun Facts                                                           NaN
Production Company                                          SPI Cinemas
Distributor                                                         NaN
Director                                                       Jayendra
Writer                Umarji Anuradha, Jayendra, Aarthi Sriram, & Suba 
Actor 1                                                        Siddarth
Actor 2                                                    Nithya Menon
Actor 3                                                     Priya Anand
Name: 2, dtype: object

df.loc[[2, 0]]

df.loc[[4, 5]]

Subseting columns, filtering by rows:

subset = df.loc[:, ['Release Year', 'Locations', 'Director']]

subset.head()

Getting all the rows but by the year 2015, subseting specific columns:

df.loc[df['Release Year'] == 2015, ['Release Year', 'Locations', 'Director']]

	Release Year	Locations	Director
16	2015	Pier 50- end of the pier	Lee Toland Krieger
17	2015	California @ Montgomery	Lee Toland Krieger
18	2015	Montgomery/Green	Lee Toland Krieger
19	2015	Driving various SF Streets	Lee Toland Krieger
20	2015	Plate Shots SF streets various	Lee Toland Krieger
...	...	...	...
3373	2015	Pine St. between Market and Kearny	Alan Taylor
3378	2015	Roxie Theater (3117 16th St.)	Zachary Shedd
3382	2015	Daniel's Pharmacy, 943 Geneva Ave.	Andrew Haigh
3388	2015	Fisherman's Wharf pier near Chapel (Port Walk ...	Gabriele Muccino
3397	2015	Mission between 3rd and 4th St.	Brad Peyton

Search This Blog

curious_about_data

Analysing the most popular film locations in San Francisco. Part 1

Comments

Post a Comment

	Title	Release Year	Locations	Fun Facts	Production Company	Distributor	Director	Writer	Actor 1	Actor 2	Actor 3
0	180	2011	Epic Roasthouse (399 Embarcadero)	NaN	SPI Cinemas	NaN	Jayendra	Umarji Anuradha, Jayendra, Aarthi Sriram, & Suba	Siddarth	Nithya Menon	Priya Anand
1	180	2011	Mason & California Streets (Nob Hill)	NaN	SPI Cinemas	NaN	Jayendra	Umarji Anuradha, Jayendra, Aarthi Sriram, & Suba	Siddarth	Nithya Menon	Priya Anand
2	180	2011	Justin Herman Plaza	NaN	SPI Cinemas	NaN	Jayendra	Umarji Anuradha, Jayendra, Aarthi Sriram, & Suba	Siddarth	Nithya Menon	Priya Anand
3	180	2011	200 block Market Street	NaN	SPI Cinemas	NaN	Jayendra	Umarji Anuradha, Jayendra, Aarthi Sriram, & Suba	Siddarth	Nithya Menon	Priya Anand
4	180	2011	City Hall	NaN	SPI Cinemas	NaN	Jayendra	Umarji Anuradha, Jayendra, Aarthi Sriram, & Suba	Siddarth	Nithya Menon	Priya Anand