Analysing the most popular film locations in San Francisco. Part 1

exploring

Exploring the dataset:

In [1]:
import pandas
In [2]:
df = pandas.read_csv('./Film_Locations_in_San_Francisco.csv')
df.head()
Out[2]:
Title Release Year Locations Fun Facts Production Company Distributor Director Writer Actor 1 Actor 2 Actor 3
0 180 2011 Epic Roasthouse (399 Embarcadero) NaN SPI Cinemas NaN Jayendra Umarji Anuradha, Jayendra, Aarthi Sriram, & Suba Siddarth Nithya Menon Priya Anand
1 180 2011 Mason & California Streets (Nob Hill) NaN SPI Cinemas NaN Jayendra Umarji Anuradha, Jayendra, Aarthi Sriram, & Suba Siddarth Nithya Menon Priya Anand
2 180 2011 Justin Herman Plaza NaN SPI Cinemas NaN Jayendra Umarji Anuradha, Jayendra, Aarthi Sriram, & Suba Siddarth Nithya Menon Priya Anand
3 180 2011 200 block Market Street NaN SPI Cinemas NaN Jayendra Umarji Anuradha, Jayendra, Aarthi Sriram, & Suba Siddarth Nithya Menon Priya Anand
4 180 2011 City Hall NaN SPI Cinemas NaN Jayendra Umarji Anuradha, Jayendra, Aarthi Sriram, & Suba Siddarth Nithya Menon Priya Anand
In [3]:
df.columns
Out[3]:
Index(['Title', 'Release Year', 'Locations', 'Fun Facts', 'Production Company',
       'Distributor', 'Director', 'Writer', 'Actor 1', 'Actor 2', 'Actor 3'],
      dtype='object')
In [4]:
df.index
Out[4]:
RangeIndex(start=0, stop=3414, step=1)

Numpy representation of the dataset:

In [5]:
df.values
Out[5]:
array([['180', 2011, 'Epic Roasthouse (399 Embarcadero)', ...,
        'Siddarth', 'Nithya Menon', 'Priya Anand'],
       ['180', 2011, 'Mason & California Streets (Nob Hill)', ...,
        'Siddarth', 'Nithya Menon', 'Priya Anand'],
       ['180', 2011, 'Justin Herman Plaza', ..., 'Siddarth',
        'Nithya Menon', 'Priya Anand'],
       ...,
       ['Murder in the First, Season 3', 2016,
        'Linden Alley between Octavia and Gough Streets', ...,
        'Taye Diggs', 'Kathleen Robertson', 'Ian Anthony Dale'],
       ['George of the Jungle', 1997, '755 Vallejo Street', ...,
        'Brendan Fraser', 'Leslie Mann', nan],
       ['Alcatraz', 2012, 'Chestnut St. from Larkin to Columbus', ...,
        'Sarah Jones', 'Jorge Garcia', nan]], dtype=object)
In [6]:
type(df)
Out[6]:
pandas.core.frame.DataFrame
In [7]:
df.shape
Out[7]:
(3414, 11)

Getting the info about the data frame:

In [8]:
df.info()
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 3414 entries, 0 to 3413
Data columns (total 11 columns):
 #   Column              Non-Null Count  Dtype 
---  ------              --------------  ----- 
 0   Title               3414 non-null   object
 1   Release Year        3414 non-null   int64 
 2   Locations           3306 non-null   object
 3   Fun Facts           898 non-null    object
 4   Production Company  3410 non-null   object
 5   Distributor         3212 non-null   object
 6   Director            3414 non-null   object
 7   Writer              3404 non-null   object
 8   Actor 1             3406 non-null   object
 9   Actor 2             3228 non-null   object
 10  Actor 3             2470 non-null   object
dtypes: int64(1), object(10)
memory usage: 293.5+ KB

Starting to Filtering and Slicing our data:

In [9]:
df['Actor 1']
Out[9]:
0             Siddarth
1             Siddarth
2             Siddarth
3             Siddarth
4             Siddarth
             ...      
3409       Kevin Bacon
3410        Taye Diggs
3411        Taye Diggs
3412    Brendan Fraser
3413       Sarah Jones
Name: Actor 1, Length: 3414, dtype: object

Subset one or more columns in the data frame:

In [10]:
subset = df[['Title', 'Release Year', 'Locations', 'Production Company', 'Director']]
In [11]:
subset.head()
Out[11]:
Title Release Year Locations Production Company Director
0 180 2011 Epic Roasthouse (399 Embarcadero) SPI Cinemas Jayendra
1 180 2011 Mason & California Streets (Nob Hill) SPI Cinemas Jayendra
2 180 2011 Justin Herman Plaza SPI Cinemas Jayendra
3 180 2011 200 block Market Street SPI Cinemas Jayendra
4 180 2011 City Hall SPI Cinemas Jayendra

Subset rows:

In [12]:
#loc will show the character that matches, 2 in this case.
df.loc[2]
Out[12]:
Title                                                               180
Release Year                                                       2011
Locations                                           Justin Herman Plaza
Fun Facts                                                           NaN
Production Company                                          SPI Cinemas
Distributor                                                         NaN
Director                                                       Jayendra
Writer                Umarji Anuradha, Jayendra, Aarthi Sriram, & Suba 
Actor 1                                                        Siddarth
Actor 2                                                    Nithya Menon
Actor 3                                                     Priya Anand
Name: 2, dtype: object
In [13]:
df.loc[[2, 0]]
Out[13]:
Title Release Year Locations Fun Facts Production Company Distributor Director Writer Actor 1 Actor 2 Actor 3
2 180 2011 Justin Herman Plaza NaN SPI Cinemas NaN Jayendra Umarji Anuradha, Jayendra, Aarthi Sriram, & Suba Siddarth Nithya Menon Priya Anand
0 180 2011 Epic Roasthouse (399 Embarcadero) NaN SPI Cinemas NaN Jayendra Umarji Anuradha, Jayendra, Aarthi Sriram, & Suba Siddarth Nithya Menon Priya Anand
In [14]:
df.loc[[4, 5]]
Out[14]:
Title Release Year Locations Fun Facts Production Company Distributor Director Writer Actor 1 Actor 2 Actor 3
4 180 2011 City Hall NaN SPI Cinemas NaN Jayendra Umarji Anuradha, Jayendra, Aarthi Sriram, & Suba Siddarth Nithya Menon Priya Anand
5 180 2011 Polk & Larkin Streets NaN SPI Cinemas NaN Jayendra Umarji Anuradha, Jayendra, Aarthi Sriram, & Suba Siddarth Nithya Menon Priya Anand

Subseting columns, filtering by rows:

In [15]:
subset = df.loc[:, ['Release Year', 'Locations', 'Director']]
In [16]:
subset.head()
Out[16]:
Release Year Locations Director
0 2011 Epic Roasthouse (399 Embarcadero) Jayendra
1 2011 Mason & California Streets (Nob Hill) Jayendra
2 2011 Justin Herman Plaza Jayendra
3 2011 200 block Market Street Jayendra
4 2011 City Hall Jayendra

Getting all the rows but by the year 2015, subseting specific columns:

In [17]:
df.loc[df['Release Year'] == 2015, ['Release Year', 'Locations', 'Director']]
Out[17]:
Release Year Locations Director
16 2015 Pier 50- end of the pier Lee Toland Krieger
17 2015 California @ Montgomery Lee Toland Krieger
18 2015 Montgomery/Green Lee Toland Krieger
19 2015 Driving various SF Streets Lee Toland Krieger
20 2015 Plate Shots SF streets various Lee Toland Krieger
... ... ... ...
3373 2015 Pine St. between Market and Kearny Alan Taylor
3378 2015 Roxie Theater (3117 16th St.) Zachary Shedd
3382 2015 Daniel's Pharmacy, 943 Geneva Ave. Andrew Haigh
3388 2015 Fisherman's Wharf pier near Chapel (Port Walk ... Gabriele Muccino
3397 2015 Mission between 3rd and 4th St. Brad Peyton

578 rows × 3 columns

Comments