Fundamental Analysis

Structure of the Dataframe (.info())

info() is a function that print information to screen. It doesn't return any object

dataframe.info()  # display columns and number of rows (that has no missing data)

In [263]:

ny.info()
<class 'pandas.core.frame.DataFrame'>
Int64Index: 971063 entries, 1 to 971063
Data columns (total 7 columns):
Created Date    971063 non-null object
Closed Date     882944 non-null object
Agency          971063 non-null object
Incident Zip    911140 non-null object
Borough         971063 non-null object
Latitude        887284 non-null float64
Longitude       887284 non-null float64
dtypes: float64(2), object(5)
memory usage: 59.3+ MB

First Few Rows (.head())

dataframe.head (n) # return dataframe of first n rows, default n = 5

In [264]:

ny.head()

Out[264]:

                Created Date       Closed Date Agency  \
Unique Key                                              
1           10/11/2016 11:53  10/11/2016 12:00   DSNY   
2           10/11/2016 11:36  10/11/2016 12:00   DSNY   
3           10/11/2016 11:36  10/11/2016 12:00   DSNY   
4           10/11/2016 12:39  10/11/2016 12:39   DSNY   
5           10/11/2016 12:18  10/11/2016 12:18   DSNY   

           Incident Zip Borough  Latitude  Longitude  
Unique Key                                            
1                   NaN  QUEENS       NaN        NaN  
2                   NaN  QUEENS       NaN        NaN  
3                   NaN  QUEENS       NaN        NaN  
4                   NaN  QUEENS       NaN        NaN  
5                   NaN  QUEENS       NaN        NaN  

Missing Data

How Missing Data For Each Column ?

In [265]:

ny.count()

Out[265]:

Created Date    971063
Closed Date     882944
Agency          971063
Incident Zip    911140
Borough         971063
Latitude        887284
Longitude       887284
dtype: int64

In [266]:

len(ny.index) - ny.count()

Out[266]:

Created Date        0
Closed Date     88119
Agency              0
Incident Zip    59923
Borough             0
Latitude        83779
Longitude       83779
dtype: int64

In [267]:

ny.isnull()

Out[267]:

            Created Date  Closed Date  Agency  \
Unique Key                                      
1                  False        False   False   
2                  False        False   False   
3                  False        False   False   
4                  False        False   False   
5                  False        False   False   
...                  ...          ...     ...   
971059             False        False   False   
971060             False        False   False   
971061             False        False   False   
971062             False        False   False   
971063             False        False   False   

            Incident Zip  Borough  Latitude  Longitude  
Unique Key                                              
1                   True    False      True       True  
2                   True    False      True       True  
3                   True    False      True       True  
4                   True    False      True       True  
5                   True    False      True       True  
...                  ...      ...       ...        ...  
971059             False    False     False      False  
971060             False    False     False      False  
971061             False    False     False      False  
971062             False    False     False      False  
971063             False    False     False      False  

[971063 rows x 7 columns]

In [268]:

ny.describe()

Out[268]:

            Latitude      Longitude
count  887284.000000  887284.000000
mean       40.732962     -73.925957
std         0.086321       0.078325
min        40.498807     -74.255211
25%        40.668923     -73.970263
50%        40.726060     -73.928597
75%        40.814237     -73.881897
max        40.912828     -73.700597

results matching ""

    No results matching ""