Fundamental Analysis¶
Structure of the Dataframe (.info())¶
info() is a function that print information to screen. It doesn't return any object
dataframe.info() # display columns and number of rows (that has no missing data)
In [263]:
ny.info()
<class 'pandas.core.frame.DataFrame'> Int64Index: 971063 entries, 1 to 971063 Data columns (total 7 columns): Created Date 971063 non-null object Closed Date 882944 non-null object Agency 971063 non-null object Incident Zip 911140 non-null object Borough 971063 non-null object Latitude 887284 non-null float64 Longitude 887284 non-null float64 dtypes: float64(2), object(5) memory usage: 59.3+ MB
First Few Rows (.head())¶
dataframe.head (n) # return dataframe of first n rows, default n = 5
In [264]:
ny.head()
Out[264]:
Created Date Closed Date Agency \
Unique Key
1 10/11/2016 11:53 10/11/2016 12:00 DSNY
2 10/11/2016 11:36 10/11/2016 12:00 DSNY
3 10/11/2016 11:36 10/11/2016 12:00 DSNY
4 10/11/2016 12:39 10/11/2016 12:39 DSNY
5 10/11/2016 12:18 10/11/2016 12:18 DSNY
Incident Zip Borough Latitude Longitude
Unique Key
1 NaN QUEENS NaN NaN
2 NaN QUEENS NaN NaN
3 NaN QUEENS NaN NaN
4 NaN QUEENS NaN NaN
5 NaN QUEENS NaN NaN
Missing Data¶
How Missing Data For Each Column ?
In [265]:
ny.count()
Out[265]:
Created Date 971063 Closed Date 882944 Agency 971063 Incident Zip 911140 Borough 971063 Latitude 887284 Longitude 887284 dtype: int64
In [266]:
len(ny.index) - ny.count()
Out[266]:
Created Date 0 Closed Date 88119 Agency 0 Incident Zip 59923 Borough 0 Latitude 83779 Longitude 83779 dtype: int64
In [267]:
ny.isnull()
Out[267]:
Created Date Closed Date Agency \
Unique Key
1 False False False
2 False False False
3 False False False
4 False False False
5 False False False
... ... ... ...
971059 False False False
971060 False False False
971061 False False False
971062 False False False
971063 False False False
Incident Zip Borough Latitude Longitude
Unique Key
1 True False True True
2 True False True True
3 True False True True
4 True False True True
5 True False True True
... ... ... ... ...
971059 False False False False
971060 False False False False
971061 False False False False
971062 False False False False
971063 False False False False
[971063 rows x 7 columns]
In [268]:
ny.describe()
Out[268]:
Latitude Longitude count 887284.000000 887284.000000 mean 40.732962 -73.925957 std 0.086321 0.078325 min 40.498807 -74.255211 25% 40.668923 -73.970263 50% 40.726060 -73.928597 75% 40.814237 -73.881897 max 40.912828 -73.700597