Series¶
Creating Series¶
Empty Series¶
Passing empty parameter result in empty series
In [135]:
s = pd.Series() print (s) type(s)
Series([], dtype: float64)
Out[135]:
pandas.core.series.Series
From Scalar¶
If data is a scalar value, an index must be provided. The value will be repeated to match the length of index
In [136]:
pd.Series( 1, index = ['a','b','c','d'])
Out[136]:
a 1 b 1 c 1 d 1 dtype: int64
From list or np.array¶
If index is not specified, default to 0 and continue incrementally
In [137]:
pd.Series(np.array(['a','b','c','d','e'])) # from np.array
Out[137]:
0 a 1 b 2 c 3 d 4 e dtype: object
In [138]:
pd.Series(['a','b','c','d','e']) # from Python list
Out[138]:
0 a 1 b 2 c 3 d 4 e dtype: object
From Dictionary¶
The dictionary key will be the index
If index sequence is not specified, then the Series will be automatically sorted according to the key
In [139]:
pd.Series({'a' : 0., 'c' : 1., 'b' : 2.}) # from Python dict, autosort by default key
Out[139]:
a 0.0 b 2.0 c 1.0 dtype: float64
If index sequence is specifeid, then Series will forllow the index order Objerve that missing data (index without value) will be marked as NaN
In [140]:
pd.Series({'a' : 0., 'c' : 1., 'b' : 2.},index = ['a','b','c','d']) # from Python Dict, index specified, no auto sort
Out[140]:
a 0.0 b 2.0 c 1.0 d NaN dtype: float64
Specify Index During Creation¶
In [141]:
pd.Series(['a','b','c','d','e'], index=[10,20,30,40,50])
Out[141]:
10 a 20 b 30 c 40 d 50 e dtype: object
Accessing Series¶
series ( single/list/range_of_row_label/number ) # can cause confusion
series.loc ( single/list/range_of_row_label )
series.iloc( single/list/range_of_row_number )
Sample Data¶
In [142]:
pd.Series([1,2,3,4,5],index=['a','b','c','d','e'])
Out[142]:
a 1 b 2 c 3 d 4 e 5 dtype: int64
Retrieve by Position¶
Single Item
In [144]:
s.iloc[1]
Out[144]:
2
Multiple Items
In [145]:
s.iloc[[1,3]] # single position
Out[145]:
b 2 d 4 dtype: int64
Range (First 3)
In [146]:
s.iloc[:3]
Out[146]:
a 1 b 2 c 3 dtype: int64
Range (Last 3)
In [147]:
s.iloc[-3:]
Out[147]:
c 3 d 4 e 5 dtype: int64
Range (in between)
In [148]:
s.iloc[2:3]
Out[148]:
c 3 dtype: int64
Retrieve by Label¶
Single Label
In [149]:
s.loc['c'] # or ... s[['c']]
Out[149]:
3
Multiple Labels
In [150]:
s.loc[['b','c']]
Out[150]:
b 2 c 3 dtype: int64
Range of Labels
In [151]:
s.loc['b':'d']
Out[151]:
b 2 c 3 d 4 dtype: int64
Series Properties¶
.index¶
In [152]:
s.index
Out[152]:
Index(['a', 'b', 'c', 'd', 'e'], dtype='object')
Series Functions¶
.reset_index ()¶
Resetting index will:
- Convert index to a normal column
- Index numbering became 0,1,2,3
In [665]:
s.reset_index()
Out[665]:
index 0 0 a 1 1 b 2 2 c 3 3 d 4 4 e 5
In [666]:
s.index
Out[666]:
Index(['a', 'b', 'c', 'd', 'e'], dtype='object')
Series Number Operator¶
The result of applying operator (arithmetic or logic) to Series object returns a new Series object
Arithmetic Operator¶
In [658]:
s1 = pd.Series( [100,200,300,400,500] ) s2 = pd.Series( [10, 20, 30, 40, 50] )
Apply To One Series Object
In [154]:
100 - s2
Out[154]:
0 90 1 80 2 70 3 60 4 50 dtype: int64
Apply To Two Series Objects
In [155]:
s1 - s2
Out[155]:
0 90 1 180 2 270 3 360 4 450 dtype: int64
Logic Operator (Boolean Selection)¶
- Apply logic operator to a Series return a new Series of boolean result
- This can be used for dataframe filtering
In [156]:
bs = pd.Series(range(0,10)) bs
Out[156]:
0 0 1 1 2 2 3 3 4 4 5 5 6 6 7 7 8 8 9 9 dtype: int64
In [157]:
print (bs>3) print (type (bs>3))
0 False 1 False 2 False 3 False 4 True 5 True 6 True 7 True 8 True 9 True dtype: bool <class 'pandas.core.series.Series'>
In [158]:
~((bs>3) & (bs<8))
Out[158]:
0 True 1 True 2 True 3 True 4 False 5 False 6 False 7 False 8 True 9 True dtype: bool
Series String Operator¶
This chapter focus on various functions that can be applied to entire Series data
SeriesObj.str.operatorFunction()
In [669]:
s = pd.Series(['A', 'B', 'C', 'Aaba', 'Baca', np.nan, 'CABA', 'dog', 'cat'])
Case Conversion¶
SeriesObj.str.upper()
SeriesObj.str.lower()
In [670]:
s.str.upper()
Out[670]:
0 A 1 B 2 C 3 AABA 4 BACA 5 NaN 6 CABA 7 DOG 8 CAT dtype: object
Number of Characters¶
In [672]:
s.str.len()
Out[672]:
0 1.0 1 1.0 2 1.0 3 4.0 4 4.0 5 NaN 6 4.0 7 3.0 8 3.0 dtype: float64
String Indexing¶
In [688]:
s = pd.Series(['A', 'B', 'C', 'Aaba', 'Baca', np.nan,'CABA', 'dog', 'cat']) s
Out[688]:
0 A 1 B 2 C 3 Aaba 4 Baca 5 NaN 6 CABA 7 dog 8 cat dtype: object
In [689]:
s.str[1] # return char-1 (second char) of every item
Out[689]:
0 NaN 1 NaN 2 NaN 3 a 4 a 5 NaN 6 A 7 o 8 a dtype: object
Splitting¶
Sample Data
In [676]:
s = pd.Series(['a_b_c', 'c_d_e', np.nan, 'f_g_h'])
Splitting base on a a delimieter Result is a SeriesObj with list of splitted characters
In [678]:
sp = s.str.split('_')
sp
Out[678]:
0 [a, b, c] 1 [c, d, e] 2 NaN 3 [f, g, h] dtype: object
Retrieving Split Result Use .str.get() to retrieve splitted elments
In [681]:
sp.str.get(-1)
Out[681]:
0 c 1 e 2 NaN 3 h dtype: object
Alternatively, use str[ ] for the same result
In [682]:
sp.str[-1]
Out[682]:
0 c 1 e 2 NaN 3 h dtype: object
Split and Expand Into DataFrame¶
In [685]:
s.str.split('_',expand=True, n=5) # limit expansion into n columns
Out[685]:
0 1 2 0 a b c 1 c d e 2 NaN NaN NaN 3 f g h
Series Substring Extraction¶
Sample Data
In [695]:
s = pd.Series(['a1', 'b2', 'c3']) s
Out[695]:
0 a1 1 b2 2 c3 dtype: object
Extract absed on regex matching ... to improve ...
In [696]:
type(s.str.extract('([ab])(\d)', expand=False))
Out[696]:
pandas.core.frame.DataFrame
Series DateTime Operator¶
Sample Data¶
In [701]:
s = pd.Series([
dt.datetime(2000,1,1,0,0,0),
dt.datetime(1999,12,15,12,34,55),
dt.datetime(2020,3,8,5,7,12),
dt.datetime(2018,1,1,0,0,0),
dt.datetime(2003,3,4,5,6,7)
])
Out[701]:
0 2000-01-01 00:00:00 1 1999-12-15 12:34:55 2 2020-03-08 05:07:12 3 2018-01-01 00:00:00 4 2003-03-04 05:06:07 dtype: datetime64[ns]
Date Related Extraction¶
A Series::DateTime object support below properties:
- date
- month
- day
- year
- dayofweek
- dayofyear
- weekday
- weekday_name
- quarter
- daysinmonth
In [703]:
s.dt.date
Out[703]:
0 2000-01-01 1 1999-12-15 2 2020-03-08 3 2018-01-01 4 2003-03-04 dtype: object
In [711]:
s.dt.month
Out[711]:
0 1 1 12 2 3 3 1 4 3 dtype: int64
In [717]:
s.dt.dayofweek
Out[717]:
0 5 1 2 2 6 3 0 4 1 dtype: int64
In [718]:
s.dt.weekday
Out[718]:
0 5 1 2 2 6 3 0 4 1 dtype: int64
In [720]:
s.dt.weekday_name
Out[720]:
0 Saturday 1 Wednesday 2 Sunday 3 Monday 4 Tuesday dtype: object
In [721]:
s.dt.quarter
Out[721]:
0 1 1 4 2 1 3 1 4 1 dtype: int64
In [724]:
s.dt.daysinmonth
Out[724]:
0 31 1 31 2 31 3 31 4 31 dtype: int64
In [723]:
Out[723]:
0 31 1 31 2 31 3 31 4 31 dtype: int64
Time Related Extration¶
In [710]:
s.dt.time # extract time as time Object
Out[710]:
0 00:00:00 1 12:34:55 2 05:07:12 3 00:00:00 4 05:06:07 dtype: object
In [708]:
s.dt.hour # extract hour as integer
Out[708]:
0 0 1 12 2 5 3 0 4 5 dtype: int64
In [709]:
s.dt.minute # extract minute as integer
Out[709]:
0 0 1 34 2 7 3 0 4 6 dtype: int64