Series

Creating Series

Empty Series

Passing empty parameter result in empty series

In [135]:

s = pd.Series()
print (s)
type(s)
Series([], dtype: float64)

Out[135]:

pandas.core.series.Series

From Scalar

If data is a scalar value, an index must be provided. The value will be repeated to match the length of index

In [136]:

pd.Series( 1, index = ['a','b','c','d'])

Out[136]:

a    1
b    1
c    1
d    1
dtype: int64

From list or np.array

If index is not specified, default to 0 and continue incrementally

In [137]:

pd.Series(np.array(['a','b','c','d','e']))  # from np.array

Out[137]:

0    a
1    b
2    c
3    d
4    e
dtype: object

In [138]:

pd.Series(['a','b','c','d','e'])           # from Python list

Out[138]:

0    a
1    b
2    c
3    d
4    e
dtype: object

From Dictionary

The dictionary key will be the index

If index sequence is not specified, then the Series will be automatically sorted according to the key

In [139]:

pd.Series({'a' : 0., 'c' : 1., 'b' : 2.})  # from Python dict, autosort by default key

Out[139]:

a    0.0
b    2.0
c    1.0
dtype: float64

If index sequence is specifeid, then Series will forllow the index order Objerve that missing data (index without value) will be marked as NaN

In [140]:

pd.Series({'a' : 0., 'c' : 1., 'b' : 2.},index = ['a','b','c','d'])  # from Python Dict, index specified, no auto sort

Out[140]:

a    0.0
b    2.0
c    1.0
d    NaN
dtype: float64

Specify Index During Creation

In [141]:

pd.Series(['a','b','c','d','e'], index=[10,20,30,40,50])

Out[141]:

10    a
20    b
30    c
40    d
50    e
dtype: object

Accessing Series

series     ( single/list/range_of_row_label/number ) # can cause confusion
series.loc ( single/list/range_of_row_label )
series.iloc( single/list/range_of_row_number )

Sample Data

In [142]:

pd.Series([1,2,3,4,5],index=['a','b','c','d','e']) 

Out[142]:

a    1
b    2
c    3
d    4
e    5
dtype: int64

Retrieve by Position

Single Item

In [144]:

s.iloc[1]

Out[144]:

2

Multiple Items

In [145]:

s.iloc[[1,3]] # single position

Out[145]:

b    2
d    4
dtype: int64

Range (First 3)

In [146]:

s.iloc[:3]

Out[146]:

a    1
b    2
c    3
dtype: int64

Range (Last 3)

In [147]:

s.iloc[-3:]

Out[147]:

c    3
d    4
e    5
dtype: int64

Range (in between)

In [148]:

s.iloc[2:3]

Out[148]:

c    3
dtype: int64

Retrieve by Label

Single Label

In [149]:

s.loc['c'] 
# or  ... s[['c']]

Out[149]:

3

Multiple Labels

In [150]:

s.loc[['b','c']]

Out[150]:

b    2
c    3
dtype: int64

Range of Labels

In [151]:

s.loc['b':'d']

Out[151]:

b    2
c    3
d    4
dtype: int64

Series Properties

.index

In [152]:

s.index

Out[152]:

Index(['a', 'b', 'c', 'd', 'e'], dtype='object')

Series Functions

.reset_index ()

Resetting index will:

  • Convert index to a normal column
  • Index numbering became 0,1,2,3

In [665]:

s.reset_index()

Out[665]:

  index  0
0     a  1
1     b  2
2     c  3
3     d  4
4     e  5

In [666]:

s.index

Out[666]:

Index(['a', 'b', 'c', 'd', 'e'], dtype='object')

Series Number Operator

The result of applying operator (arithmetic or logic) to Series object returns a new Series object

Arithmetic Operator

In [658]:

s1 = pd.Series( [100,200,300,400,500] )
s2 = pd.Series( [10, 20, 30, 40, 50] )

Apply To One Series Object

In [154]:

100 - s2

Out[154]:

0    90
1    80
2    70
3    60
4    50
dtype: int64

Apply To Two Series Objects

In [155]:

s1 - s2

Out[155]:

0     90
1    180
2    270
3    360
4    450
dtype: int64

Logic Operator (Boolean Selection)

  • Apply logic operator to a Series return a new Series of boolean result
  • This can be used for dataframe filtering

In [156]:

bs = pd.Series(range(0,10))
bs

Out[156]:

0    0
1    1
2    2
3    3
4    4
5    5
6    6
7    7
8    8
9    9
dtype: int64

In [157]:

print (bs>3)
print (type (bs>3))
0    False
1    False
2    False
3    False
4     True
5     True
6     True
7     True
8     True
9     True
dtype: bool
<class 'pandas.core.series.Series'>

In [158]:

~((bs>3) & (bs<8))

Out[158]:

0     True
1     True
2     True
3     True
4    False
5    False
6    False
7    False
8     True
9     True
dtype: bool

Series String Operator

  • This chapter focus on various functions that can be applied to entire Series data

    SeriesObj.str.operatorFunction()
    

In [669]:

s = pd.Series(['A', 'B', 'C', 'Aaba', 'Baca', np.nan, 'CABA', 'dog', 'cat'])

Case Conversion

SeriesObj.str.upper()
SeriesObj.str.lower()

In [670]:

s.str.upper()

Out[670]:

0       A
1       B
2       C
3    AABA
4    BACA
5     NaN
6    CABA
7     DOG
8     CAT
dtype: object

Number of Characters

In [672]:

s.str.len()

Out[672]:

0    1.0
1    1.0
2    1.0
3    4.0
4    4.0
5    NaN
6    4.0
7    3.0
8    3.0
dtype: float64

String Indexing

In [688]:

s = pd.Series(['A', 'B', 'C', 'Aaba', 'Baca', np.nan,'CABA', 'dog', 'cat'])
s

Out[688]:

0       A
1       B
2       C
3    Aaba
4    Baca
5     NaN
6    CABA
7     dog
8     cat
dtype: object

In [689]:

s.str[1]  # return char-1 (second char) of every item

Out[689]:

0    NaN
1    NaN
2    NaN
3      a
4      a
5    NaN
6      A
7      o
8      a
dtype: object

Splitting

Sample Data

In [676]:

s = pd.Series(['a_b_c', 'c_d_e', np.nan, 'f_g_h'])

Splitting base on a a delimieter Result is a SeriesObj with list of splitted characters

In [678]:

sp = s.str.split('_')
sp

Out[678]:

0    [a, b, c]
1    [c, d, e]
2          NaN
3    [f, g, h]
dtype: object

Retrieving Split Result Use .str.get() to retrieve splitted elments

In [681]:

sp.str.get(-1) 

Out[681]:

0      c
1      e
2    NaN
3      h
dtype: object

Alternatively, use str[ ] for the same result

In [682]:

sp.str[-1]

Out[682]:

0      c
1      e
2    NaN
3      h
dtype: object

Split and Expand Into DataFrame

In [685]:

s.str.split('_',expand=True, n=5)  # limit expansion into n columns

Out[685]:

     0    1    2
0    a    b    c
1    c    d    e
2  NaN  NaN  NaN
3    f    g    h

Series Substring Extraction

Sample Data

In [695]:

s = pd.Series(['a1', 'b2', 'c3'])
s

Out[695]:

0    a1
1    b2
2    c3
dtype: object

Extract absed on regex matching ... to improve ...

In [696]:

type(s.str.extract('([ab])(\d)', expand=False))

Out[696]:

pandas.core.frame.DataFrame

Series DateTime Operator

Sample Data

In [701]:

s = pd.Series([
    dt.datetime(2000,1,1,0,0,0),
    dt.datetime(1999,12,15,12,34,55),
    dt.datetime(2020,3,8,5,7,12),
    dt.datetime(2018,1,1,0,0,0),
    dt.datetime(2003,3,4,5,6,7)
])

Out[701]:

0   2000-01-01 00:00:00
1   1999-12-15 12:34:55
2   2020-03-08 05:07:12
3   2018-01-01 00:00:00
4   2003-03-04 05:06:07
dtype: datetime64[ns]

A Series::DateTime object support below properties:

  • date
  • month
  • day
  • year
  • dayofweek
  • dayofyear
  • weekday
  • weekday_name
  • quarter
  • daysinmonth

In [703]:

s.dt.date

Out[703]:

0    2000-01-01
1    1999-12-15
2    2020-03-08
3    2018-01-01
4    2003-03-04
dtype: object

In [711]:

s.dt.month

Out[711]:

0     1
1    12
2     3
3     1
4     3
dtype: int64

In [717]:

s.dt.dayofweek

Out[717]:

0    5
1    2
2    6
3    0
4    1
dtype: int64

In [718]:

s.dt.weekday

Out[718]:

0    5
1    2
2    6
3    0
4    1
dtype: int64

In [720]:

s.dt.weekday_name

Out[720]:

0     Saturday
1    Wednesday
2       Sunday
3       Monday
4      Tuesday
dtype: object

In [721]:

s.dt.quarter

Out[721]:

0    1
1    4
2    1
3    1
4    1
dtype: int64

In [724]:

s.dt.daysinmonth

Out[724]:

0    31
1    31
2    31
3    31
4    31
dtype: int64

In [723]:

 

Out[723]:

0    31
1    31
2    31
3    31
4    31
dtype: int64

In [710]:

s.dt.time   # extract time as time Object

Out[710]:

0    00:00:00
1    12:34:55
2    05:07:12
3    00:00:00
4    05:06:07
dtype: object

In [708]:

s.dt.hour  # extract hour as integer

Out[708]:

0     0
1    12
2     5
3     0
4     5
dtype: int64

In [709]:

s.dt.minute # extract minute as integer

Out[709]:

0     0
1    34
2     7
3     0
4     6
dtype: int64

results matching ""

    No results matching ""