Series¶

Creating Series¶

Empty Series¶

Passing empty parameter result in empty series

In [135]:

s = pd.Series()
print (s)
type(s)

Series([], dtype: float64)

Out[135]:

pandas.core.series.Series

From Scalar¶

If data is a scalar value, an index must be provided. The value will be repeated to match the length of index

In [136]:

pd.Series( 1, index = ['a','b','c','d'])

Out[136]:

a    1
b    1
c    1
d    1
dtype: int64

From list or np.array¶

If index is not specified, default to 0 and continue incrementally

In [137]:

pd.Series(np.array(['a','b','c','d','e']))  # from np.array

Out[137]:

0    a
1    b
2    c
3    d
4    e
dtype: object

In [138]:

pd.Series(['a','b','c','d','e'])           # from Python list

Out[138]:

0    a
1    b
2    c
3    d
4    e
dtype: object

From Dictionary¶

The dictionary key will be the index

If index sequence is not specified, then the Series will be automatically sorted according to the key

In [139]:

pd.Series({'a' : 0., 'c' : 1., 'b' : 2.})  # from Python dict, autosort by default key

Out[139]:

a    0.0
b    2.0
c    1.0
dtype: float64

If index sequence is specifeid, then Series will forllow the index order Objerve that missing data (index without value) will be marked as NaN

In [140]:

pd.Series({'a' : 0., 'c' : 1., 'b' : 2.},index = ['a','b','c','d'])  # from Python Dict, index specified, no auto sort

Out[140]:

a    0.0
b    2.0
c    1.0
d    NaN
dtype: float64

Specify Index During Creation¶

In [141]:

pd.Series(['a','b','c','d','e'], index=[10,20,30,40,50])

Out[141]:

10    a
20    b
30    c
40    d
50    e
dtype: object

Accessing Series¶

series     ( single/list/range_of_row_label/number ) # can cause confusion
series.loc ( single/list/range_of_row_label )
series.iloc( single/list/range_of_row_number )

Sample Data¶

In [142]:

pd.Series([1,2,3,4,5],index=['a','b','c','d','e'])

Out[142]:

a    1
b    2
c    3
d    4
e    5
dtype: int64

Retrieve by Position¶

Single Item

In [144]:

s.iloc[1]

Out[144]:

Multiple Items

In [145]:

s.iloc[[1,3]] # single position

Out[145]:

b    2
d    4
dtype: int64

Range (First 3)

In [146]:

s.iloc[:3]

Out[146]:

a    1
b    2
c    3
dtype: int64

Range (Last 3)

In [147]:

s.iloc[-3:]

Out[147]:

c    3
d    4
e    5
dtype: int64

Range (in between)

In [148]:

s.iloc[2:3]

Out[148]:

c    3
dtype: int64

Retrieve by Label¶

Single Label

In [149]:

s.loc['c'] 
# or  ... s[['c']]

Out[149]:

Multiple Labels

In [150]:

s.loc[['b','c']]

Out[150]:

b    2
c    3
dtype: int64

Range of Labels

In [151]:

s.loc['b':'d']

Out[151]:

b    2
c    3
d    4
dtype: int64

Series Properties¶

.index¶

In [152]:

s.index

Out[152]:

Index(['a', 'b', 'c', 'd', 'e'], dtype='object')

Series Functions¶

.reset_index ()¶

Resetting index will:

Convert index to a normal column
Index numbering became 0,1,2,3

In [665]:

s.reset_index()

Out[665]:

In [666]:

s.index

Out[666]:

Index(['a', 'b', 'c', 'd', 'e'], dtype='object')

Series Number Operator¶

The result of applying operator (arithmetic or logic) to Series object returns a new Series object

Arithmetic Operator¶

In [658]:

s1 = pd.Series( [100,200,300,400,500] )
s2 = pd.Series( [10, 20, 30, 40, 50] )

Apply To One Series Object

In [154]:

100 - s2

Out[154]:

0    90
1    80
2    70
3    60
4    50
dtype: int64

Apply To Two Series Objects

In [155]:

s1 - s2

Out[155]:

0     90
1    180
2    270
3    360
4    450
dtype: int64

Logic Operator (Boolean Selection)¶

Apply logic operator to a Series return a new Series of boolean result
This can be used for dataframe filtering

In [156]:

bs = pd.Series(range(0,10))
bs

Out[156]:

0    0
1    1
2    2
3    3
4    4
5    5
6    6
7    7
8    8
9    9
dtype: int64

In [157]:

print (bs>3)
print (type (bs>3))

0    False
1    False
2    False
3    False
4     True
5     True
6     True
7     True
8     True
9     True
dtype: bool
<class 'pandas.core.series.Series'>

In [158]:

~((bs>3) & (bs<8))

Out[158]:

0     True
1     True
2     True
3     True
4    False
5    False
6    False
7    False
8     True
9     True
dtype: bool

Series String Operator¶

This chapter focus on various functions that can be applied to entire Series data
```
SeriesObj.str.operatorFunction()
```

In [669]:

s = pd.Series(['A', 'B', 'C', 'Aaba', 'Baca', np.nan, 'CABA', 'dog', 'cat'])

Case Conversion¶

SeriesObj.str.upper()
SeriesObj.str.lower()

In [670]:

s.str.upper()

Out[670]:

0       A
1       B
2       C
3    AABA
4    BACA
5     NaN
6    CABA
7     DOG
8     CAT
dtype: object

Number of Characters¶

In [672]:

s.str.len()

Out[672]:

0    1.0
1    1.0
2    1.0
3    4.0
4    4.0
5    NaN
6    4.0
7    3.0
8    3.0
dtype: float64

String Indexing¶

In [688]:

s = pd.Series(['A', 'B', 'C', 'Aaba', 'Baca', np.nan,'CABA', 'dog', 'cat'])
s

Out[688]:

0       A
1       B
2       C
3    Aaba
4    Baca
5     NaN
6    CABA
7     dog
8     cat
dtype: object

In [689]:

s.str[1]  # return char-1 (second char) of every item

Out[689]:

0    NaN
1    NaN
2    NaN
3      a
4      a
5    NaN
6      A
7      o
8      a
dtype: object

Splitting¶

Sample Data

In [676]:

s = pd.Series(['a_b_c', 'c_d_e', np.nan, 'f_g_h'])

Splitting base on a a delimieter Result is a SeriesObj with list of splitted characters

In [678]:

sp = s.str.split('_')
sp

Out[678]:

0    [a, b, c]
1    [c, d, e]
2          NaN
3    [f, g, h]
dtype: object

Retrieving Split Result Use .str.get() to retrieve splitted elments

In [681]:

sp.str.get(-1)

Out[681]:

0      c
1      e
2    NaN
3      h
dtype: object

Alternatively, use str[ ] for the same result

In [682]:

sp.str[-1]

Out[682]:

0      c
1      e
2    NaN
3      h
dtype: object

Split and Expand Into DataFrame¶

In [685]:

s.str.split('_',expand=True, n=5)  # limit expansion into n columns

Out[685]:

     0    1    2
0    a    b    c
1    c    d    e
2  NaN  NaN  NaN
3    f    g    h

Series Substring Extraction¶

Sample Data

In [695]:

s = pd.Series(['a1', 'b2', 'c3'])
s

Out[695]:

0    a1
1    b2
2    c3
dtype: object

Extract absed on regex matching ... to improve ...

In [696]:

type(s.str.extract('([ab])(\d)', expand=False))

Out[696]:

pandas.core.frame.DataFrame

Series DateTime Operator¶

Sample Data¶

In [701]:

s = pd.Series([
    dt.datetime(2000,1,1,0,0,0),
    dt.datetime(1999,12,15,12,34,55),
    dt.datetime(2020,3,8,5,7,12),
    dt.datetime(2018,1,1,0,0,0),
    dt.datetime(2003,3,4,5,6,7)
])

Out[701]:

0   2000-01-01 00:00:00
1   1999-12-15 12:34:55
2   2020-03-08 05:07:12
3   2018-01-01 00:00:00
4   2003-03-04 05:06:07
dtype: datetime64[ns]

A Series::DateTime object support below properties:

date
month
day
year
dayofweek
dayofyear
weekday
weekday_name
quarter
daysinmonth

In [703]:

s.dt.date

Out[703]:

0    2000-01-01
1    1999-12-15
2    2020-03-08
3    2018-01-01
4    2003-03-04
dtype: object

In [711]:

s.dt.month

Out[711]:

0     1
1    12
2     3
3     1
4     3
dtype: int64

In [717]:

s.dt.dayofweek

Out[717]:

0    5
1    2
2    6
3    0
4    1
dtype: int64

In [718]:

s.dt.weekday

Out[718]:

0    5
1    2
2    6
3    0
4    1
dtype: int64

In [720]:

s.dt.weekday_name

Out[720]:

0     Saturday
1    Wednesday
2       Sunday
3       Monday
4      Tuesday
dtype: object

In [721]:

s.dt.quarter

Out[721]:

0    1
1    4
2    1
3    1
4    1
dtype: int64

In [724]:

s.dt.daysinmonth

Out[724]:

0    31
1    31
2    31
3    31
4    31
dtype: int64

In [723]:

Out[723]:

0    31
1    31
2    31
3    31
4    31
dtype: int64

In [710]:

s.dt.time   # extract time as time Object

Out[710]:

0    00:00:00
1    12:34:55
2    05:07:12
3    00:00:00
4    05:06:07
dtype: object

In [708]:

s.dt.hour  # extract hour as integer

Out[708]:

0     0
1    12
2     5
3     0
4     5
dtype: int64

In [709]:

s.dt.minute # extract minute as integer

Out[709]:

0     0
1    34
2     7
3     0
4     6
dtype: int64

Series¶

Series¶

Creating Series¶

Empty Series¶

From Scalar¶

From list or np.array¶

From Dictionary¶

Specify Index During Creation¶

Accessing Series¶

Sample Data¶

Retrieve by Position¶

Retrieve by Label¶

Series Properties¶

.index¶

Series Functions¶

.reset_index ()¶

Series Number Operator¶

Arithmetic Operator¶

Logic Operator (Boolean Selection)¶

Series String Operator¶

Case Conversion¶

Number of Characters¶

String Indexing¶

Splitting¶

Split and Expand Into DataFrame¶

Series Substring Extraction¶

Series DateTime Operator¶

Sample Data¶

results matching ""

No results matching ""

Series¶

Creating Series¶

Empty Series¶

From Scalar¶

From list or np.array¶

From Dictionary¶

Specify Index During Creation¶

Accessing Series¶

Sample Data¶

Retrieve by Position¶

Retrieve by Label¶

Series Properties¶

.index¶

Series Functions¶

.reset_index ()¶

Series Number Operator¶

Arithmetic Operator¶

Logic Operator (Boolean Selection)¶

Series String Operator¶

Case Conversion¶

Number of Characters¶

String Indexing¶

Splitting¶

Split and Expand Into DataFrame¶

Series Substring Extraction¶

Series DateTime Operator¶

Sample Data¶

Date Related Extraction¶

Time Related Extration¶

results matching ""

No results matching ""