Column Manipulation¶

Copy Column¶

In [498]:

mydf >> define(newcol = 'value1')                 # simple method for one column

Out[498]:

    comp dept grp     value1     value2     newcol
0     C2   D4  G1  50.265891  24.075876  50.265891
1     C2   D3  G2  44.306847  19.393717  44.306847
2     C2   D1  G2  56.247403  17.939938  56.247403
3     C3   D1  G2  41.746750  18.240598  41.746750
4     C1   D4  G2  50.915616  20.373281  50.915616
..   ...  ...  ..        ...        ...        ...
195   C3   D3  G1  50.185141  15.609130  50.185141
196   C2   D2  G1  54.906576  22.447863  54.906576
197   C1   D2  G1  53.081448  19.785567  53.081448
198   C3   D4  G2  44.667053  21.552565  44.667053
199   C3   D3  G1  59.081494  15.113429  59.081494

[200 rows x 6 columns]

In [507]:

mydf >> define (('newcol1', 'value1'), newcol2='value2')  # method for muiltiple new columns

Out[507]:

    comp dept grp     value1     value2    newcol1  \
0     C2   D4  G1  50.265891  24.075876  50.265891   
1     C2   D3  G2  44.306847  19.393717  44.306847   
2     C2   D1  G2  56.247403  17.939938  56.247403   
3     C3   D1  G2  41.746750  18.240598  41.746750   
4     C1   D4  G2  50.915616  20.373281  50.915616   
..   ...  ...  ..        ...        ...        ...   
195   C3   D3  G1  50.185141  15.609130  50.185141   
196   C2   D2  G1  54.906576  22.447863  54.906576   
197   C1   D2  G1  53.081448  19.785567  53.081448   
198   C3   D4  G2  44.667053  21.552565  44.667053   
199   C3   D3  G1  59.081494  15.113429  59.081494   

       newcol2  
0    24.075876  
1    19.393717  
2    17.939938  
3    18.240598  
4    20.373281  
..         ...  
195  15.609130  
196  22.447863  
197  19.785567  
198  21.552565  
199  15.113429  

[200 rows x 7 columns]

New Column from existing Column¶

Without specify the new column name, it will be derived from expression

In [474]:

mydf >> define ('value1*2')

Out[474]:

    comp dept grp     value1     value2    value1*2
0     C2   D4  G1  50.265891  24.075876  100.531782
1     C2   D3  G2  44.306847  19.393717   88.613694
2     C2   D1  G2  56.247403  17.939938  112.494805
3     C3   D1  G2  41.746750  18.240598   83.493501
4     C1   D4  G2  50.915616  20.373281  101.831232
..   ...  ...  ..        ...        ...         ...
195   C3   D3  G1  50.185141  15.609130  100.370282
196   C2   D2  G1  54.906576  22.447863  109.813152
197   C1   D2  G1  53.081448  19.785567  106.162897
198   C3   D4  G2  44.667053  21.552565   89.334105
199   C3   D3  G1  59.081494  15.113429  118.162988

[200 rows x 6 columns]

Specify the new column name

In [475]:

mydf >> define(value3 = 'value1*2')

Out[475]:

    comp dept grp     value1     value2      value3
0     C2   D4  G1  50.265891  24.075876  100.531782
1     C2   D3  G2  44.306847  19.393717   88.613694
2     C2   D1  G2  56.247403  17.939938  112.494805
3     C3   D1  G2  41.746750  18.240598   83.493501
4     C1   D4  G2  50.915616  20.373281  101.831232
..   ...  ...  ..        ...        ...         ...
195   C3   D3  G1  50.185141  15.609130  100.370282
196   C2   D2  G1  54.906576  22.447863  109.813152
197   C1   D2  G1  53.081448  19.785567  106.162897
198   C3   D4  G2  44.667053  21.552565   89.334105
199   C3   D3  G1  59.081494  15.113429  118.162988

[200 rows x 6 columns]

Define multiple new columns in one go. Observe there are three ways to specify the new columns

In [563]:

mydf >> define('value1*2',('newcol2','value2*2'),newcol3='value2*3')

Out[563]:

    comp dept grp    ...        value1*2    newcol2  \
0     C2   D4  G1    ...      100.531782  48.151753   
1     C2   D3  G2    ...       88.613694  38.787435   
2     C2   D1  G2    ...      112.494805  35.879876   
3     C3   D1  G2    ...       83.493501  36.481197   
4     C1   D4  G2    ...      101.831232  40.746562   
..   ...  ...  ..    ...             ...        ...   
195   C3   D3  G1    ...      100.370282  31.218259   
196   C2   D2  G1    ...      109.813152  44.895727   
197   C1   D2  G1    ...      106.162897  39.571135   
198   C3   D4  G2    ...       89.334105  43.105131   
199   C3   D3  G1    ...      118.162988  30.226857   

       newcol3  
0    72.227629  
1    58.181152  
2    53.819815  
3    54.721795  
4    61.119844  
..         ...  
195  46.827389  
196  67.343590  
197  59.356702  
198  64.657696  
199  45.340286  

[200 rows x 8 columns]

Select Column(s)¶

In [516]:

mydf2 = mydf >> define(newcol1='value1',newcol2='value2')
mydf2.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 200 entries, 0 to 199
Data columns (total 7 columns):
comp       200 non-null object
dept       200 non-null object
grp        200 non-null object
value1     200 non-null float64
value2     200 non-null float64
newcol1    200 non-null float64
newcol2    200 non-null float64
dtypes: float64(4), object(3)
memory usage: 11.0+ KB

By Column Names¶

Exact Coumn Name

In [513]:

mydf2 >> select ('comp','dept','value1')

Out[513]:

    comp dept     value1
0     C2   D4  50.265891
1     C2   D3  44.306847
2     C2   D1  56.247403
3     C3   D1  41.746750
4     C1   D4  50.915616
..   ...  ...        ...
195   C3   D3  50.185141
196   C2   D2  54.906576
197   C1   D2  53.081448
198   C3   D4  44.667053
199   C3   D3  59.081494

[200 rows x 3 columns]

Column Name Starts With ...

In [517]:

mydf2 >> select ('comp', startswith='val')

Out[517]:

    comp     value1     value2
0     C2  50.265891  24.075876
1     C2  44.306847  19.393717
2     C2  56.247403  17.939938
3     C3  41.746750  18.240598
4     C1  50.915616  20.373281
..   ...        ...        ...
195   C3  50.185141  15.609130
196   C2  54.906576  22.447863
197   C1  53.081448  19.785567
198   C3  44.667053  21.552565
199   C3  59.081494  15.113429

[200 rows x 3 columns]

Column Name Ends With ...

In [520]:

mydf2 >> select ('comp',endswith=('1','2','3'))

Out[520]:

    comp     value1     value2    newcol1    newcol2
0     C2  50.265891  24.075876  50.265891  24.075876
1     C2  44.306847  19.393717  44.306847  19.393717
2     C2  56.247403  17.939938  56.247403  17.939938
3     C3  41.746750  18.240598  41.746750  18.240598
4     C1  50.915616  20.373281  50.915616  20.373281
..   ...        ...        ...        ...        ...
195   C3  50.185141  15.609130  50.185141  15.609130
196   C2  54.906576  22.447863  54.906576  22.447863
197   C1  53.081448  19.785567  53.081448  19.785567
198   C3  44.667053  21.552565  44.667053  21.552565
199   C3  59.081494  15.113429  59.081494  15.113429

[200 rows x 5 columns]

Column Name Contains ...

In [519]:

mydf2 >> select('comp', contains=('col','val'))

Out[519]:

    comp     value1     value2    newcol1    newcol2
0     C2  50.265891  24.075876  50.265891  24.075876
1     C2  44.306847  19.393717  44.306847  19.393717
2     C2  56.247403  17.939938  56.247403  17.939938
3     C3  41.746750  18.240598  41.746750  18.240598
4     C1  50.915616  20.373281  50.915616  20.373281
..   ...        ...        ...        ...        ...
195   C3  50.185141  15.609130  50.185141  15.609130
196   C2  54.906576  22.447863  54.906576  22.447863
197   C1  53.081448  19.785567  53.081448  19.785567
198   C3  44.667053  21.552565  44.667053  21.552565
199   C3  59.081494  15.113429  59.081494  15.113429

[200 rows x 5 columns]

Specify Column Range¶

In [518]:

mydf2 >> select ('comp', slice('value1','newcol2'))

Out[518]:

    comp     value1     value2    newcol1    newcol2
0     C2  50.265891  24.075876  50.265891  24.075876
1     C2  44.306847  19.393717  44.306847  19.393717
2     C2  56.247403  17.939938  56.247403  17.939938
3     C3  41.746750  18.240598  41.746750  18.240598
4     C1  50.915616  20.373281  50.915616  20.373281
..   ...        ...        ...        ...        ...
195   C3  50.185141  15.609130  50.185141  15.609130
196   C2  54.906576  22.447863  54.906576  22.447863
197   C1  53.081448  19.785567  53.081448  19.785567
198   C3  44.667053  21.552565  44.667053  21.552565
199   C3  59.081494  15.113429  59.081494  15.113429

[200 rows x 5 columns]

Drop Column(s)¶

In [514]:

mydf2 >> select('newcol1','newcol2',drop=True)

Out[514]:

    comp dept grp     value1     value2
0     C2   D4  G1  50.265891  24.075876
1     C2   D3  G2  44.306847  19.393717
2     C2   D1  G2  56.247403  17.939938
3     C3   D1  G2  41.746750  18.240598
4     C1   D4  G2  50.915616  20.373281
..   ...  ...  ..        ...        ...
195   C3   D3  G1  50.185141  15.609130
196   C2   D2  G1  54.906576  22.447863
197   C1   D2  G1  53.081448  19.785567
198   C3   D4  G2  44.667053  21.552565
199   C3   D3  G1  59.081494  15.113429

[200 rows x 5 columns]

Column Manipulation¶