Column Manipulation¶
Copy Column¶
In [498]:
mydf >> define(newcol = 'value1') # simple method for one column
Out[498]:
comp dept grp value1 value2 newcol 0 C2 D4 G1 50.265891 24.075876 50.265891 1 C2 D3 G2 44.306847 19.393717 44.306847 2 C2 D1 G2 56.247403 17.939938 56.247403 3 C3 D1 G2 41.746750 18.240598 41.746750 4 C1 D4 G2 50.915616 20.373281 50.915616 .. ... ... .. ... ... ... 195 C3 D3 G1 50.185141 15.609130 50.185141 196 C2 D2 G1 54.906576 22.447863 54.906576 197 C1 D2 G1 53.081448 19.785567 53.081448 198 C3 D4 G2 44.667053 21.552565 44.667053 199 C3 D3 G1 59.081494 15.113429 59.081494 [200 rows x 6 columns]
In [507]:
mydf >> define (('newcol1', 'value1'), newcol2='value2') # method for muiltiple new columns
Out[507]:
comp dept grp value1 value2 newcol1 \
0 C2 D4 G1 50.265891 24.075876 50.265891
1 C2 D3 G2 44.306847 19.393717 44.306847
2 C2 D1 G2 56.247403 17.939938 56.247403
3 C3 D1 G2 41.746750 18.240598 41.746750
4 C1 D4 G2 50.915616 20.373281 50.915616
.. ... ... .. ... ... ...
195 C3 D3 G1 50.185141 15.609130 50.185141
196 C2 D2 G1 54.906576 22.447863 54.906576
197 C1 D2 G1 53.081448 19.785567 53.081448
198 C3 D4 G2 44.667053 21.552565 44.667053
199 C3 D3 G1 59.081494 15.113429 59.081494
newcol2
0 24.075876
1 19.393717
2 17.939938
3 18.240598
4 20.373281
.. ...
195 15.609130
196 22.447863
197 19.785567
198 21.552565
199 15.113429
[200 rows x 7 columns]
New Column from existing Column¶
Without specify the new column name, it will be derived from expression
In [474]:
mydf >> define ('value1*2')
Out[474]:
comp dept grp value1 value2 value1*2 0 C2 D4 G1 50.265891 24.075876 100.531782 1 C2 D3 G2 44.306847 19.393717 88.613694 2 C2 D1 G2 56.247403 17.939938 112.494805 3 C3 D1 G2 41.746750 18.240598 83.493501 4 C1 D4 G2 50.915616 20.373281 101.831232 .. ... ... .. ... ... ... 195 C3 D3 G1 50.185141 15.609130 100.370282 196 C2 D2 G1 54.906576 22.447863 109.813152 197 C1 D2 G1 53.081448 19.785567 106.162897 198 C3 D4 G2 44.667053 21.552565 89.334105 199 C3 D3 G1 59.081494 15.113429 118.162988 [200 rows x 6 columns]
Specify the new column name
In [475]:
mydf >> define(value3 = 'value1*2')
Out[475]:
comp dept grp value1 value2 value3 0 C2 D4 G1 50.265891 24.075876 100.531782 1 C2 D3 G2 44.306847 19.393717 88.613694 2 C2 D1 G2 56.247403 17.939938 112.494805 3 C3 D1 G2 41.746750 18.240598 83.493501 4 C1 D4 G2 50.915616 20.373281 101.831232 .. ... ... .. ... ... ... 195 C3 D3 G1 50.185141 15.609130 100.370282 196 C2 D2 G1 54.906576 22.447863 109.813152 197 C1 D2 G1 53.081448 19.785567 106.162897 198 C3 D4 G2 44.667053 21.552565 89.334105 199 C3 D3 G1 59.081494 15.113429 118.162988 [200 rows x 6 columns]
Define multiple new columns in one go. Observe there are three ways to specify the new columns
In [563]:
mydf >> define('value1*2',('newcol2','value2*2'),newcol3='value2*3')
Out[563]:
comp dept grp ... value1*2 newcol2 \
0 C2 D4 G1 ... 100.531782 48.151753
1 C2 D3 G2 ... 88.613694 38.787435
2 C2 D1 G2 ... 112.494805 35.879876
3 C3 D1 G2 ... 83.493501 36.481197
4 C1 D4 G2 ... 101.831232 40.746562
.. ... ... .. ... ... ...
195 C3 D3 G1 ... 100.370282 31.218259
196 C2 D2 G1 ... 109.813152 44.895727
197 C1 D2 G1 ... 106.162897 39.571135
198 C3 D4 G2 ... 89.334105 43.105131
199 C3 D3 G1 ... 118.162988 30.226857
newcol3
0 72.227629
1 58.181152
2 53.819815
3 54.721795
4 61.119844
.. ...
195 46.827389
196 67.343590
197 59.356702
198 64.657696
199 45.340286
[200 rows x 8 columns]
Select Column(s)¶
In [516]:
mydf2 = mydf >> define(newcol1='value1',newcol2='value2') mydf2.info()
<class 'pandas.core.frame.DataFrame'> RangeIndex: 200 entries, 0 to 199 Data columns (total 7 columns): comp 200 non-null object dept 200 non-null object grp 200 non-null object value1 200 non-null float64 value2 200 non-null float64 newcol1 200 non-null float64 newcol2 200 non-null float64 dtypes: float64(4), object(3) memory usage: 11.0+ KB
By Column Names¶
Exact Coumn Name
In [513]:
mydf2 >> select ('comp','dept','value1')
Out[513]:
comp dept value1 0 C2 D4 50.265891 1 C2 D3 44.306847 2 C2 D1 56.247403 3 C3 D1 41.746750 4 C1 D4 50.915616 .. ... ... ... 195 C3 D3 50.185141 196 C2 D2 54.906576 197 C1 D2 53.081448 198 C3 D4 44.667053 199 C3 D3 59.081494 [200 rows x 3 columns]
Column Name Starts With ...
In [517]:
mydf2 >> select ('comp', startswith='val')
Out[517]:
comp value1 value2 0 C2 50.265891 24.075876 1 C2 44.306847 19.393717 2 C2 56.247403 17.939938 3 C3 41.746750 18.240598 4 C1 50.915616 20.373281 .. ... ... ... 195 C3 50.185141 15.609130 196 C2 54.906576 22.447863 197 C1 53.081448 19.785567 198 C3 44.667053 21.552565 199 C3 59.081494 15.113429 [200 rows x 3 columns]
Column Name Ends With ...
In [520]:
mydf2 >> select ('comp',endswith=('1','2','3'))
Out[520]:
comp value1 value2 newcol1 newcol2 0 C2 50.265891 24.075876 50.265891 24.075876 1 C2 44.306847 19.393717 44.306847 19.393717 2 C2 56.247403 17.939938 56.247403 17.939938 3 C3 41.746750 18.240598 41.746750 18.240598 4 C1 50.915616 20.373281 50.915616 20.373281 .. ... ... ... ... ... 195 C3 50.185141 15.609130 50.185141 15.609130 196 C2 54.906576 22.447863 54.906576 22.447863 197 C1 53.081448 19.785567 53.081448 19.785567 198 C3 44.667053 21.552565 44.667053 21.552565 199 C3 59.081494 15.113429 59.081494 15.113429 [200 rows x 5 columns]
Column Name Contains ...
In [519]:
mydf2 >> select('comp', contains=('col','val'))
Out[519]:
comp value1 value2 newcol1 newcol2 0 C2 50.265891 24.075876 50.265891 24.075876 1 C2 44.306847 19.393717 44.306847 19.393717 2 C2 56.247403 17.939938 56.247403 17.939938 3 C3 41.746750 18.240598 41.746750 18.240598 4 C1 50.915616 20.373281 50.915616 20.373281 .. ... ... ... ... ... 195 C3 50.185141 15.609130 50.185141 15.609130 196 C2 54.906576 22.447863 54.906576 22.447863 197 C1 53.081448 19.785567 53.081448 19.785567 198 C3 44.667053 21.552565 44.667053 21.552565 199 C3 59.081494 15.113429 59.081494 15.113429 [200 rows x 5 columns]
Specify Column Range¶
In [518]:
mydf2 >> select ('comp', slice('value1','newcol2'))
Out[518]:
comp value1 value2 newcol1 newcol2 0 C2 50.265891 24.075876 50.265891 24.075876 1 C2 44.306847 19.393717 44.306847 19.393717 2 C2 56.247403 17.939938 56.247403 17.939938 3 C3 41.746750 18.240598 41.746750 18.240598 4 C1 50.915616 20.373281 50.915616 20.373281 .. ... ... ... ... ... 195 C3 50.185141 15.609130 50.185141 15.609130 196 C2 54.906576 22.447863 54.906576 22.447863 197 C1 53.081448 19.785567 53.081448 19.785567 198 C3 44.667053 21.552565 44.667053 21.552565 199 C3 59.081494 15.113429 59.081494 15.113429 [200 rows x 5 columns]
Drop Column(s)¶
In [514]:
mydf2 >> select('newcol1','newcol2',drop=True)
Out[514]:
comp dept grp value1 value2 0 C2 D4 G1 50.265891 24.075876 1 C2 D3 G2 44.306847 19.393717 2 C2 D1 G2 56.247403 17.939938 3 C3 D1 G2 41.746750 18.240598 4 C1 D4 G2 50.915616 20.373281 .. ... ... .. ... ... 195 C3 D3 G1 50.185141 15.609130 196 C2 D2 G1 54.906576 22.447863 197 C1 D2 G1 53.081448 19.785567 198 C3 D4 G2 44.667053 21.552565 199 C3 D3 G1 59.081494 15.113429 [200 rows x 5 columns]