Summarization¶
Simple Method¶
Passing Multiple Expressions
In [605]:
gdf >> summarize('n()','sum(value1)','mean(value2)')
Out[605]:
comp dept n() sum(value1) mean(value2) 0 C2 D4 17 822.616781 21.727354 1 C2 D3 8 382.508590 19.204159 2 C2 D1 15 792.951922 19.384431 3 C3 D1 16 781.894494 19.218751 4 C1 D4 14 692.099862 21.086066 .. ... ... ... ... ... 10 C1 D2 8 427.151466 19.759528 11 C1 D5 13 649.086524 19.445141 12 C3 D3 16 799.665401 19.711510 13 C3 D2 9 447.793311 18.818598 14 C2 D2 11 573.303485 19.999535 [15 rows x 5 columns]
Specify Summarized Column Name¶
Assignment Method
- Passing colName='expression'**
- Column name cannot contain special character
In [621]:
gdf >> summarize(count='n()',v1sum='sum(value1)',v2_mean='mean(value2)')
Out[621]:
comp dept count v1sum v2_mean 0 C2 D4 17 822.616781 21.727354 1 C2 D3 8 382.508590 19.204159 2 C2 D1 15 792.951922 19.384431 3 C3 D1 16 781.894494 19.218751 4 C1 D4 14 692.099862 21.086066 .. ... ... ... ... ... 10 C1 D2 8 427.151466 19.759528 11 C1 D5 13 649.086524 19.445141 12 C3 D3 16 799.665401 19.711510 13 C3 D2 9 447.793311 18.818598 14 C2 D2 11 573.303485 19.999535 [15 rows x 5 columns]
Tuple Method ('colName','expression') Use when the column name contain special character
In [623]:
gdf >> summarize(('count','n()'),('v1.sum','sum(value1)'),('s2.sum','sum(value2)'),v2mean=np.mean(value2))
Out[623]:
comp dept count v1.sum s2.sum v2mean 0 C2 D4 17 822.616781 369.365011 20.102874 1 C2 D3 8 382.508590 153.633271 20.102874 2 C2 D1 15 792.951922 290.766469 20.102874 3 C3 D1 16 781.894494 307.500019 20.102874 4 C1 D4 14 692.099862 295.204927 20.102874 .. ... ... ... ... ... ... 10 C1 D2 8 427.151466 158.076226 20.102874 11 C1 D5 13 649.086524 252.786832 20.102874 12 C3 D3 16 799.665401 315.384162 20.102874 13 C3 D2 9 447.793311 169.367385 20.102874 14 C2 D2 11 573.303485 219.994881 20.102874 [15 rows x 6 columns]
Number of Rows in Group¶
- n() : total rows in group
- n_unique() : total of rows with unique value
In [626]:
gdf >> summarize(count='n()', va11_unique='n_unique(value1)')
Out[626]:
comp dept count va11_unique 0 C2 D4 17 17 1 C2 D3 8 8 2 C2 D1 15 15 3 C3 D1 16 16 4 C1 D4 14 14 .. ... ... ... ... 10 C1 D2 8 8 11 C1 D5 13 13 12 C3 D3 16 16 13 C3 D2 9 9 14 C2 D2 11 11 [15 rows x 4 columns]