Summarization

Simple Method

Passing Multiple Expressions

In [605]:

gdf >> summarize('n()','sum(value1)','mean(value2)')

Out[605]:

   comp dept  n()  sum(value1)  mean(value2)
0    C2   D4   17   822.616781     21.727354
1    C2   D3    8   382.508590     19.204159
2    C2   D1   15   792.951922     19.384431
3    C3   D1   16   781.894494     19.218751
4    C1   D4   14   692.099862     21.086066
..  ...  ...  ...          ...           ...
10   C1   D2    8   427.151466     19.759528
11   C1   D5   13   649.086524     19.445141
12   C3   D3   16   799.665401     19.711510
13   C3   D2    9   447.793311     18.818598
14   C2   D2   11   573.303485     19.999535

[15 rows x 5 columns]

Specify Summarized Column Name

Assignment Method

  • Passing colName='expression'**
  • Column name cannot contain special character

In [621]:

gdf >> summarize(count='n()',v1sum='sum(value1)',v2_mean='mean(value2)')

Out[621]:

   comp dept  count       v1sum    v2_mean
0    C2   D4     17  822.616781  21.727354
1    C2   D3      8  382.508590  19.204159
2    C2   D1     15  792.951922  19.384431
3    C3   D1     16  781.894494  19.218751
4    C1   D4     14  692.099862  21.086066
..  ...  ...    ...         ...        ...
10   C1   D2      8  427.151466  19.759528
11   C1   D5     13  649.086524  19.445141
12   C3   D3     16  799.665401  19.711510
13   C3   D2      9  447.793311  18.818598
14   C2   D2     11  573.303485  19.999535

[15 rows x 5 columns]

Tuple Method ('colName','expression') Use when the column name contain special character

In [623]:

gdf >> summarize(('count','n()'),('v1.sum','sum(value1)'),('s2.sum','sum(value2)'),v2mean=np.mean(value2))

Out[623]:

   comp dept  count      v1.sum      s2.sum     v2mean
0    C2   D4     17  822.616781  369.365011  20.102874
1    C2   D3      8  382.508590  153.633271  20.102874
2    C2   D1     15  792.951922  290.766469  20.102874
3    C3   D1     16  781.894494  307.500019  20.102874
4    C1   D4     14  692.099862  295.204927  20.102874
..  ...  ...    ...         ...         ...        ...
10   C1   D2      8  427.151466  158.076226  20.102874
11   C1   D5     13  649.086524  252.786832  20.102874
12   C3   D3     16  799.665401  315.384162  20.102874
13   C3   D2      9  447.793311  169.367385  20.102874
14   C2   D2     11  573.303485  219.994881  20.102874

[15 rows x 6 columns]

Number of Rows in Group

  • n() : total rows in group
  • n_unique() : total of rows with unique value

In [626]:

gdf >> summarize(count='n()', va11_unique='n_unique(value1)')

Out[626]:

   comp dept  count  va11_unique
0    C2   D4     17           17
1    C2   D3      8            8
2    C2   D1     15           15
3    C3   D1     16           16
4    C1   D4     14           14
..  ...  ...    ...          ...
10   C1   D2      8            8
11   C1   D5     13           13
12   C3   D3     16           16
13   C3   D2      9            9
14   C2   D2     11           11

[15 rows x 4 columns]

results matching ""

    No results matching ""