Frequency Table (crosstab)

crosstab returns Dataframe Object

crosstab( index = <SeriesObj>, columns = <colName> )                    # one dimension table
crosstab( index = <SeriesObj>, columns = <SeriesObj> )                  # two dimension table
crosstab( index = <SeriesObj>, columns = [<SeriesObj1>, <SeriesObj2>] ) # multi dimension table   
crosstab( index = <SeriesObj>, columns = <SeriesObj>, margines=True )   # add column and row margins

Sample Data

In [ ]:

n = 200
comp = ['C' + i for i in np.random.randint( 1,4, size  = n).astype(str)] # 3x Company
dept = ['D' + i for i in np.random.randint( 1,6, size  = n).astype(str)] # 5x Department
grp =  ['G' + i for i in np.random.randint( 1,3, size  = n).astype(str)] # 2x Groups
value1 = np.random.normal( loc=50 , scale=5 , size = n)
value2 = np.random.normal( loc=20 , scale=3 , size = n)
value3 = np.random.normal( loc=5 , scale=30 , size = n)

mydf = pd.DataFrame({
    'comp':comp, 
    'dept':dept, 
    'grp': grp,
    'value1':value1, 
    'value2':value2,
    'value3':value3 })
mydf.head()

One DimensionTable

In [391]:

pd.crosstab(index=mydf.comp, columns='counter')

Out[391]:

col_0  counter
comp          
C1          64
C2          69
C3          67

In [419]:

type(pd.crosstab(index=mydf.comp, columns='counter'))

Out[419]:

pandas.core.frame.DataFrame

Two Dimension Table

In [404]:

pd.crosstab(index=mydf.comp, columns=mydf.dept)

Out[404]:

dept  D1  D2  D3  D4  D5
comp                    
C1    15   7  16  12  14
C2    16  13  16  17   7
C3    10  11  15  16  15

Higher Dimension Table

In [424]:

tb = pd.crosstab(index=mydf.comp, columns=[mydf.dept, mydf.grp])
tb

Out[424]:

dept  D1    D2 ... D4 D5   
grp   G1 G2 G1 ... G2 G1 G2
comp           ...         
C1     7  8  4 ...  5  6  8
C2    10  6  7 ...  9  5  2
C3     5  5  5 ...  7  9  6

[3 rows x 10 columns]

Get the subtable under D2

In [400]:

tb['D2']

Out[400]:

grp   G1  G2
comp        
C1     4   3
C2     7   6
C3     5   6
All   16  15

Getting Margin

New column and row labeled 'All' will be created

In [428]:

tb = pd.crosstab(index=mydf.dept, columns=mydf.grp, margins=True)
tb

Out[428]:

grp    G1  G2  All
dept              
D1     22  19   41
D2     16  15   31
D3     27  20   47
D4     24  21   45
D5     20  16   36
All   109  91  200

In [429]:

tb['All']   # row total, return a Series

Out[429]:

dept
D1      41
D2      31
D3      47
D4      45
D5      36
All    200
Name: All, dtype: int64

In [430]:

tb.loc['All'] # column total, return a Series

Out[430]:

grp
G1     109
G2      91
All    200
Name: All, dtype: int64

Getting Proportion

Use matrix operation divide for each cells over the margin

In [431]:

tb/tb.loc['All']

Out[431]:

grp         G1        G2    All
dept                           
D1    0.201835  0.208791  0.205
D2    0.146789  0.164835  0.155
D3    0.247706  0.219780  0.235
D4    0.220183  0.230769  0.225
D5    0.183486  0.175824  0.180
All   1.000000  1.000000  1.000

Reseting Index

  • When creating a crosstab, column specified by index will become index
  • To convert it to normal column, use reset_index()

    DataFrameObj.reset_index( inpalce=False )
    

In [656]:

tb.reset_index()

Out[656]:

grp dept   G1  G2  All
0     D1   22  19   41
1     D2   16  15   31
2     D3   27  20   47
3     D4   24  21   45
4     D5   20  16   36
5    All  109  91  200

results matching ""

    No results matching ""