Frequency Table (crosstab)¶
crosstab returns Dataframe Object
crosstab( index = <SeriesObj>, columns = <colName> ) # one dimension table
crosstab( index = <SeriesObj>, columns = <SeriesObj> ) # two dimension table
crosstab( index = <SeriesObj>, columns = [<SeriesObj1>, <SeriesObj2>] ) # multi dimension table
crosstab( index = <SeriesObj>, columns = <SeriesObj>, margines=True ) # add column and row margins
Sample Data¶
In [ ]:
n = 200
comp = ['C' + i for i in np.random.randint( 1,4, size = n).astype(str)] # 3x Company
dept = ['D' + i for i in np.random.randint( 1,6, size = n).astype(str)] # 5x Department
grp = ['G' + i for i in np.random.randint( 1,3, size = n).astype(str)] # 2x Groups
value1 = np.random.normal( loc=50 , scale=5 , size = n)
value2 = np.random.normal( loc=20 , scale=3 , size = n)
value3 = np.random.normal( loc=5 , scale=30 , size = n)
mydf = pd.DataFrame({
'comp':comp,
'dept':dept,
'grp': grp,
'value1':value1,
'value2':value2,
'value3':value3 })
mydf.head()
One DimensionTable¶
In [391]:
pd.crosstab(index=mydf.comp, columns='counter')
Out[391]:
col_0 counter comp C1 64 C2 69 C3 67
In [419]:
type(pd.crosstab(index=mydf.comp, columns='counter'))
Out[419]:
pandas.core.frame.DataFrame
Two Dimension Table¶
In [404]:
pd.crosstab(index=mydf.comp, columns=mydf.dept)
Out[404]:
dept D1 D2 D3 D4 D5 comp C1 15 7 16 12 14 C2 16 13 16 17 7 C3 10 11 15 16 15
Higher Dimension Table¶
In [424]:
tb = pd.crosstab(index=mydf.comp, columns=[mydf.dept, mydf.grp]) tb
Out[424]:
dept D1 D2 ... D4 D5 grp G1 G2 G1 ... G2 G1 G2 comp ... C1 7 8 4 ... 5 6 8 C2 10 6 7 ... 9 5 2 C3 5 5 5 ... 7 9 6 [3 rows x 10 columns]
Get the subtable under D2
In [400]:
tb['D2']
Out[400]:
grp G1 G2 comp C1 4 3 C2 7 6 C3 5 6 All 16 15
Getting Margin¶
New column and row labeled 'All' will be created
In [428]:
tb = pd.crosstab(index=mydf.dept, columns=mydf.grp, margins=True) tb
Out[428]:
grp G1 G2 All dept D1 22 19 41 D2 16 15 31 D3 27 20 47 D4 24 21 45 D5 20 16 36 All 109 91 200
In [429]:
tb['All'] # row total, return a Series
Out[429]:
dept D1 41 D2 31 D3 47 D4 45 D5 36 All 200 Name: All, dtype: int64
In [430]:
tb.loc['All'] # column total, return a Series
Out[430]:
grp G1 109 G2 91 All 200 Name: All, dtype: int64
Getting Proportion¶
Use matrix operation divide for each cells over the margin
In [431]:
tb/tb.loc['All']
Out[431]:
grp G1 G2 All dept D1 0.201835 0.208791 0.205 D2 0.146789 0.164835 0.155 D3 0.247706 0.219780 0.235 D4 0.220183 0.230769 0.225 D5 0.183486 0.175824 0.180 All 1.000000 1.000000 1.000
Reseting Index¶
- When creating a crosstab, column specified by index will become index
To convert it to normal column, use reset_index()
DataFrameObj.reset_index( inpalce=False )
In [656]:
tb.reset_index()
Out[656]:
grp dept G1 G2 All 0 D1 22 19 41 1 D2 16 15 31 2 D3 27 20 47 3 D4 24 21 45 4 D5 20 16 36 5 All 109 91 200