Implement the chi-squared test
You can add the chi-squared test to your visualization to highlight whether there is a significant association between two categorical variables.
Note: The chi-squared test can be used with a maximum of two values of a
pivoted field, such as Wave 1 and Wave 2. Use filters to exclude any additional
values included in your data (e.g. Wave 3).
-
In the Field Picker, select the Count measure to include it in the
Data panel.
The following step creates a table calculation for use in subsequent table calculations. After you create this table calculation you can hide the fields for these measures in the Data panel.
-
Create a table calculation for
Count:
- In the Field Picker, select Add > Table Calculation.
-
Enter
${ and then select the
Count measure from the drop-down list.
- In the Name text box, enter Count as the field name.
- Click Save.
-
Create a table calculation for
CountA:
- In the Field Picker, select Add > Table Calculation.
-
Copy the appropriate table calculation template into the
Expression text box.
# Convert NULLS to 0s to enable proper calculation if(is_null(${sample1_count}),0,${sample1_count})
-
Replace all instances of
${sample1_count}
with the${count}
table calculation. -
In the
Name text box, enter the field name.
For example: CountA
- Click Save.
-
Create a table calculation for
Column total count:
- In the Field Picker, select Add > Table Calculation.
-
Copy the appropriate table calculation template into the
Expression text box.
sum(${sample1_count})
-
Replace
${sample1_count}
with the${counta}
table calculation. -
In the
Name text box, enter the field name.
For example: Column Total Count
- Click Save.
-
Create a table calculation for
Proportion%:
- In the Field Picker, select Add > Table Calculation.
-
Copy the appropriate table calculation template into the
Expression text box.
${counta}/${column_total_count}
-
In the
Name text box, enter the field name.
For example: Proportion %
- Click Save.
-
Create a table calculation for
Period over period change (PoP%):
- In the Field Picker, select Add > Table Calculation.
-
Copy the table calculation template into the
Expression text box.
if(${period/sample_group}=1 # 1 is equal to the period number # This can alternatively be equal to a sample_group name # ----- Begin second pivot section ---------------------------------- OR ${second_pivot}!=pivot_offset(${second_pivot}, -1) # what the previous line of code is doing is that given the current column # like "Canada", return true if the column before's value is not "Canada" # ----- End second pivot section ---------------------------------- ,null, ((${proportion}-pivot_offset(${proportion}, -1)))) # the -1 arguments above are taking the previous value
-
Replace
${period/sample_group}
in the first line with the period, sample group or wave dimension in your data, and if you are using a second pivot, replace${second_pivot}
with the second field you are pivoting on. - In the Format drop-down list, select Percent.
- In the Decimals text box, enter 0.
-
In the
Name text box, enter the field name.
For example: PoP%
- Click Save.
-
Create a custom row total table calculation named
Row total count:
- In the Field Picker, select Add > Table Calculation.
-
Copy the appropriate table calculation template into the
Expression text box.
if(${period/sample_group}="1" # 1 is equal to the period number # This can alternatively be equal to a sample_group name # ----- Begin second pivot section ---------------------------------- # Remove this section if you aren't including a second pivoted field OR ${second_pivot}!=pivot_offset(${second_pivot}, -1) # what the previous line of code is doing is that given the current column # like "Canada", return true if the column before's value is not "Canada" # ----- End second pivot section ---------------------------------- ,null, ${sample1_count}+if(is_null(pivot_offset(${sample2_count},-1)),0,pivot_offset(${sample2_count},-1))) # the -1 arguments above are taking the previous value
-
Replace
${period/sample_group}
in the first line with the period, sample group or wave dimension in your data, and replace${second_pivot}
with the second field you are pivoting on. -
Replace all instances of
${sample1_count}
and${sample2_count}
with the${count}
table calculation. -
In the
Name text box, enter the field name.
For example: Row Total Count
- Click Save.
-
Create a
Max number of unique row values per 2 quarters table
calculation:
- In the Field Picker, select Add > Table Calculation.
-
Copy this table calculation template into the
Expression text box:
if(${period/sample_group}="1" # 1 is equal to the period number # This can alternatively be equal to a sample_group name # ----- Begin second pivot section ---------------------------------- # Remove this section if you aren't including a second pivoted field OR ${second_pivot}!=pivot_offset(${second_pivot}, -1) # what the previous line of code is doing is that given the current column # like "Canada", return true if the column before's value is not "Canada" # ----- End second pivot section ---------------------------------- ,null, max(list(sum(if(${counta}>=1,1,0)),sum(if(pivot_offset(${counta},-1)>=1,1,0)))))
-
Replace
${period/sample_group}
in the first line with the period, sample group or wave dimension in your data, and replace${second_pivot}
with the second field you are pivoting on. -
In the
Name text box, enter the field name.
For example: Max # of Unique Row Values per 2 Quarters
- Click Save.
-
Create a
Degrees of freedom table calculation:
- In the Field Picker, select Add > Table Calculation.
-
Copy this table calculation template into the
Expression text box:
# Assuming only comparing max 2 waves at a time # Thus, the degrees of freedom will always be # (2-1) '(# of waves -1)' x (n-1) '# of choices -1' (2-1)*(${max_number_of_unique_row_values_per_2_quarters}-1)
-
In the
Name text box, enter the field name.
For example: Degrees of Freedom
- Click Save.
-
Create an
Expected frequency table calculation:
- In the Field Picker, select Add > Table Calculation.
-
Copy this table calculation template into the
Expression text box:
if(is_null( (pivot_offset(${row_total_count},1)*${column_total_count}) / sum(pivot_offset(${row_total_count},1)) ), #True, This marks the last period of the pivot table ${row_total_count}*${column_total_count} / sum(${row_total_count}), #False (pivot_offset(${row_total_count},1)*${column_total_count}) / sum(pivot_offset(${row_total_count},1)) )
- In the Format drop-down list, select Decimals.
- In the Decimals text box, enter 2.
-
In the
Name text box, enter the field name.
For example: Expected Frequency
- Click Save.
-
Create a
Chi-Squared table calculation:
- In the Field Picker, select Add > Table Calculation.
-
Copy this table calculation into the
Expression text box:
#χ2 = (observed value - expected value)² / expected value power(${counta}-${expected_frequency},2) / ${expected_frequency}
-
In the
Name text box, enter the field name.
For example: Chi-Squared
- Click Save.
-
Create a
Final Chi-Squared table calculation:
- In the Field Picker, select Add > Table Calculation.
-
Copy this table calculation template into the
Expression text box:
if(${period/sample_group}="1" # 1 is equal to the period number # This can alternatively be equal to a sample_group name # ----- Begin second pivot section ---------------------------------- # Remove this section if you aren't including a second pivoted field OR ${second_pivot}!=pivot_offset(${second_pivot}, -1) # what the previous line of code is doing is that given the current column # like "Canada", return true if the column before's value is not "Canada" # ----- End second pivot section ---------------------------------- OR ${row_total_count}=0 ,null, sum(${chi_squared})+sum(pivot_offset(${chi_squared},-1)))
-
Replace
${period/sample_group}
in the first line with the period, sample group or wave dimension in your data, and replace${second_pivot}
with the second field you are pivoting on. -
In the
Name text box, enter the field name.
For example: Final Chi-Squared
- Click Save.
-
Create a
Chi-Squared Significant table calculation:
- In the Field Picker, select Add > Table Calculation.
-
Copy this table calculation template into the
Expression text box:
if(${period/sample_group}="1" # 1 is equal to the period number # This can alternatively be equal to a sample_group name # ----- Begin second pivot section ---------------------------------- # Remove this section if you aren't including a second pivoted field OR ${second_pivot}!=pivot_offset(${second_pivot}, -1) # what the previous line of code is doing is that given the current column # like "Canada", return true if the column before's value is not "Canada" # ----- End second pivot section ---------------------------------- ,null, # If the max expected_frequency of the current period or the max expected_frequency # of the previous period is less than 5, than null (minimum rule of thumb expected_frequency is 5) if(max(${expected_frequency_1})<5 OR max(pivot_offset(${expected_frequency_2},-1)) < 5,null, # If chisq_dist is greater than 95% signficance level, return 1(true) else 0(false) if( chisq_dist(${final_chi_squared},${degrees_of_freedom},yes) # Significance % >0.95,1,0)))
-
Replace
${period/sample_group}
in the first line with the period, sample group or wave dimension in your data, and replace${second_pivot}
with the second field you are pivoting on. -
Replace
${expected_frequency_1}
and${expected_frequency_1}
with the${expected_frequency}
table calculation. -
In the
Name text box, enter the field name.
For example: Chi-Squared Significant
- Click Save.