Complete Enumeration of all possible Binary Splits (MR)

Automatic Interaction Detector (AID): The first step in the AID procedures involves a complete enumeration of all possible binary splits. Because all independent variables used in the AID analysis are categorical, each category of each independent variable can be sued to spilt the total sample into two sub samples.

To illustrate assume that researchers only measured two independent variables in the aforementioned study of appliance purchasers: the age of the head of the household and the household’s annual income. The age of the head of the household was recorded in two categories (45 years of age or younger; older than 45 years of age); and the annual household income was recorded in three categories (less than $20,000; $20,000 — $30,000; more than $30,000). AID procedures could use either the age variable or the income variable but not both variables together to split the total sample in to two sub samples. When doing so, there would be a total of four possible binary splits that could be used to create two sub samples. Those four binary splits would be based on the following pairs of categories:

1. Heads of households 45 years of age or younger and heads of households older than 45 years of age.
2. Households with annual incomes of less than $20,000 and households with annual incomes of $20,000 or more.
3. Households with annual incomes of $30,000 or less and households with annual incomes of more than $30,000.
4. Households with annual incomes in the $20,000 to $30,000 range and households with annual incomes less than $20,000 or more than $30,000.

These four pairs of sub-samples all of the possible ways the total samples can be broken down in to sub samples when using the two independent variables of age and income as the basis for splitting the total sample. Of course, if there were more than these two independent variables or if age and income had been recorded in four categories rather than in only two and three categories it would be possible to create more than four binary splits.

This is an important aspect of AID, because the AID procedures determine every possible binary split of very independent variable (age, income, and so on) included in the analysis. This is referred to as a complete enumeration of all possible binary splits.

Selecting the best first spilt: To determine how the total sample should first be spilt, the AID procedures apply the basic analysis to every possible binary split of the total sample that can be made using the categorical independent variables. For each possible split, a one way analysis of variance is used to determine if the difference between the average number of information sources used (the dependent variable) by the two sub samples is statistically significant. Each binary split that results in a significant difference between the two sub samples is listed for possible use as the basis for making the first AID split.

For example, a representative (but much abbreviated) list of the type of information available from a complete enumeration of all possible binary splits in the study of appliance purchasers. This shows that when the independent variable “cost of product” was used to divide the total sample on the basis of costing less than $200 or costing $200 or more, the average number of information sources used by the two sub groups was 1.95 and 1.68 respectively. That difference was judged statistically significant by the one way analysis of variance. The information shown in the above for the age and education variables should be similarly interpreted.