Introductory comments on Correlation and regression analysis

Cross tabulation is used when the variables to be analyzed are in categorical form. Correlation and regression analysis are used in situations where both the dependent and independent variables are of the continuous type.

Continuous Variables: Continuous variables are ones that can be quantified or measured on a continuum and that can take on any value from zero to the largest number possible in the series. A respondent’s age (in years) and the number of miles he or she drives annually are two examples of continuous variables. Percentages are also continuous variables, as are a household’s annual expenditure on all types of Insurance and the pounds of coffee it consumes annually. Also, many interval scales used in attitude measurement are treated as continuous variables.

Generally speaking, because of the mathematical equation involved, the results from correlation and regression analyses are judged to be (1) more accurate representations of the relationships between variables and (2) more objectively arrived at than similar results from cross tabulation.

The following discussions of correlation and regression analysis are designed to help the reader gain a better understanding of the basic concepts underlying them. These methods are very complex mathematically, but the discussions attempt to avoid that aspect of these two topics.

Correlation analysis

Perhaps it is easiest for the student to grasp an understanding of correlation analysis if we first present a series of scatter diagrams of hypothetical survey data. By illustrating how different patterns of survey data appear in a scatter diagram, and by showing the different results from a correlation analysis when applied to those data, the student can quickly learn when it is possible to apply correlation analysis, and what is learned from doing so.

Data typically used in correlation analysis:

Assume that our hypothetical survey data were obtained from interviews with a large number of respondents say, 100 or more. During the interviewers the respondents were asked how often they consumed a certain food product in atypical month. They were also asked other questions, including their annual income in dollars, their age in years, and other things.

These data possess two characteristics typical of the data to which correlation analysis can be applied.

1. The data are in continuous form, especially consumption frequency per month, annual income in dollars, and age in years.
2. More than on variable is measured for each respondent that is, each respondent is asked a number of questions, each of which represents a different variable.

Constructing Scatter Diagrams of Data from Variables:

For the student unfamiliar with correlation analysis, a good way to approach the topic is to discuss the construction of scatter diagrams of data. Since a correlation analysis typically involves one dependent variable and one independent variable, we can begin by constructing a scatter diagram using the data from two variables in our hypothetical study. A respondent’s consumption frequency per month will be used as the dependent variable, and it will be identified as variable Y. That same respondent’s annual income will be used as the independent variables, and it will be identified as variable X.

A scatter diagram typically uses the dependent variable as the vertical (or Y ) axis of the diagram and the independent variable as the horizontal (or X) axis of the diagram. Each respondent is represented by a dot on the scatter diagram. For any given respondent, a dot’s vertical (Y) location is determined by the respondent’s consumption frequency per month, and its horizontal (X) location is determined by that same respondent’s annual income. The scatter diagram is completed when every respondent is represented by a dot that reflects each respondent’s consumption frequency per month and annual income.