# Using the Correlation Tool from the Excel 2007 and Excel 2010 Analysis ToolPak

## Introduction

Welcome to the third hub in my series on the Excel 2007 and Excel 2010 Analysis ToolPak. In this hub, I will be looking at using the Correlation Tool.

A correlation is a relationship between two variables.

For example, it is easy to imagine that there is a correlation between cold weather and the sale of pies at a shop (as the weather gets cooler it would be reasonable to expect that the number of pies sold increases). Using the Correlation tool you can determine if there is in fact a statistically significant correlation (or relationship) between cold weather and pie sales.

## The three types of Correlation, Positive, Negative and No Correlation

Before we begin to analyse our data, let’s look at the three types of correlation.

• The first is a Positive correlation. In this case, when one variable goes up, the other goes up. So sun hat sales going up as the weather gets hotter (the temperature increases) is an example of a positive correlation.
• The second is a Negative correlation where one variable goes down as another increases. An example of this is pie sales in warm weather. Pie sales fall as the temperature increases.
• The final correlation is No correlation. This occurs when the variables are not linked, so for example sales of televisions and the weather are very unlikely to be linked.

In order to confirm these relationships, we would test for a correlation between the variables using the Correlation tool which is part of the Excel 2007 / Excel 2010 Analysis ToolPak.

Below are examples of charts showing the three types of correlation with trend lines illustrating the correlation:

Statistically,

• A strongly positive correlation is close to or equal to 1
• A strongly negative correlation is close to or equal to -1
• No correlation is equal to 0

For example, a positive correlation of 0.6 is stronger than a correlation of 0.4. A correlation of 1 is a perfect positive correlation.

## Testing for a Correlation using the Correlation Tool from the Analysis ToolPak in Excel 2007 and Excel 2010

Before beginning the test, ensure that the Analysis ToolPak is installed within Excel 2007 or Excel 2010 and is visible. You can access the ToolPak via the Data Analysis button on the Data tab, in the Analysis group. If you cannot see the button or are unsure how to add it, please refer to my hub which covers using the Analysis ToolPak in both Excel 2007 and Excel 2010 and can be found here:

http://robbiecwilson.hubpages.com/hub/Using-the-Analysis-ToolPak-in-Excel-2007-and-Excel-2010

In order to begin testing for a correlation, you need data for your two variables in two columns (or rows) in your spreadsheet.

For my example, I am going to test my data for sun hat sales and the average temperature on that day for a positive correlation.

• To begin with, I click on Data Analysis button on the Data tab, in the Analysis group
• Then select Correlation
• Select the Input Range of your data
• If you have Labels in first row then select that check box
• The Data is Grouped By Columns so I leave that as default
• I then select an Output Range and click OK

The Correlation tool gave me a correlation of .97 (shown in the figure below) which is a very strong positive correlation. I graphed the two data series using a Scatter chart and the results can be seen below. To learn more about graphs and charts in Excel 2007, please refer to my hub which goes into greater detail which can be found here:

http://robbiecwilson.hubpages.com/hub/Creating-charts-and-graphs-in-Excel-2007

I also added a trend line to the scatter chart to illustrate the correlation. I have also written a hub on adding trend lines to charts and that can be found here:

Scroll to Continue

Now I will repeat the process for my data on pie sales in cold weather. You can see the results in the figure below. The correlation is once again very strong (-.93). As temperature drops, the number of pies sold goes up significantly.

Finally, we will look at the sales of televisions and the average daily temperature. We expect there to be no correlation as I cannot think of a single reason why the two would be related. If you can, please leave a comment below in the comments section. As you would expect, when I us the Correlation tool on my data, it finds almost no correlation (.07) at all between TV sales and temperature. As the chart shows, the data is all over the place as expected.

## Let’s look at Correlation and Cause in more detail

Now that we understand correlation and how to test for it, we need to consider the relationship between correlation and cause. In our first example, the sales of sun hats and the average daily temperature, it is easy to see the correlation.

But is the sale of sun hats caused only by higher temperatures? We cannot say statistically without considering all the variables that could possibly affect sun hat sales:

• A favourable weekend forecast could impact hat sales
• Also, a hot day may also be cloudy (or if you are in the tropics it may actually be raining) which you would expect not to cause people to go out and buy sun hats
• Celebrity endorsement may affect sales

In summary, correlation and cause are not necessarily linked. Although logically sun hat sales and temperature are linked, in order to be mathematically certain all variables influencing sales would have to be investigated and tested.

## Now, let’s look at the Limitations of Correlation

There is one instance where a correlation is not the most appropriate statistical measure of the dependence of two variables on each other. That is where the relationship between the two variables changes from positive to negative (or vice versa). To better explain this, I will use an example.

Suppose I have a plant that really likes low humidity. As humidity increases from zero to fifty my plant just grows faster and faster. After humidity goes above fifty, the plant begins to grow slower and slower and continues to decrease as humidity goes up. So the correlation starts off positive, reaches a certain point and then becomes negative. To see this graphically, I have charted my plants response to humidity.

## Conclusion

In this hub, I have investigated correlation which is the dependence of two variables on each other. Variables can be positively or negatively correlated or they can not be correlated at all. I provided examples of each three in chart form with trend lines to illustrate the difference. Then we went through the process of actually testing the data for a correlation. Positive and negative correlations were investigated as well as data with no correlation at all. Finally, we investigated the idea that correlation and cause are not necessarily linked and the limitations of correlation.

The other hubs in my series on the Analysis ToolPak in Excel 2007 and Excel 2010 cover many of the statistical and mathematical tools available in the ToolPak, including:

The Histogram: this tool allows you to examine the distribution of your data across definable boundaries. In this hub, I look at students’ exam results across grade boundaries.

http://robbiecwilson.hubpages.com/hub/Using-the-Histogram-Tool-from-the-Analysis-ToolPak-in-Excel-2007-and-Excel-2010

Rank and Percentile: allows you to rank your data and also to apply percentiles to it. I look again at my students’ exam data and use the ranking results to assign grades using the calculated percentiles.

http://robbiecwilson.hubpages.com/hub/Use-the-Rank-and-Percentile-Tool-from-the-Analysis-ToolPak-in-Excel-2007-and-Excel-2010-to-create-ranking-tables

Regression enables you to visually summarise the relationship between variables. You can use the results to predict variables based on the known results of other variables. In this example, I examine the relationship between fish mortality and the concentration of Phosphate and Nitrogen in water.

http://robbiecwilson.hubpages.com/hub/Using-the-Regression-Tool-from-the-Analysis-ToolPak-in-Excel-2007-and-Excel-2010

Sampling lets you randomly create a sample from a larger population and perform analysis on that sample. I use sampling to create lottery numbers from a pool of numbers.

http://robbiecwilson.hubpages.com/hub/Use-the-Sampling-Tool-in-the-Excel-2007-and-Excel-2010-Analysis-ToolPak-for-sampling-and-to-analyse-periodic-variations

Finally, the Moving Average tool is used to look for trends in your data. This tool also creates a trend line on your charted data to illustrate the trend should one exist. I use the Moving Average tool to investigate whether my daily hits on my websites are trending upwards as I hope they are.

http://robbiecwilson.hubpages.com/hub/Using-the-Moving-Average-Tool-from-the-Excel-2007-and-Excel-2010-Analysis-ToolPak

I hope that you enjoyed working through this hub and that you found it useful and informative. Many thanks for reading and please feel free to leave any comments you may have.

I hope that you enjoyed working through this hub and that you found it useful and informative. Many thanks for reading and please feel free to leave any comments you may have.