A scatter plot is a set of points plotted on a horizontal and vertical axes.
Scatter plots are important in statistics because they can show the extent of correlation, if any, between the values of observed quantities or phenomena (called variables). If no correlation exists between the variables, the points appear randomly scattered on the coordinate plane. If a large correlation exists, the points concentrate near a straight line. Scatter plots are useful data visualization tools for illustrating a trend.
Besides showing the extent of correlation, a scatter plot shows the sense of the correlation:
- If the vertical (or y-axis) variable increases as the horizontal (or x-axis) variable increases, the correlation is positive.
- If the y-axis variable decreases as the x-axis variable increases or vice-versa, the correlation is negative.
- If it is impossible to establish either of the above criteria, then the correlation is zero.
The maximum possible positive correlation is +1 or +100%, when all the points in a scatter plot lie exactly along a straight line with a positive slope. The maximum possible negative correlation is -1 or -100%, in which case all the points lie exactly along a straight line with a negative slope.
Correlation is often confused with causation, either accidentally (as a result of false or unproved hypotheses) or deliberately (with intent to deceive). However, in the pure sense, while a scatter plot can reveal the nature and extent of correlation, it says nothing about causation.