Key insights from
How to Lie with Statistics
By Darrell Huff
|
|
|
What you’ll learn
We encounter statistics every day—in advertisements, in the news—everywhere. But are statistics trustworthy and accurate? Many times, the answer is no. They can and often are skewed with specific goals in mind and to influence decisions toward certain outcomes. By looking closer at the statistical process, we can discern the validity of a statistic and make more informed choices.
Read on for key insights from How to Lie with Statistics.
|
|
1. Although the use of samples is the simplest and most-relied-on statistical method, samples are often unreliable.
Many statistics are gathered using the sampling process. Samples are taken of a certain group, and then the statistics are based on those samples. The samples are said to represent the group. If the group is large, the results will be more accurate. However, if the group is too small, the results will not be entirely representative of the whole group.
Using a handful of multicolored beans as an example, we can see differences in results. A small handful may consist of mainly black beans, but if we take a larger handful, we see that black, red, and white beans are equal in number. By forming opinions based on what we observe from small samples, we are not getting an accurate representation of the whole group. Unfortunately, trying to assess larger samples takes more time.
For a statistic to be accurate, a representative sample must be taken. That is a sample in which all biases have been eliminated. Looking closer at the sampling process helps us to pinpoint these biases, but a certain amount of skepticism is necessary. “Random” sampling is the most basic form, where a statistician focuses on a small part of the whole as being representative.
The form of sampling that is mostly used in market research and polling is called stratified random sampling. The difference in this approach is that it breaks the whole into smaller groups and then examines those subgroups. This leads to inaccuracies as the subgroups can be misleading in terms of representation. An example could be taking a poll door-to-door. If the poll is conducted during the day, that excludes all employed residents. If it is conducted in the evening, it may exclude residents who are socializing or running errands. It is extremely difficult to get a representative sample this way.
|
|
Sponsored by The Pour Over
Neutral news is hard to find.
The Pour Over provides concise, politically neutral, and entertaining summaries of the world’s biggest news paired with reminders to stay focused on eternity, and delivers it straight to your inbox. It's free, too.
|
|
2. By identifying the biases that limit a set of statistics, we can determine the data’s accuracy.
One of the biases that limits statistical methods such as polling is the tendency of the participant to give a desirable answer. Polls are extremely susceptible to bias. Participants often do not want to give an unpopular answer. Popular-opinion polls are mostly useless because of this. The sample is almost always twisted by a bias in one direction or the other.
Another common bias is the way in which an average is presented in the statistical process. An average can be represented by mean, mode, or median. The mean could result in a larger figure, as it is an arithmetical average of several numbers. However, if we wanted the figure to appear smaller, we would use the median average. This represents the middle figure out of all the figures considered, so half would be more, and half would be less. Lastly, you could use the mode average, which is the most frequently represented number in the sample. While these definitions are somewhat confusing, we can see that averages can be presented in various different ways to favor what we are trying to convey.
Conscious bias is one in which only favorable data is chosen for presentation. Unfavorable data may be suppressed, or only the favorable outcome is shown. By using different types of “averages,” the presenter of the data is consciously choosing what he or she wants to convey. An unconscious bias is more difficult to discern. It is more about finding the source behind the statistics and figuring out whether the publisher of the statistic has any invested interest in the outcome of the survey, poll, or other statistical representation. Uncovering this type of bias requires a bit of research, but it is in our best interest to discover any biases behind the “facts” being presented.
|
|
3. Charts, maps, and other types of graphs are not always what they seem.
To appeal to the reader and achieve statistical manipulation, a researcher or data presenter can skew graphic representation in a way that is deceptive. By changing the numerical representation on a given axis, or by zooming in on the rise of a chart, the publisher of the graphic can easily manipulate the reader to see what he or she wants the audience to see.
Line charts are among the most used to represent data. The smallest increase can be shown as a larger rise simply by changing the numerical increments on either or both axes. Looking at the chart seems impressive, but if we take the time to check the numerical representation, it is underwhelming. Bar charts can be deceptive as well. By changing the width of the bars, or even showing a truncated version of it, the information on display can seem to represent something it does not.
The author gave an example of a graphic containing two cows. There is a smaller cow, under which the year 1860 is written. The next cow is noticeably larger, with the year 1936 written below it. The reader is led to believe that the general size of the cow has grown over time. The actual information being conveyed is the growth of the dairy cow population. But the large cow exaggerates the population growth. It is three times the size of its fellow cow, even though the dairy cow population did not grow by this amount in the years referenced.
|
|
|
4. Statistical fallacies are worth watching for, but we need to know where they usually pop up.
One of the major fallacies of statistics occurs when the following formula is used: If B follows A, then A must have caused B. However, this formula could easily be turned around. This is known as the post hoc fallacy. It is easy to misrepresent data when using this formula, as both A and B are most likely the product of a third factor. To discern the truth, we must look closely at the information we are given.
There are several reasons why B may follow A. One of which is that it occurred purely by chance. If tested again, B may or may not occur, but testing continually will eventually yield the same result. A small test group in this scenario is especially helpful because it is more likely to convey the desired result. The variables B and A are related in some way; however, it is almost impossible to discern which is the cause and which is the effect. This is easily likened to the question: Which came first, the chicken or the egg?
There are instances in which there is no causal link between the two factors, which instead are the product of a third factor. The author gives the example of the rise of the salaries of Presbyterian ministers and also of the price of rum. Why would these two things be related? There is a third factor at play: The price of everything is rising worldwide. If presented in just the right way, we can say that the ministers’ salaries are benefitting from the rum trade as they are both rising.
Another statistical manipulation occurs when the data presented do not accurately represent the correlation of factors. A positive correlation can become a negative one if applied beyond the information given. An example would be the amount of rain in a certain area leading to healthier crops. Obviously, crops will grow if it rains. However, if it rains too much, the crops will be destroyed. Knowing simply that it rained more does not paint the whole picture. The reader needs to know more to determine how much rain is beneficial. In cases such as this, a correlation of factors does exist, but we must look closely to determine all the variables at play.
|
|
5. Not all statistics are misleading, but there are five questions that can help us determine if they might be.
If we really want to know if a statistic is betraying us, we can answer the following five questions:
· Who says so?
· How does he know?
· What’s missing?
· Did somebody change the subject?
· Does it make sense?
Determining exactly who is presenting us with the statistic will allow us to see if any of the previously discussed biases exist. Is the statistic coming from a source that has something to prove? Do they wish to sway us a certain way? First, we check for both the conscious and unconscious biases we were warned about. Second, we check the validity of the source. If no biases exist, and the statistic comes from a reputable source, it may be trustworthy.
Again, we must look at the size of the sample we are being presented with. An example would be a survey submitted by the Chicago Journal of Commerce. A questionnaire was sent to 1,200 companies regarding price gouging and hoarding. Eighty-six percent of the companies did not respond, which means only 14 percent of the companies were represented in the sample. The sample is too small to represent the conclusion of the questionnaire properly.
Certain figures are often missing in statistics. Sometimes the number of cases has been left out entirely. As with the previous example, the Journal could simply refer to the fourteen percent that responded to the questionnaire, and never inform the reader about the other 86 percent. We are reminded again to check for exactly what average is being represented. If a mean or median would shift the result substantially, it should not be trusted. The same can be said for the omission of certain factors. If a missing factor changed the way the data is presented or interpreted, the statistic is purposely being misrepresented.
Something else to watch out for is a sudden change of subject. If between the figures and the conclusion something shifts, it is worth taking a closer look. Often when a long-term trend is used, there is no evidence to back up what is being represented. Referring to the example of the Presbyterian ministers and the price of rum, we see that things change over time, but a direct correlation does not necessarily exist between the two factors.
The last question to be answered is whether the statistic makes sense. Use of precise figures gives the appearance of validity. Why would someone fabricate an exact number? However, common sense should tell us that many statistics cannot be that precise. We should also question if a statistic makes sense when it is said to represent a trend predicted to extend into the future. While a trend may exist, it is impossible to predict the future. The statistic is merely an educated guess.
The more we know what to look for, the easier it is to determine if a statistic is trustworthy. If we can establish the way in which an average is presented, or even look closer at a line graph, we can help to identify whether a first impression of the statistic is accurate, or if there is more at play than the obvious.
|
|
Endnotes
These insights are just an introduction. If you're ready to dive deeper, pick up a copy of How to Lie with Statistics here. And since we get a commission on every sale, your purchase will help keep this newsletter free.
* This is sponsored content
|
|
This newsletter is powered by Thinkr, a smart reading app for the busy-but-curious. For full access to hundreds of titles — including audio — go premium and download the app today.
|
|
Was this email forwarded to you? Sign up here.
Want to advertise with us? Click
here.
|
Copyright © 2024 Veritas Publishing, LLC. All rights reserved.
311 W Indiantown Rd, Suite 200, Jupiter, FL 33458
|
|
|