Pages

Monday, August 3, 2015

I'm Business Insider's math reporter, and these 10 everyday things drive me insane

Math, statistics, empirical analysis, and data visualization are all incredibly powerful tools for understanding the world. Unfortunately, these tools are misused and abused in many ways that, to greater or lesser degrees, lead to confusion rather than clarification and make the world just a slightly worse place. Here are 10 such things that aggravate me. 1. Misleading vertical axes There are a few ways in which graphs can have badly misleading y-axes. Column or bar charts are a great way to compare values, since the lengths of the columns or bars should be proportional to the values being displayed. But things can go horribly wrong if the base of the vertical axis is not set at zero. A classic example was a chart shown on Fox News last year comparing Obamacare enrollment numbers just before the enrollment deadline to the administration's goal: As Media Matters pointed out in its post on the chart, the actual enrollment figure of 6 million was about 85% of the goal of a little over 7 million, while the column representing the current enrollment was about a third the height of the column representing the goal. This is a deeply misleading way to represent these figures. Fortunately, Fox News later presented a corrected and more responsible version of the chart: The "start at zero" rule applies mostly to column and bar charts. For line charts, it's fine to use whatever axis boundaries you need to show the trend in which you're most interested. This chart from FRED shows the decline in the labor-force participation rate, or the percentage of adults either employed or looking for work, since 2007. It has an axis ranging only from 62.5% to 66.5%: That roughly 3% drop in labor-force participation, however, represents millions of Americans who have stopped working or looking for work, and it is one of the biggest mysteries of our current economic situation. The downward trend is the main story here, and so it's fine to choose axis bounds that clearly tell that story. 2. Multiple vertical axis scales Another unfortunately common abuse of axes is putting multiple scales on a graph. This is usually done to show some kind of relationship or correlation between two time series. But because one can basically choose any scale one wants for the two axes, it's very easy to insinuate relationships that may not actually exist or matter. Further, even if there is a valid relationship between the two series, the dual y-axis design can still be visually confusing, making it difficult to see the nuances of that relationship. Scatter plots are usually a better option for showing the relationship between two sets of values. One of the most egregious examples of a misleading multiple scale graph is this chart combining the Dow Jones Industrial Average in the run-up to the 1929 stock market crash with more recent stock market movements: The implication is that the vague similarities between the two time periods means that a 1929-like stock crash is imminent. This, of course, makes no sense, because this apparent pattern emerges only with a very selective choice of vertical axis scales and because two lines looking somewhat alike tell us nothing about the similarities and differences between the underlying market and economic situations — the things that actually matter when trying to figure out the likelihood of a crash — during the two time periods. 3. Horizontal axis disasters Things can go wrong with the horizontal axis as well. One of the biggest problems is a missing horizontal axis on a time series chart. Showing how a quantity changes over time is a lot less useful if the actual time period being analyzed is unclear: Having an x-axis for a time series graph still doesn't necessarily mean you're in the clear, though. Business Insider deputy editor Sam Ro tweeted out this intriguing chart from a Bank of America research note, ostensibly showing technological development and population growth over time: The time scale is uneven and appears to have no actual relationship with the data being presented. Apparently Greece and Rome peaked in about A.D. 1000, and the industrial revolution, moon landing, and invention of railroads all occurred in the past 15 years. When big events happen, Twitter will frequently visualize activity on the social network related to those events. Unfortunately, its charts usually lack both an x-axis and a y-axis, making it rather difficult to get any insight: 4. The lottery Taking a break from aggravating things in charts, I am not a huge fan of playing the lottery. Buying a lottery ticket is almost always a losing proposition. Even in the case of immensely large jackpots, the probability of winning is so low that the expected value of a lottery ticket will almost certainly be negative. Of course, this is a matter of personal taste. I'd rather not waste a dollar, but other people can certainly enjoy buying a ticket for non-monetary reasons like fear of missing out on a jackpot, or the simple rush of taking the gamble. 5. The concept of wind chill Wind chill combines temperature and wind speed into a single index value, represented as an adjusted temperature. The goal is to capture the interaction between wind and cold — wind blowing over exposed skin will pull heat away more quickly than still air of the same temperature. This measure, however, is flawed. First, several other factors affect a person's perception of weather: Is it raining? Is it sunny, or overcast? What time of day is it? Wind chill, while bringing together two important parts of weather, ignores others. Second, representing the combination of temperature and wind as another temperature is odd. A 35 degree Fahrenheit (1.7 degrees Celsius) day with 25 mph (40 kph) winds doesn't really "feel like" a 23° F (-5° C) day. Most immediately, a glass of water left outside on a windy 35° F day will never freeze, as the actual temperature is still above the freezing point, while a glass left outside on a still 23° F day will eventually freeze. Temperature is temperature, and wind speed is wind speed. That said, wind speed (and other factors) are still very important! In conditions of extreme cold, exposed skin will suffer from frostbite faster in windy situations than in still situations, all other things being equal. I just find the representation of a combined temperature and wind speed as a new "temperature" somewhat odd. 6. Pie charts Pie charts are intended to show how some whole is broken into component parts. In most cases, they fail at that goal. When we're breaking a big circle into many pieces, it can be hard to directly compare the sizes of those pieces and thus the proportions of interest. Here's a chart breaking down the popularity of various pizza toppings. Note that each pie wedge needs to be labeled with its percentage, because otherwise it would be hard to tell, say, whether sausage or mushrooms are more popular, given the similar size of the two wedges: Bar or column charts tend to do a better job of representing these kinds of breakdowns for a large number of subcategories. On the flip side, pie charts can be somewhat clearer when looking at just a small number of categories with large differences between the percentages: Of course, given that the relevant information from this pie chart is directly printed as text, and we're basically just looking at a single number — the proportion of climate scientists who reject human-caused global warming — one might wonder why we'd bother with the chart at all. 7. Bad map-coloring schemes Maps can be an incredibly useful way to display geographically varying information. However, they must be designed carefully to clearly convey their data. One somewhat frequent problem in creating maps is using arbitrarily chosen colors to display data. This map, from Imgur via @BeautifulMaps on Twitter, uses a very unintuitive color scheme to show speed limits around the world: There isn't a natural flow in the color scheme to go along with the naturally increasing scale of speed limits. I have no idea, at a glance, whether Texas' blue speed limit is higher or lower than neighboring Mexico's light green speed limit. I have to refer to the key every time I look at a different country to have any idea what that country's color means. A better option is to stick with one color, but vary the saturation, brightness, or intensity of that color. This map from the Census Bureau showing the minority proportion of each state's population in 2000 has a scale from light blue to dark blue, making regional patterns immediately apparent: We can clearly see, even without looking at the key, that minorities tend to be a larger percentage of the population in the South and in more urban states, while the less densely populated states of the Midwest and Great Plains tend to have smaller proportions. Two colors, varying by intensity, can be helpful in situations in which there is a natural midpoint. Comparing Democratic votes with Republican votes in an election, seeing where incomes are above the national average or below the national average, or seeing where populations increased and declined in a given year are all cases in which a two-color scheme can work well. As an example of the last case, here is a map we made using Census data showing which US counties had population growth or loss between 2013 and 2014. Growing counties are in blue, with darker shades indicating faster growth; shrinking counties are in red, with darker shades indicating faster loss: 8. Questionable psychological measures The human mind is an incredibly complex thing, and we know very little about how it works. This does not stop us from making often clumsy attempts to measure and compare people based on intelligence or personality. One of the worst offenders is the Myers-Briggs Type Inventory, which attempts to assign a personality "type" to test-takers. The test sorts people into 16 categories, based on four binary personality trait variables: introverted versus extroverted, intuitive versus sensing, thinking versus feeling, and judging versus perceiving. The test has numerous problems. First off is the dichotomous nature of the four trait scales: A person who takes the test and scores just slightly more extroverted than introverted is placed solidly in the "extrovert" bucket, despite having a mixture of traits. Related to this problem is the reliability of the test: It's not uncommon for people who take the test and then retake it a few weeks later to end up assigned to a completely different personality type. Because the test is supposed to be measuring something fundamental about a person's psyche, that variability is problematic. The MBTI also has somewhat questionable origins. It was developed by a mother-daughter team in the 1940s, neither of whom had any formal psychological training. The test also has come under strong criticism from social scientists for its lack of empirical validity or theoretical justification in the decades since its development. 9. General bad chart design In addition to the sins of axes, pie charts, and map colors mentioned above, charts and infographics can fail at their task of conveying data in plenty of other ways. In his seminal 1983 book "The Visual Display of Quantitative Information," data visualization pioneer Edward Tufte coined the word "chartjunk" to describe unnecessary and distracting elements of a graph that either add nothing to the reader's understanding of the information being presented or even actively detract from that understanding. Chartjunk can take on many forms. Some common forms include poorly chosen shading, background, or border options that draw the eye away from the information being presented, excessively distracting decorative elements, and the use of poorly scaled 3D and related design effects that distort the reader's perception of the data. Tufte includes the chart to the left of the age breakdown of college students in his book, writing, "This may well be the worst graphic ever to find its way into print." The chart essentially displays only five numbers: the proportions of college students 25 and older over a five-year period. To do this, the chart has four brightly colored regions, two of which are there just to provide an off-center 3D perspective effect that is both distracting and makes the graph harder to read. Like the misleading y-axis of the initial Fox News Obamacare chart above, the blue region draws the reader's eye up, confusingly suggesting that the earlier years' proportions are higher than they actually are. Further, the top half of the chart, showing the proportions of college students under the age of 25, is redundant: This is just the mirror image of the lower half of the chart, because the percentage of students under age 25 is just 100% minus the percentage of students over age 25. The chart and charts like it that have poor chartjunk-laden design decisions take very simple data sets and present them in an almost incomprehensibly overcomplicated and ugly way. Former Business Insider reporter Walt Hickey found several examples of extremely poor chart design and compiled them here. 10. The number 10, and other big round numbers On December 23, 2014, the Dow Jones Industrial Average crossed 18,000 for the first time in the index's history, and the headline on that morning's Business Insider market update post reflected this "milestone." Several of my friends will be turning 30 this year, which seems somewhat more momentous and important than my coming 29th birthday. Privately held tech startups that raise money at a valuation of at least $1 billion are labeled "unicorns," while presumably an app developer worth only $990 million on paper would just be a run-of-the-mill horse. There are 10 items on this list, not nine or 11, either of which would easily have been possible by removing an item or finding more things that annoy me. In each of these cases, and in several other everyday situations, multiples of powers of 10 are favored over other numbers as important cutoffs or milestones. But this is an essentially arbitrary thing: The big round numbers we view as important are seen as such only because the most common number system in the modern world is the base 10 decimal system. The most likely reason we use decimal rather than a different number system is because human beings generally tend to have 10 fingers. This itself is an arbitrary side effect of human evolution. This arbitrary nature of big round numbers, and of related decimal-biased numerical events, is a thing that annoys me. Sigh.SEE ALSO: Everything about the way we teach math is wrong Join the conversation about this story » NOW WATCH: 14 things you didn't know your iPhone headphones could do


READ THE ORIGINAL POST AT uk.businessinsider.com