The simplified statistics behind the most recent running debate

Very recently, a paper was published in the American College of Cardiology Foundation about the supposed dangers of strenuous running. Alex Hutchinson, from Sweat Science did a great piece that summarizes the results well and points out potential flaws in how many are interpreting their results.

I am going to digress discussing triathlons for this entry in favor of delving into the statistics in this study that says running more than 20 MPW and fast could be dangerous. The original paper may be Found here and Alex Hutchinson's review here:

This journal article has quickly gained recognition as "proof" that running too much can be dangerous for your health. While this is true (anything in excess can be dangerous), there is not enough evidence that 20 MPW is the cut off for health and overtraining. There are certainly hazards to overly stressing one's body, but those are generally outweighed with moderate exercise. The other problem is the average American is nowhere near that maximal border, and much closer to the minimal border, which is costing healthcare billions, not to mention decreasing the quality of life for many. The current state of society may need more encouragement to exercise, not excuses to skip.

In this study, a large group of individuals was followed since 2001 and assessed for mortality rates. Fairly complex statistical methods called Cox proportional hazard regression analysis was used to interpret the results because the study did not have what is referred to as a "balanced design." Without a balanced design, each group has a different number of individuals, and each group has very different vital statistics (age, BMI, etc.), and is referred to as non homogenous. Because of that, simple comparison of average death rates is not possible. In order to rectify this, certain assumptions must be made and adjustments made to their formulae that should take into account the covariate of age. There are inherent risks whenever such assumptions are made, and whenever possible, a balanced design is more helpful. For a more basic analysis of the numbers, see below.

The four groups analyzed were as follows (the number in paraenthesis being the total number of individuals in the group) 1. sedentary (413), 2. light (576), 3. moderate (262), and 4. strenuous (40). As you can see, the numbers are far from being balanced. This creates a basic problem when trying to compare averages. At the most basic comparison, one would want to compare average mortality rates across groups. (Note: the actual data analysis used Cox proportional regression analysis, but this is a simplified explanation). An simple comparison (ANOVA, or analysis of variance) would be an incorrect assessment tool because it does not take into account covariates. For this explanation, however, we will discuss the fundamental ideas behind the results.

A basic rule of statistics is, when you have a large number of subjects, you have more "statistical power" meaning you can detect smaller and smaller differences between groups. For a hypothetical example, if you have two groups of 500 individuals, and one group receives a medical intervention, where the other gets placebo. After six weeks, you can detect a relatively small difference in the effectiveness of the drugs. A difference as small as 12 people could be seen as "significant". However, if your sample size is 50, you need a far grater difference between the groups to detect a "significant" difference. "Significant" here refers to statistical significance, where you accept a 5% likelihood that your results will not detect an effect that is actually there. Whether or not statistical significance always equates to real world significance is a topic for a different entry. When you have small numbers (such as the last group), it becomes very difficult to detect small differences because there is more chance of random events affecting your results (such random events could be a death unrelated to running). If each group had 500 subjects, smaller differences could be detected, but because there are small numbers in some groups, the analysis loses statistical power. With the small sample size in the strenuous group, it becomes difficult to show if there was a significant difference in death rate from no running (which had a high sample size).

While it would be very inappropriate to compare direct proportions of deaths because the groups were very different in age groups. A simple comparison of death rates would yield:

Sedentary: 31%
Light: 1.21 %
Moderate: 3%
Strenuous: 5%

This data only gives four data points, and would suggest that doing some running, significantly decreases your death risk (you can do a simple ANOVA and see all groups are significantly better than the sedentary group). The problem is this data is misleading, but one could speculate that the root cause of misleading data is because the groups were sampled at such different age ranges. These data do not give statistically significant differences between light, moderate, and strenuous runners. One cannot tease out the effect of different starting age on death rate versus the level of running on death rate.

Using adjustment equations, the group goes on to say that a U shape curve exists where moderate exercise is optimal for lower mortality rates, but given the above data, can we be so sure we have found it? Given their survival data equations, they adjusted for age and claimed that the light running group had the lowest mortality rate. While this is fair, if you take away the sedentary group and just compare the three running groups, 1.21, 3, and 5% mortality rates, is there a glaringly obvious answer? Not necessarily because back to the original point. The groups are not evenly matched for numbers. The strenuous group only had 40 subjects and two died. If one died, 2.5% would be the death rate, and if three died, 7.5% would be the rate. Compare that to the light group, if one extra person died, the rate would change from only 1.21% to 1.39%! You would need 22 more people in the light jogging group to die for the rate to go all the way to 5%! Simple 'back of the envelope' calculations reveal that you need a much larger shift in death rates to change the death rates in the light running group.This illustrates the point of needing equal sample sizes pretty well.

When it comes down to the root of the issue, addicted runners will likely not stop running (as they love it), but it is possible to change the attitude of sedentary individuals, and they may be encouraged to exercise. The authors do not argue against exercise, and one is a doctor who freely admits exercise is very beneficial (though we are not sure of the optimal level). This data can only hurt those individuals who we need to be encouraged to exercise. News articles will read "Running; too much may be dangerous" because it is a catchy headline. They will not tout the fact that running at all made for much lower mortality rates, because that is "old" news. The research done in this article is sound research, but the interpretations are crossing into dangerous when news outlets read them! It is important to take a step back, simplify what they found, and try to understand what the data really suggests at it's core. Moderation is key for any aspect of life, and each individual is likely going to respond differently to doses of running, so don't stop exercising!!