The next phase I've been trying to present is the impact that a blanket ban on abortion would have on the US population growth. I'm getting close, but I keep running into problems compiling the data. And so, today, I'm going to blow off some steam and you get to see why it is so difficult to really make informed decisions about controversial topics like abortion. It's really quite simple. Data suck

^{1}.

Alright, maybe that isn't completely accurate. When you have good data that were gathered in a systematic way with a specific goal in mind, they are usually wonderful to work with. However, with a topic like abortion, no one is gathering data in a systematic manner. The data to answer a question with any kind of complexity have to be gathered from multiple sources.

My current task is to forecast US Population growth under two separate conditions; one in which abortion remains legal, and one in which abortion is banned. The idea is to compare the populations between the two scenarios. It is also a necessary step into broader discussions about economic, welfare, and social policies. So it's an important base of information in my hypotheses about what happens if we make abortion illegal.

Now, consider what we need to know about forecasting population growth. First, we need to know the current population. Next, we need to know the annual birth rate. Third, we need to know the annual death rate. That part is easy enough. But that doesn't tell us much about the population except the gross population. What would be more informative, especially in light of subsequent discussion to follow, would be an understanding of population growth by race.

So now we need the populations, birth rates, and death rates by race. A lot of that can be found through the CDC. And for forecasting population growth under legal abortion (the current state of things), that would probably be enough. What gets tricky is forecasting population growth if abortion were illegal. Theoretically, this would have an effect on the birth rate, so we need to know how many abortions are performed within each race. With this information, we can calculate the new birth rate for each race. Unfortunately, we can't apply that birth rate to forecasted population numbers.

You see, we can increase the birth rate and calculate next year's population quite easily. However, all those additional births are less than 1 year old for the next year (thank you, Captain Obvious). This means that the new birth rate doesn't apply to them because their birth rate is known to be 0. In other words, the age of the population is important in forecasting population growth when we change anything about the current conditions.

So now we're working in three dimensions. US Population by race and age. Still sound easy? Well let me complicate it further.

I can find population data from 2011 by age and race. the racial categories available to me are

- African American
- American Indian or Alaskan Native
- Asian or Pacific Islander
- Caucasian
- Hispanic

The age categories are divided into 5 year segments starting with 0-4, 5-9, 10-14, etc and ending with > 100.

Birth rate data are available in the same racial categories from 2009 and in similar age categories starting at age 10 and ending at age 50. But I had to pick them out carefully from a document over 100 pages long. On top of that, the birth rates were calculated from the population of women (not the total population) and so the numbers won't translate exactly to the 2011 data.

And we can't get abortion data by age and race at all. The best we can do is find the proportion of abortions by race and the proportion of abortions by age. But the categories are different. For race, I can find

- African American
- Caucasian
- Hispanic
- Other

And for age I can find

- < 20
- 20-24
- > 25

In order to get abortions by age and race, I had to assume that age and race are independent with respect to abortions (probably not true). And then I had to make assumptions about how many of those abortions occurred in 10-14 year olds, 15-19 year olds, 25-29 year olds, etc.

^{2}

So far, I've succeeded in building the US population, the current US birth rates, and I'm very close to having projected birth rates under the condition of illegal abortion. I even have the forecasting routine written and it produces a lovely graph that I'm really very pleased with. And at this point, after many hours of searching for data, entering tables, writing code, and scratching notes on paper, I've just come to a crucial realization:

*I haven't put together the death data yet!*

*Head, meet desk.*

Okay, if you've made it through all of my rant so far, here are the take home messages

- When I finish this analysis, it will be flawed. I will do my best to admit those flaws and explain my assumptions. At the same time, while it won't be perfect, it will be a decent approximation.
- There's a reason that it's so hard to make informed decisions on controversial issues. The data are hard to compile. It's rare to find a data set on a controversial subject that allows you to see all the nuance and character of what you are trying to measure.
- My head hurts.

Happy Wednesday, everyone!

^{1}If that sounds weird, keep in mind that data is a plural noun. The singular form is datum.

^{2}I chose to assign the abortions proportionally to the number of births in each age group. It seems like a relative safe assumption, but probably introduces a little bit of bias.