In preparation for Veteran’s Day 2016, I want to give an example of how I would “Interview My Data”, something I talked about in my blog last week. Today I want to look at some related facts and figures of Veteran’s Day. In case you’re unfamiliar with the Veteran’s Day holiday, the History channel has a great summary.
The Department of Veterans Affairs (VA) has lots of data prepared by the National Center for Veterans Analysis and Statistics. From a data science student and open data advocate’s perspective, it’s too bad that most of their data is only in PDF format but I digress. The map at the beginning of this post shows the number of veterans living in each U.S. state shown with a choropleth map. The darker the shade of the state, the more veterans live in that state. It’s a little unclear how the number ranges were chosen for each color. For example, the lightest yellow range is 29,825-100,000 people which seems like a very large range for me. It’d be helpful for those of us that are beginners looking at this data for an explanation to understand the context for this data.
This map shows that California, Texas and Florida have the largest veteran’s populations and prompts me to ask more questions. Is there something about these three states that naturally attract veteran’s (climate, number of nearby military bases, access to great healthcare, etc.)? Or do these states have larger veteran population because they have larger populations in general?
I’d love to have access to the actual data that was used to make this map as an addendum to this map for further analysis and data reuse. You can find the data for 9/30/15 if you click on each state but since it’s a different year than the entire U.S. map (2014), the numbers don’t match. Another way to improve the usefulness of the data would be to include historical data prior to 2014 to compare changes over time.
Fortunately, the International Brotherhood of Veterans has published the actual numbers of veterans by state in 2014 on their website. Upon further investigation, we can see California had 1,851,470 veterans, Florida had 1,583,697 and Texas had 1,680,418. All three of these numbers fall within the darkest brown/red on the VA’s map above.
I think this data is a great example of what I described in my Data Journalism article last week. Publicly available data was collected for a different purpose than what you might be using it for and thus it will take some cleaning to answer the questions and tell the data story. Also this data alone answers the what but not the why states have different veteran populations. Next week I’ll return back to my mini-series on “Interviewing Your Data.”