One of the key ways data scientists explore data is using the R programming language. Today I want to talk about a short course I took this summer that covered the basics of R programming, including data structures supported by R, control structures, functions and debugging techniques. The second part of the course was exploratory data analysis (EDA) and visualization. I learned the rules to build appropriate graphs and then how to use the ggplot package to make plots to tell the data story which I’ll talk about more next week.
This R Code was part of an assignment to determine the type of triangle and also to calculate Cartesian distance. Note that it’s important to document each step you take in the code so that it can be reproducible for others. In this case we needed to determine the length of each side of the triangle to determine whether it was equilateral, isosceles or scalene. Calculating the Cartesian distance was translated from its mathematical equation to its R formula as shown below.
The other assignments including working with data frames of CSV files in R and querying them for specific information. I did exploratory data analysis for US Census data and also looked at NASA weather data to find the number of hurricanes in the state of Michigan in October from 1995 to 2000. Note that in both of these instances the data (“nasaweather”) was already pre-processed and loaded into an R library. I used the following R code to query the hurricane data.
The subset code wants to use the filter of ‘Hurricane’ and month is equal to 10 (October) with a latitude and longitude of Michigan (44.3, 85.6). Note that there are no results using all of these filters. This is probably because the latitude and longitude data in the table is where the Hurricane initially touched down, which would most likely had a different latitude and longitude than Michigan. To verify this, let’s look up where Hurricanes Opal and Fran initially touched down. A quick search shows that Opal touched down in Florida October 4, 1995 and Fran touched down in North Carolina September 5, 1996.
Next week I’ll show you some fun visualizations you can do with R.