Last week I talked about setting up vectors and querying them using R code. Today I want to delve into a bit more detail about using R to produce visualizations. I was trying to answer the question of which Major League Baseball league had the highest number of wins for the 2015 season.
To begin, I got the regular season standings data from MLB.com. Since I don’t yet know how to scrape data from the website into R, I manually entered it into R Studio by creating a vector with all wins values for both leagues. I named one vector “a” for the American League and the other vector “n” for the National League. Then I made a vector “z” to combine the data from both leagues into one single vector.
Next, I assigned the name ‘dataList’ to the ‘z’ vector. Since I wanted the box plots to be side-by-side and be different colors of blue for the American League and lavender for the National League, I used the following code. Note that you need to assign names to the x and y axis as well as the main title. I think it’s pretty cool that with one line of code you can do so many things in the boxplot.