Today I’ll delve into a bit more detail about the initial findings from the Morning Joe project using histograms and kernel density estimates. The exploratory data visualization process begins by looking at summary statistics of the data in Python. Average monthly rain ranges from 0.2 – 407.7 mm per month, average monthly temperature ranges from 23.36-27.16 C, average monthly rust ranges from 0.33-50%, monthly calculated production ranges from 80.33 – 3832.67 (1000-60kg bags) and bi-weekly futures range from 21.98-175.18 US$.
To understand how the coffee data is distributed, a histogram and kernel density estimate (KDE) were created for each of the variables: rust, production and futures. The default bin size of 10 was used for each histogram after a bit of experimentation to find the best bin size. The KDE color was changed to purple since it provided a greater contrast from the histogram than the default blue color.
—Note that original KDEs were removed from the histograms on 4 December 2017 since they were misleading. The KDEs had negative values since all of the variables values were positive.
Coffee rust peaks between 5 – 20%. Note that farmers begin coffee rust pest management practices when leaf rust exceeds 3%.
From the Coffee Production histogram and KDE, coffee production peaks at 1000 and then 3000 – 4000 (1000 – 60kg bags). There is a gap in the data between 1000 and 3000 since Papa New Guinea’s production is much lower (80.33-97.92 in 1989-1991) than Colombia’s production of 1010.33 in 2011-2012.
Next week I’ll shift focus and talk in more detail about the other visualizations I used to find correlations between the variables.