Morning Joe Viz Project


More than 2 billion cups of coffee (cafe arábica variety) are consumed worldwide each day. The livelihood of 120 million people depends on the coffee supply chain.

Coffee futures are the second most popular commodity traded worth over $100 billion annually. Coffee futures are standardized, exchange-traded contracts in which the contract buyer agrees to take delivery, from the seller, a specific quantity of coffee (eg. 10 tonnes) at a predetermined price on a future delivery date.”

This slideshow requires JavaScript.


Coffee rust leads to losses of more than $500 million worldwide. Coffee rust is the main disease that causes plant leaves to turn yellowCoffee rust is caused by the coffee berry borer, or Hemileia vastatrix fungus at temperatures from 10-30 C, and is one of the main diseases of arábica worldwide.


This project will explore the following questions:

  • Does coffee rust affect production and futures prices?
  • Can visualizations be used to help draw conclusions about potential       relationships between these variables?
  • Can relationships between rust, production and futures be quantified?

There is no known research on this topic to date.


The project focuses on rust, production and futures data from Brazil, Colombia and Papua New Guinea from 1989-2013.

Data was collected from 10 different English, Portugues and Spanish sources and the acquisition process is described here.

These three countries produce 48% of the world’s coffee. The coffee maps shows the regions in each country cafe arábica is grown (shown in brown).

This slideshow requires JavaScript.

Data Assumptions and Limitations

It’s important to note there are gaps in available data. Brasilian data was from 2005-2006 and 2008-2009. Colombian data was from 1995 and 2011-2013, and the Papua New Guinea data was from 1989-1991.

Since the growing conditions in Brasil, Colombia and Papua New Guinea are similar, it is acceptable to compare data between these three countries. Percent rust on plants was calculated from yearly statistics. Futures are worldwide prices rather than by country. Finally, I assume the public data is accurate.

Research Questions

  • Is there a link between coffee rust and the amount of coffee produced?
  • Is there a link between coffee rust and futures prices?
  • Can links be quantified?

Variables include 337 observations with the following features:

  • Rainfall
  • Temperature
  • Rust percent
  • Production amount
  • Futures prices

Data visualizations help to accept or not accept the following hypotheses:

  • More rain = more coffee rust
  • Higher temperatures = more coffee rust
  • More coffee rust = less production
  • More coffee rust = lower futures prices
  • More coffee production = lower futures prices

Analysis and Results

During the exploratory data analysis phase, summary statistics were calculated for each variable.


Rain average was 183.55 cm per month with a range of 0.2-407.7 inches. Temperature average was 25.12 C with a range of 23.36-27.16 C. Rust average was 16.41% with a range of 0.33-50%. Production average was 1731.40 (1000-60 kg bags of beans) per month with a range of 80.33-3832.67. Futures average was $93.19 USD with a range from $21.98-$175.18 USD.

The visualization process began with creating three histograms – one historgram for production, one for rust and another one for futuresCoffee Production peaks at 1000 and then 3000-4000 (1000 – 60kg bags), Coffee Rust peaks between 5-20%.
For the futures histogram, a bin size of 30 was chosen since it best visually encoded the variation in the data. Coffee Futures peak around 30 and 140 US$. The gap in the histogram follows the gap in the data, with the futures ranging from $20-40 in 1989-1991 and 1995 and futures ranging from $78-$175.18 from 2005-2013. This gap is probably caused by natural increases in futures prices between 1995 and 2005 but this visualization does not explain why this happens.


The next step in the analysis was making a correlation matrix. A correlation matrix visualization quantifies the amount one variable is correlated to another variable.

Positive correlation values mean variables change in the same direction. Negative correlation values mean variables change in the opposite direction. The intensity scale on the right shows the quantity the variables are correlated to each other. A higher, darker value of 0.8 is the highest positive correlation while the darkest  value of -0.8 is the highest negative correlation.


The correlation matrix can be used to examine the original hypotheses:

  • More rain = more coffee rust  –> Fail to accept
  • Higher temperatures = more coffee rust  –> Fail to accept
  • More coffee rust = less production –> Fail to accept
  • More coffee rust = lower futures prices –> Fail to accept
  • More coffee production = lower futures prices  –> Accept

Line plots show relationships between the variables. Rain and temperature are not included since the correlation matrix showed they don’t impact rust or futures. From the first two line plots, rust and production and rust and futures may be correlated to each other but it’s difficult to say. For Rust and Futures, a scatter plot shows the data better than a line plot but the relationship is difficult to see.




From the line plot of rust versus futures, it appears there may be a positive correlation between the variables when the rust is less than 50%. The ranges in data values for futures is much larger (80-175) than the ranges for rust (0-50). Logarithmic scales were not used for either visualization to prevent misleading the reader. For Rust and Futures, a scatter plot shows the data better than a line plot but the drawback to this visualization is that the relationship between the variables is difficult to see.


The final part of the data analysis process was making linear regressions. A linear regression plot of rust versus production shows a positive correlation. This plot shows an increase in rust means an increase in production. I fail to accept the original hypothesis that increase in rust means a decrease in production.


This visualization shows production and futures are positively correlated to each other meaning an increase in production equals an increase in futures. I fail to accept the original hypothesis that an increase in production would mean a decrease in futures.


Linear regression did not best fit the rust versus futures data so I used a polynomial fit. There is a positive correlation between the variables, which is the opposite result of what we saw in the correlation matrix. Let’s examine this more carefully to see if this original hypothesis should be rejected.


The regression of rust versus futures has the opposite correlation of the original matrix. Rather than a linear regression that really did not fit the data, I used a polynomial fit that minimizes the square error. I calculated the root-mean-square deviation and slope. If we take another look at the data, we see some large differences between the x-variable, Rust, and the y-variable, Futures. These variations explain the difference between the regression plot and the correlation matrix. The original hypothesis that more rust decreases futures must also be rejected.


Based on the visualization analysis of the correlation matrix, line and scatter plots and regressions, the original hypotheses can be accepted or not.

  • More rain increases rust.   –> Fail to accept
  • Higher temperatures increase rust.   –> Fail to accept
  • More rust decreases production.   –> Fail to accept
  • More production decreases futures.   –> Fail to accept
  • More rust decreases futures.   –> Fail to accept

Research Questions Answered

  • Is there a link between coffee rust and the amount of coffee produced?          Yes
  • Is there a link between coffee rust and futures prices?      Yes
  • Can links be quantified?                       Yes


Coffee rust varies in severity with levels from 1-6. The higher the percentage of rust that covers a leaf, the higher the level of rust. Leaves with level 6 have the most coffee rust. The D3 visual is tangentially related to the project since it describes rust level rather than rust percent, which is why it was added as appendix.

In this boxplot visualization, you will see minimum, average and maximum values as you hover over a rust intensity level. 

Rust Severity Level vs. Futures Box Plot – HTML File


  • Alves, Marcelo de Carvalho. Cafeicultura De Precisão Na Proteção de Plantas: Monitoramento de Pragas E Doenças. Expocafe 2012 slide 28. Accessed 17 October 2017.
  • Avelino, J., H. Zelaya, A. Merlo, A. Pineda, M. Ordoñez and S. Savary. The intensity of a coffee rust epidemic is dependent on production situations. Ecological Modeling, 197 (2006): 431-447.
  • Avelino, J., Cristancho, M., Georgiou, S. et al. The coffee rust crisis in Colombia and Central America (2008-2013): impacts, plausible causes and proposed solutions. Food Sec. (2015) 7: 303.
  • Bock, K.R., 1962b. Dispersal of uredospores of Hemieia vastatrix under field conditions. Trans. Br. Mycol. Soc. 45, 63-74.
  • Cintra, M., C.A.A. Meira, M.C. Monard, H.A. Camargo. And L.H.A. Rodrigues. The Use of Fuzzy Decision Trees for Coffee Rust Warning in Brazilian Crops. 2011 11th International Conference on Intelligent Systems Design and Applications. DOI:10.1109/ISDA.2011.6121847. Accessed 17 October 2017.
  • Color-Hex Color Palettes. Accessed 21 November 2017.
  • Commodity Futures Price Quotes for Coffee. Accessed 11 September 2017.
  • Custudio, Adriano Augusto de Paiva et al . Comparison and validation of diagrammatic scales for brown eye spots in coffee tree leaves. Ciênc. agrotec., Lavras , v. 35, n. 6, p. 1067-1076, Dec. 2011 . Available from . Accessed 3 October 2017.
  • Cunha, R.L; Mendes, A.N.G.; Chalfoun, S.M. Controle químico da ferrugem do cafeeiro (Coffea arabica L.) e seus efeitos na produção e preservação de enfolhamento. Ciência e Agrotecnologia, v. 28, n.5, p.990-996, 2004.
  • De Carvalho Alves, M., da Silva, F.M., Sanches, L. et al. Geospatial analysis of ecological vulnerability of coffee agroecosystems in Brazil. Appl Geomat (2013) 5: 87.
  • Economics of coffee. Wikipedia: Accessed 3 October 2017.
  • Escobar, Kimberly. “Coffee Rust is killing Latin American Plants.” Accessed 18 October 2017.
  • Galli, F.; Carvalho, P.C.T. Doenças do cafeeiro. In: GALLI, F. (Coord.). Manual de fitopatologia. São Paulo: Editora Ceres, 1980. 587p. V.2. p.128-140.
  • Hemileia Vastatrix. Wikipedia article. Accessed 22 October 2017.
  • Japiassú, L. B, A. W.R. Garcia, A.E. Miguel, M.S. A. Mendonça, C.H.S. Carvalho, R.A. Ferreira. Influencia da Carga Pendente, Do Espaçamento e de Fatores Climáticos no Desenvolvimento da Ferrugem do Cafeeiro. Estação de Avisos Fitossanitários Do Mapa/Fundação Procafé. Accessed 10 October 2017.
  • Jaramillo J., Muchugu E., Vega F.E., Davis A., Borgemeister C., Chabi-Olaye A. (2011). Some Like It Hot: The Influence and Implications of Climate Change on Coffee Berry Borer (Hypothenemus hampei) and Coffee Production in East Africa. PLoS ONE 6(9): e24528.
  • Kim, K. and W. Lee. Stock market prediction using artificial neural networks with optimal feature transformation. Neural Computing and Applications (2004), 13: 225-260. doi: 10.1007/s00521-004-0428-x.
  • Lamouroux, N., F. Pellegrin, D. Nandris and F. Kohler. The Coffea arabica Fungal Pathosystem in New Caledonia: Interactions at Two Different Spatial Scales. J. Phytopathology 143, 403-413 (1995).
  • Luaces, O., L. H.A. Rodrigues, C. A.A. Meira, A. Bahamonde. Using nondeterministic learners to alert on coffee rust disease. Expert Systems with Applications 38(2011): 14276-14283.
  • Magrach A, Ghazoul J (2015). Climate and Pest-Driven Geographic Shifts in Global Coffee Production: Implications for Forest Cover, Biodiversity and Carbon Storage. PLoS ONE 10(7): e0133071. doi:10.1371/journal.pone.0133071.
  • Montage, D. Algorithmic Trading of Futures via Machine Learning. Stanford University CS229 Course material.,%20Algorithmic%20Trading%20of%20Futures%20via%20Machine%20Learning.pdf. Accessed 11 September 2017.
  • Paulo, E.M., Montes, S.M.N.M., & Fischer, I.H.. (2013). Progresso temporal da ferrugem alaranjada em cultivares de cafeeiro no Oeste de São Paulo. Arquivos do Instituto Biológico, 80(1), 59-64.
  • Quinones, P., Javier, A. Effects of Daylength and Soil Humidity on the Flowering of Coffee Coffea Arabica L. in Colombia. Rev.Fac.Nal.Agr.Medellín, Medellín , v. 64, n. 1, p. 5745-5754, June 2011 . Available from . Accessed 3 Oct. 2017.
  • Perez-Ariza, CB, Nicholson, AE & Flores, MJ 2012, Prediction of coffee rust disease using Bayesian networks. in A Cano, M Gomez-Olmedo & TD Nielsen (eds), Proceedings of the Sixth European Workshop on Probabilistic Graphical Models. University of Granada, Granada Spain, pp. 259 – 266, European Workshop on Probabilistic Graphical Models (PGM), Granada Spain, 1 January.
  • Seven Things You Must Know about Coffee Futures. Accessed 12 September 2017.
  • Yap, Jamie. Stat 531 Midterm Project. Accessed 27 November 2017.
  • Zambolim, L.; Vale, F.X.R.; Pereira, A.A.; Chaves, G.M. Café (Coffea arábica L.): Controle de doenças causadas por fungos, bactérias e vírus. In: VALE, F.X.R.; ZAMBOLIM, L. (Ed.). Controle de doen- ças de plantas. Viçosa: Suprema Gráfica e Editora, 1997. pp.83-180.