Today I want to describe the results from my final project in my Applied Data Science course at Indiana University. The scope of the research is a bit different from my other coffee future, rust and production project I just finished discussing for my data visualization course.
Coffee (arábica variety) is the second largest traded commodity worldwide, with about $100 billion in volume traded annually. Coffee futures are standardized, exchange-traded contracts in which the contract buyer agrees to take delivery, from the seller, a specific quantity of coffee (eg. 10 tonnes) at a predetermined price on a future delivery date. Coffee futures are traded on the New York Stock Exchange (NYSE) from 9:30 a.m. – 1:30 p.m. daily.
My two hypotheses are that: 1. “Opening Price”, “High Price”, “Low Price” and “Closing Price” are related variables for coffee commodities. 2. Historical data of daily coffee futures commodities can be used to more accurately predict the future daily “Closing Price.”
Historical data from January 1, 2010 – November 15, 2017 of coffee futures was manually obtained from the “Historical Data” section of the investing.com website.
Initially, a linear regression, decision tree regression, and AdaBoost models were chosen since these models are transparent and easy to interpret. Furthermore, since the problem is a regression rather than classification or clustering problem, these models were a good choice for the analysis. An 80/20 training / test data set was created from the pre-processed CSV file and each of the models was fitted. There were over 2,800 observations from this time period included in the analysis.
Each of the models had a very high goodness of fit, R2 value. Linear Regression had an R2 value of 0.996, Decision Tree Regression = 0.799, Adaboost = 0.749 and Ridge Regression = 0.996. The best fitting model were Linear Regression and Ridge Regression. Now that each classifier showed a very goodness of fit, the predicted values were calculated.
|Algorithm Type||Goodness of Fit||Prediction Error|
|Decision Tree Regression||0.799||0.00000542|
The hypothesis that “Opening Price”, “High Price”, “Low Price” are related variables to the “Closing Price” can be accepted. The hypothesis that historical data of “Opening Price”, “High Price” and “Low Price” can be used to predict the daily “Closing Price” was also verified. The objectives were achieved and the results were presented.
Recommendations include implementing the linear regression, AdaBoost or ridge regression algorithms to predict coffee futures prices.