ODI Cricket Prediction Shiny App

TLDR: I made an app that predicts outcomes of ODI cricket matches - find it here.

Somewhat following on from my previous play with visualising the cricket data, I was interested in applying some of the machine learning tools learnt in my first year classes at UCLA to see how I could do in building a predictive model for the ODI cricket results.

I excluded a little more data this time, since ODI cricket has evolved a lot in recent years, in order to help prediction of current games I only considered matches that occurred in 2010 or after.

I split the data into training and testing data and tried a large array of model I used various terms, including interaction terms for each model to determine the best performing model in terms of prediction error. A non exhuastive list is a follow:

Logistic/Probit/Cauchit Regression
Logistic Lasso Regression
Logistic Ridge Regression
Random Forest
Naive Bayes Classifier
Support Vector Machine (SVM)
Kernel Regularised Regression
Random Forest Ensemble

The best model in the end was ridge regularized logistic regression, on a subset of the available terms. Notably more flexible kernel regularized least squares did not improve, suggesting that the issues was not a mis-specified model, but inherently noisy data, this seems anecdotally reasonable.