This is one of Kaggle's getting-started no-prize competition, based on handwritten digit MNIST data. A summary of my approaches: Reserve the last 7000 data samples for testing (i.e. used as a reference for computing the prediction accuracy). Try SVM and PCA, reducing the number of features from 784 down to 80, with a small subset of the data, see if it works and how fast. I actually did plenty of