Main Content

[1] Agresti, A. *Categorical Data Analysis*, 2nd Ed. Hoboken, NJ: John Wiley & Sons, Inc., 2002.

[2] Allwein, E., R. Schapire, and Y. Singer. “Reducing multiclass to binary: A unifying approach for margin classiﬁers.”
*Journal of Machine Learning Research*. Vol. 1, 2000, pp. 113–141.

[3] Alpaydin, E. “Combined 5 x 2 CV F Test for Comparing Supervised Classification Learning Algorithms.”
*Neural Computation*, Vol. 11, No. 8, 1999, pp. 1885–1992.

[4] Blackard, J. A. and D. J. Dean. "Comparative accuracies of artificial neural networks and discriminant analysis in predicting forest cover types from cartographic variables". *Computers and Electronics in Agriculture* Vol. 24, Issue 3, 1999, pp. 131–151.

[5] Bottou, L., and Chih-Jen Lin. “Support Vector Machine Solvers.”
*Large Scale Kernel Machines* (L. Bottou, O. Chapelle, D. DeCoste, and J. Weston, eds.). Cambridge, MA: MIT Press, 2007.

[6] Bouckaert. R. “Choosing Between Two Learning Algorithms Based on Calibrated Tests.”
*International Conference on Machine Learning*, pp. 51–58, 2003.

[7] Bouckaert, R. and E. Frank. “Evaluating the Replicability of Significance Tests for Comparing Learning Algorithms.”
*In Advances in Knowledge Discovery and Data Mining, 8th Pacific-Asia Conference*, 2004, pp. 3–12.

[8] Breiman, L. "Bagging Predictors." *Machine Learning* 26, 1996, pp. 123–140.

[9] Breiman, L. "Random Forests." *Machine Learning* 45, 2001, pp. 5–32.

[10] Breiman, L. `https://www.stat.berkeley.edu/~breiman/RandomForests/`

[11] Breiman, L., J. H. Friedman, R. A. Olshen, and C. J. Stone. *Classification and Regression Trees.* Boca Raton, FL: Chapman & Hall, 1984.

[12] Christianini, N., and J. Shawe-Taylor. *An Introduction to Support Vector Machines and Other Kernel-Based Learning Methods*. Cambridge, UK: Cambridge University Press, 2000.

[13] Dietterich, T. “Approximate statistical tests for comparing supervised classification learning algorithms.”
*Neural Computation*, Vol. 10, No. 7, 1998, pp. 1895–1923.

[14] Dietterich, T., and G. Bakiri. “Solving Multiclass Learning Problems Via Error-Correcting Output Codes.”
*Journal of Artificial Intelligence Research*. Vol. 2, 1995, pp. 263–286.

[15] Escalera, S., O. Pujol, and P. Radeva. “On the decoding process in ternary error-correcting output codes.”
*IEEE Transactions on Pattern Analysis and Machine Intelligence*. Vol. 32, Issue 7, 2010, pp. 120–134.

[16] Escalera, S., O. Pujol, and P. Radeva. “Separability of ternary codes for sparse designs of error-correcting output codes.”
*Pattern Recogn*. Vol. 30, Issue 3, 2009, pp. 285–297.

[17] Fan, R.-E., P.-H. Chen, and C.-J. Lin. “Working set selection using second order information for training support vector machines.”
*Journal of Machine Learning Research*, Vol 6, 2005, pp. 1889–1918.

[18] Fagerlan, M.W., S Lydersen, P. Laake. “The McNemar Test for Binary Matched-Pairs Data: Mid-p and Asymptotic Are Better Than Exact Conditional.”
*BMC Medical Research Methodology*. Vol. 13, 2013, pp. 1–8.

[19] Freund, Y. "A more robust boosting algorithm." arXiv:0905.2138v1, 2009.

[20] Freund, Y. and R. E. Schapire. "A Decision-Theoretic Generalization of On-Line Learning and an Application to Boosting." *J. of Computer and System Sciences*, Vol. 55, 1997, pp. 119–139.

[21] Friedman, J. "Greedy function approximation: A gradient boosting machine." *Annals of Statistics,* Vol. 29, No. 5, 2001, pp. 1189–1232.

[22] Friedman, J., T. Hastie, and R. Tibshirani. "Additive logistic regression: A statistical view of boosting." *Annals of Statistics*, Vol. 28, No. 2, 2000, pp. 337–407.

[23] Hastie, T., and R. Tibshirani. “Classification by Pairwise Coupling.”
*Annals of Statistics*. Vol. 26, Issue 2, 1998, pp. 451–471.

[24] Hastie, T., R. Tibshirani, and J. Friedman. *The Elements of Statistical Learning*, second edition. New York: Springer, 2008.

[25] Ho, C. H. and C. J. Lin. “Large-Scale Linear Support Vector Regression.”
*Journal of Machine Learning Research*, Vol. 13, 2012, pp. 3323–3348.

[26] Ho, T. K. "The random subspace method for constructing decision forests." *IEEE Transactions on Pattern Analysis and Machine Intelligence,* Vol. 20, No. 8, 1998, pp. 832–844.

[27] Hsieh, C. J., K. W. Chang, C. J. Lin, S. S. Keerthi, and S. Sundararajan. “A Dual Coordinate Descent Method for Large-Scale Linear SVM.”
*Proceedings of the 25th International Conference on Machine Learning, ICML ’08*, 2001, pp. 408–415.

[28] Hsu, Chih-Wei, Chih-Chung Chang, and Chih-Jen Lin. *A Practical Guide to Support Vector Classification*. Available at `https://www.csie.ntu.edu.tw/~cjlin/papers/guide/guide.pdf`

.

[29] Hu, Q., X. Che, L. Zhang, and D. Yu. “Feature Evaluation and Selection Based on Neighborhood Soft Margin.”
*Neurocomputing*. Vol. 73, 2010, pp. 2114–2124.

[30] Kecman V., T. -M. Huang, and M. Vogt. “Iterative Single Data Algorithm for Training Kernel Machines from Huge Data Sets: Theory and Performance.” In *Support Vector Machines: Theory and Applications*. Edited by Lipo Wang, 255–274. Berlin: Springer-Verlag, 2005.

[31] Kohavi, R. “Scaling Up the Accuracy of Naive-Bayes Classifiers: a Decision-Tree Hybrid.”
*Proceedings of the Second International Conference on Knowledge Discovery and Data Mining*, 1996.

[32] Lancaster, H.O. “Significance Tests in Discrete Distributions.”
*JASA*, Vol. 56, Number 294, 1961, pp. 223–234.

[33] Langford, J., L. Li, and T. Zhang. “Sparse Online Learning Via Truncated Gradient.”
*J. Mach. Learn. Res.*, Vol. 10, 2009, pp. 777–801.

[34] Loh, W.Y. “Regression Trees with Unbiased Variable Selection and Interaction Detection.”
*Statistica Sinica*, Vol. 12, 2002, pp. 361–386.

[35] Loh, W.Y. and Y.S. Shih. “Split Selection Methods for Classification Trees.”
*Statistica Sinica*, Vol. 7, 1997, pp. 815–840.

[36] McNemar, Q. “Note on the Sampling Error of the Difference Between Correlated Proportions or Percentages.”
*Psychometrika*, Vol. 12, Number 2, 1947, pp. 153–157.

[37] Meinshausen, N. “Quantile Regression Forests.”
*Journal of Machine Learning Research*, Vol. 7, 2006, pp. 983–999.

[38] Mosteller, F. “Some Statistical Problems in Measuring the Subjective Response to Drugs.”
*Biometrics*, Vol. 8, Number 3, 1952, pp. 220–226.

[39] Nocedal, J. and S. J. Wright. *Numerical Optimization*, 2nd ed., New York: Springer, 2006.

[40] Schapire, R. E. et al. "Boosting the margin: A new explanation for the effectiveness of voting methods." *Annals of Statistics,* Vol. 26, No. 5, 1998, pp. 1651–1686.

[41] Schapire, R., and Y. Singer. "Improved boosting algorithms using confidence-rated predictions." *Machine Learning,* Vol. 37, No. 3, 1999, pp. 297–336.

[42] Shalev-Shwartz, S., Y. Singer, and N. Srebro. “Pegasos: Primal Estimated Sub-Gradient Solver for SVM.”
*Proceedings of the 24th International Conference on Machine Learning, ICML ’07*, 2007, pp. 807–814.

[43] Seiffert, C., T. Khoshgoftaar, J. Hulse, and A. Napolitano. "RUSBoost: Improving classification performance when training data is skewed." *19th International Conference on Pattern Recognition,* 2008, pp. 1–4.

[44] Warmuth, M., J. Liao, and G. Ratsch. "Totally corrective boosting algorithms that maximize the margin." *Proc. 23rd Int’l. Conf. on Machine Learning, ACM,* New York, 2006, pp. 1001–1008.

[45] Wu, T. F., C. J. Lin, and R. Weng. “Probability Estimates for Multi-Class Classification by Pairwise Coupling.”
*Journal of Machine Learning Research*. Vol. 5, 2004, pp. 975–1005.

[46] Wright, S. J., R. D. Nowak, and M. A. T. Figueiredo. “Sparse Reconstruction by Separable Approximation.”
*Trans. Sig. Proc.*, Vol. 57, No 7, 2009, pp. 2479–2493.

[47] Xiao, Lin. “Dual Averaging Methods for Regularized Stochastic Learning and Online Optimization.”
*J. Mach. Learn. Res.*, Vol. 11, 2010, pp. 2543–2596.

[48] Xu, Wei. “Towards Optimal One Pass Large Scale Learning with Averaged Stochastic Gradient Descent.”
*CoRR*, abs/1107.2490, 2011.

[49] Zadrozny, B. “Reducing Multiclass to Binary by Coupling Probability Estimates.”
*NIPS 2001: Proceedings of Advances in Neural Information Processing Systems 14*, 2001, pp. 1041–1048.

[50] Zadrozny, B., J. Langford, and N. Abe. “Cost-Sensitive Learning by Cost-Proportionate Example Weighting.”
*Third IEEE International Conference on Data Mining*, 435–442. 2003.

[51] Zhou, Z.-H. and X.-Y. Liu. “On Multi-Class Cost-Sensitive Learning.”
*Computational Intelligence.* Vol. 26, Issue 3, 2010, pp. 232–257 CiteSeerX.