References

Agresti, Alan. 2003. Categorical Data Analysis. Wiley Series in Probability and Statistics. Wiley.

Allaire, JJ. 2018. Tfruns: Training Run Tools for ’Tensorflow’. https://CRAN.R-project.org/package=tfruns.

Allaire, JJ, and François Chollet. 2019. Keras: R Interface to ’Keras’. https://keras.rstudio.com.

Banfield, Jeffrey D, and Adrian E Raftery. 1993. “Model-Based Gaussian and Non-Gaussian Clustering.” Biometrics. JSTOR, 803–21.

Bengio, Yoshua, Li Yao, Guillaume Alain, and Pascal Vincent. 2013. “Generalized Denoising Auto-Encoders as Generative Models.” In Advances in Neural Information Processing Systems, 899–907.

Berge, Laurent, Charles Bouveyron, and Stephane Girard. 2018. HDclassif: High Dimensional Supervised Classification and Clustering. https://CRAN.R-project.org/package=HDclassif.

Bergstra, James, and Yoshua Bengio. 2012. “Random Search for Hyper-Parameter Optimization.” Journal of Machine Learning Research 13 (Feb): 281–305.

Beygelzimer, Alina, Sham Kakade, and John Langford. 2006. “Cover Trees for Nearest Neighbor.” In Proceedings of the 23rd International Conference on Machine Learning, 97–104. ACM.

Beygelzimer, Alina, Sham Kakadet, John Langford, Sunil Arya, David Mount, and Shengqiao Li. 2019. FNN: Fast Nearest Neighbor Search Algorithms and Applications. https://CRAN.R-project.org/package=FNN.

Biecek, Przemyslaw. 2019. DALEX: Descriptive mAchine Learning Explanations. https://CRAN.R-project.org/package=DALEX.

Bourlard, Hervé, and Yves Kamp. 1988. “Auto-Association by Multilayer Perceptrons and Singular Value Decomposition.” Biological Cybernetics 59 (4-5). Springer: 291–94.

Bouveyron, Charles, and Camille Brunet-Saumard. 2014. “Model-Based Clustering of High-Dimensional Data: A Review.” Computational Statistics & Data Analysis 71. Elsevier: 52–78.

Bouveyron, Charles, Stéphane Girard, and Cordelia Schmid. 2007. “High-Dimensional Data Clustering.” Computational Statistics & Data Analysis 52 (1). Elsevier: 502–19.

Box, George EP, and David R Cox. 1964. “An Analysis of Transformations.” Journal of the Royal Statistical Society. Series B (Methodological). JSTOR, 211–52.

Breiman, Leo. 1984. Classification and Regression Trees. Routledge.

———. 1996a. “Bagging Predictors.” Machine Learning 24 (2). Springer: 123–40.

———. 1996b. “Stacked Regressions.” Machine Learning 24 (1). Springer: 49–64.

———. 2001. “Random Forests.” Machine Learning 45 (1). Springer: 5–32.

Breiman, Leo, and Ross Ihaka. 1984. Nonlinear Discriminant Analysis via Scaling and Ace. Department of Statistics, University of California.

Breiman, Leo, and others. 2001. “Statistical Modeling: The Two Cultures (with Comments and a Rejoinder by the Author).” Statistical Science 16 (3). Institute of Mathematical Statistics: 199–231.

Bruce, Peter, and Andrew Bruce. 2017. Practical Statistics for Data Scientists: 50 Essential Concepts. O’Reilly Media, Inc.

Carroll, Raymond J, and David Ruppert. 1981. “On Prediction and the Power Transformation Family.” Biometrika 68 (3). Oxford University Press: 609–15.

Celeux, Gilles, and Gérard Govaert. 1995. “Gaussian Parsimonious Clustering Models.” Pattern Recognition 28 (5). Elsevier: 781–93.

Charrad, Malika, Nadia Ghazzali, Veronique Boiteau, and Azam Niknafs. 2015. NbClust: Determining the Best Number of Clusters in a Data Set. https://CRAN.R-project.org/package=NbClust.

Chawla, Nitesh V, Kevin W Bowyer, Lawrence O Hall, and W Philip Kegelmeyer. 2002. “SMOTE: Synthetic Minority over-Sampling Technique.” Journal of Artificial Intelligence Research 16: 321–57.

Chen, Tianqi, and Carlos Guestrin. 2016. “XGBoost: A Scalable Tree Boosting System.” In Proceedings of the 22nd Acm Sigkdd International Conference on Knowledge Discovery and Data Mining, 785–94. KDD ’16. New York, NY, USA: ACM. https://doi.org/10.1145/2939672.2939785.

Chen, Tianqi, Tong He, Michael Benesty, Vadim Khotilovich, Yuan Tang, Hyunsu Cho, Kailong Chen, et al. 2018. Xgboost: Extreme Gradient Boosting. https://CRAN.R-project.org/package=xgboost.

Chollet, François, and Joseph J Allaire. 2018. Deep Learning with R. Manning Publications Company.

Cireşan, Dan, Ueli Meier, and Jürgen Schmidhuber. 2012. “Multi-Column Deep Neural Networks for Image Classification.” arXiv Preprint arXiv:1202.2745.

Cunningham, Padraig, and Sarah Jane Delany. 2007. “K-Nearest Neighbour Classifiers.” Multiple Classifier Systems 34 (8). Springer New York, NY, USA: 1–17.

Dasgupta, Abhijit, and Adrian E Raftery. 1998. “Detecting Features in Spatial Point Processes with Clutter via Model-Based Clustering.” Journal of the American Statistical Association 93 (441). Taylor & Francis: 294–302.

Davison, Anthony Christopher, David Victor Hinkley, and others. 1997. Bootstrap Methods and Their Application. Vol. 1. Cambridge University Press.

Deane-Mayer, Zachary A., and Jared E. Knowles. 2016. CaretEnsemble: Ensembles of Caret Models. https://CRAN.R-project.org/package=caretEnsemble.

De Cock, Dean. 2011. “Ames, Iowa: Alternative to the Boston Housing Data as an End of Semester Regression Project.” Journal of Statistics Education 19 (3). Taylor & Francis.

De Maesschalck, Roy, Delphine Jouan-Rimbaud, and Désiré L Massart. 2000. “The Mahalanobis Distance.” Chemometrics and Intelligent Laboratory Systems 50 (1). Elsevier: 1–18.

Dı'az-Uriarte, Ramón, and Sara Alvarez De Andres. 2006. “Gene Selection and Classification of Microarray Data Using Random Forest.” BMC Bioinformatics 7 (1). BioMed Central: 3.

Dietterich, Thomas G. 2000a. “An Experimental Comparison of Three Methods for Constructing Ensembles of Decision Trees: Bagging, Boosting, and Randomization.” Machine Learning 40 (2). Springer: 139–57.

———. 2000b. “Ensemble Methods in Machine Learning.” In International Workshop on Multiple Classifier Systems, 1–15. Springer.

Doersch, Carl. 2016. “Tutorial on Variational Autoencoders.” arXiv Preprint arXiv:1606.05908.

Dorogush, Anna Veronika, Vasily Ershov, and Andrey Gulin. 2018. “CatBoost: Gradient Boosting with Categorical Features Support.” arXiv Preprint arXiv:1810.11363.

Doshi-Velez, Finale, and Been Kim. 2017. “Towards a Rigorous Science of Interpretable Machine Learning.” arXiv Preprint arXiv:1702.08608.

Efron, Bradley. 1983. “Estimating the Error Rate of a Prediction Rule: Improvement on Cross-Validation.” Journal of the American Statistical Association 78 (382). Taylor & Francis: 316–31.

Efron, Bradley, and Trevor Hastie. 2016. Computer Age Statistical Inference. Vol. 5. Cambridge University Press.

Efron, Bradley, and Robert Tibshirani. 1986. “Bootstrap Methods for Standard Errors, Confidence Intervals, and Other Measures of Statistical Accuracy.” Statistical Science. JSTOR, 54–75.

———. 1997. “Improvements on Cross-Validation: The 632+ Bootstrap Method.” Journal of the American Statistical Association 92 (438). Taylor & Francis: 548–60.

Erichson, N. Benjamin, Peng Zheng, and Sasha Aravkin. 2018. Sparsepca: Sparse Principal Component Analysis (Spca). https://CRAN.R-project.org/package=sparsepca.

Faraway, Julian J. 2016a. Extending the Linear Model with R: Generalized Linear, Mixed Effects and Nonparametric Regression Models. Vol. 124. CRC press.

———. 2016b. Linear Models with R. Chapman; Hall/CRC.

Fisher, Aaron, Cynthia Rudin, and Francesca Dominici. 2018. “Model Class Reliance: Variable Importance Measures for Any Machine Learning Model Class, from the" Rashomon" Perspective.” arXiv Preprint arXiv:1801.01489.

Fisher, Ronald A. 1936. “The Use of Multiple Measurements in Taxonomic Problems.” Annals of Eugenics 7 (2). Wiley Online Library: 179–88.

Fisher, Walter D. 1958. “On Grouping for Maximum Homogeneity.” Journal of the American Statistical Association 53 (284). Taylor & Francis: 789–98.

Fraley, Chris, and Adrian E Raftery. 1998. “How Many Clusters? Which Clustering Method? Answers via Model-Based Cluster Analysis.” The Computer Journal 41 (8). Oxford University Press: 578–88.

———. 2002. “Model-Based Clustering, Discriminant Analysis, and Density Estimation.” Journal of the American Statistical Association 97 (458). Taylor & Francis: 611–31.

Fraley, Chris, Adrian E Raftery, T Brendan Murphy, and Luca Scrucca. 2012. “Mclust Version 4 for R: Normal Mixture Modeling for Model-Based Clustering, Classification, and Density Estimation.” University of Washington.

Fraley, Chris, Adrian E. Raftery, and Luca Scrucca. 2019. Mclust: Gaussian Mixture Modelling for Model-Based Clustering, Classification, and Density Estimation. https://CRAN.R-project.org/package=mclust.

Freund, Yoav, and Robert E Schapire. 1999. “Adaptive Game Playing Using Multiplicative Weights.” Games and Economic Behavior 29 (1-2). Elsevier: 79–103.

Friedman, Jerome H. 1991. “Multivariate Adaptive Regression Splines.” The Annals of Statistics. JSTOR, 1–67.

———. 2001. “Greedy Function Approximation: A Gradient Boosting Machine.” Annals of Statistics. JSTOR, 1189–1232.

———. 2002. “Stochastic Gradient Boosting.” Computational Statistics & Data Analysis 38 (4). Elsevier: 367–78.

Friedman, Jerome, Trevor Hastie, and Robert Tibshirani. 2001. The Elements of Statistical Learning. Vol. 1. Springer Series in Statistics New York, NY, USA:

Friedman, Jerome, Trevor Hastie, Rob Tibshirani, Noah Simon, Balasubramanian Narasimhan, and Junyang Qian. 2018. Glmnet: Lasso and Elastic-Net Regularized Generalized Linear Models. https://CRAN.R-project.org/package=glmnet.

Friedman, Jerome H, Bogdan E Popescu, and others. 2008. “Predictive Learning via Rule Ensembles.” The Annals of Applied Statistics 2 (3). Institute of Mathematical Statistics: 916–54.

Geladi, Paul, and Bruce R Kowalski. 1986. “Partial Least-Squares Regression: A Tutorial.” Analytica Chimica Acta 185. Elsevier: 1–17.

Geurts, Pierre, Damien Ernst, and Louis Wehenkel. 2006. “Extremely Randomized Trees.” Machine Learning 63 (1). Springer: 3–42.

Géron, Aurélien. 2017. Hands-on Machine Learning with Scikit-Learn and Tensorflow: Concepts, Tools, and Techniques to Build Intelligent Systems. O’Reilly Media, Inc.

Goldstein, Alex, Adam Kapelner, Justin Bleich, and Emil Pitkin. 2015. “Peeking Inside the Black Box: Visualizing Statistical Learning with Plots of Individual Conditional Expectation.” Journal of Computational and Graphical Statistics 24 (1). Taylor & Francis: 44–65.

Goldstein, Benjamin A, Eric C Polley, and Farren BS Briggs. 2011. “Random Forests for Genetic Association Studies.” Statistical Applications in Genetics and Molecular Biology 10 (1). De Gruyter.

Golub, Gene H, Michael Heath, and Grace Wahba. 1979. “Generalized Cross-Validation as a Method for Choosing a Good Ridge Parameter.” Technometrics 21 (2). Taylor & Francis Group: 215–23.

Goodfellow, Ian, Yoshua Bengio, and Aaron Courville. 2016. Deep Learning. Vol. 1. MIT Press Cambridge.

Gower, John C. 1971. “A General Coefficient of Similarity and Some of Its Properties.” Biometrics. JSTOR, 857–71.

Granitto, Pablo M, Cesare Furlanello, Franco Biasioli, and Flavia Gasperi. 2006. “Recursive Feature Elimination with Random Forest for Ptr-Ms Analysis of Agroindustrial Products.” Chemometrics and Intelligent Laboratory Systems 83 (2). Elsevier: 83–90.

Greenwell, B, B Boehmke, J Cunningham, and GBM Developers. 2018. “Gbm: Generalized Boosted Regression Models.” R Package Version 2.1 4.

Greenwell, Brandon. 2018. Pdp: Partial Dependence Plots. https://CRAN.R-project.org/package=pdp.

Greenwell, Brandon M. and Boehmke, Bradley C. 2019. “Quantifying the Strength of Potential Interaction Effects.”

Greenwell, Brandon M, Bradley C Boehmke, and Andrew J McCarthy. 2018. “A Simple and Effective Model-Based Variable Importance Measure.” arXiv Preprint arXiv:1805.04755.

Greenwell, Brandon M., Andrew J. McCarthy, Bradley C. Boehmke, and Dungang Lui. 2018. “Residuals and Diagnostics for Binary and Ordinal Regression Models: An Introduction to the Sure Package.” The R Journal 10 (1): 1–14. https://journal.r-project.org/archive/2018/RJ-2018-004/index.html.

Guo, Cheng, and Felix Berkhahn. 2016. “Entity Embeddings of Categorical Variables.” arXiv Preprint arXiv:1604.06737.

Hair, Joseph F. 2006. Multivariate Data Analysis. Pearson Education India.

Hall, Patrick. 2018. “Awesome Machine Learning Interpretability: A Curated, but Probably Biased and Incomplete, List of Awesome Machine Learning Interpretability Resources.”

Han, Jiawei, Jian Pei, and Micheline Kamber. 2011. Data Mining: Concepts and Techniques. Elsevier.

Harrell, Frank E. 2015. Regression Modeling Strategies: With Applications to Linear Models, Logistic and Ordinal Regression, and Survival Analysis. Springer Series in Statistics. Springer International Publishing. https://books.google.com/books?id=94RgCgAAQBAJ.

Harrison Jr, David, and Daniel L Rubinfeld. 1978. “Hedonic Housing Prices and the Demand for Clean Air.” Journal of Environmental Economics and Management 5 (1). Elsevier: 81–102.

Hartigan, John A, and Manchek A Wong. 1979. “Algorithm as 136: A K-Means Clustering Algorithm.” Journal of the Royal Statistical Society. Series C (Applied Statistics) 28 (1). JSTOR: 100–108.

Hastie, Trevor. 2016. Svmpath: The Svm Path Algorithm. https://CRAN.R-project.org/package=svmpath.

Hastie, T., R. Tibshirani, and M. Wainwright. 2015. Statistical Learning with Sparsity: The Lasso and Generalizations. Chapman & Hall/Crc Monographs on Statistics & Applied Probability. Taylor & Francis.

Hawkins, Douglas M, Subhash C Basak, and Denise Mills. 2003. “Assessing Model Fit by Cross-Validation.” Journal of Chemical Information and Computer Sciences 43 (2). ACS Publications: 579–86.

Hinton, Geoffrey E, and Ruslan R Salakhutdinov. 2006. “Reducing the Dimensionality of Data with Neural Networks.” Science 313 (5786). American Association for the Advancement of Science: 504–7.

Hinton, Geoffrey E, Nitish Srivastava, Alex Krizhevsky, Ilya Sutskever, and Ruslan R Salakhutdinov. 2012. “Improving Neural Networks by Preventing Co-Adaptation of Feature Detectors.” arXiv Preprint arXiv:1207.0580.

Hinton, Geoffrey E, and Richard S Zemel. 1994. “Autoencoders, Minimum Description Length and Helmholtz Free Energy.” In Advances in Neural Information Processing Systems, 3–10.

Hoerl, Arthur E, and Robert W Kennard. 1970. “Ridge Regression: Biased Estimation for Nonorthogonal Problems.” Technometrics 12 (1). Taylor & Francis Group: 55–67.

Hothorn, Torsten, Kurt Hornik, and Achim Zeileis. 2006. “Unbiased Recursive Partitioning: A Conditional Inference Framework.” Journal of Computational and Graphical Statistics 15 (3). Taylor & Francis: 651–74.

Hothorn, Torsten, and Achim Zeileis. 2015. “Partykit: A Modular Toolkit for Recursive Partytioning in R.” The Journal of Machine Learning Research 16 (1). JMLR. org: 3905–9.

Hunt, Tyler. 2018. ModelMetrics: Rapid Calculation of Model Metrics. https://CRAN.R-project.org/package=ModelMetrics.

Hyndman, Rob J, and George Athanasopoulos. 2018. Forecasting: Principles and Practice. OTexts.

Ioffe, Sergey, and Christian Szegedy. 2015. “Batch Normalization: Accelerating Deep Network Training by Reducing Internal Covariate Shift.” arXiv Preprint arXiv:1502.03167.

Irizarry, Rafael A. 2018. Dslabs: Data Science Labs. https://CRAN.R-project.org/package=dslabs.

Janitza, Silke, Harald Binder, and Anne-Laure Boulesteix. 2016. “Pitfalls of Hypothesis Tests and Model Selection on Bootstrap Samples: Causes and Consequences in Biometrical Applications.” Biometrical Journal 58 (3). Wiley Online Library: 447–73.

Jiang, Shengyi, Guansong Pang, Meiling Wu, and Limin Kuang. 2012. “An Improved K-Nearest-Neighbor Algorithm for Text Categorization.” Expert Systems with Applications 39 (1). Elsevier: 1503–9.

Karatzoglou, Alexandros, Alex Smola, and Kurt Hornik. 2018. Kernlab: Kernel-Based Machine Learning Lab. https://CRAN.R-project.org/package=kernlab.

Karatzoglou, Alexandros, Alex Smola, Kurt Hornik, and Achim Zeileis. 2004. “Kernlab – an S4 Package for Kernel Methods in R.” Journal of Statistical Software 11 (9): 1–20. http://www.jstatsoft.org/v11/i09/.

Kass, Gordon V. 1980. “An Exploratory Technique for Investigating Large Quantities of Categorical Data.” Applied Statistics. JSTOR, 119–27.

Kaufman, Leonard, and Peter J Rousseeuw. 2009. Finding Groups in Data: An Introduction to Cluster Analysis. Vol. 344. John Wiley & Sons.

Ke, Guolin, Qi Meng, Thomas Finley, Taifeng Wang, Wei Chen, Weidong Ma, Qiwei Ye, and Tie-Yan Liu. 2017. “Lightgbm: A Highly Efficient Gradient Boosting Decision Tree.” In Advances in Neural Information Processing Systems, 3146–54.

Ketchen, David J, and Christopher L Shook. 1996. “The Application of Cluster Analysis in Strategic Management Research: An Analysis and Critique.” Strategic Management Journal 17 (6). Wiley Online Library: 441–58.

Kim, Ji-Hyun. 2009. “Estimating Classification Error Rate: Repeated Cross-Validation, Repeated Hold-Out and Bootstrap.” Computational Statistics & Data Analysis 53 (11). Elsevier: 3735–45.

Kingma, Diederik P, and Jimmy Ba. 2014. “Adam: A Method for Stochastic Optimization.” arXiv Preprint arXiv:1412.6980.

Kuhn, Max. 2014. “Futility Analysis in the Cross-Validation of Machine Learning Models.” arXiv Preprint arXiv:1405.6974.

———. 2017a. AmesHousing: The Ames Iowa Housing Data. https://CRAN.R-project.org/package=AmesHousing.

———. 2017b. “The R Formula Method: The Bad Parts.” R Views. https://rviews.rstudio.com/2017/03/01/the-r-formula-method-the-bad-parts/.

———. 2018. “Applied Machine Learning Workshop.”

———. 2019. “Applied Machine Learning.”

Kuhn, Max, and Kjell Johnson. 2013. Applied Predictive Modeling. Vol. 26. Springer.

———. 2018. AppliedPredictiveModeling: Functions and Data Sets for ’Applied Predictive Modeling’. https://CRAN.R-project.org/package=AppliedPredictiveModeling.

———. 2019. Feature Engineering and Selection: A Practical Approach for Predictive Models. Chapman & Hall/CRC.

Kuhn, Max, and Hadley Wickham. 2019. Rsample: General Resampling Infrastructure. https://CRAN.R-project.org/package=rsample.

Kursa, Miron B, Witold R Rudnicki, and others. 2010. “Feature Selection with the Boruta Package.” J Stat Softw 36 (11): 1–13.

Kutner, M. H., C. J. Nachtsheim, J. Neter, and W. Li. 2005. Applied Linear Statistical Models. 5th ed. McGraw Hill.

Laan, Mark J. van der, Eric C. Polley, and Alan E. Hubbard. 2003. “Super Learner.” Statistical Applications in Genetics and Molecular Biology 6 (1).

LeCun, Yann. 1987. “Modeles Connexionnistes de L’apprentissage (Connectionist Learning Models).” Ph.D. thesis, Universite P. et M. Curie (Paris 6).

LeCun, Yann, Bernhard E Boser, John S Denker, Donnie Henderson, Richard E Howard, Wayne E Hubbard, and Lawrence D Jackel. 1990. “Handwritten Digit Recognition with a Back-Propagation Network.” In Advances in Neural Information Processing Systems, 396–404.

LeCun, Yann, Léon Bottou, Yoshua Bengio, and Patrick Haffner. 1998. “Gradient-Based Learning Applied to Document Recognition.” Proceedings of the IEEE 86 (11). IEEE: 2278–2324.

LeDell, Erin, Stephanie Sapp, Mark van der Laan, and Maintainer Erin LeDell. 2014. Package “Subsemble”. https://CRAN.R-project.org/package=subsemble.

Lee, Honglak, Chaitanya Ekanadham, and Andrew Y Ng. 2008. “Sparse Deep Belief Net Model for Visual Area V2.” In Advances in Neural Information Processing Systems, 873–80.

Lee, Sharon X, and Geoffrey J McLachlan. 2013. “Model-Based Clustering and Classification with Non-Normal Mixture Distributions.” Statistical Methods & Applications 22 (4). Springer: 427–54.

Liaw, Andy, and Matthew Wiener. 2002. “Classification and Regression by randomForest.” R News 2 (3): 18–22. https://CRAN.R-project.org/doc/Rnews/.

Little, Roderick JA, and Donald B Rubin. 2014. Statistical Analysis with Missing Data. Vol. 333. John Wiley & Sons.

Liu, Dungang, and Heping Zhang. 2018. “Residuals and Diagnostics for Ordinal Regression Models: A Surrogate Approach.” Journal of the American Statistical Association 113 (522). Taylor & Francis: 845–54. https://doi.org/10.1080/01621459.2017.1292915.

Loh, Wei-Yin, and Nunta Vanichsetakul. 1988. “Tree-Structured Classification via Generalized Discriminant Analysis.” Journal of the American Statistical Association 83 (403). Taylor & Francis Group: 715–25.

Lundberg, Scott, and Su-In Lee. 2016. “An Unexpected Unity Among Methods for Interpreting Model Predictions.” arXiv Preprint arXiv:1611.07478.

Lundberg, Scott M, and Su-In Lee. 2017. “A Unified Approach to Interpreting Model Predictions.” In Advances in Neural Information Processing Systems, 4765–74.

Luu, Keurcien, Michael Blum, and Florian Privé. 2019. Pcadapt: Fast Principal Component Analysis for Outlier Detection. https://CRAN.R-project.org/package=pcadapt.

Ma, Yi, Harm Derksen, Wei Hong, and John Wright. 2007. “Segmentation of Multivariate Mixed Data via Lossy Data Coding and Compression.” IEEE Transactions on Pattern Analysis and Machine Intelligence 29 (9). IEEE: 1546–62.

Makhzani, Alireza, and Brendan Frey. 2014. “A Winner-Take-All Method for Training Sparse Convolutional Autoencoders.” In NIPS Deep Learning Workshop. Citeseer.

Makhzani, Alireza, and Brendan J Frey. 2015. “Winner-Take-All Autoencoders.” In Advances in Neural Information Processing Systems, 2791–9.

Makhzani, Alireza, Jonathon Shlens, Navdeep Jaitly, Ian Goodfellow, and Brendan Frey. 2015. “Adversarial Autoencoders.” arXiv Preprint arXiv:1511.05644.

Maldonado, Sebastián, and Richard Weber. 2009. “A Wrapper Method for Feature Selection Using Support Vector Machines.” Information Sciences 179 (13). Elsevier: 2208–17.

Masci, Jonathan, Ueli Meier, Dan Cireşan, and Jürgen Schmidhuber. 2011. “Stacked Convolutional Auto-Encoders for Hierarchical Feature Extraction.” In International Conference on Artificial Neural Networks, 52–59. Springer.

Massy, William F. 1965. “Principal Components Regression in Exploratory Statistical Research.” Journal of the American Statistical Association 60 (309). Taylor & Francis Group: 234–56.

Mccord, Michael, and M Chuah. 2011. “Spam Detection on Twitter Using Traditional Classifiers.” In International Conference on Autonomic and Trusted Computing, 175–86. Springer.

Meyer, David, Evgenia Dimitriadou, Kurt Hornik, Andreas Weingessel, and Friedrich Leisch. 2019. E1071: Misc Functions of the Department of Statistics, Probability Theory Group (Formerly: E1071), Tu Wien. https://CRAN.R-project.org/package=e1071.

Micci-Barreca, Daniele. 2001. “A Preprocessing Scheme for High-Cardinality Categorical Attributes in Classification and Prediction Problems.” ACM SIGKDD Explorations Newsletter 3 (1). ACM: 27–32.

Molinaro, Annette M, Richard Simon, and Ruth M Pfeiffer. 2005. “Prediction Error Estimation: A Comparison of Resampling Methods.” Bioinformatics 21 (15). Oxford University Press: 3301–7.

Molnar, Christoph. 2019. Iml: Interpretable Machine Learning. https://CRAN.R-project.org/package=iml.

Molnar, Christoph, and others. 2018. “Interpretable Machine Learning: A Guide for Making Black Box Models Explainable.” E-Book At< Https://Christophm.github.io/Interpretable-Ml-Book/>, Version Dated 10.

Pearson, Karl. 1901. “LIII. On Lines and Planes of Closest Fit to Systems of Points in Space.” The London, Edinburgh, and Dublin Philosophical Magazine and Journal of Science 2 (11). Taylor & Francis: 559–72.

Pedersen, Thomas Lin, and Michaël Benesty. 2018. Lime: Local Interpretable Model-Agnostic Explanations. https://CRAN.R-project.org/package=lime.

Platt, John C. 1999. “Probabilistic Outputs for Support Vector Machines and Comparisons to Regularized Likelihood Methods.” In Advances in Large Margin Classifiers, 61–74. MIT Press.

Polley, Eric, Erin LeDell, Chris Kennedy, and Mark van der Laan. 2019. SuperLearner: Super Learner Prediction. https://CRAN.R-project.org/package=SuperLearner.

Poultney, Christopher, Sumit Chopra, Yann L Cun, and others. 2007. “Efficient Learning of Sparse Representations with an Energy-Based Model.” In Advances in Neural Information Processing Systems, 1137–44.

Probst, Philipp, Bernd Bischl, and Anne-Laure Boulesteix. 2018. “Tunability: Importance of Hyperparameters of Machine Learning Algorithms.” arXiv Preprint arXiv:1802.09596.

Probst, Philipp, Marvin N Wright, and Anne-Laure Boulesteix. 2019. “Hyperparameters and Tuning Strategies for Random Forest.” Wiley Interdisciplinary Reviews: Data Mining and Knowledge Discovery. Wiley Online Library, e1301.

Quinlan, J. Ross. 1986. “Induction of Decision Trees.” Machine Learning 1 (1). Springer: 81–106.

Quinlan, J Ross, and others. 1996. “Bagging, Boosting, and C4. 5.” In AAAI/Iaai, Vol. 1, 725–30.

Rashmi, Korlakai Vinayak, and Ran Gilad-Bachrach. 2015. “DART: Dropouts Meet Multiple Additive Regression Trees.” In AISTATS, 489–97.

Ribeiro, Marco Tulio, Sameer Singh, and Carlos Guestrin. 2016. “Why Should I Trust You?: Explaining the Predictions of Any Classifier.” In Proceedings of the 22nd Acm Sigkdd International Conference on Knowledge Discovery and Data Mining, 1135–44. ACM.

Rifai, Salah, Pascal Vincent, Xavier Muller, Xavier Glorot, and Yoshua Bengio. 2011. “Contractive Auto-Encoders: Explicit Invariance During Feature Extraction.” In Proceedings of the 28th International Conference on International Conference on Machine Learning, 833–40. Omnipress.

Ripley, Brian D. 2007. Pattern Recognition and Neural Networks. Cambridge University Press.

Robinson, John T. 1981. “The Kdb-Tree: A Search Structure for Large Multidimensional Dynamic Indexes.” In Proceedings of the 1981 Acm Sigmod International Conference on Management of Data, 10–18. ACM.

Rousseeuw, Peter J. 1987. “Silhouettes: A Graphical Aid to the Interpretation and Validation of Cluster Analysis.” Journal of Computational and Applied Mathematics 20. Elsevier: 53–65.

Ruder, Sebastian. 2016. “An Overview of Gradient Descent Optimization Algorithms.” arXiv Preprint arXiv:1609.04747.

Russolillo, Giorgio, and Carlo Natale Lauro. 2011. “A Proposal for Handling Categorical Predictors in Pls Regression Framework.” In Classification and Multivariate Analysis for Complex Data Structures, 343–50. Springer.

Saeys, Yvan, Iñaki Inza, and Pedro Larrañaga. 2007. “A Review of Feature Selection Techniques in Bioinformatics.” Bioinformatics 23 (19). Oxford University Press: 2507–17.

Sakurada, Mayu, and Takehisa Yairi. 2014. “Anomaly Detection Using Autoencoders with Nonlinear Dimensionality Reduction.” In Proceedings of the Mlsda 2014 2nd Workshop on Machine Learning for Sensory Data Analysis, 4. ACM.

Sapp, Stephanie, Mark J van der Laan, and John Canny. 2014. “Subsemble: An Ensemble Method for Combining Subset-Specific Algorithm Fits.” Journal of Applied Statistics 41 (6). Taylor & Francis: 1247–59.

Sarle, Warren S. n.d. “Comp.ai.neural-Nets Faq.” http://www.faqs.org/faqs/ai-faq/neural-nets/part2/.

Segal, Mark R. 2004. “Machine Learning Benchmarks and Random Forest Regression.” UCSF: Center for Bioinformatics and Molecular Biostatistics.

Shah, Anoop D, Jonathan W Bartlett, James Carpenter, Owen Nicholas, and Harry Hemingway. 2014. “Comparison of Random Forest and Parametric Imputation Models for Imputing Missing Data Using Mice: A Caliber Study.” American Journal of Epidemiology 179 (6). Oxford University Press: 764–74.

Srivastava, Nitish, Geoffrey Hinton, Alex Krizhevsky, Ilya Sutskever, and Ruslan Salakhutdinov. 2014a. “Dropout: A Simple Way to Prevent Neural Networks from Overfitting.” Journal of Machine Learning Research 15: 1929–58. http://jmlr.org/papers/v15/srivastava14a.html.

———. 2014b. “Dropout: A Simple Way to Prevent Neural Networks from Overfitting.” The Journal of Machine Learning Research 15 (1). JMLR. org: 1929–58.

Staniak, Mateusz, and Przemyslaw Biecek. 2018. “Explanations of Model Predictions with Live and breakDown Packages.” arXiv Preprint arXiv:1804.01955.

Stekhoven, Daniel J. 2015. “MissForest: Nonparametric Missing Value Imputation Using Random Forest.” Astrophysics Source Code Library.

Stone, Charles J, Mark H Hansen, Charles Kooperberg, Young K Truong, and others. 1997. “Polynomial Splines and Their Tensor Products in Extended Linear Modeling: 1994 Wald Memorial Lecture.” The Annals of Statistics 25 (4). Institute of Mathematical Statistics: 1371–1470.

Strobl, Carolin, Anne-Laure Boulesteix, Achim Zeileis, and Torsten Hothorn. 2007. “Bias in Random Forest Variable Importance Measures: Illustrations, Sources and a Solution.” BMC Bioinformatics 8 (1). BioMed Central: 25.

Surowiecki, James. 2005. The Wisdom of Crowds. Anchor.

Štrumbelj, Erik, and Igor Kononenko. 2014. “Explaining Prediction Models and Individual Predictions with Feature Contributions.” Knowledge and Information Systems 41 (3): 647–65. https://doi.org/10.1007/s10115-013-0679-x.

Therneau, Terry M, Elizabeth J Atkinson, and others. 1997. “An Introduction to Recursive Partitioning Using the RPART Routines.” Mayo Foundation.

Tibshirani, Robert. 1996. “Regression Shrinkage and Selection via the Lasso.” Journal of the Royal Statistical Society. Series B (Methodological). JSTOR, 267–88.

Tibshirani, Robert, Guenther Walther, and Trevor Hastie. 2001. “Estimating the Number of Clusters in a Data Set via the Gap Statistic.” Journal of the Royal Statistical Society: Series B (Statistical Methodology) 63 (2). Wiley Online Library: 411–23.

Tierney, Nicholas. 2019. Visdat: Preliminary Visualisation of Data. https://CRAN.R-project.org/package=visdat.

Trevor Hastie, Stephen Milborrow. Derived from mda:mars by, and Rob Tibshirani. Uses Alan Miller’s Fortran utilities with Thomas Lumley’s leaps wrapper. 2019. Earth: Multivariate Adaptive Regression Splines. https://CRAN.R-project.org/package=earth.

Udell, Madeleine, Corinne Horn, Reza Zadeh, Stephen Boyd, and others. 2016. “Generalized Low Rank Models.” Foundations and Trends in Machine Learning 9 (1). Now Publishers, Inc.: 1–118.

Van der Laan, Mark J, Eric C Polley, and Alan E Hubbard. 2007. “Super Learner.” Statistical Applications in Genetics and Molecular Biology 6 (1). De Gruyter.

Vincent, Pascal. 2011. “A Connection Between Score Matching and Denoising Autoencoders.” Neural Computation 23 (7). MIT Press: 1661–74.

Vincent, Pascal, Hugo Larochelle, Yoshua Bengio, and Pierre-Antoine Manzagol. 2008. “Extracting and Composing Robust Features with Denoising Autoencoders.” In Proceedings of the 25th Iternational Conference on Machine Learning, 1096–1103. ACM.

West, Brady T, Kathleen B Welch, and Andrzej T Galecki. 2014. Linear Mixed Models: A Practical Guide Using Statistical Software. Chapman; Hall/CRC.

Wickham, Hadley. 2014. Advanced R. Chapman; Hall/CRC.

Wickham, Hadley, and Garrett Grolemund. 2016. R for Data Science: Import, Tidy, Transform, Visualize, and Model Data. O’Reilly Media, Inc.

Wickham, Hadley, and others. 2014. “Tidy Data.” Journal of Statistical Software 59 (10). Foundation for Open Access Statistics: 1–23.

Wolpert, David H. 1996. “The Lack of a Priori Distinctions Between Learning Algorithms.” Neural Computation 8 (7). MIT Press: 1341–90.

Wolpert, David H. 1992. “Stacked Generalization.” Neural Networks 5: 241–59.

Wright, Marvin, and Andreas Ziegler. 2017. “Ranger: A Fast Implementation of Random Forests for High Dimensional Data in C++ and R.” Journal of Statistical Software, Articles 77 (1): 1–17. https://doi.org/10.18637/jss.v077.i01.

Zhang, Wei, Deli Zhao, and Xiaogang Wang. 2013. “Agglomerative Clustering via Maximum Incremental Path Integral.” Pattern Recognition 46 (11). Elsevier: 3056–65.

Zhao, Deli, and Xiaoou Tang. 2009. “Cyclizing Clusters via Zeta Function of a Graph.” In Advances in Neural Information Processing Systems, 1953–60.

Zheng, Alice, and Amanda Casari. 2018. Feature Engineering for Machine Learning: Principles and Techniques for Data Scientists. O’Reilly Media, Inc.

Zhou, Chong, and Randy C Paffenroth. 2017. “Anomaly Detection with Robust Deep Autoencoders.” In Proceedings of the 23rd Acm Sigkdd International Conference on Knowledge Discovery and Data Mining, 665–74. ACM.

Zou, Hui, and Trevor Hastie. 2005. “Regularization and Variable Selection via the Elastic Net.” Journal of the Royal Statistical Society: Series B (Statistical Methodology) 67 (2). Wiley Online Library: 301–20.

Zumel, Nina, and John Mount. 2016. “Vtreat: A Data. Frame Processor for Predictive Modeling.” arXiv Preprint arXiv:1611.09477.