class: title-slide <a href="https://github.com/bradleyboehmke/random-forest-training"><img style="position: absolute; top: 0; right: 0; border: 0;" src="https://s3.amazonaws.com/github/ribbons/forkme_right_darkblue_121621.png" alt="Fork me on GitHub"></a> <br><br><br><br> # .font130[Decision Trees, Bagging, & Random Forests] ## .font130[with an example implementation in
<i class="fab fa-r-project faa-pulse animated faa-slow " style=" color:steelblue;"></i>
] ### Brad Boehmke ### 2018-12-05 ### Slides: [bit.ly/random-forests-training](http://bit.ly/random-forests-training) --- class: center, middle, inverse # Introduction --- # About me .pull-left[ <img src="images/name-tag.png" width="1360" style="display: block; margin: auto;" /> * <svg style="height:0.8em;top:.04em;position:relative;fill:steelblue;" viewBox="0 0 496 512"><path d="M336.5 160C322 70.7 287.8 8 248 8s-74 62.7-88.5 152h177zM152 256c0 22.2 1.2 43.5 3.3 64h185.3c2.1-20.5 3.3-41.8 3.3-64s-1.2-43.5-3.3-64H155.3c-2.1 20.5-3.3 41.8-3.3 64zm324.7-96c-28.6-67.9-86.5-120.4-158-141.6 24.4 33.8 41.2 84.7 50 141.6h108zM177.2 18.4C105.8 39.6 47.8 92.1 19.3 160h108c8.7-56.9 25.5-107.8 49.9-141.6zM487.4 192H372.7c2.1 21 3.3 42.5 3.3 64s-1.2 43-3.3 64h114.6c5.5-20.5 8.6-41.8 8.6-64s-3.1-43.5-8.5-64zM120 256c0-21.5 1.2-43 3.3-64H8.6C3.2 212.5 0 233.8 0 256s3.2 43.5 8.6 64h114.6c-2-21-3.2-42.5-3.2-64zm39.5 96c14.5 89.3 48.7 152 88.5 152s74-62.7 88.5-152h-177zm159.3 141.6c71.4-21.2 129.4-73.7 158-141.6h-108c-8.8 56.9-25.6 107.8-50 141.6zM19.3 352c28.6 67.9 86.5 120.4 158 141.6-24.4-33.8-41.2-84.7-50-141.6h-108z"/></svg> bradleyboehmke.github.io * <svg style="height:0.8em;top:.04em;position:relative;fill:steelblue;" viewBox="0 0 496 512"><path d="M165.9 397.4c0 2-2.3 3.6-5.2 3.6-3.3.3-5.6-1.3-5.6-3.6 0-2 2.3-3.6 5.2-3.6 3-.3 5.6 1.3 5.6 3.6zm-31.1-4.5c-.7 2 1.3 4.3 4.3 4.9 2.6 1 5.6 0 6.2-2s-1.3-4.3-4.3-5.2c-2.6-.7-5.5.3-6.2 2.3zm44.2-1.7c-2.9.7-4.9 2.6-4.6 4.9.3 2 2.9 3.3 5.9 2.6 2.9-.7 4.9-2.6 4.6-4.6-.3-1.9-3-3.2-5.9-2.9zM244.8 8C106.1 8 0 113.3 0 252c0 110.9 69.8 205.8 169.5 239.2 12.8 2.3 17.3-5.6 17.3-12.1 0-6.2-.3-40.4-.3-61.4 0 0-70 15-84.7-29.8 0 0-11.4-29.1-27.8-36.6 0 0-22.9-15.7 1.6-15.4 0 0 24.9 2 38.6 25.8 21.9 38.6 58.6 27.5 72.9 20.9 2.3-16 8.8-27.1 16-33.7-55.9-6.2-112.3-14.3-112.3-110.5 0-27.5 7.6-41.3 23.6-58.9-2.6-6.5-11.1-33.3 2.6-67.9 20.9-6.5 69 27 69 27 20-5.6 41.5-8.5 62.8-8.5s42.8 2.9 62.8 8.5c0 0 48.1-33.6 69-27 13.7 34.7 5.2 61.4 2.6 67.9 16 17.7 25.8 31.5 25.8 58.9 0 96.5-58.9 104.2-114.8 110.5 9.2 7.9 17 22.9 17 46.4 0 33.7-.3 75.4-.3 83.6 0 6.5 4.6 14.4 17.3 12.1C428.2 457.8 496 362.9 496 252 496 113.3 383.5 8 244.8 8zM97.2 352.9c-1.3 1-1 3.3.7 5.2 1.6 1.6 3.9 2.3 5.2 1 1.3-1 1-3.3-.7-5.2-1.6-1.6-3.9-2.3-5.2-1zm-10.8-8.1c-.7 1.3.3 2.9 2.3 3.9 1.6 1 3.6.7 4.3-.7.7-1.3-.3-2.9-2.3-3.9-2-.6-3.6-.3-4.3.7zm32.4 35.6c-1.6 1.3-1 4.3 1.3 6.2 2.3 2.3 5.2 2.6 6.5 1 1.3-1.3.7-4.3-1.3-6.2-2.2-2.3-5.2-2.6-6.5-1zm-11.4-14.7c-1.6 1-1.6 3.6 0 5.9 1.6 2.3 4.3 3.3 5.6 2.3 1.6-1.3 1.6-3.9 0-6.2-1.4-2.3-4-3.3-5.6-2z"/></svg> @bradleyboehmke * <svg style="height:0.8em;top:.04em;position:relative;fill:steelblue;" viewBox="0 0 512 512"><path d="M459.37 151.716c.325 4.548.325 9.097.325 13.645 0 138.72-105.583 298.558-298.558 298.558-59.452 0-114.68-17.219-161.137-47.106 8.447.974 16.568 1.299 25.34 1.299 49.055 0 94.213-16.568 130.274-44.832-46.132-.975-84.792-31.188-98.112-72.772 6.498.974 12.995 1.624 19.818 1.624 9.421 0 18.843-1.3 27.614-3.573-48.081-9.747-84.143-51.98-84.143-102.985v-1.299c13.969 7.797 30.214 12.67 47.431 13.319-28.264-18.843-46.781-51.005-46.781-87.391 0-19.492 5.197-37.36 14.294-52.954 51.655 63.675 129.3 105.258 216.365 109.807-1.624-7.797-2.599-15.918-2.599-24.04 0-57.828 46.782-104.934 104.934-104.934 30.213 0 57.502 12.67 76.67 33.137 23.715-4.548 46.456-13.32 66.599-25.34-7.798 24.366-24.366 44.833-46.132 57.827 21.117-2.273 41.584-8.122 60.426-16.243-14.292 20.791-32.161 39.308-52.628 54.253z"/></svg> @bradleyboehmke * <svg style="height:0.8em;top:.04em;position:relative;fill:steelblue;" viewBox="0 0 448 512"><path d="M416 32H31.9C14.3 32 0 46.5 0 64.3v383.4C0 465.5 14.3 480 31.9 480H416c17.6 0 32-14.5 32-32.3V64.3c0-17.8-14.4-32.3-32-32.3zM135.4 416H69V202.2h66.5V416zm-33.2-243c-21.3 0-38.5-17.3-38.5-38.5S80.9 96 102.2 96c21.2 0 38.5 17.3 38.5 38.5 0 21.3-17.2 38.5-38.5 38.5zm282.1 243h-66.4V312c0-24.8-.5-56.7-34.5-56.7-34.6 0-39.9 27-39.9 54.9V416h-66.4V202.2h63.7v29.2h.9c8.9-16.8 30.6-34.5 62.9-34.5 67.2 0 79.7 44.3 79.7 101.9V416z"/></svg> @bradleyboehmke * <svg style="height:0.8em;top:.04em;position:relative;fill:steelblue;" viewBox="0 0 512 512"><path d="M502.3 190.8c3.9-3.1 9.7-.2 9.7 4.7V400c0 26.5-21.5 48-48 48H48c-26.5 0-48-21.5-48-48V195.6c0-5 5.7-7.8 9.7-4.7 22.4 17.4 52.1 39.5 154.1 113.6 21.1 15.4 56.7 47.8 92.2 47.6 35.7.3 72-32.8 92.3-47.6 102-74.1 131.6-96.3 154-113.7zM256 320c23.2.4 56.6-29.2 73.4-41.4 132.7-96.3 142.8-104.7 173.4-128.7 5.8-4.5 9.2-11.5 9.2-18.9v-19c0-26.5-21.5-48-48-48H48C21.5 64 0 85.5 0 112v19c0 7.4 3.4 14.3 9.2 18.9 30.6 23.9 40.7 32.4 173.4 128.7 16.8 12.2 50.2 41.8 73.4 41.4z"/></svg> bradleyboehmke@gmail.com ] .pull-right[ #### Family <img src="images/family.png" align="right" alt="family" width="130" /> * Dayton, OH * Kate, Alivia (9), Jules (6) #### Professional * 84.51° - Data Science Enabler <img src="images/logo8451.jpg" align="right" alt="family" width="150" /> #### Academic * University of Cincinnati <img src="images/uc.png" align="right" alt="family" width="100" /> * Air Force Institute of Technology #### R Ecosystem <img src="images/r-contributions-hex.png" alt="family" width="700" /> ] --- class: clear, center, middle background-image: url(images/single-tree.gif) background-size: cover .font300.white[Decision Trees] ??? Image credit: [giphy](https://giphy.com/gifs/tree-U85Z0lxOwDoys?utm_source=media-link&utm_medium=landing&utm_campaign=Media%20Links&utm_term=) --- # Basic Idea <img src="images/dt-01.png" width="90%" height="90%" style="display: block; margin: auto;" /> .center[.content-box-gray[.bold[Will a customer redeem a coupon]]] --- # A .red[ruleset] model <img src="images/dt-03.png" width="90%" height="90%" style="display: block; margin: auto;" /> .font90[`if Loyal Customer = Yes and Household income >= $150K and Shopping mode = store then coupon redemption = Yes`] --- # Terminology <img src="images/dt-02.png" width="90%" height="90%" style="display: block; margin: auto;" /> --- # Growing the tree .pull-left[ ### Algorithms - ID3 (Iterative Dichotomiser 3) - C4.5 (successor of ID3) - CART (Classification And Regression Tree) - CHAID (CHi-squared Automatic Interaction Detector) - MARS: (Multivariate Adaptive Regression Splines) - Conditional Inference Trees - and more... ] --- # Growing the tree .pull-left[ ### Algorithms - ID3 (Iterative Dichotomiser 3) - C4.5 (successor of ID3) - .bold.blue[CART (Classification And Regression Tree)] - CHAID (CHi-squared Automatic Interaction Detector) - MARS: (Multivariate Adaptive Regression Splines) - Conditional Inference Trees - and more... ] .pull-right[ ### CART Features
<i class="fas fa-shopping-cart faa-passing animated faa-slow "></i>
- Classification and regression trees - Continuous and discrete features - Partitioning - Greedy top-down - Strictly binary splits (tends to produce tall/deep trees) - Variance reduction in regression trees - Gini impurity in classification trees - Cost complexity pruning - [
<i class="ai ai-google-scholar faa-tada animated-hover "></i>(Breiman, 1984)
](https://www.taylorfrancis.com/books/9781351460491) ] <br> .center[.content-box-gray[.bold[Most common decision tree algorithm]]] --- # Best .red[
Binary
] Partitioning .pull-left[ .center.font130.bold[Regression tree] <img src="images/regression-partition.png" width="90%" style="display: block; margin: auto;" /> ] .pull-right[ .center.font130.bold[Classification tree] <img src="images/classification-partition.png" width="90%" style="display: block; margin: auto;" /> ] <br> .center[.content-box-gray[.bold[Objective: Minimize disimilarity in terminal nodes]]] --- # Best .red[Binary] Partitioning .pull-left[ <br> - __Numeric feature__: Numeric split to minimize loss function <br><br><br><br><br> - __Binary feature__: Category split to minimize loss function <br><br><br><br><br> - __Multiclass feature__: Order feature classes based on mean target variable (regression) or class proportion (classification) and choose split to minimize loss function ([
<i class="ai ai-google-scholar faa-tada animated-hover "></i>See ESL, section 9.2.4 for details
](https://web.stanford.edu/~hastie/ElemStatLearn/)). ] .pull-right[ <img src="images/splitting-rules.png" width="55%" height="55%" style="display: block; margin: auto;" /> ] --- # How deep to grow a tree? Say we have the given data generated from the underlying .blue["truth"] function <br><br> <img src="slides-source_files/figure-html/unnamed-chunk-7-1.png" style="display: block; margin: auto;" /> --- # Depth = 1 (decision .red[stump]
<img src="images/stump.png" style="height:1em; width:auto; "/>
) .scrollable90[ .pull-left[ <img src="slides-source_files/figure-html/unnamed-chunk-8-1.png" width="100%" style="display: block; margin: auto;" /> ``` ## ## Model formula: ## y ~ x ## ## Fitted party: ## [1] root ## | [2] x >= 3.07863: -0.665 (n = 255, err = 95.5) ## | [3] x < 3.07863: 0.640 (n = 245, err = 75.9) ## ## Number of inner nodes: 1 ## Number of terminal nodes: 2 ``` ] .pull-right[ <img src="slides-source_files/figure-html/unnamed-chunk-9-1.png" style="display: block; margin: auto;" /> ] ] --- # Depth = 3
<img src="images/small-tree-icon.png" style="height:1em; width:auto; "/>
.scrollable90[ .pull-left[ <img src="slides-source_files/figure-html/unnamed-chunk-10-1.png" width="100%" style="display: block; margin: auto;" /> ``` ## ## Model formula: ## y ~ x ## ## Fitted party: ## [1] root ## | [2] x >= 3.07863 ## | | [3] x >= 3.65785 ## | | | [4] x < 5.53399: -0.948 (n = 149, err = 40.0) ## | | | [5] x >= 5.53399: -0.316 (n = 60, err = 15.6) ## | | [6] x < 3.65785 ## | | | [7] x < 3.20455: -0.476 (n = 10, err = 0.9) ## | | | [8] x >= 3.20455: -0.130 (n = 36, err = 9.0) ## | [9] x < 3.07863 ## | | [10] x < 0.52255 ## | | | [11] x < 0.28331: 0.142 (n = 23, err = 4.8) ## | | | [12] x >= 0.28331: 0.390 (n = 19, err = 5.1) ## | | [13] x >= 0.52255 ## | | | [14] x >= 2.26018: 0.440 (n = 65, err = 13.7) ## | | | [15] x < 2.26018: 0.852 (n = 138, err = 36.6) ## ## Number of inner nodes: 7 ## Number of terminal nodes: 8 ``` ] .pull-right[ <img src="slides-source_files/figure-html/unnamed-chunk-11-1.png" style="display: block; margin: auto;" /> ] ] --- # Depth = 20 (.red[complex tree]
<img src="images/large-tree-icon.png" style="height:1em; width:auto; "/>
) .scrollable90[ .pull-left[ <img src="slides-source_files/figure-html/unnamed-chunk-12-1.png" height="100%" style="display: block; margin: auto;" /> ] .pull-right[ <img src="slides-source_files/figure-html/unnamed-chunk-13-1.png" style="display: block; margin: auto;" /> ] ] --- # .red[Two Predictor] Decision Boundaries .pull-left[ ### Classification problem: Iris data <img src="slides-source_files/figure-html/unnamed-chunk-14-1.png" style="display: block; margin: auto;" /> ] --- # .red[Two Predictor] Decision Boundaries .pull-left[ ### Classification problem: Iris data <img src="slides-source_files/figure-html/unnamed-chunk-15-1.png" style="display: block; margin: auto;" /> ] .pull-right[ ### Classification tree <img src="slides-source_files/figure-html/unnamed-chunk-16-1.png" height="100%" style="display: block; margin: auto;" /> ] --- # Minimize overfitting .pull-left[ .font110[Must balance the depth and complexity of the tree to .bold[generalize] to unseen data] 2 main options: * Early stopping * Restrict tree depth * Restrict node size * Pruning ] .pull-right[ <img src="slides-source_files/figure-html/unnamed-chunk-17-1.png" style="display: block; margin: auto;" /> .center[.content-box-gray[.bold[Trees have a tendency to overfit]]] ] --- # Minimize overfitting: Early stopping
<i class="fas fa-stop-circle faa-tada animated-hover " style=" color:red;"></i>
.pull-left[ .blue[Limit tree depth]: Stop splitting after a certain depth <img src="slides-source_files/figure-html/maxdepth-1.gif" style="display: block; margin: auto;" /> ] -- .pull-right[ .blue[Minimum node “size”]: Do not split intermediate node which contains too few data points <img src="slides-source_files/figure-html/minbucket-1.gif" style="display: block; margin: auto;" /> ] --- # Minimize overfitting: Pruning
<i class="fas fa-cut faa-tada animated-hover " style=" color:red;"></i>
.pull-left[ 1. .font120[Grow a very large tree] ] .pull-right[ <img src="slides-source_files/figure-html/unnamed-chunk-18-1.png" style="display: block; margin: auto;" /> .center[.content-box-gray[.bold[Deep trees overfit]]] ] --- # Minimize overfitting: Pruning
<i class="fas fa-cut faa-tada animated-hover " style=" color:red;"></i>
.pull-left[ 1. Grow a very large tree 2. Prune it back with a _.red[cost complexity parameter]_ ( `\(\alpha\)` ) `\(\times\)` number of terminal nodes ( `\(T\)` ) to find an optimal subtree: - Very similar to lasso penalty in regularized regression - Large `\(\alpha =\)` small tree - Small `\(\alpha =\)` large tree - Find optimal `\(\alpha\)` with cross validation $$ \text{minimize: loss function} + \alpha |T| $$ ] .pull-right[ <img src="slides-source_files/figure-html/unnamed-chunk-19-1.png" style="display: block; margin: auto;" /> .center[.content-box-gray[.bold[Penalize depth to generalize]]] ] --- # Feature/Target Pre-processing Considerations <br> * __Monotonic transformations__ (i.e. log, exp, sqrt): .blue[Not required] to meet algorithm assumptions as in many parametric models; only shifts the optimal split points. * __Removing outliers__: .blue[unnecessary] as the emphasis is on a single binary split and outliers are not going to bias that split. * __One-hot encoding__: .blue[unncessary] and actually forces artificial relationships between categorical levels. Also, by increasing `\(p\)`, we reduce the probability that influential levels and variable interactions will be identified. * __Missing values__: .blue[unnecessary] as most algorithms will 1) create new "missing" class for categorical variables, 2) auto-impute for continuous variables, or 3) use *surrogate* splits --- # Variable importance Once we have a final model, we can find the most .red[influential variables] based on those that have the .red[largest reduction] in our loss function: .pull-left[ <img src="slides-source_files/figure-html/unnamed-chunk-20-1.png" height="100%" style="display: block; margin: auto;" /> ] .pull-right[ ``` ## Variable Importance ## 1 rm 23825.9224 ## 2 lstat 15047.9426 ## 3 dis 5385.2076 ## 4 indus 5313.9748 ## 5 tax 4205.2067 ## 6 ptratio 4202.2984 ## 7 nox 4166.1230 ## 8 age 3969.2913 ## 9 crim 2753.2843 ## 10 zn 1604.5566 ## 11 rad 1007.6588 ## 12 black 408.1277 ``` ] --- # Variable importance Once we have a final model, we can find the most .red[influential variables] based on those that have the .red[largest reduction] in our loss function: .pull-left[ <img src="slides-source_files/figure-html/unnamed-chunk-22-1.png" height="100%" style="display: block; margin: auto;" /> ] .pull-right[ <img src="slides-source_files/figure-html/unnamed-chunk-23-1.png" style="display: block; margin: auto;" /> ] --- # Strengths & Weaknesses .pull-left[ ### Strengths
<img src="https://emojis.slackmojis.com/emojis/images/1471045870/910/rock.gif?1471045870" style="height:1em; width:auto; "/>
- .green[Small trees are easy to interpret] - .green[Trees scale well to large _N_] (fast!!) - .green[Can handle data of all types] (i.e., requires little, if any, preprocessing) - .green[Automatic variable selection] - .green[Can handle missing data] - .green[Completely nonparametric] ] -- .pull-right[ ### Weaknesses
<img src="https://emojis.slackmojis.com/emojis/images/1471045885/967/wtf.gif?1471045885" style="height:1.25em; width:auto; "/>
- .red[Large trees can be difficult to interpret] - .red[All splits depend on previous splits] (i.e. capturing interactions
<i class="fas fa-thumbs-up faa-FALSE animated " style=" color:green;"></i>
; additive models
<i class="fas fa-thumbs-down faa-FALSE animated " style=" color:red;"></i>
) - .red[Trees are step functions] (i.e., binary splits) - .red[Single trees typically have poor predictive accuracy] - .red[Single trees have high variance] (easy to overfit to training data) ] --- class: clear, center, middle background-image: url(images/bagging-icon.jpg) background-size: cover .font300.white[Bagging] ??? Image credit: [unsplash](https://unsplash.com/photos/19SC2oaVZW0) --- # The problem with single trees .pull-left[ .center[.font120[.bold[Single pruned trees are poor predictors]]] <img src="slides-source_files/figure-html/unnamed-chunk-24-1.png" style="display: block; margin: auto;" /> ] .pull-right[ .center[.font120[.bold[Single deep trees are noisy]]] <img src="slides-source_files/figure-html/unnamed-chunk-25-1.png" style="display: block; margin: auto;" /> ] .center[.content-box-gray[Bagging uses this high variance to our advantage
<i class="fas fa-arrow-up faa-FALSE animated " style=" color:red;"></i>
]] --- # .red[B]ootstrap .red[Agg]regat.red[ing]: wisdom of the crowd .pull-left[ 1. Sample records with replacement (aka "bootstrap" the training data) 2. .white[Fit an overgrown tree to the resampled data set] 3. .white[Average predictions] ] .pull-right[ <img src="images/bagging-fig1.png" width="1379" style="display: block; margin: auto;" /> ] --- # .red[B]ootstrap .red[Agg]regat.red[ing]: wisdom of the crowd .pull-left[ 1. .opacity[.grey[Sample records with replacement (aka "bootstrap" the training data)]] 2. Fit an
overgrown
tree to each resampled data set 3. .white[Average predictions] ] .pull-right[ <img src="images/bagging-fig2.png" width="1384" style="display: block; margin: auto;" /> ] --- # .red[B]ootstrap .red[Agg]regat.red[ing]: wisdom of the crowd .pull-left[ 1. .opacity[.grey[Sample records with replacement (aka "bootstrap" the training data)]] 2. Fit an
overgrown
tree to each resampled data set 3. .white[Average predictions] ] .pull-right[ <img src="slides-source_files/figure-html/unnamed-chunk-28-1.png" style="display: block; margin: auto;" /> ] --- # .red[B]ootstrap .red[Agg]regat.red[ing]: wisdom of the crowd .pull-left[ 1. .opacity[.grey[Sample records with replacement (aka "bootstrap" the training data)]] 2. .opacity[.grey[Fit an overgrown tree to each resampled data set]] 3. Average predictions ] .pull-right[ <img src="images/bagging-fig3.png" width="1385" style="display: block; margin: auto;" /> ] --- # .red[B]ootstrap .red[Agg]regat.red[ing]: wisdom of the crowd .pull-left[ .font120.bold[As we add more trees...] <img src="slides-source_files/figure-html/unnamed-chunk-30-1.gif" style="display: block; margin: auto;" /> ] .pull-right[ .font120.bold[our average prediction error reduces] <img src="slides-source_files/figure-html/unnamed-chunk-31-1.gif" style="display: block; margin: auto;" /> ] .center[.content-box-gray[.bold[Wisdom of the crowd in action]]] --- # However, a .red[problem remains] .bold[Bagging results in tree correlation...] <img src="images/tree-correlation-1.png" width="70%" style="display: block; margin: auto;" /> .center[.content-box-gray[.bold[which prevents bagging from optimally reducing variance of the predictive values]]
<img src="https://emojis.slackmojis.com/emojis/images/1471045851/836/headbang.gif?1471045851" style="height:3em; width:auto; "/>
] --- class: clear, center, middle background-image: url(images/rf-icon.jpg) background-size: cover .font300.white[Random Forests] ??? Image credit: [unsplash](https://unsplash.com/photos/5KvErlbdeyo) --- # Idea .pull-left[ ### Split-variable randomization * .font120[Follow a similar bagging process but... ] ] .pull-right[ <img src="images/bagged-trees-illustration.png" width="1484" style="display: block; margin: auto;" /> .center[.content-box-gray[.bold[Bagging produces many correlated trees]]] ] --- # Idea .pull-left[ ### Split-variable randomization * Follow a similar bagging process but... * each time a split is to be performed, the search for the split variable is .blue[limited to a random subset of *m* of the *p* variables] - regression trees: `\(m = \frac{p}{3}\)` - classification trees: `\(m = \sqrt{p}\)` - `\(m\)` is commonly referred to as .blue[___mtry___] .white[ * Bagging introduces randomness into the rows of the data * Random forest introduces randomness into the rows and columns of the data ] ] .pull-right[ <img src="images/rf-trees-illustration.png" width="1491" style="display: block; margin: auto;" /> .center[.content-box-gray[.bold[Random Forests produce many unique trees]]] ] --- # Bagging vs Random Forest .pull-left[ .opacity[ ### Split-variable randomization * Follow a similar bagging process but... * each time a split is to be performed, the search for the split variable is limited to a random subset of *m* of the *p* variables ] * Bagging introduces .red[randomness into the rows] of the data * Random forest introduces .red[randomness into the rows and columns] of the data ] .pull-right[ <img src="slides-source_files/figure-html/unnamed-chunk-35-1.gif" style="display: block; margin: auto;" /> ] .center[.bold[.green[Combined, this provides a more diverse set of trees that almost always lowers our prediction error.]]] --- # Out-of-bag
<i class="fas fa-shopping-bag faa-pulse animated-hover " style=" color:red;"></i>
.pull-left[ .font80[ * For large enough N, on average, 63.21% or the original records end up in any bootstrap sample * Roughly 36.79% of the observations are not used in the construction of a particular tree * These observations are considered .red[out-of-bag (OOB)] and can be used for efficient assessment of model performance (.bold[unstructured, but free, cross-validation])] .font90[.blue[Pro tip: - When N is small, OOB is less reliable than validation - As N increases, OOB is far more efficient than *k*-fold CV - When the number of trees are about 3x the number needed for the random forest to stabilize, the OOB error estimate is equivalent to leave-one-out cross-validation error. ] ] ] .pull-right[ <img src="slides-source_files/figure-html/unnamed-chunk-36-1.gif" style="display: block; margin: auto;" /> ] --- # Tuning
<i class="fas fa-cog faa-spin animated faa-slow " style=" color:red;"></i>
Random forests provide good "out-of-the-
<i class="fas fa-box-open faa-pulse animated-hover "></i>
" performance but there are a few parameters we can tune to increase performance. -- .pull-left[ - .blue[Number of trees] - .blue[mtry] - .grey[Node size] - .grey[Sampling scheme] - .green[Split rule] ] .pull-right[ - .blue[Typically have the largest impact on predictive accuracy.] <br> - .grey[Tend to have marginal impact on predictive accuracy but still worth exploring. Can also increase computational efficiency.] <br> - .green[Generally used to increase computational efficiency] ] --- # Tuning
<i class="fas fa-cog faa-spin animated faa-slow " style=" color:red;"></i>
Random forests provide good "out-of-the-
<i class="fas fa-box-open faa-pulse animated-hover "></i>
" performance but there are a few parameters we can tune to increase performance. .font90[ .pull-left[ - .blue.bold[Number of trees] `\(^a\)` - .bold[Why]: stabalize the error - .bold[Rule of thumb]: start with `\(p \times 10\)` trees and adjust as necessary - .bold[Caveats]: - small mtry and sample size values and/or larger node size values result in less correlated trees; therefore requiring more trees to converge. - more trees provide more robust/stable error & variable importance measures - .bold[Impact on computation time]: increases linearly with the number of trees ] ] .pull-right[ <img src="slides-source_files/figure-html/tuning-trees-1.png" style="display: block; margin: auto;" /> ] .font70[ *a) Technically, the number of trees is not a real tuning parameter but it is important to have a sufficient number for the estimate to stabilize.*] --- # Tuning
<i class="fas fa-cog faa-spin animated faa-slow " style=" color:red;"></i>
Random forests provide good "out-of-the-
<i class="fas fa-box-open faa-pulse animated-hover "></i>
" performance but there are a few parameters we can tune to increase performance. .font90[ .pull-left[ - .blue.bold[Mtry] - .bold[Why]: balance low tree correlation and reasonable predictive strength - .bold[Rule of thumb]: - Regression default: `\(\frac{p}{3}\)` - Classification default: `\(\sqrt{p}\)` - start with 5 values evenly spaced across the range from 2 to *p* (include the default) - .bold[Caveats]: - few relevant predictors:
<i class="fas fa-arrow-up faa-FALSE animated "></i>
mtry - many relevant predictors:
<i class="fas fa-arrow-down faa-FALSE animated "></i>
mtry - .bold[Impact on computation time]: increases approx linearly with higher mtry values. ] ] .pull-right[ <img src="slides-source_files/figure-html/tuning-mtry-1.png" style="display: block; margin: auto;" /> ] --- # Tuning
<i class="fas fa-cog faa-spin animated faa-slow " style=" color:red;"></i>
Random forests provide good "out-of-the-
<i class="fas fa-box-open faa-pulse animated-hover "></i>
" performance but there are a few parameters we can tune to increase performance. .font90[ .pull-left[ - .blue.bold[Node size] - .bold[Why]: balance tree complexity - .bold[Rule of thumb]: - Regression default: 5 - Classification default: 1 - start with 3 values (1, 5, 10) - .bold[Caveats]: - many noisy predictors:
<i class="fas fa-arrow-up faa-FALSE animated "></i>
node size - if higher mtry values are performing best,
<i class="fas fa-arrow-up faa-FALSE animated "></i>
node size - .bold[Impact on computation time]: increases approx exponentially with small node sizes. - for very large data sets:
<i class="fas fa-arrow-up faa-FALSE animated "></i>
node size ] ] .pull-right[ <img src="slides-source_files/figure-html/tuning-node-size-1.png" style="display: block; margin: auto;" /> ] --- # Tuning
<i class="fas fa-cog faa-spin animated faa-slow " style=" color:red;"></i>
Random forests provide good "out-of-the-
<i class="fas fa-box-open faa-pulse animated-hover "></i>
" performance but there are a few parameters we can tune to increase performance. <br><br> .font90[ - .blue.bold[.opacity20[Node size] / Required split size / Max number of nodes / Max depth] - Alternative parameters exist that can control tree complexity; however, most preferred random forest packages (__ranger__, __H2O__) focus on node size. - See [
<i class="ai ai-google-scholar faa-tada animated-hover "></i>(Probst et al., 2018)
](https://arxiv.org/pdf/1804.03515.pdf) for short discussion. ] --- # Tuning
<i class="fas fa-cog faa-spin animated faa-slow " style=" color:red;"></i>
Random forests provide good "out-of-the-
<i class="fas fa-box-open faa-pulse animated-hover "></i>
" performance but there are a few parameters we can tune to increase performance. .font90[ .pull-left[ - .blue.bold[Sampling scheme] - .bold[Why]: balance low tree correlation and reasonable predictive strength - .bold[Rule of thumb]: - default value is 100% with replacement - assess 3-4 values ranging from 25%-100% - .bold[Caveats]: - if you have dominating features -
<i class="fas fa-arrow-down faa-FALSE animated "></i>
sample size to minimize tree correlation - if you have many categorical features with varying number of levels - try sampling without replacement - .bold[Impact on computation time]: - for very large data sets:
<i class="fas fa-arrow-down faa-FALSE animated "></i>
sample size to decrease compute time ] ] .pull-right[ <img src="slides-source_files/figure-html/tuning-sampling-scheme-1.png" style="display: block; margin: auto;" /> ] --- # Tuning
<i class="fas fa-cog faa-spin animated faa-slow " style=" color:red;"></i>
Random forests provide good "out-of-the-
<i class="fas fa-box-open faa-pulse animated-hover "></i>
" performance but there are a few parameters we can tune to increase performance.<br><br> .font90[ - .blue.bold[Split rule] - .bold[Why]: Balance tree correlation and run time - .bold[Rule of thumb]: - Regression default: variance - Classification default: Gini / cross-entropy - .bold[Caveats]: - Default split rules favor variables with many possible splits (continuous & categorical w/many levels) - Try extra random tree splitting [
<i class="ai ai-google-scholar faa-tada animated-hover "></i>(Geurts et al., 2006)
](https://link.springer.com/article/10.1007/s10994-006-6226-1) if: - many categorical variables with few levels - need to reduce run time - .bold[Impact on computation time]: Completely random split rule minimizes compute time since optimal split is not assessed; splits are made at random ] --- # Variable Importance We have two approaches for .blue[model specific variable importance] with random forests: .font80[ .pull-left[ .font120.bold[Impurity] 1. At each split in each tree, compute the improvement in the split-criterion 2. Average the improvement made by each variable across all the trees that the variable is used 3. The variables with the largest average decrease in MSE are considered most important. <br> Notes: - more trees lead to more stable vi estimates - smaller mtry values lead to more equal vi estimates across all variables - bias towards variables with many categories or numeric values ] .pull-right[ .font120.bold[Permutation] 1. For each tree, the OOB sample is passed down the tree and the prediction accuracy is recorded. 2. Then the values for each variable (one at a time) are randomly permuted and the accuracy is again computed. 3. The decrease in accuracy as a result of this randomly “shaking up” of variable values is averaged over all the trees for each variable. 4. The variables with the largest average decrease in accuracy are considered most important. Notes: - more trees lead to more stable vi estimates - smaller mtry values lead to more equal vi estimates across all variables - categorical variables with many levels can have high variance vi estimates ] ] --- # Variable Importance The two tend to .blue[produce similar results but with slight differences in rank order]: .font80[ .pull-left[ .font120.bold[Impurity] <img src="slides-source_files/figure-html/vi-impurity-1.png" style="display: block; margin: auto;" /> ] .pull-right[ .font120.bold[Permutation] <img src="slides-source_files/figure-html/vi-permutation-1.png" style="display: block; margin: auto;" /> ] ] --- class: clear, center, middle background-image: url(images/everyone-can-random-forest.jpg) background-size: cover .pull-left[ <br><br><br><br><br><br><br><br><br><br><br><br><br> .font200.white[Implementation] ] ??? Image credit: [unsplash](https://unsplash.com/photos/mDinBvq1Sfg) --- # Prereqs .font130[Random Forest
<i class="fas fa-toolbox faa-FALSE animated "></i>
] * __h2o__: `\(n >> p\)` * __ranger__: `\(p >> n\)` (for this presentation I will demo __ranger__) .code60[ ```r # general EDA library(dplyr) library(ggplot2) # machine learning *library(ranger) *library(h2o) library(rsample) # data splitting library(vip) # visualize feature importance library(pdp) # visualize feature effects ``` ] -- .font130[Data] .code60[ ```r # Create training (70%) and test (30%) sets for the AmesHousing::make_ames() data. # Use set.seed for reproducibility set.seed(8451) ames_split <- initial_split(AmesHousing::make_ames(), prop = .7, strata = "Sale_Price") ames_train <- training(ames_split) ames_test <- testing(ames_split) ``` ] --- # Initial Implementation - training .pull-left[ .font80[ * `formula`: formula specification * `data`: training data * `num.trees`: number of trees in the forest * `mtry`: randomly selected predictor variables at each split. Default is `\(\texttt{floor}(\sqrt{\texttt{number of features}})\)` ; however, for regression problems the preferred `mtry` to start with is `\(\texttt{floor}(\frac{\texttt{number of features}}{3}) = \texttt{floor}(\frac{80}{3}) = 26\)` * `respect.unordered.factors`: specifies how to treat unordered factor variables. We recommend setting this to "order" ([
<i class="ai ai-google-scholar faa-tada animated-hover "></i>See ESL, section 9.2.4 for details
](https://web.stanford.edu/~hastie/ElemStatLearn/)). * `seed`: because this is a random algorithm, you will set the seed to get reproducible results ] ] .pull-right[ ```r # number of features features <- setdiff(names(ames_train), "Sale_Price") # perform basic random forest model fit_default <- ranger( formula = Sale_Price ~ ., data = ames_train, num.trees = length(features) * 10, mtry = floor(length(features) / 3), respect.unordered.factors = 'order', verbose = FALSE, seed = 123 ) ``` ] --- # Initial Implementation - results .code70[ ```r # look at results fit_default ## Ranger result ## ## Call: ## ranger(formula = Sale_Price ~ ., data = ames_train, num.trees = length(features) * 10, mtry = floor(length(features)/3), respect.unordered.factors = "order", verbose = FALSE, seed = 123) ## ## Type: Regression ## Number of trees: 800 ## Sample size: 2054 ## Number of independent variables: 80 ## Mtry: 26 ## Target node size: 5 ## Variable importance mode: none ## Splitrule: variance ## OOB prediction error (MSE): 620208087 ## R squared (OOB): 0.8957654 # compute RMSE (RMSE = square root of MSE) sqrt(fit_default$prediction.error) ## [1] 24903.98 ``` ] .center[.content-box-grey[.bold[Default results are based on OOB errors]]] --- # Characteristics to Consider What we do next should be driven by attributes of our data: -- .scrollable90[ .pull-left[ - Half our variables are numeric - Half are categorical variables with moderate number of levels - Likely will favor .blue[variance split rule] - May benefit from .blue[sampling w/o replacement] ```r ames_train %>% summarise_if(is.factor, n_distinct) %>% gather() %>% arrange(desc(value)) ## # A tibble: 46 x 2 ## key value ## <chr> <int> ## 1 Neighborhood 27 ## 2 Exterior_1st 16 ## 3 Exterior_2nd 16 ## 4 MS_SubClass 15 ## 5 Overall_Qual 10 ## 6 Sale_Type 10 ## 7 Condition_1 9 ## 8 Overall_Cond 9 ## 9 House_Style 8 ## 10 Functional 8 ## # ... with 36 more rows ``` ] .pull-right[ - We have highly correlated data (both btwn features and with target) - May favor .blue[lower mtry] and - .blue[lower node size] to help decorrelate the trees<br><br> ```r cor_matrix <- ames_train %>% mutate_if(is.factor, as.numeric) %>% cor() # feature correlation data_frame( row = rownames(cor_matrix)[row(cor_matrix)[upper.tri(cor_matrix)]], col = colnames(cor_matrix)[col(cor_matrix)[upper.tri(cor_matrix)]], corr = cor_matrix[upper.tri(cor_matrix)] ) %>% arrange(desc(abs(corr))) ## # A tibble: 3,240 x 3 ## row col corr ## <chr> <chr> <dbl> ## 1 BsmtFin_Type_1 BsmtFin_SF_1 1 ## 2 Garage_Cars Garage_Area 0.888 ## 3 Exterior_1st Exterior_2nd 0.856 ## 4 Gr_Liv_Area TotRms_AbvGrd 0.802 ## 5 Overall_Qual Sale_Price 0.800 ## 6 Total_Bsmt_SF First_Flr_SF 0.789 ## 7 MS_SubClass Bldg_Type 0.719 ## 8 House_Style Second_Flr_SF 0.713 ## 9 BsmtFin_Type_2 BsmtFin_SF_2 -0.702 ## 10 Gr_Liv_Area Sale_Price 0.694 ## # ... with 3,230 more rows # target correlation data_frame( row = rownames(cor_matrix)[row(cor_matrix)[upper.tri(cor_matrix)]], col = colnames(cor_matrix)[col(cor_matrix)[upper.tri(cor_matrix)]], corr = cor_matrix[upper.tri(cor_matrix)] ) %>% filter(col == "Sale_Price") %>% arrange(desc(abs(corr))) ## # A tibble: 78 x 3 ## row col corr ## <chr> <chr> <dbl> ## 1 Overall_Qual Sale_Price 0.800 ## 2 Gr_Liv_Area Sale_Price 0.694 ## 3 Exter_Qual Sale_Price -0.662 ## 4 Garage_Cars Sale_Price 0.655 ## 5 Garage_Area Sale_Price 0.652 ## 6 Total_Bsmt_SF Sale_Price 0.630 ## 7 Kitchen_Qual Sale_Price -0.625 ## 8 First_Flr_SF Sale_Price 0.617 ## 9 Bsmt_Qual Sale_Price -0.575 ## 10 Year_Built Sale_Price 0.571 ## # ... with 68 more rows ``` ] ] --- # Tuning But before we tune, do we have enough
<img src="images/large-tree-icon.png" style="height:1em; width:auto; "/>
s? .scrollable90[ .pull-left[ - Some pkgs provide OOB error for each tree - __ranger__ only provides overall OOB .code80[ ```r # number of features n_features <- ncol(ames_train) - 1 # tuning grid tuning_grid <- expand.grid( trees = seq(10, 1000, by = 20), rmse = NA ) for(i in seq_len(nrow(tuning_grid))) { fit <- ranger( formula = Sale_Price ~ ., data = ames_train, * num.trees = tuning_grid$trees[i], mtry = floor(n_features / 3), respect.unordered.factors = 'order', verbose = FALSE, seed = 123 ) tuning_grid$rmse[i] <- sqrt(fit$prediction.error) } ``` ] ] .pull-right[ - using `\(p \times 10 = 800\)` trees is sufficient - may increase if we decrease mtry or sample size ```r ggplot(tuning_grid, aes(trees, rmse)) + geom_line(size = 1) ``` <img src="slides-source_files/figure-html/implementation-trees-plot-1.png" style="display: block; margin: auto;" /> ] ] --- # Tuning .scrollable90[ .pull-left[ .font120[Tuning grid] - lower end of mtry range due to correlation - lower end of node size range due to correlation - sampling w/o replacement due to categorical features <br><br> .code80[ ```r hyper_grid <- expand.grid( mtry = floor(n_features * c(.05, .15, .25, .333, .4)), min.node.size = c(1, 3, 5), replace = c(TRUE, FALSE), sample.fraction = c(.5, .63, .8), rmse = NA ) # number of hyperparameter combinations nrow(hyper_grid) ## [1] 90 head(hyper_grid) ## mtry min.node.size replace sample.fraction rmse ## 1 4 1 TRUE 0.5 NA ## 2 12 1 TRUE 0.5 NA ## 3 20 1 TRUE 0.5 NA ## 4 26 1 TRUE 0.5 NA ## 5 32 1 TRUE 0.5 NA ## 6 4 3 TRUE 0.5 NA ``` ] ] .pull-right[ .font120[Grid search execution] - This search grid took ~2.5 minutes - __caret__ provides grid search [
<i class="fas fa-external-link-alt faa-FALSE animated " style=" color:blue;"></i>
](https://topepo.github.io/caret/model-training-and-tuning.html) - For larger data, use __H2O__'s random grid search with early stopping [
<i class="fas fa-external-link-alt faa-FALSE animated " style=" color:blue;"></i>
](http://docs.h2o.ai/h2o/latest-stable/h2o-docs/grid-search.html) .code80[ ```r for(i in seq_len(nrow(hyper_grid))) { # fit model for ith hyperparameter combination fit <- ranger( formula = Sale_Price ~ ., data = ames_train, num.trees = 1000, * mtry = hyper_grid$mtry[i], * min.node.size = hyper_grid$min.node.size[i], * replace = hyper_grid$replace[i], * sample.fraction = hyper_grid$sample.fraction[i], verbose = FALSE, seed = 123, respect.unordered.factors = 'order', ) # export OOB error hyper_grid$rmse[i] <- sqrt(fit$prediction.error) } ``` ] ] ] --- # Tuning results .pull-left[ Our top 10 models: - have ~1% or higher performance improvement over the default model - sample w/o replacement - primarily include higher sampling - primarily use mtry = 20 or 26 - node size appears non-influential I would follow this up with an additional grid search that focuses on: - mtry values around 15, 18, 21, 24 - sample fraction around 63%, 70%, 75%, 80% .center[.blue[_using too high of sampling fraction without replacement runs the risk of overfitting to your training data!_]] ] .pull-right[ ```r default_rmse <- sqrt(fit_default$prediction.error) hyper_grid %>% arrange(rmse) %>% mutate(perc_gain = (default_rmse - rmse) / default_rmse * 100) %>% head(10) ## mtry min.node.size replace sample.fraction rmse perc_gain ## 1 20 1 FALSE 0.80 24474.19 1.7257766 ## 2 20 5 FALSE 0.80 24485.64 1.6798126 ## 3 20 3 FALSE 0.80 24555.24 1.4003421 ## 4 26 3 FALSE 0.80 24612.76 1.1693799 ## 5 20 1 FALSE 0.63 24613.27 1.1673219 ## 6 26 1 FALSE 0.80 24615.42 1.1586911 ## 7 26 5 FALSE 0.80 24617.94 1.1485760 ## 8 20 3 FALSE 0.63 24642.72 1.0490463 ## 9 12 1 FALSE 0.80 24659.98 0.9797534 ## 10 12 3 FALSE 0.80 24702.53 0.8089133 ``` ] --- # Feature Importance <a href="https://koalaverse.github.io/vip/index.html"><img src="images/logo-vip.png" class="pdp-hex", align="right"></a> .pull-left[ Once you find your optimal model: - re-run with the respective hyperparameters - include `importance` parameter - crank up the # of trees to ensure stable vi estimates ```r fit_final <- ranger( formula = Sale_Price ~ ., data = ames_train, * num.trees = 2000, mtry = 20, min.node.size = 1, sample.fraction = .8, replace = FALSE, * importance = 'permutation', respect.unordered.factors = 'order', verbose = FALSE, seed = 123 ) ``` ] .pull-right[ ```r vip(fit_final, num_features = 15) ``` <img src="slides-source_files/figure-html/unnamed-chunk-39-1.png" style="display: block; margin: auto;" /> ] --- # Feature Effects <a href="https://bgreenwell.github.io/pdp/index.html"><img src="images/pdp-logo.png" class="pdp-hex", align="right"></a> Partial dependence plots (PDPs), Individual Conditional Expectation (ICE) curves, and other approaches allow us to understand how _important_ variables influence our model's predictions: .pull-left[ .center.bold[PDP: Overall Home Quality] ```r fit_final %>% partial(pred.var = "Overall_Qual", train = as.data.frame(ames_train)) %>% autoplot() ``` <img src="slides-source_files/figure-html/pdp-overall-qual-1.png" style="display: block; margin: auto;" /> ] .pull-right[ .center.bold[ICE: Overall Home Quality] ```r fit_final %>% partial(pred.var = "Overall_Qual", train = as.data.frame(ames_train), ice = TRUE) %>% autoplot(alpha = 0.05, center = TRUE) ``` <img src="slides-source_files/figure-html/ice-overall-qual-1.png" style="display: block; margin: auto;" /> ] --- # Feature Effects <a href="https://bgreenwell.github.io/pdp/index.html"><img src="images/pdp-logo.png" class="pdp-hex", align="right"></a> Partial dependence plots (PDPs), Individual Conditional Expectation (ICE) curves, and other approaches allow us to understand how _important_ variables influence our model's predictions: .pull-left[ .center.bold[PDP: Above Ground SqFt] ```r fit_final %>% partial(pred.var = "Gr_Liv_Area", train = as.data.frame(ames_train)) %>% autoplot() ``` <img src="slides-source_files/figure-html/pdp-ground-liv-1.png" style="display: block; margin: auto;" /> ] .pull-right[ .center.bold[ICE: Above Ground SqFt] ```r fit_final %>% partial(pred.var = "Gr_Liv_Area", train = as.data.frame(ames_train), ice = TRUE) %>% autoplot(alpha = 0.05, center = TRUE) ``` <img src="slides-source_files/figure-html/ice-ground-liv-1.png" style="display: block; margin: auto;" /> ] --- # Feature Effects <a href="https://bgreenwell.github.io/pdp/index.html"><img src="images/pdp-logo.png" class="pdp-hex", align="right"></a> Interaction between two influential variables: .pull-left[ ```r fit_final %>% partial( pred.var = c("Gr_Liv_Area", "Year_Built"), train = as.data.frame(ames_train) ) %>% plotPartial( zlab = "Sale_Price", levelplot = FALSE, drape = TRUE, colorkey = FALSE, screen = list(z = 50, x = -60) ) ``` ] .pull-right[ <img src="slides-source_files/figure-html/interaction-pdp-output-1.png" style="display: block; margin: auto;" /> ] .center[.content-bog-gray[.bold[Read more about machine learning interpretation [here](https://christophm.github.io/interpretable-ml-book/)]]] --- # Random Forest Summary .pull-left[ ### Strengths
<img src="https://emojis.slackmojis.com/emojis/images/1471045870/910/rock.gif?1471045870" style="height:1em; width:auto; "/>
- .green[Competitive performance.] - .green[Remarkably good "out-of-the box"] (very little tuning required). - .green[Built-in validation set] (don't need to sacrifice data for extra validation). - .green[Typically does not overfit.] - .green[Robust to outliers.] - .green[Handles missing data] (imputation not required). - .green[Provide automatic feature selection.] - .green[Minimal preprocessing required.] ] -- .pull-right[ ### Weaknesses
<img src="https://emojis.slackmojis.com/emojis/images/1471045885/967/wtf.gif?1471045885" style="height:1.25em; width:auto; "/>
- .red[Although accurate, often cannot compete with the accuracy of advanced boosting algorithms.] - .red[Can become slow on large data sets.] - .red[Less interpretable] (although this is easily addressed with various tools such as variable importance, partial dependence plots, LIME, etc.). ] --- # Random Forest Summary .pull-left[ <img src="images/leo-breiman.jpg" width="70%" height="70%" style="display: block; margin: auto;" /> ] .pull-right[ <br><br> .font120[ _"Take the output of random forests not as absolute truth, but as smart computer generated guesses that may be helpful in leading to a deeper understanding of the problem."_ --- Leo Breiman ] ] --- # Learning More .pull-left[ <img src="images/isl.jpg" width="55%" height="55%" style="display: block; margin: auto;" /> .center.font150[[Book website](http://www-bcf.usc.edu/~gareth/ISL/)] ] .pull-right[ <img src="images/esl.jpg" width="55%" height="55%" style="display: block; margin: auto;" /> .center.font150[[Book website](https://web.stanford.edu/~hastie/ElemStatLearn/)] ] --- class: clear, center, middle background-image: url(images/raising-hand.gif) background-size: cover <br><br><br><br><br><br><br><br><br><br><br><br> .font300.bold[
Questions?
]