16 Overview

Tree-based models are a class of nonparametric algorithms that work by partitioning the feature space into a number of smaller (non-overlapping) regions with similar response values using a set of splitting rules. Such divide-and-conquer methods can produce simple rules that are easy to interpret and visualize with tree diagrams. As we’ll see, decision trees offer many benefits; however, they typically lack in predictive performance compared to more complex algorithms like neural networks and MARS. However, more advanced decision tree ensemble algorithms — like bagging and random forests — combine together many decision trees and can perform quite well.

16.1 Learning objectives

By the end of this module you should be able to:

Explain how decision tree models partition data and how the depth of a tree impacts performance.
Train, fit, tune and assess decision tree models.
Explain and apply decision tree ensemble algorithms such as bagging and random forests.

16.2 Estimated time requirement

The estimated time to go through the module lessons is about:

Reading only: 3-4 hours
Reading + videos: 4 hours

16.3 Tasks

Work through the 3 module lessons.
Upon finishing each lesson take the associated lesson quizzes on Canvas. Be sure to complete the lesson quiz no later than the due date listed on Canvas.
Check Canvas for this week’s lab, lab quiz due date, and any additional content (i.e. in-class material)