16 Overview

Tree-based models are a class of nonparametric algorithms that work by partitioning the feature space into a number of smaller (non-overlapping) regions with similar response values using a set of splitting rules. Such divide-and-conquer methods can produce simple rules that are easy to interpret and visualize with tree diagrams. As we’ll see, decision trees offer many benefits; however, they typically lack in predictive performance compared to more complex algorithms like neural networks and MARS. However, more advanced decision tree ensemble algorithms — like bagging and random forests — combine together many decision trees and can perform quite well.

16.1 Learning objectives

By the end of this module you should be able to:

  • Explain how decision tree models partition data and how the depth of a tree impacts performance.
  • Train, fit, tune and assess decision tree models.
  • Explain and apply decision tree ensemble algorithms such as bagging and random forests.

16.2 Estimated time requirement

The estimated time to go through the module lessons is about:

  • Reading only: 3-4 hours
  • Reading + videos: 4 hours

16.3 Tasks

  • Work through the 3 module lessons.
  • Upon finishing each lesson take the associated lesson quizzes on Canvas. Be sure to complete the lesson quiz no later than the due date listed on Canvas.
  • Check Canvas for this week’s lab, lab quiz due date, and any additional content (i.e. in-class material)