Overview#

In the last module we discussed general guidelines for first interacting with a new data set along with performing various data transformation tasks. In this module we want to build on those activities by learning how to clean and tidy our data. Moreover, it’s rare that a data analysis involves only a single table of data. Typically you have many tables of data, and you must combine them to answer the questions that you’re interested in. Collectively, multiple tables of data are called relational data because it is the relations, not just the individual datasets, that are important. We’ll also spend some time learning how to work with text data.

Learning objectives#

By the end of this module you should be able to:

  • Explain the basic concepts of “tidy” data.

  • Perform data tidying tasks with Python such as reshaping, splitting, and combining data along with handling missing values.

  • Describe and apply different join operations for relational datasets.

  • Perform basic character string manipulations and apply regular expressions to identify string patterns.

Estimated time requirement#

The estimated time to go through the module lessons is about 3 hours.