1 + 2
3
This book is currently in active development.
Welcome to BANA 4080: Introduction to Data Mining with Python. This course provides an immersive, hands-on introduction to the tools and techniques used in modern data science. You’ll learn how to explore, analyze, and model data using Python and gain practical experience through labs, projects, and real-world datasets.
Along the way you will develop core skills in data wrangling, exploratory data analysis, data visualization, and even key machine learning techniques such as supervised, unsupervised, and deep learning model. We’ll even take a quick detour into generative AI and large language models (LLMs). Throughout this process we’ll use real-world data and experiential learning to guide your learning.
By the end of the course, students will be able to:
This book is designed for upper-level undergraduate students who may have little to no prior programming experience but are eager to explore the world of data science using Python. It’s also an ideal resource for early-career professionals or students in analytics, business, or quantitative fields who are looking to upskill—whether by learning Python for the first time or by building a deeper understanding of how to explore, visualize, and model data. The content is structured to be accessible and hands-on, guiding readers step-by-step through the core tools and techniques used in modern data-driven problem solving.
This book is broken into 14 modules, each aligned with a week of instruction in the BANA 4080 course. Every module introduces key concepts or techniques in data science, combining concise explanations with interactive, hands-on code examples. Whether you’re reading independently or following along with the course, the modular structure makes it easy to work through the content at your own pace, week by week.
Module & Topics | Summary of Concepts Covered |
---|---|
1. Fundamentals I | Course overview, coding environment setup, Python basics |
2. Fundamentals II | Using Jupyter notebooks, data structures, Python libraries |
3. Pandas DataFrames | Importing data, DataFrame fundamentals, subsetting DataFrames |
4. Data Wrangling I | Cleaning, filtering, aggregating, and merging tabular data |
5. Data Wrangling II | Working with datetime, text data, and joining data like SQL |
6. Data Visualization | Creating plots using matplotlib and seaborn, exploratory data analysis |
7. Writing Efficient Python Code | Control flow, defining functions, loops, list comprehensions |
8. Introduction to Machine Learning | Overview of ML, features/labels, train/test split, scikit-learn basics |
9. Unsupervised Learning | Clustering (k-means), PCA, dimensionality reduction, t-SNE visualization |
10. Supervised Learning | Regression and classification models: linear, logistic regression |
11. Deep Learning & Neural Networks | Neural networks using Keras; simple classification tasks |
12. Generative AI & Prompt Engineering | Working with LLMs, OpenAI API, prompt design, building AI agents |
13. Final Project Kickoff | Scoping and starting a capstone data science project |
14. Final Project Presentations & Wrap-Up | Presenting project findings, course reflection, and next steps |
The following typographical conventions are used in this book:
inline code
: monospaced highlighted text indicates functions or other commands that could be typed literally by the user,1 + 2
3
In addition to the general text used throughout, you will notice the following code chunks with images:
Signifies a tip or suggestion
Signifies a general note
Signifies a warning or caution
This book is built around an open-source Python-based data science ecosystem. While the list of tools evolves with the field, the examples and exercises in this book are designed to work with Python 3.x, currently using…
# Display the Python version
import sys
print("Python version:", sys.version.split()[0])
Python version: 3.13.5
…and are executed within Jupyter Notebooks, which provide an interactive, beginner-friendly environment for writing and running code.
Throughout the modules, we use foundational Python libraries such as:
Each module explicitly introduces the relevant software and libraries, explains how and why they are used, and provides reproducible code so that readers can follow along and generate similar results in their own environment.
TBD