Stick With It: Data Analysis with Python

One of the most enjoyable parts of this learning journey so far has been feeling my brain properly kick into gear again. This course gave me that — the kind of challenge that forces you to slow down, focus, and really think.

Weirdly, it also brought back memories of university. Some of the topics (like Linear Regression and Polynomial Expressions) reminded me of lectures during my Chemical Engineering degree. The difference? This time I actually understood how and why they’re useful. The practical nature of the course made everything click.

The labs were a real highlight — I used a dataset on used cars from 1985 to train simple models that could predict car prices based on different features. The models themselves were basic, but it got me thinking: If this works for cars… what else could I predict if I had the right data?

What the Course Covered

We started by diving into Python’s core data libraries: NumPy, Pandas, Scikit-learn, and Matplotlib/Seaborn for visualisation. From there, we built basic data pipelines and explored how to structure and manipulate data in a way that machine learning models can actually understand.

The project was centred on predicting car prices using 26 different variables — things like engine size, horsepower, and fuel type. Sounds simple, but there was a lot to unpack.

Wrangling the Data (a.k.a. Making Sense of the Mess)

There’s a saying I hear all the time at work: “Crap in = crap out.” That couldn’t be more relevant here.

If your data’s a mess, your model will be too. So this part of the course focused on cleaning and preparing the data properly — and I actually found it really satisfying.

Here’s some of what I learned:

Fixing data types (e.g. making sure price is treated as a number, not a string)
Handling missing values by replacing them with means or most frequent values
Normalising columns using min-max scaling so everything is on the same scale
Using binning to group values into ranges (super useful for analysis)
Converting categories (like fuel type) into dummy variables so models can work with them

Turning messy data into something clean and structured is weirdly rewarding. Definitely one of my favourite parts.

Exploring the Data

Next up was Exploratory Data Analysis (EDA) — the step where you poke around and try to figure out what’s actually going on.

We used Python tools to:

Summarise and inspect datasets (describe() and info() are your friends)
Build box plots, scatter plots, and heatmaps
Group and pivot data to explore relationships (e.g. how drivetrain affects price)
Run ANOVA tests and correlation analysis

Key learning here: correlation doesn’t mean causation. Just because engine size and price go up together doesn’t mean one causes the other.

Building and Testing Models

This is where the real fun started. I built both simple and multiple linear regression models using Scikit-learn, and then tested them using:

R-squared to check how well the model explains variance
Mean Squared Error for accuracy
Residual plots to see how close predictions were to real values
Polynomial regression for more complex relationships

I also finally got my head around overfitting vs underfitting, train/test splits, and cross-validation. It was a lot at first, but once I saw it play out in the labs, it started to make sense.

How This Applies to the Real World

What really hit home during this course is how useful these skills are — not just in AI, but across industries.

Some ideas that came to mind:

Predictive pricing in e-commerce, insurance, or real estate
Customer segmentation based on behavioural data
Trend analysis in employee or customer feedback
Building stakeholder dashboards using tools like Seaborn or Dash

Once you know how to structure and analyse data, the use cases are endless.

Final Thoughts

This module took me longer than expected, but I stuck with it — and I’m so glad I did. It’s given me a much stronger foundation, not just in Python, but in how to approach data more strategically.

I’ve also realised that data analysis is just as much about asking the right questions as it is about writing good code. And when the answers start to come together, it’s seriously rewarding.

Next up: Machine Learning with Python. I’m looking forward to getting deeper into model building and seeing how these techniques scale up.

Thanks for reading — and if you’re also learning, keep going. The frustrating parts are where the real progress happens. As always, I’d love to hear your thoughts or feedback!