Learn SQL fundamentals using DuckDB and Python. A practical series for Python developers who want to level up their data skills.
SQL is the language of data — and if you work in Python, learning it will make you better at nearly everything you already do. This series walks you through SQL from first principles using DuckDB, a modern, local database that runs inside Python with no server setup required. No credentials, no Docker, no configuration. Just Python and data.
Each post builds on the last. By the end you’ll be writing queries, filtering and aggregating data, joining related tables, and using advanced features like CTEs and subqueries — all explained from a Python developer’s perspective.
Last week, we talked about the superpower of relational databases, the ability to join tables to make data storage more efficient. In fact, we have covered much of the syntax that you would use on a daily basis already. But SQL’s simplicity hides surprising flexibility. You can model data in many ways, and you can often get the same results with different syntax.
The art of SQL is optimizing your queries so that they run well. This comes with experience, so I encourage you to start playing around with the queries and data we are working with. We will see some of this flexibility with today’s topic: subqueries.
...
JOINs Explained for Python Developers
So far in this series we have covered all the core SQL clauses: SELECT, FROM, WHERE, GROUP BY, HAVING, and ORDER BY. We can do quite a bit with those tools, but we have been working with a single table. SQL is the language of relational databases, and it is time to talk about the relational part.
JOINs connect related tables. It’s like looking up values in a Python dictionary or merging pandas DataFrames, except that the database handles the matching. Today we are going to see how this works, but first we need a little setup.
...
HAVING: Filtering Grouped Results
When I first encountered HAVING, I thought, “Why do we need this? It’s just like WHERE.”
Then I tried filtering on COUNT() and hit a strange error. That’s when it clicked: HAVING filters after grouping, not before. It’s what you need when WHERE won’t work because the thing you want to filter on doesn’t exist until after GROUP BY runs.
Let’s start with a simple query of customer count by city. But there are a lot of cities and we only care about those with more than ten customers.
...
GROUP BY: Aggregating Your Data
Last week, we learned to use WHERE to efficiently return only the rows that we want from a database. But what if you want to summarize the data more efficiently?
It turns out that you can have the database do the summarization for you with the GROUP BY keyword.
Like Python’s collections.Counter or pandas groupby(), SQL’s GROUP BY lets you summarize data by category. It allows you to count, sum, and average across groups.
...
WHERE: Filtering Your Data
We have come a long way in the past couple of months, working through the core SQL keywords. So far, we can SELECT columns, specify FROM where our data lives, and ORDER BY to sort results.
That is quite a lot, and today we are going to unlock the real power of SQL by giving you the ability to filter your results before they are returned from the server.
...
ORDER BY: Sorting Your Results
We now have a firm grasp on how to use SELECT: Choosing Your Columns and FROM: Where Your Data Lives to tell the database where to find data and how to format the columns when it returns it. With this knowledge, we can pull back all of the data from a table in a database.
There is still a problem with the data that we receive from a query. It can come back in any order. It may return in the same order 9 times out of 10, but there is no guarantee that it will come back in the same order next time. This happens because database engines optimize execution plans based on factors like data volume, indexes, and available memory, and those optimizations can change between queries.
...
SELECT: Choosing Your Columns
You have written SELECT * many times by now. It works, but it’s a bit like asking for everything in the fridge when you just want milk. This week, we will look at the SELECT clause and see that it does more than just pick columns. It transforms your output.
Previously, we looked closely at the FROM clause, which tells the database where the query will find the data. The SELECT clause defines which columns will be returned, and you can reshape data on the way out.
...
FROM: Where Your Data Lives
We have come a long way over the last five posts, but we are just getting started. So far, we have explored concepts that will help us along our journey, but haven’t talked a whole lot about SQL itself.
We have seen some basic SQL that uses a couple of keywords, SELECT and FROM, but we haven’t looked very closely at what these do. Let’s do that now, starting with FROM.
...
SQL Thinks in Sets, Not Loops
Remember back when we started, I mentioned SQL was difficult because of how I was thinking? I was asking it to perform steps to return data. This didn’t work because SQL uses a declarative syntax that describes the final result. Until I realized this, SQL felt hard. Let’s explore this concept further.
Working with lists and loops When you work with lists in Python, one of the first tools you reach for is the for loop. The for loop is great because it lets you take every item in the list and apply some logic to it, one at a time. It might look something like this.
...
Don't forget to save! Persisting your DuckDB database
I still remember losing schoolwork and video game progress because I forgot to save. That sinking feeling when hours of work vanish because you were too caught up in the flow to pause and save.
In our last post, we created a customer database and generated 500 rows of fake data. Our in-memory database has the same problem: when Python exits, all that data vanishes:
import duckdb con = duckdb.connect(':memory:') con.execute("CREATE TABLE customers (id INT, name VARCHAR)") con.execute("INSERT INTO customers VALUES (1, 'Alice')") print(con.execute("SELECT * FROM customers").fetchdf()) # Script ends... and the data is gone forever A database is supposed to provide persistent storage, isn’t it? Let’s fix that with one small change.
...