Learn SQL fundamentals using DuckDB and Python. A practical series for Python developers who want to level up their data skills.
SQL is the language of data — and if you work in Python, learning it will make you better at nearly everything you already do. This series walks you through SQL from first principles using DuckDB, a modern, local database that runs inside Python with no server setup required. No credentials, no Docker, no configuration. Just Python and data.
Each post builds on the last. By the end you’ll be writing queries, filtering and aggregating data, joining related tables, and using advanced features like CTEs and subqueries — all explained from a Python developer’s perspective.
I learned SQL very early in my career. At the time, I didn’t understand why, and for the first month or so, it didn’t make sense to me. The syntax didn’t resemble any language I had seen before, and it employed concepts with which I was unfamiliar. This all made SQL seem scary, and oddly enough, SQL hasn’t changed much in the past 40 years, which makes it even more of an oddity.
...
Zero-Setup SQL: Run your first SQL query in under 5 minutes with DuckDB
Have you ever tried setting up a database server just to learn SQL? Docker containers, admin credentials… Forget all that. Let me show you how to go from zero to running SQL in under 5 minutes.
Why are we using DuckDB to learn SQL? No setup - Just import and start using it Real SQL - PostgreSQL compatible syntax so what you learn transfers Fast enough - Handles millions of rows on your laptop Python-native - Works well with lists, DataFrames, CSV files It is perfect for learning without infrastructure headaches.
...
Generate Practice Data with faker
Last week, we got DuckDB running with three hardcoded rows. That got us started—but three rows? You can eyeball that. Let’s generate hundreds of realistic customers and build a dataset worth exploring.
Python has the perfect tool: faker. It’s a library that generates realistic fake data—names, emails, addresses, dates—anything you’d find in a real database. Let’s use it to build a dataset we can explore for the rest of this series.
...
Don't forget to save! Persisting your DuckDB database
I still remember losing schoolwork and video game progress because I forgot to save. That sinking feeling when hours of work vanish because you were too caught up in the flow to pause and save.
In our last post, we created a customer database and generated 500 rows of fake data. Our in-memory database has the same problem—when Python exits, all that data vanishes:
import duckdb con = duckdb.connect(':memory:') con.execute("CREATE TABLE customers (id INT, name VARCHAR)") con.execute("INSERT INTO customers VALUES (1, 'Alice')") print(con.execute("SELECT * FROM customers").fetchdf()) # Script ends... and the data is gone forever A database is supposed to provide persistent storage, isn’t it? Let’s fix that with one small change.
...
SQL Thinks in Sets, Not Loops
Remember back when we started, I mentioned SQL was difficult because of how I was thinking? I was asking it to perform steps to return data. This didn’t work because SQL uses a declarative syntax that describes the final result. Until I realized this, SQL felt hard. Let’s explore this concept further.
Working with lists and loops When you work with lists in Python, one of the first tools you reach for is the for loop. The for loop is great because it lets you take every item in the list and apply some logic to it, one at a time. It might look something like this.
...
FROM: Where Your Data Lives
We have come a long way over the last five posts, but we are just getting started. So far, we have explored concepts that will help us along our journey, but haven’t talked a whole lot about SQL itself.
We have seen some basic SQL that uses a couple of keywords, SELECT and FROM, but we haven’t looked very closely at what these do. Let’s do that now, starting with FROM.
...
SELECT: Choosing Your Columns
You have written SELECT * many times by now. It works, but it’s a bit like asking for everything in the fridge when you just want milk. This week, we will look at the SELECT clause and see that it does more than just pick columns. It transforms your output.
Previously, we looked closely at the FROM clause, which tells the database where the query will find the data. The SELECT clause defines which columns will be returned, and you can reshape data on the way out.
...
ORDER BY: Sorting Your Results
We now have a firm grasp on how to use SELECT: Choosing Your Columns and FROM: Where Your Data Lives to tell the database where to find data and how to format the columns when it returns it. With this knowledge, we can pull back all of the data from a table in a database.
There is still a problem with the data that we receive from a query. It can come back in any order. It may return in the same order 9 times out of 10, but there is no guarantee that it will come back in the same order next time. This happens because database engines optimize execution plans based on factors like data volume, indexes, and available memory, and those optimizations can change between queries.
...