Real programs spend a lot of time transforming data — filtering out invalid items, reshaping structures, computing summaries, and grouping records. Python provides several tools for these tasks.
Filtering data
Use a comprehension with an if clause to keep only items that match a condition:
users = [
{"name": "Ada", "active": True},
{"name": "Bob", "active": False},
{"name": "Cia", "active": True},
]
active_users = [u for u in users if u["active"]]
# [{"name": "Ada", "active": True}, {"name": "Cia", "active": True}]
The filter() function does the same thing with a function:
def is_active(user):
return user["active"]
active_users = list(filter(is_active, users))
Comprehensions are more idiomatic in modern Python. Use filter() only when you already have a named function that does the check.
Mapping data
Transform each item in a collection:
user_names = [user["name"] for user in users]
# ["Ada", "Bob", "Cia"]
This is often called “mapping” — applying a function to each item. The map() function does the same:
user_names = list(map(lambda u: u["name"], users))
lambda creates an anonymous function inline — lambda u: u["name"] is equivalent to def f(u): return u["name"]. Lambdas are convenient for short, one-off functions passed to map(), sorted(), and other functions that accept a callable.
Comprehensions are more idiomatic in modern Python. Use map() only when you already have a named function.
Reducing data to a single value
Use sum(), min(), max(), or len() for common reductions:
prices = [10, 20, 30]
sum(prices) # 60
min(prices) # 10
max(prices) # 30
len(prices) # 3
For custom reductions, use sum() with a generator expression:
ages = [36, 28, 42]
total = sum(ages) # 106
For more complex folding, use functools.reduce():
from functools import reduce
numbers = [1, 2, 3, 4]
product = reduce(lambda acc, n: acc * n, numbers, 1)
# 24
reduce() takes a function, an iterable, and an initial value. It is powerful but often less readable than a simple loop:
product = 1
for n in numbers:
product *= n
Use whichever is clearer for your specific case.
Grouping data
Grouping items by a key is a common task. The itertools.groupby() function works but requires sorted data. A simpler approach uses a dictionary:
from collections import defaultdict
users = [
{"name": "Ada", "role": "admin"},
{"name": "Bob", "role": "user"},
{"name": "Cia", "role": "admin"},
]
by_role = defaultdict(list)
for user in users:
by_role[user["role"]].append(user)
# defaultdict(<class 'list'>, {
# "admin": [{"name": "Ada", ...}, {"name": "Cia", ...}],
# "user": [{"name": "Bob", ...}],
# })
defaultdict(list) creates a new empty list automatically for any key that does not exist yet. This avoids the need to check and initialize keys manually. defaultdict is from Python’s collections module in the standard library — you will see it covered in more detail in the standard library lesson.
Counting items
Use a dict or collections.Counter to count occurrences:
from collections import Counter
roles = [u["role"] for u in users]
role_counts = Counter(roles)
# Counter({"admin": 2, "user": 1})
role_counts.most_common() # [("admin", 2), ("user", 1)]
Counter is the cleanest approach when you need frequencies.
Finding items
Use next() with a generator expression to find the first match:
first_admin = next((u for u in users if u["role"] == "admin"), None)
The second argument to next() is a default returned when no match is found. Without it, next() raises StopIteration — the built-in exception that signals the end of an iterator.
To find all matches, use a list comprehension:
admins = [u for u in users if u["role"] == "admin"]
Sorting with keys
sorted() and list.sort() accept a key function that determines the sort value:
users = [
{"name": "Cia", "age": 42},
{"name": "Ada", "age": 36},
{"name": "Bob", "age": 28},
]
by_name = sorted(users, key=lambda u: u["name"])
by_age = sorted(users, key=lambda u: u["age"])
Sort by multiple fields with a tuple key:
by_role_then_name = sorted(users, key=lambda u: (u["role"], u["name"]))
Python’s sort is stable — items with equal keys keep their original order. This makes multi-pass sorting predictable.
Chaining transformations
Comprehensions and generator expressions let you chain operations cleanly:
def process_users(users):
return sorted(
[
{"name": u["name"].title(), "email": u["email"].lower()}
for u in users
if u.get("active")
],
key=lambda u: u["name"],
)
This filters, transforms, and sorts in one readable expression. For more complex pipelines, break it into named steps:
def process_users(users):
active = [u for u in users if u.get("active")]
cleaned = [
{"name": u["name"].title(), "email": u["email"].lower()}
for u in active
]
return sorted(cleaned, key=lambda u: u["name"])
Each step has a name and a clear purpose. This is easier to debug — you can inspect any intermediate variable.
What to carry forward
- filter with
[x for x in items if condition] - map with
[transform(x) for x in items] - reduce with
sum(),min(),max(), or a loop - group with
defaultdict(list) - count with
collections.Counter - find with
next((x for x in items if condition), default) - sort with
key=functions - break complex transformations into named steps for clarity
These patterns cover most data transformation tasks. The next lesson covers how to split code across multiple files using modules.