🐍 推导式 — Python 最优雅的语法糖
🕐 预计用时:2-3 小时 | 🎯 目标:掌握列表/字典/集合推导式、生成器表达式
📖 今日目录
1. 什么是推导式?
推导式是 Python 的"一行创建术"——用一行代码完成循环 + 创建列表/字典/集合。
# 传统写法:5 行
squares = []
for x in range(10):
if x % 2 == 0:
squares.append(x ** 2)
# 推导式写法:1 行
squares = [x ** 2 for x in range(10) if x % 2 == 0]
# 两者完全等价!但推导式更简洁、更 Pythonic
2. 列表推导式(基础)
📖 基本语法
# 语法:[表达式 for 变量 in 可迭代对象]
# 平方
squares = [x ** 2 for x in range(1, 6)]
print(squares) # [1, 4, 9, 16, 25]
# 转大写
words = ["hello", "world", "python"]
upper = [w.upper() for w in words]
print(upper) # ['HELLO', 'WORLD', 'PYTHON']
# 字符串转整数
str_nums = ["1", "2", "3", "4", "5"]
nums = [int(s) for s in str_nums]
print(nums) # [1, 2, 3, 4, 5]
# 提取字典值
users = [{"name": "张三", "age": 25}, {"name": "李四", "age": 30}]
names = [u["name"] for u in users]
print(names) # ['张三', '李四']
🔍 推导式拆解
# 列表推导式
result = [x ** 2 for x in range(5)]
# 等价于
result = []
for x in range(5):
result.append(x ** 2)
# 结构解析:
# [x ** 2] → 表达式(每个元素变成什么)
# for x → 循环变量
# in range(5) → 可迭代对象
3. 列表推导式(条件筛选)
📖 带 if 条件
# 语法:[表达式 for 变量 in 可迭代对象 if 条件]
# 筛选偶数
evens = [x for x in range(20) if x % 2 == 0]
print(evens) # [0, 2, 4, 6, 8, 10, 12, 14, 16, 18]
# 筛选长单词
words = ["apple", "hi", "banana", "ok", "cherry"]
long_words = [w for w in words if len(w) > 3]
print(long_words) # ['apple', 'banana', 'cherry']
# 筛选正数
nums = [3, -1, 4, -5, 9, -2, 6]
positive = [n for n in nums if n > 0]
print(positive) # [3, 4, 9, 6]
# 筛选成年人
people = [("张三", 25), ("李四", 17), ("王五", 30), ("赵六", 15)]
adults = [name for name, age in people if age >= 18]
print(adults) # ['张三', '王五']
📖 带 if-else(三元表达式)
# 语法:[表达式1 if 条件 else 表达式2 for 变量 in 可迭代对象]
# 注意:if-else 放在 for 前面!
# 奇偶标记
nums = [1, 2, 3, 4, 5]
labels = ["偶" if x % 2 == 0 else "奇" for x in nums]
print(labels) # ['奇', '偶', '奇', '偶', '奇']
# 成绩评级
scores = [85, 62, 90, 45, 78]
grades = ["及格" if s >= 60 else "不及格" for s in scores]
print(grades) # ['及格', '及格', '及格', '不及格', '及格']
# 处理缺失值
data = [1, None, 3, None, 5]
clean = [x if x is not None else 0 for x in data]
print(clean) # [1, 0, 3, 0, 5]
💡 if 的位置决定含义:
[x for x in range(10) if x > 5] — 筛选条件(放在最后)
["偶" if x % 2 == 0 else "奇" for x in range(5)] — 转换条件(放在前面)
筛选用 if,转换用 if-else,位置不同!
4. 列表推导式(嵌套循环)
📖 双层循环
# 语法:[表达式 for 变量1 in 序列1 for 变量2 in 序列2]
# 笛卡尔积
pairs = [(x, y) for x in range(3) for y in range(3)]
print(pairs)
# [(0,0), (0,1), (0,2), (1,0), (1,1), (1,2), (2,0), (2,1), (2,2)]
# 扑克牌
suits = ["♠", "♥", "♦", "♣"]
ranks = ["A", "2", "3", "4", "5", "6", "7", "8", "9", "10", "J", "Q", "K"]
cards = [f"{s}{r}" for s in suits for r in ranks]
print(cards[:6]) # ['♠A', '♠2', '♠3', '♠4', '♠5', '♠6']
# 扁平化嵌套列表
nested = [[1, 2, 3], [4, 5], [6, 7, 8, 9]]
flat = [item for sublist in nested for item in sublist]
print(flat) # [1, 2, 3, 4, 5, 6, 7, 8, 9]
📖 嵌套推导式(矩阵操作)
# 矩阵转置
matrix = [
[1, 2, 3],
[4, 5, 6],
[7, 8, 9],
]
transposed = [[row[i] for row in matrix] for i in range(3)]
print(transposed)
# [[1, 4, 7], [2, 5, 8], [3, 6, 9]]
# 创建单位矩阵
identity = [[1 if i == j else 0 for j in range(4)] for i in range(4)]
for row in identity:
print(row)
# [1, 0, 0, 0]
# [0, 1, 0, 0]
# [0, 0, 1, 0]
# [0, 0, 0, 1]
# 矩阵展平
matrix = [[1, 2, 3], [4, 5, 6], [7, 8, 9]]
flat = [x for row in matrix for x in row]
print(flat) # [1, 2, 3, 4, 5, 6, 7, 8, 9]
⚠️ 嵌套推导式的可读性警告:
超过 2 层的嵌套推导式很难读懂,建议改用普通循环。
[x for a in A for b in B for c in C if ...] ← 太复杂了!
原则:推导式超过 2 层嵌套,就用 for 循环。
5. 字典推导式
📖 基本语法
# 语法:{键表达式: 值表达式 for 变量 in 可迭代对象}
# 数字和平方
squares = {x: x**2 for x in range(1, 6)}
print(squares) # {1: 1, 2: 4, 3: 9, 4: 16, 5: 25}
# 反转字典
original = {"a": 1, "b": 2, "c": 3}
reversed_d = {v: k for k, v in original.items()}
print(reversed_d) # {1: 'a', 2: 'b', 3: 'c'}
# 列表转字典
names = ["张三", "李四", "王五"]
indexed = {name: i for i, name in enumerate(names)}
print(indexed) # {'张三': 0, '李四': 1, '王五': 2}
📖 带条件的字典推导式
# 筛选
scores = {"张三": 85, "李四": 92, "王五": 78, "赵六": 95, "孙七": 60}
passed = {name: score for name, score in scores.items() if score >= 80}
print(passed) # {'张三': 85, '李四': 92, '赵六': 95}
# 转换
celsius = {"北京": 5, "上海": 12, "广州": 25, "哈尔滨": -10}
fahrenheit = {city: c * 9/5 + 32 for city, c in celsius.items()}
print(fahrenheit) # {'北京': 41.0, '上海': 53.6, ...}
# 过滤并转换
data = {"a": 10, "b": -5, "c": 20, "d": -3, "e": 15}
result = {k: v * 2 for k, v in data.items() if v > 0}
print(result) # {'a': 20, 'c': 40, 'e': 30}
📖 实用场景
# 合并字典并处理
dict1 = {"a": 1, "b": 2}
dict2 = {"b": 3, "c": 4}
merged = {k: dict1.get(k, 0) + dict2.get(k, 0) for k in set(dict1) | set(dict2)}
print(merged) # {'a': 1, 'b': 5, 'c': 4}
# 字符频率统计
text = "hello world"
freq = {c: text.count(c) for c in set(text) if c != ' '}
print(freq) # {'h': 1, 'e': 1, 'l': 3, 'o': 2, 'w': 1, 'r': 1, 'd': 1}
# 配置转换
raw = {"DEBUG": "true", "PORT": "8080", "MAX_CONN": "100"}
config = {
k.lower(): (v.lower() == "true" if v.lower() in ("true", "false") else
int(v) if v.isdigit() else v)
for k, v in raw.items()
}
print(config) # {'debug': True, 'port': 8080, 'max_conn': 100}
6. 集合推导式
# 语法:{表达式 for 变量 in 可迭代对象}(和列表推导式一样,只是 [] 换成 {})
# 自动去重
nums = [1, 2, 2, 3, 3, 3, 4, 4, 4, 4]
unique_squares = {x**2 for x in nums}
print(unique_squares) # {16, 1, 4, 9}
# 提取不重复的字符
text = "hello world"
unique_chars = {c for c in text if c != ' '}
print(unique_chars) # {'h', 'e', 'l', 'o', 'w', 'r', 'd'}
# 不重复的单词长度
words = ["apple", "hi", "banana", "ok", "cherry", "go"]
lengths = {len(w) for w in words}
print(lengths) # {2, 5, 6}
# 从嵌套数据中提取不重复的标签
articles = [
{"title": "A", "tags": ["python", "web"]},
{"title": "B", "tags": ["python", "data"]},
{"title": "C", "tags": ["web", "data", "ml"]},
]
all_tags = {tag for article in articles for tag in article["tags"]}
print(all_tags) # {'python', 'web', 'data', 'ml'}
💡 集合推导式 vs 列表推导式:
需要去重 → 集合推导式 {}
保持顺序 → 列表推导式 []
键值映射 → 字典推导式 {k: v}
7. 生成器表达式
生成器表达式是"惰性版的列表推导式"——不立即计算,用到时才计算。
# 列表推导式:立即计算,占用内存
squares_list = [x**2 for x in range(1000000)]
print(type(squares_list)) # <class 'list'>
print(len(squares_list)) # 1000000(全部存在内存中)
# 生成器表达式:惰性计算,节省内存
squares_gen = (x**2 for x in range(1000000))
print(type(squares_gen)) # <class 'generator'>
# 每次只计算一个值,内存占用极小
📖 生成器的使用方式
# 生成器只能遍历一次!
gen = (x**2 for x in range(5))
print(next(gen)) # 0
print(next(gen)) # 1
print(next(gen)) # 4
for val in gen:
print(val) # 9, 16(继续遍历,不会重复)
# 生成器遍历完后为空
print(list(gen)) # [](已经用完了)
# 重新创建才能再次使用
gen = (x**2 for x in range(5))
print(list(gen)) # [0, 1, 4, 9, 16]
📖 生成器的实际用法
# 1. 直接传给 sum/max/min(不用转列表)
total = sum(x**2 for x in range(100))
print(total) # 328350
max_val = max(x**2 for x in range(-10, 11))
print(max_val) # 100
# 2. 传给 any/all(短路优化)
has_even = any(x % 2 == 0 for x in [1, 3, 5, 8, 7])
print(has_even) # True(找到 8 就停了)
all_positive = all(x > 0 for x in [1, 2, 3, 4, 5])
print(all_positive) # True
# 3. 大数据处理(节省内存)
def read_large_file(filename):
"""逐行读取大文件(生成器)"""
with open(filename, "r") as f:
for line in f:
yield line.strip()
# 处理百万行文件也不会内存溢出
# lines = read_large_file("huge_file.txt")
# long_lines = (line for line in lines if len(line) > 100)
📋 四种推导式对比
| | | |
|---|
| [x for x in ...] | | |
| {k: v for ...} | | |
| {x for x in ...} | | |
| (x for x in ...) | | |
8. 推导式 vs 循环:何时用哪个?
# ✅ 适合用推导式的场景
# 1. 简单的转换
upper = [w.upper() for w in words]
# 2. 简单的筛选
evens = [x for x in nums if x % 2 == 0]
# 3. 简单的映射
squares = {x: x**2 for x in range(10)}
# 4. 调用函数时(作为参数)
total = sum(x**2 for x in range(100))
# ❌ 不适合用推导式的场景
# 1. 有复杂逻辑(多行处理)
# 不好
result = [complex_transform(x) if x > 0 else handle_negative(x, extra) for x in data]
# 好
result = []
for x in data:
if x > 0:
result.append(complex_transform(x))
else:
result.append(handle_negative(x, extra))
# 2. 有副作用(打印、写文件)
# 不好
[print(x) for x in range(10)] # 创建了无用的 [None, None, ...] 列表
# 好
for x in range(10):
print(x)
# 3. 超过 2 层嵌套
# 不好
result = [f(a, b, c) for a in A for b in B[a] for c in C if g(a, b, c)]
# 好
result = []
for a in A:
for b in B[a]:
for c in C:
if g(a, b, c):
result.append(f(a, b, c))
🎯 推导式使用原则:
1. 一行能看懂 → 用推导式
2. 需要换行才能看懂 → 考虑用循环
3. 超过 2 层嵌套 → 一定用循环
4. 有副作用(print/write)→ 用循环
9. 实战练习
🎯 练习 1:数据清洗管道
# 原始数据
raw_data = [
" 张三, 85 ",
"李四, N/A",
" 王五, 92 ",
"赵六, ",
"孙七, 78",
", 88",
]
# 推导式清洗管道
# 步骤1: 去除空白
stripped = [line.strip() for line in raw_data]
# 步骤2: 过滤空行
non_empty = [line for line in stripped if line and not line.startswith(",")]
# 步骤3: 解析为元组
parsed = [tuple(item.strip() for item in line.split(",")) for line in non_empty]
# 步骤4: 过滤无效数据
valid = [(name, int(score)) for name, score in parsed
if score.isdigit() and name]
print("清洗后的数据:")
for name, score in valid:
print(f" {name}: {score}分")
# 一行版(挑战)
clean = [
(parts[0].strip(), int(parts[1].strip()))
for line in raw_data
if (parts := line.strip().split(",")) and len(parts) == 2
and parts[0].strip() and parts[1].strip().isdigit()
]
print(clean)
🎯 练习 2:成绩报表生成器
# 学生成绩数据
students = {
"张三": {"数学": 85, "英语": 90, "Python": 88},
"李四": {"数学": 92, "英语": 78, "Python": 95},
"王五": {"数学": 78, "英语": 85, "Python": 82},
"赵六": {"数学": 95, "英语": 92, "Python": 90},
"孙七": {"数学": 60, "英语": 65, "Python": 70},
}
# 1. 每人平均分(字典推导式)
avgs = {name: round(sum(scores.values()) / len(scores), 1)
for name, scores in students.items()}
print("平均分:", avgs)
# 2. 按平均分排序(推导式 + sorted)
ranked = {name: avg for name, avg in
sorted(avgs.items(), key=lambda x: -x[1])}
print("排名:", ranked)
# 3. 优秀学生(平均分 >= 80)
excellent = {name: avg for name, avg in avgs.items() if avg >= 80}
print("优秀:", excellent)
# 4. 每科最高分(字典推导式 + 生成器表达式)
subjects = list(list(students.values())[0].keys())
best_per_subject = {
sub: max((name, scores[sub]) for name, scores in students.items())
for sub in subjects
}
print("各科最高:", best_per_subject)
# 5. 不及格科目统计
fails = {name: [sub for sub, score in scores.items() if score < 60]
for name, scores in students.items()
if any(score < 60 for score in scores.values())}
print("不及格:", fails)
🎯 练习 3:文本分析器
import string
text = """
Python is a high-level programming language.
Python is easy to learn and powerful.
Many developers love Python for its simplicity.
Python supports multiple programming paradigms.
"""
# 1. 清洗并拆分单词
words = [
w.strip(string.punctuation).lower()
for line in text.strip().split("\n")
for w in line.split()
if w.strip(string.punctuation)
]
# 2. 词频统计(字典推导式)
unique_words = set(words)
freq = {w: words.count(w) for w in unique_words}
# 3. 按频率排序
sorted_freq = dict(sorted(freq.items(), key=lambda x: -x[1]))
print("📊 词频统计(前10):")
for word, count in list(sorted_freq.items())[:10]:
bar = "█" * count
print(f" {word:15s} | {bar} ({count})")
# 4. 按首字母分组
from itertools import groupby
words_sorted = sorted(set(words))
by_letter = {
letter: [w for w in group]
for letter, group in groupby(words_sorted, key=lambda w: w[0])
}
print("\n📖 按首字母分组:")
for letter, word_list in sorted(by_letter.items()):
print(f" {letter}: {', '.join(word_list)}")
# 5. 长度分布
length_dist = {len(w): sum(1 for word in words if len(word) == len(w))
for w in set(words) for _ in [1]}
# 更简洁的写法
from collections import Counter
len_counter = Counter(len(w) for w in words)
print("\n📏 单词长度分布:")
for length in sorted(len_counter.keys()):
print(f" {length}字母: {'█' * len_counter[length]} ({len_counter[length]})")
10. 今日小结
| | |
|---|
| [expr for x in seq] | |
| [expr for x in seq if cond] | |
| [a if cond else b for x in seq] | |
| [expr for x in A for y in B] | |
| {k: v for x in seq} | |
| {expr for x in seq} | |
| (expr for x in seq) | |
🧠 记忆口诀:
方括号列表,花括号字典集合,圆括号生成器。
if 放后面是筛选,if-else 放前面是转换。
嵌套两层还行,三层就用循环。
生成器省内存,但只能用一次。
🔮 预告: Day 20 综合练习 — 词频统计程序 + 联系人通讯录(文件持久化)。把 Day13-Day19 学的全部用起来!