Python写了几年,总有一些技巧是你偶然发现、然后后悔没早点知道的。它们不是那种"教科书第一章"的基础语法,也不是需要读源码才能理解的冷门黑魔法——而是介于两者之间,真正能让你每天少写几十行、少调半小时bug的效率工具。
全部基于标准库和内置语法,不需要安装任何第三方包。
大部分人对f-string的认知停留在f"Hello {name}",但实际上f-string能做的不止这些。
数字格式化。你肯定遇到过这种需求:金额显示千分位、百分比对齐、补零。以前要用.format()写一堆格式说明符,现在f-string一行搞定:
amount = 1234567.89success_rate = 0.8732task_id = 42print(f"¥{amount:,.2f}") # ¥1,234,567.89print(f"成功率:{success_rate:.1%}") # 成功率:87.3%print(f"任务#{task_id:05d}") # 任务#00042:,.2f里的,是千分位分隔符,.2f是保留两位小数。:.1%自动乘100并加百分号。:05d补零到5位。比.format()简洁一倍。
还有个容易被忽略的功能——=调试语法。Python 3.8之后,想在打印日志时同时显示变量名和值,以前要写f"x={x}",现在直接用f"{x=}":
x = 3 + 4print(f"{x=}") # x=7name = "张三"print(f"{name.upper()=}") # name.upper()='张三'表达式结果、变量名、值全部自动输出,调试时省掉大量手写字符串的时间。
zip()最常见的用法是并行迭代两个列表。但它的能力远不止于此。
转置矩阵。*matrix把每行解包为独立参数传给zip,一行完成矩阵转置:
matrix = [ [1, 2, 3], [4, 5, 6], [7, 8, 9]]transposed = list(zip(*matrix))print(transposed) # [(1, 4, 7), (2, 5, 8), (3, 6, 9)]分组配对——把两个列表做成字典:
keys = ["name", "age", "city"]values = ["李四", 28, "深圳"]user = dict(zip(keys, values))print(user) # {'name': '李四', 'age': 28, 'city': '深圳'}反向操作——zip配合解包把元组列表拆回独立列表:
pairs = [("A", 1), ("B", 2), ("C", 3)]letters, numbers = zip(*pairs)print(letters) # ('A', 'B', 'C')print(numbers) # (1, 2, 3)注意一个坑:列表长度不一致时,zip会静默截断到最短。用itertools.zip_longest可以填充缺失值:
from itertools import zip_longesta = [1, 2, 3, 4]b = ["a", "b"]for x, y in zip_longest(a, b, fillvalue="缺失"): print(f"{x} -> {y}")# 1 -> a# 2 -> b# 3 -> 缺失# 4 -> 缺失统计列表里每个元素的出现次数,collections.Counter是最佳选择:
from collections import Counterwords = ["apple", "banana", "apple", "cherry", "banana", "apple"]count = Counter(words)print(count) # Counter({'apple': 3, 'banana': 2, 'cherry': 1})print(count.most_common(2)) # [('apple', 3), ('banana', 2)]Counter之间还能做加减运算:
a = Counter(["a", "b", "a", "c"])b = Counter(["a", "b", "b", "d"])print(a + b) # Counter({'a': 3, 'b': 3, 'c': 1, 'd': 1})print(a - b) # Counter({'a': 1, 'c': 1}) # 减到0以下的key会被删除print(a & b) # Counter({'a': 1, 'b': 1}) # 取交集(最小值)print(a | b) # Counter({'a': 2, 'b': 2, 'c': 1, 'd': 1}) # 取并集(最大值)分析日志文件时这四则运算完全是降维打击。比如比较今天和昨天的访问来源差异,today_counter - yesterday_counter一行出结果。
多变量交换不用临时变量:
a, b = b, a*收集头尾以外的所有元素:
first, *middle, last = [1, 2, 3, 4, 5, 6]print(first, middle, last) # 1 [2, 3, 4, 5] 6嵌套解包处理复杂数据结构:
data = [("张三", 28, ("Python", "Go")), ("李四", 32, ("Java", "Rust"))]for name, age, (lang1, lang2) in data: print(f"{name}({age}岁)主攻{lang1}和{lang2}")# 张三(28岁)主攻Python和Go# 李四(32岁)主攻Java和Rust**合并字典——完美处理"默认配置+用户覆盖"模式:
defaults = {"timeout": 30, "retries": 3}user_config = {"timeout": 60, "debug": True}merged = {**defaults, **user_config}print(merged) # {'timeout': 60, 'retries': 3, 'debug': True}用any()检查是否有至少一个元素满足条件,用生成器表达式而不是列表——短路求值,不会遍历完整个序列:
logs = ["INFO: 服务启动", "WARNING: 内存使用率80%", "ERROR: 连接超时", "INFO: 请求完成"]has_error = any("ERROR"in line for line in logs)print(has_error) # Trueall_info = all("INFO"in line for line in logs)print(all_info) # False更实用的场景——表单批量验证:
defvalidate_form(data): checks = [ len(data.get("username", "")) >= 3,"@"in data.get("email", ""), data.get("age", 0) >= 18, data.get("password", "") != "" ]ifnot all(checks): failed = [desc for desc, ok in zip( ["用户名过短", "邮箱格式错误", "未满18岁", "密码为空"], checks ) ifnot ok]raise ValueError(f"验证失败: {', '.join(failed)}")returnTrueform = {"username": "ab", "email": "xxx", "age": 16, "password": ""}try: validate_form(form)except ValueError as e: print(e) # 验证失败: 用户名过短, 邮箱格式错误, 未满18岁, 密码为空Python 3.9引入了|和|=合并字典,比{**a, **b}更直观:
a = {"x": 1, "y": 2}b = {"y": 3, "z": 4}merged = a | bprint(merged) # {'x': 1, 'y': 3, 'z': 4}a |= b # 原地更新翻转序列的最简写法:
text = "Python"print(text[::-1]) # nohtyP隔位取样:
data = list(range(1, 11)) # [1,2,3,4,5,6,7,8,9,10]print(data[::2]) # [1, 3, 5, 7, 9] 奇数位置print(data[1::2]) # [2, 4, 6, 8, 10] 偶数位置切片赋值——批量插入、替换、删除:
items = ["a", "d", "e"]items[1:1] = ["b", "c"] # 在索引1的位置插入print(items) # ['a', 'b', 'c', 'd', 'e']items[1:3] = ["X", "Y", "Z"] # 替换子序列print(items) # ['a', 'X', 'Y', 'Z', 'd', 'e']items[1:4] = [] # 删除子序列print(items) # ['a', 'd', 'e']集合运算做业务分析,比写SQL join快得多:
today_users = {"张三", "李四", "王五", "赵六"}yesterday_users = {"李四", "赵六", "孙七"}new_users = today_users - yesterday_users # 新用户lost_users = yesterday_users - today_users # 流失用户retained = today_users & yesterday_users # 留存用户all_users = today_users | yesterday_users # 全部用户symmetric_diff = today_users ^ yesterday_users # 非重叠用户print(f"新增{len(new_users)}人: {new_users}") # 新增2人print(f"流失{len(lost_users)}人: {lost_users}") # 流失1人print(f"留存{len(retained)}人: {retained}") # 留存2人一个容易被忽略的事实:set的查找是O(1),列表是O(n)。10万级别的集合运算,Python原生set通常比数据库往返快。
字典推导式和集合推导式:
# 翻转kv映射name_to_id = {"张三": 101, "李四": 102, "王五": 103}id_to_name = {v: k for k, v in name_to_id.items()}print(id_to_name) # {101: '张三', 102: '李四', 103: '王五'}# 提取所有唯一首字母names = ["Alice", "Bob", "Charlie", "Anna", "David"]initials = {name[0] for name in names}print(initials) # {'A', 'B', 'C', 'D'}带条件的推导式——一行完成过滤+转换:
scores = [85, 92, 45, 78, 60, 55, 88]pass_scores_squared = [s**2for s in scores if s >= 60]print(pass_scores_squared) # [7225, 8464, 6084, 3600, 7744]嵌套推导式注意:两层是上限,超过就老老实实写for循环。
key参数可以接收任意函数来自定义排序逻辑:
words = ["watermelon", "apple", "banana", "kiwi", "grape"]sorted_words = sorted(words, key=len)print(sorted_words) # ['kiwi', 'apple', 'grape', 'banana', 'watermelon']operator模块的itemgetter比lambda更快:
from operator import itemgetterusers = [ {"name": "张三", "score": 85}, {"name": "李四", "score": 92}, {"name": "王五", "score": 78},]sorted_users = sorted(users, key=itemgetter("score"), reverse=True)print([u["name"] for u in sorted_users]) # ['李四', '张三', '王五']# 多级排序:元组逐元素比较sorted_users = sorted(users, key=lambda u: (-u["score"], u["name"]))列表推导式返回整个列表,生成器表达式返回惰性迭代器,几乎不占额外内存:
# 列表推导:一次性生成1000万个整数 → 约80MB内存# nums_list = [i**2 for i in range(10_000_000)]# 生成器:按需产生squares = (i**2for i in range(10_000_000))print(next(squares)) # 0print(next(squares)) # 1管道式处理——每个环节都是生成器,内存占用恒定:
from collections import Counterdefread_lines(filename):with open(filename, encoding="utf-8") as f:for line in f:yield line.strip()deffilter_errors(lines):for line in lines:if"ERROR"in line:yield linedefextract_ips(error_lines):import refor line in error_lines: match = re.search(r"\d{1,3}\.\d{1,3}\.\d{1,3}\.\d{1,3}", line)if match:yield match.group()# 管道串联:每步惰性计算,即使日志文件几百MB,内存也不变lines = read_lines("server.log")errors = filter_errors(lines)ips = extract_ips(errors)top_ips = Counter(ips).most_common(10)functools.lru_cache让函数记住之前的计算结果——暴力递归变记忆化DP,代码零改动:
from functools import lru_cache@lru_cache(maxsize=128)deffibonacci(n):if n < 2:return nreturn fibonacci(n - 1) + fibonacci(n - 2)print(fibonacci(35)) # 9227465,第一次约0.5秒print(fibonacci(35)) # 第二次瞬间完成print(fibonacci.cache_info())# CacheInfo(hits=33, misses=36, maxsize=128, currsize=36)真实业务场景——缓存配置中心查询,避免重复网络请求:
from functools import lru_cache@lru_cache(maxsize=16)defget_db_config(db_name: str):"""从配置中心拉取数据库连接信息"""import time time.sleep(0.5) # 模拟网络延迟return {"host": f"{db_name}-master.internal", "port": 5432}config1 = get_db_config("orders") # 0.5秒config2 = get_db_config("orders") # 0秒,命中缓存注意:被装饰的函数参数必须是可哈希的(列表不行),原始数据频繁变化时不适合用。
os.path函数嵌套的写法可读性差,pathlib链式调用,语义清晰:
from pathlib import Path# 路径拼接直接用 / 运算符dir_path = Path.home() / "projects" / "data"dir_path.mkdir(parents=True, exist_ok=True)file_path = dir_path / "result.csv"basename = file_path.stem # 不用写os.path函数# 一次性读写content = file_path.read_text(encoding="utf-8")file_path.write_text("新内容", encoding="utf-8")# 遍历目录for csv_file in dir_path.glob("*.csv"): print(csv_file.name)# 递归搜索all_md = list(dir_path.rglob("*.md"))else只在try没有抛出异常时执行——把"成功才做的逻辑"和"可能失败的操作"精确分离:
defprocess_file(path): file = Nonetry: file = open(path, "r")except FileNotFoundError: print(f"文件不存在: {path}")returnNoneexcept PermissionError: print(f"无权限读取: {path}")returnNoneelse:# 只在打开成功后才执行 content = file.read()return len(content)finally:# 无论如何都执行——资源清理放这里最安全if file: file.close()多个上下文管理器可以用ExitStack避免嵌套地狱:
from contextlib import ExitStackdefprocess_multiple_files(filenames):with ExitStack() as stack: files = []for name in filenames:try: f = stack.enter_context(open(name)) files.append(f)except FileNotFoundError: print(f"跳过缺失文件: {name}")return [f.readline().strip() for f in files]海象运算符(:=)在推导式中复用计算结果,避免重复调用昂贵函数:
# 不使用walrus:get_user_score被调了两次top_users = [ user["name"]for user in all_usersif get_user_score(user["id"]) > 90and get_user_score(user["id"]) < 100]# 使用walrus:一次调用,结果复用top_users = [ namefor user in all_usersif (score := get_user_score(user["id"])) > 90and score < 100and (name := user["name"])]while循环中同时读取和判断:
import requestsdeffetch_all(url_template): result = [] page = 1while (resp := requests.get(url_template.format(page), timeout=10)).ok and resp.json(): result.extend(resp.json()) page += 1return resultfunctools.partial把函数的部分参数提前"冻结",生成专用函数:
from functools import partialdefsend_notification(user, message, urgency="normal", channel="email"): print(f"[{urgency.upper()}] 通过{channel}发给{user}: {message}")send_urgent = partial(send_notification, urgency="critical")send_sms = partial(send_notification, channel="sms", urgency="normal")send_urgent("张三", "服务器CPU 99%") # [CRITICAL] 通过email发给张三: ...send_sms("李四", "验证码: 123456") # [NORMAL] 通过sms发给李四: ...GUI回调场景里partial特别有用——传参数给callback但不能加括号,partial刚好解决:
defon_click(button_name, event): print(f"用户点击了{button_name}")save_callback = partial(on_click, "保存按钮")cancel_callback = partial(on_click, "取消按钮")startswith和endswith可以接收元组,一次检查多个前缀:
log_line = "ERROR: 数据库连接超时"if log_line.startswith(("ERROR", "FATAL", "CRITICAL")): print(f"严重日志: {log_line}")defis_code_file(filename):return filename.endswith((".py", ".js", ".ts", ".go", ".rs"))removeprefix和removesuffix(Python 3.9+)只移除真正的前缀/后缀,不影响中间内容,比replace安全:
tricky = ".gz.tar.gz"print(tricky.removesuffix(".gz")) # .gz.tar ← 正确print(tricky.replace(".gz", "")) # .tar ← 错了!把中间的.gz也删了translate配合maketrans做批量字符替换,比多次replace快得多:
trans = str.maketrans({"'": "''", "\\": "\\\\", '"': '\\"'})dangerous = "O'Brien's \"data\"\\path"safe = dangerous.translate(trans)print(safe) # O''Brien''s \"data\"\\pathproduct——笛卡尔积,测试用例生成神器:
from itertools import productbrowsers = ["Chrome", "Firefox", "Safari"]os_list = ["Windows", "Mac", "Linux"]resolutions = ["1920x1080", "1366x768"]test_cases = list(product(browsers, os_list, resolutions))print(len(test_cases)) # 18 种组合combinations——做A/B测试分组:
from itertools import combinationsfeatures = ["大按钮", "红色边框", "动态动画", "倒计时文案"]for combo in combinations(features, 3): print(combo)chain——平铺多个可迭代对象,避免嵌套循环:
from itertools import chainactive = ["张三", "李四"]inactive = ["王五"]pending = ["赵六"]all_users = list(chain(active, inactive, pending))print(all_users) # ['张三', '李四', '王五', '赵六']# 平铺嵌套列表nested = [["a", "b"], ["c"], ["d", "e", "f"]]flat = list(chain.from_iterable(nested))print(flat) # ['a', 'b', 'c', 'd', 'e', 'f']@contextmanager装饰器让你几行代码写出自己的with逻辑:
import timefrom contextlib import contextmanager@contextmanagerdeftimer(description: str): start = time.perf_counter()yield elapsed = time.perf_counter() - start print(f"{description}: {elapsed:.3f}秒")with timer("数据处理"): data = [i**2for i in range(5_000_000)]# 输出: 数据处理: 0.237秒临时切换工作目录——进入with块自动切过去,退出自动还原:
import osfrom pathlib import Pathfrom contextlib import contextmanager@contextmanagerdefwork_in(path): old = os.getcwd() os.chdir(path)try:yieldfinally: os.chdir(old)with work_in("/tmp"): Path("test.txt").write_text("临时文件")# 出了with块,自动回到原目录代码里散落着status = "active"、if status == "canceled"这种魔法字符串——拼错一个字母,bug就藏在运行时。enum让常量变得类型安全:
from enum import Enum, autoclassOrderStatus(Enum): PENDING = auto() CONFIRMED = auto() SHIPPED = auto() DELIVERED = auto() CANCELED = auto()defprocess_order(order_id: str, status: OrderStatus):if status == OrderStatus.CANCELED: print(f"订单{order_id}已取消,启动退款流程")elif status == OrderStatus.DELIVERED: print(f"订单{order_id}已签收,启动评价提醒")else: print(f"订单{order_id}状态: {status.name}")process_order("ORD-10001", OrderStatus.SHIPPED)# 可以从名字或值还原status = OrderStatus["CANCELED"] # 按名字取status = OrderStatus(3) # 按值取,得到SHIPPED一个好的工程实践:一旦一个字符串常量会在三个以上地方出现,就应该用enum替代。三个月后回来改代码的你自己也会感谢你。
标准库比你想象的要强大得多——很多时候你以为需要装第三方包的问题,collections、itertools、functools和pathlib已经帮你搞定了。关键在于:知道它们的存在,并且在合适的场景下用出来。