作者:还怪好嘞 专栏:Python全栈修炼之路 标签:Python、字典、集合、哈希表、哈希冲突、时间复杂度
前言
字典(dict)和集合(set)是Python中最强大的数据结构之一。它们基于哈希表实现,提供了平均O(1)时间复杂度的查找、插入和删除操作。理解哈希原理,能让你更好地利用这些数据结构,写出更高效的代码。本文将深入剖析哈希表原理,并通过实战项目展示它们的强大威力。
一、知识点详解
1.1 字典基础操作
1.1.1 创建字典
# 1. 字面量创建empty = {}person = {"name": "Alice", "age": 25, "city": "Beijing"}# 2. dict()构造函数from_pairs = dict([("a", 1), ("b", 2)])from_kwargs = dict(name="Bob", age=30)from_zip = dict(zip(["x", "y", "z"], [1, 2, 3]))# 3. 字典推导式squares = {x: x**2 for x in range(6)}# {0: 0, 1: 1, 2: 4, 3: 9, 4: 16, 5: 25}# 4. fromkeys() - 创建有默认值的字典keys = ["a", "b", "c"]d = dict.fromkeys(keys, 0) # {'a': 0, 'b': 0, 'c': 0}# 注意:默认值是同一个对象的引用!
1.1.2 访问与修改
# 访问元素person = {"name": "Alice", "age": 25}print(person["name"]) # "Alice"# print(person["gender"]) # KeyError!# 安全访问print(person.get("gender")) # Noneprint(person.get("gender", "N/A")) # "N/A"# 设置值person["age"] = 26 # 修改person["gender"] = "female" # 新增# setdefault - 不存在则设置默认值person.setdefault("country", "China") # 设置并返回值person.setdefault("name", "Bob") # 已存在,返回原值# update - 批量更新person.update({"age": 27, "job": "Engineer"})person.update(age=28, hobby="Reading")
1.1.3 删除操作
person = {"name": "Alice", "age": 25, "city": "Beijing"}# pop - 删除并返回值age = person.pop("age") # 25# person.pop("gender") # KeyError!# popitem - 删除并返回最后插入的项(Python 3.7+)item = person.popitem() # ('city', 'Beijing')# del - 删除键值对del person["name"]# clear - 清空person.clear() # {}
1.1.4 遍历字典
person = {"name": "Alice", "age": 25, "city": "Beijing"}# 遍历键(默认)for key in person: print(key)# 遍历键值对for key, value in person.items(): print(f"{key}: {value}")# 遍历值for value in person.values(): print(value)# 遍历键for key in person.keys(): print(key)# 遍历时修改(需要复制)for key in list(person.keys()): if key == "age": del person[key]
1.2 字典方法速查表
| | | |
|---|
d[key] | | | d["name"] |
d.get(key, default) | | | d.get("x", 0) |
d[key] = value | | | d["x"] = 1 |
del d[key] | | | del d["x"] |
d.pop(key) | | | d.pop("x") |
d.popitem() | | | d.popitem() |
d.update(other) | | | d.update({"a":1}) |
d.keys() | | | list(d.keys()) |
d.values() | | | list(d.values()) |
d.items() | | | list(d.items()) |
key in d | | | "x" in d |
1.3 高级字典类型
1.3.1 defaultdict —— 自动默认值
from collections import defaultdict# 基本用法# 当访问不存在的键时,自动调用工厂函数创建默认值counts = defaultdict(int) # 默认0lists = defaultdict(list) # 默认[]sets = defaultdict(set) # 默认set()# 词频统计words = ["apple", "banana", "apple", "cherry", "banana", "apple"]for word in words: counts[word] += 1 # 无需检查key是否存在print(dict(counts)) # {'apple': 3, 'banana': 2, 'cherry': 1}# 分组pairs = [("a", 1), ("b", 2), ("a", 3), ("b", 4)]grouped = defaultdict(list)for key, value in pairs: grouped[key].append(value)print(dict(grouped)) # {'a': [1, 3], 'b': [2, 4]}# 自定义工厂函数def default_user(): return {"name": "", "age": 0, "scores": []}users = defaultdict(default_user)users["alice"]["name"] = "Alice" # 自动创建默认用户
1.3.2 Counter —— 计数器
from collections import Counter# 创建计数器words = ["apple", "banana", "apple", "cherry", "banana", "apple"]c = Counter(words)print(c) # Counter({'apple': 3, 'banana': 2, 'cherry': 1})# 常用方法print(c.most_common(2)) # [('apple', 3), ('banana', 2)]print(c["apple"]) # 3print(c["orange"]) # 0(不报错)# 数学运算c1 = Counter(a=3, b=1)c2 = Counter(a=1, b=2)print(c1 + c2) # Counter({'a': 4, 'b': 3})print(c1 - c2) # Counter({'a': 2})print(c1 & c2) # Counter({'a': 1, 'b': 1}) 取最小print(c1 | c2) # Counter({'a': 3, 'b': 2}) 取最大# 更新计数c.update(["apple", "apple"]) # apple计数+2c.subtract(["banana"]) # banana计数-1
1.3.3 OrderedDict —— 有序字典(Python 3.7+后普通dict已足够)
from collections import OrderedDict# Python 3.7+ 普通字典已经保持插入顺序# OrderedDict 的特殊功能:# 1. move_to_end - 移动键到末尾od = OrderedDict([("a", 1), ("b", 2), ("c", 3)])od.move_to_end("a") # a移到最后od.move_to_end("b", last=False) # b移到最前# 2. popitem(last=False) - 弹出第一个first = od.popitem(last=False)# 3. 相等性比较考虑顺序od1 = OrderedDict([("a", 1), ("b", 2)])od2 = OrderedDict([("b", 2), ("a", 1)])print(od1 == od2) # False(顺序不同)d1 = {"a": 1, "b": 2}d2 = {"b": 2, "a": 1}print(d1 == d2) # True(普通字典不考虑顺序)
1.4 集合操作
1.4.1 集合基础
# 创建集合empty = set() # 注意:{}创建的是空字典!nums = {1, 2, 3, 4, 5}from_list = set([1, 2, 2, 3, 3, 3]) # {1, 2, 3}# 添加删除nums.add(6)nums.remove(3) # KeyError if not existsnums.discard(10) # 安全删除,不存在不报错item = nums.pop() # 随机删除并返回nums.clear()# 集合推导式evens = {x for x in range(20) if x % 2 == 0}
1.4.2 集合运算
a = {1, 2, 3, 4, 5}b = {4, 5, 6, 7, 8}# 交集print(a & b) # {4, 5}print(a.intersection(b)) # {4, 5}# 并集print(a | b) # {1, 2, 3, 4, 5, 6, 7, 8}print(a.union(b)) # {1, 2, 3, 4, 5, 6, 7, 8}# 差集print(a - b) # {1, 2, 3}print(a.difference(b)) # {1, 2, 3}# 对称差集(异或)print(a ^ b) # {1, 2, 3, 6, 7, 8}print(a.symmetric_difference(b)) # {1, 2, 3, 6, 7, 8}# 子集/超集print(a.issubset(b)) # Falseprint(a.issuperset({1, 2})) # Trueprint(a.isdisjoint({6, 7})) # True(无交集)# 原地修改a.intersection_update(b) # a = a & ba.update({9, 10}) # a = a | {9, 10}
集合运算可视化:
并集 (|) 交集 (&) 差集 (-) 对称差集 (^) ┌─────┐ ┌─────┐ ┌─────┐ ┌─────┐ │ A B │ │ A │ │ A │ │ A B │ │ ◆ │ │ ◆ │ │ │ │ │ └─────┘ └─────┘ └─────┘ └─────┘ A ∪ B A ∩ B A - B A △ B
二、底层原理深度解析
2.1 哈希表原理
核心概念: 哈希表(Hash Table)通过哈希函数将键映射到数组索引,实现O(1)查找。
哈希表结构示意:键 → 哈希函数 → 哈希值 → 取模 → 数组索引 → 存储位置示例:"apple" → hash() → 123456789 → % 8 → 5 → 存储到索引5"banana" → hash() → 987654321 → % 8 → 1 → 存储到索引1
Python字典的内存布局:
┌─────────────────────────────────────────┐│ 字典对象头部 ││ - 引用计数、类型指针 ││ - 条目数(used) ││ - 容量(capacity) │├─────────────────────────────────────────┤│ 哈希表(索引数组) ││ ┌────┬────┬────┬────┬────┬────┬────┐ ││ │ -1 │ 1 │ -1 │ 0 │ -1 │ -1 │ 2 │ ││ └────┴────┴────┴────┴────┴────┴────┘ ││ 0 1 2 3 4 5 6 ││ (-1表示空,其他数字是条目数组索引) │├─────────────────────────────────────────┤│ 条目数组(存储键值对) ││ ┌─────────┬─────────┬─────────┐ ││ │ hash: │ hash: │ hash: │ ││ │ key: │ key: │ key: │ ││ │ value: │ value: │ value: │ ││ │ idx: 0 │ idx: 1 │ idx: 2 │ ││ └─────────┴─────────┴─────────┘ │└─────────────────────────────────────────┘
2.2 哈希冲突与解决
哈希冲突: 不同键产生相同哈希值(或相同索引)。
Python的解决方案——开放寻址法:
# 探测序列(伪代码)def find_slot(key): hash_val = hash(key) idx = hash_val % capacity while table[idx] is not empty: if table[idx].key == key: return idx # 找到 # 冲突!探测下一个位置 idx = (idx * 5 + 1) % capacity # 二次探测变体 return idx # 空槽位
冲突处理可视化:
初始状态:索引: 0 1 2 3 4 5 6 7 [] [] [] [] [] [] [] []插入"a"(hash=5):索引: 0 1 2 3 4 5 6 7 [] [] [] [] [] [a] [] []插入"b"(hash=5,冲突!):探测序列:5→6索引: 0 1 2 3 4 5 6 7 [] [] [] [] [] [a] [b] []插入"c"(hash=6,冲突!):探测序列:6→7索引: 0 1 2 3 4 5 6 7 [] [] [] [] [] [a] [b] [c]
2.3 Python 3.7+ 字典有序性保证
历史演进:
紧凑字典实现:
传统字典(稀疏):索引数组:[空, 条目2, 空, 条目0, 空, 空, 条目1, 空]条目数组:分散存储紧凑字典(Python 3.6+):索引数组:[空, 2, 空, 0, 空, 空, 1, 空] (指向条目数组索引)条目数组:[条目0, 条目1, 条目2] (按插入顺序紧密排列)优势:1. 更少的内存占用(条目数组连续)2. 缓存友好(顺序遍历更快)3. 自然保持插入顺序
2.4 字典查找O(1)底层证明
理论分析:
假设:- 哈希函数分布均匀- 负载因子 α = n/m ≤ 2/3(Python默认)- 使用开放寻址法查找时间 = 计算哈希 + 探测次数成功查找的平均探测次数:- 线性探测:≈ 1/2 * (1 + 1/(1-α))- 二次探测:≈ -ln(1-α)/α- Python的变体:接近1次当 α = 2/3 时:- 平均探测次数 < 2- 时间复杂度 ≈ O(1)
实际测试:
import timeimport randomdef benchmark_lookup(): sizes = [1000, 10000, 100000, 1000000] print("字典查找性能测试") print("=" * 50) print(f"{'大小':<15}{'总时间(ms)':<15}{'单次(ns)':<15}") print("-" * 50) for size in sizes: # 创建字典 d = {i: f"value_{i}" for i in range(size)} # 随机查找 keys = list(d.keys()) random.shuffle(keys) # 计时 start = time.perf_counter() for key in keys: _ = d[key] elapsed = (time.perf_counter() - start) * 1000 per_op = (elapsed * 1000000) / size # 纳秒 print(f"{size:<15}{elapsed:<15.3f}{per_op:<15.1f}")benchmark_lookup()# 典型输出(证明O(1)):# 大小 总时间(ms) 单次(ns)# 1000 0.234 234.0# 10000 2.156 215.6# 100000 21.890 218.9# 1000000 234.567 234.6# 时间随大小线性增长,单次操作基本恒定
2.5 扩容机制
# Python字典扩容策略:# - 初始容量:8# - 扩容时机:当 used > 2/3 * capacity# - 扩容倍数:约4倍(小字典)或2倍(大字典)import sysdef show_dict_growth(): d = {} for i in range(100): d[i] = i if i in [0, 5, 10, 21, 42, 85]: print(f"元素数: {len(d):3}, 内存: {sys.getsizeof(d):4} bytes")show_dict_growth()# 输出示例:# 元素数: 1, 内存: 232 bytes# 元素数: 6, 内存: 360 bytes# 元素数: 11, 内存: 640 bytes# 元素数: 22, 内存: 1176 bytes# 元素数: 43, 内存: 2504 bytes# 元素数: 86, 内存: 5208 bytes
三、实战项目
3.1 项目一:词频统计TOP-N
from collections import Counterimport refrom typing import List, Tupleimport heapqclass WordFrequencyAnalyzer: """词频分析器 - 支持多种统计方式""" def __init__(self, text: str = ""): self.text = text self.words = [] self.counter = None def tokenize(self, text: str = None) -> List[str]: """分词(简化版)""" if text is None: text = self.text # 转小写,提取单词 text = text.lower() # 匹配中文字符或英文单词 words = re.findall(r'[\u4e00-\u9fff]|[a-z]+', text) # 过滤停用词(简化版) stopwords = {'the', 'a', 'an', 'is', 'are', 'was', 'were', '的', '了', '在', '是', '和', '有'} return [w for w in words if w not in stopwords and len(w) > 1] def analyze(self, text: str = None) -> Counter: """分析词频""" self.words = self.tokenize(text) self.counter = Counter(self.words) return self.counter def top_n(self, n: int = 10) -> List[Tuple[str, int]]: """获取TOP N""" if self.counter is None: self.analyze() return self.counter.most_common(n) def top_n_heap(self, n: int = 10) -> List[Tuple[str, int]]: """使用堆获取TOP N(适合大数据)""" if self.counter is None: self.analyze() # nlargest比排序整个Counter更高效 return heapq.nlargest(n, self.counter.items(), key=lambda x: x[1]) def frequency_distribution(self) -> dict: """频率分布统计""" if self.counter is None: self.analyze() dist = {} for word, count in self.counter.items(): dist[count] = dist.get(count, 0) + 1 return { "total_words": len(self.words), "unique_words": len(self.counter), "frequency_dist": sorted(dist.items(), reverse=True) } def find_phrases(self, n: int = 2, min_freq: int = 2) -> List[Tuple[Tuple[str, ...], int]]: """查找高频短语(n-gram)""" if not self.words: self.words = self.tokenize() phrases = [] for i in range(len(self.words) - n + 1): phrase = tuple(self.words[i:i+n]) phrases.append(phrase) phrase_counter = Counter(phrases) return [(p, c) for p, c in phrase_counter.most_common() if c >= min_freq] def generate_report(self, top_n: int = 20) -> str: """生成分析报告""" if self.counter is None: self.analyze() dist = self.frequency_distribution() top_words = self.top_n(top_n) report = f"""╔════════════════════════════════════════════════╗║ 词 频 统 计 报 告 ║╚════════════════════════════════════════════════╝【基本信息】 总词数: {dist['total_words']} 不同词汇数: {dist['unique_words']} 词汇丰富度: {dist['unique_words']/dist['total_words']:.2%}【高频词汇 TOP {top_n}】""" max_count = top_words[0][1] if top_words else 1 for rank, (word, count) in enumerate(top_words, 1): bar_len = int(30 * count / max_count) bar = "█" * bar_len report += f" {rank:2}. {word:15}{count:4}{bar}\n" # 频率分布 report += "\n【频率分布】\n" for freq, count in dist['frequency_dist'][:10]: report += f" 出现{freq:3}次的词有 {count:4} 个\n" return report# 使用示例text = """Python是一种解释型、面向对象、动态数据类型的高级程序设计语言。Python由Guido van Rossum于1989年底发明,第一个公开发行版发行于1991年。Python源代码遵循GPL协议。Python语法简洁清晰,特色之一是强制用空白符缩进。Python具有丰富的标准库和第三方库,广泛应用于Web开发、数据分析、人工智能等领域。"""analyzer = WordFrequencyAnalyzer(text)print(analyzer.generate_report(15))# 查找高频短语print("\n【高频短语(2-gram)】")for phrase, count in analyzer.find_phrases(n=2, min_freq=1): print(f" {' '.join(phrase)}: {count}")
3.2 项目二:电话簿系统
from typing import Dict, List, Optional, Setfrom dataclasses import dataclass, fieldfrom collections import defaultdict@dataclassclass Contact: """联系人""" name: str phone: str email: str = "" groups: Set[str] = field(default_factory=set) notes: str = "" def __hash__(self): return hash(self.phone) def __eq__(self, other): if isinstance(other, Contact): return self.phone == other.phone return Falseclass PhoneBook: """电话簿系统 - 支持多索引查询""" def __init__(self): # 主存储:手机号 -> 联系人 self._by_phone: Dict[str, Contact] = {} # 辅助索引 self._by_name: Dict[str, Set[str]] = defaultdict(set) # 姓名 -> 手机号集合 self._by_group: Dict[str, Set[str]] = defaultdict(set) # 分组 -> 手机号集合 self._by_email: Dict[str, str] = {} # 邮箱 -> 手机号 def add(self, contact: Contact) -> bool: """添加联系人""" if contact.phone in self._by_phone: print(f"手机号 {contact.phone} 已存在") return False # 主存储 self._by_phone[contact.phone] = contact # 更新索引 self._by_name[contact.name].add(contact.phone) for group in contact.groups: self._by_group[group].add(contact.phone) if contact.email: self._by_email[contact.email] = contact.phone return True def remove(self, phone: str) -> bool: """删除联系人""" if phone not in self._by_phone: return False contact = self._by_phone[phone] # 清理索引 self._by_name[contact.name].discard(phone) if not self._by_name[contact.name]: del self._by_name[contact.name] for group in contact.groups: self._by_group[group].discard(phone) if not self._by_group[group]: del self._by_group[group] if contact.email: del self._by_email[contact.email] # 主存储删除 del self._by_phone[phone] return True def update(self, phone: str, **kwargs) -> bool: """更新联系人""" if phone not in self._by_phone: return False contact = self._by_phone[phone] # 清理旧索引 self._by_name[contact.name].discard(phone) for group in contact.groups: self._by_group[group].discard(phone) if contact.email: del self._by_email[contact.email] # 更新字段 for key, value in kwargs.items(): if hasattr(contact, key): setattr(contact, key, value) # 重建索引 self._by_name[contact.name].add(phone) for group in contact.groups: self._by_group[group].add(phone) if contact.email: self._by_email[contact.email] = phone return True def find_by_phone(self, phone: str) -> Optional[Contact]: """按手机号查找""" return self._by_phone.get(phone) def find_by_name(self, name: str) -> List[Contact]: """按姓名查找""" phones = self._by_name.get(name, set()) return [self._by_phone[p] for p in phones] def find_by_group(self, group: str) -> List[Contact]: """按分组查找""" phones = self._by_group.get(group, set()) return [self._by_phone[p] for p in phones] def find_by_email(self, email: str) -> Optional[Contact]: """按邮箱查找""" phone = self._by_email.get(email) return self._by_phone.get(phone) if phone else None def search(self, keyword: str) -> List[Contact]: """模糊搜索""" results = set() keyword = keyword.lower() for contact in self._by_phone.values(): if (keyword in contact.name.lower() or keyword in contact.phone or keyword in contact.email.lower() or keyword in contact.notes.lower()): results.add(contact) return list(results) def add_to_group(self, phone: str, group: str) -> bool: """添加联系人到分组""" if phone not in self._by_phone: return False contact = self._by_phone[phone] contact.groups.add(group) self._by_group[group].add(phone) return True def get_groups(self) -> List[str]: """获取所有分组""" return list(self._by_group.keys()) def get_stats(self) -> dict: """获取统计信息""" return { "total_contacts": len(self._by_phone), "unique_names": len(self._by_name), "groups": len(self._by_group), "with_email": len(self._by_email) } def export(self) -> List[dict]: """导出所有联系人""" return [ { "name": c.name, "phone": c.phone, "email": c.email, "groups": list(c.groups), "notes": c.notes } for c in self._by_phone.values() ] def print_all(self): """打印所有联系人""" print("\n" + "=" * 80) print("电话簿".center(80)) print("=" * 80) for contact in sorted(self._by_phone.values(), key=lambda c: c.name): print(f"\n姓名: {contact.name}") print(f" 电话: {contact.phone}") if contact.email: print(f" 邮箱: {contact.email}") if contact.groups: print(f" 分组: {', '.join(contact.groups)}") if contact.notes: print(f" 备注: {contact.notes}")# 使用示例def demo(): book = PhoneBook() # 添加联系人 contacts = [ Contact("张三", "13800138000", "zhangsan@example.com", {"同事", "好友"}), Contact("李四", "13800138001", "lisi@example.com", {"同事"}), Contact("王五", "13800138002", "wangwu@example.com", {"好友"}), Contact("赵六", "13800138003", groups={"家人"}), ] for c in contacts: book.add(c) # 查询演示 print("\n按姓名查找'张三':") for c in book.find_by_name("张三"): print(f" {c}") print("\n按分组查找'同事':") for c in book.find_by_group("同事"): print(f" {c.name}: {c.phone}") print("\n模糊搜索'138':") for c in book.search("138"): print(f" {c.name}: {c.phone}") # 统计 print("\n统计信息:") print(book.get_stats()) # 打印全部 book.print_all()if __name__ == "__main__": demo()
四、常见陷阱与解决方案
4.1 可变对象作为字典键
# 陷阱:使用可变对象作为键# d = {[1, 2]: "value"} # TypeError: unhashable type: 'list'# 原因:列表是可变的,哈希值会改变,破坏哈希表不变性# 解决方案1:转换为元组d = {(1, 2): "value"}# 解决方案2:使用frozensets = frozenset([1, 2, 3])d = {s: "value"}# 解决方案3:自定义类实现__hash__class Point: def __init__(self, x, y): self.x = x self.y = y def __hash__(self): return hash((self.x, self.y)) def __eq__(self, other): return isinstance(other, Point) and (self.x, self.y) == (other.x, other.y)d = {Point(1, 2): "A"}
4.2 字典遍历时修改
# 陷阱:遍历时修改字典# d = {"a": 1, "b": 2, "c": 3}# for k in d:# if d[k] < 2:# del d[k] # RuntimeError: dictionary changed size during iteration# 正确做法1:遍历键的副本for k in list(d.keys()): if d[k] < 2: del d[k]# 正确做法2:字典推导式d = {k: v for k, v in d.items() if v >= 2}# 正确做法3:标记后删除to_delete = [k for k, v in d.items() if v < 2]for k in to_delete: del d[k]
4.3 defaultdict的默认工厂陷阱
from collections import defaultdict# 陷阱:默认工厂返回可变对象的同一个引用d = defaultdict(list)d["a"].append(1)print(d["b"]) # [] - 新的空列表# 但如果是自定义对象:class Container: def __init__(self): self.items = []d = defaultdict(Container)c1 = d["a"]c2 = d["a"]print(c1 is c2) # True - 同一个对象!# 正确做法:使用lambda每次创建新对象d = defaultdict(lambda: Container())c1 = d["a"]c2 = d["a"]print(c1 is c2) # True - 还是同一个(因为"a"已存在)c3 = d["b"]print(c1 is c3) # False - 不同key不同对象
4.4 集合运算的副作用
a = {1, 2, 3}b = {3, 4, 5}# 陷阱:& 和 | 返回新集合,但 &= 和 |= 原地修改original_id = id(a)# 返回新集合result = a & bprint(id(result) == original_id) # False# 原地修改a &= bprint(id(a) == original_id) # True# 注意:使用赋值会改变变量指向a = {1, 2, 3}a = a & b # 新集合,原集合可能被GC
4.5 哈希一致性陷阱
# 陷阱:修改对象后哈希值改变class BadKey: def __init__(self, value): self.value = value def __hash__(self): return hash(self.value) def __eq__(self, other): return isinstance(other, BadKey) and self.value == other.valuekey = BadKey(1)d = {key: "value"}print(d[key]) # "value"key.value = 2 # 修改了!# print(d[key]) # KeyError! 哈希值变了,找不到# 正确做法:使用不可变对象或确保对象不可变class GoodKey: def __init__(self, value): self._value = value # 私有,不暴露修改接口 @property def value(self): return self._value def __hash__(self): return hash(self._value) def __eq__(self, other): return isinstance(other, GoodKey) and self._value == other._value
五、本章小结
核心知识点回顾
| |
|---|
| |
| get/setdefault/update/pop/popitem |
| defaultdict、Counter、OrderedDict |
| |
| |
| |
| |
数据结构选择指南
需要键值映射 → 字典(dict) ├─ 需要自动默认值 → defaultdict ├─ 需要计数 → Counter └─ 需要保持顺序 → dict (3.7+) 或 OrderedDict需要去重 → 集合(set)/frozenset ├─ 需要可哈希 → frozenset └─ 集合运算 → set需要有序序列 → 列表(list)/元组(tuple)查找效率对比(n个元素): 列表查找: O(n) 字典/集合查找: O(1)
最佳实践清单
- 使用setdefault或defaultdict处理缺失键
六、课后练习
基础练习
两数之和:给定数组和目标值,找出和为目标值的两个数的索引
two_sum([2, 7, 11, 15], 9) # [0, 1] 因为 2+7=9
要求:使用字典实现O(n)时间复杂度
字符异位词:判断两个字符串是否为异位词(字符相同,顺序不同)
is_anagram("listen", "silent") # True
要求:使用Counter实现
最长连续序列:找出未排序数组中最长连续序列长度
longest_consecutive([100, 4, 200, 1, 3, 2]) # 4 ([1,2,3,4])
要求:使用集合实现O(n)复杂度
进阶练习
LRU缓存:实现LRU(最近最少使用)缓存
cache = LRUCache(2)cache.put(1, 1)cache.put(2, 2)cache.get(1) # 返回 1cache.put(3, 3) # 淘汰 key 2cache.get(2) # 返回 -1 (未找到)
一致性哈希:实现简单的一致性哈希环
倒排索引:为文档集合构建倒排索引
挑战练习
布隆过滤器:实现布隆过滤器
跳表:实现跳表(Skip List)
参考资源
本文是《Python全栈修炼之路》系列第5篇,持续更新中,欢迎关注专栏获取更多内容!