当前位置：首页>python>Python 字典与集合专项练习:6 道编程题从入门到精通

Python 字典与集合专项练习:6 道编程题从入门到精通

2026-07-01 18:57:01

配套专栏：Python 全栈修炼之路第 5 篇《字典与集合 —— 哈希的威力》
难度分布：⭐ → ⭐⭐ → ⭐⭐ → ⭐⭐⭐ → ⭐⭐⭐ → ⭐⭐⭐⭐
核心覆盖：字典查找 O(1)、Counter 计数、集合运算、defaultdict 分组、OrderedDict、哈希表底层原理

题目一：两数之和 ⭐

📌 题目描述

给定一个整数数组 nums 和一个目标值 target，在数组中找出和为目标值的那两个整数，返回它们的索引。假设每个输入只有一个解。

输入：nums = [2, 7, 11, 15], target = 9输出：[0, 1]解释：nums[0] + nums[1] = 2 + 7 = 9

要求：时间复杂度 O(n)，不能使用暴力双重循环 O(n²)。

💡 编程思路

这道题是 LeetCode 1，经典的「用空间换时间」案例：

暴力法
：双重循环枚举所有数对，O(n²)，面试不推荐。
哈希表法
：遍历数组，用字典记录「已遍历数字 → 索引」。对于当前数字 num，检查 target - num 是否在字典中。字典查找 O(1)，整体 O(n)。

关键洞察：字典的 in 操作是 O(1)，利用这一点可以将查找从 O(n) 降到 O(1)。

🖥️ 参考代码

def two_sum(nums, target):    """    两数之和 —— 哈希表法 O(n)    思路：遍历时记录已见数字，对于每个num检查(target-num)是否已存在    """    seen = {}  # 数字 -> 索引    for i, num in enumerate(nums):        complement = target - num        if complement in seen:            return [seen[complement], i]        seen[num] = i    return []  # 无解# 测试if __name__ == "__main__":    print(two_sum([2, 7, 11, 15], 9))   # [0, 1]    print(two_sum([3, 2, 4], 6))        # [1, 2]    print(two_sum([3, 3], 6))           # [0, 1]    # 边界测试    print(two_sum([], 10))              # []    print(two_sum([1], 2))              # []

🔗 关联知识点

知识点	说明
字典 `in` 操作	O(1) 时间复杂度，哈希查找
哈希表存储「已见」	空间换时间的经典模式
`enumerate`	同时获取索引和值

题目二：字符异位词分组 ⭐⭐

📌 题目描述

给定一个字符串数组，将异位词（字母相同、排列不同的词）分组在一起。

输入：["eat", "tea", "tan", "ate", "nat", "bat"]输出：[["eat", "tea", "ate"], ["tan", "nat"], ["bat"]]

💡 编程思路

异位词的本质是字母组成相同，因此可以用「排序后的字符串」或「字母计数」作为分组键：

方法一（排序键）
：将每个字符串按字母排序，"eat" → "aet"。所有异位词排序后相同，作为字典键。
方法二（计数键）
：统计每个字母出现次数，用元组 (count_a, count_b, ...) 作为键。避免排序开销。

方法一更直观，方法一和方法二都是 O(n·k log k) 和 O(n·k)，其中 k 是字符串平均长度。

🖥️ 参考代码

from collections import defaultdictdef group_anagrams_sort(strs):    """方法一：排序作为键"""    groups = defaultdict(list)    for s in strs:        key = ''.join(sorted(s))  # 排序后作为键        groups[key].append(s)    return list(groups.values())def group_anagrams_count(strs):    """方法二：字母计数作为键（避免排序）"""    groups = defaultdict(list)    for s in strs:        # 统计26个字母出现次数        count = [0] * 26        for ch in s:            count[ord(ch) - ord('a')] += 1        key = tuple(count)  # 列表不可哈希，转元组        groups[key].append(s)    return list(groups.values())# 测试if __name__ == "__main__":    strs = ["eat", "tea", "tan", "ate", "nat", "bat"]    print(group_anagrams_sort(strs))    # [['eat', 'tea', 'ate'], ['tan', 'nat'], ['bat']]    print(group_anagrams_count(strs))    # [['eat', 'tea', 'ate'], ['tan', 'nat'], ['bat']]    # 边界测试    print(group_anagrams_sort([""]))    # [['']]    print(group_anagrams_sort(["a"]))   # [['a']]

🔗 关联知识点

知识点	说明
`defaultdict(list)`	自动创建空列表，避免 `if key not in d` 判断
`sorted(s)`	返回排序后的字符列表
元组作为字典键	元组不可变、可哈希，可作为键
`ord(ch) - ord('a')`	字母转 0-25 索引

题目三：前 K 个高频元素 ⭐⭐

📌 题目描述

给定一个整数数组 nums 和一个整数 k，返回出现频率前 k 高的元素。

输入：nums = [1, 1, 1, 2, 2, 3], k = 2输出：[1, 2]解释：1出现3次，2出现2次，是前两个高频元素

要求：时间复杂度优于 O(n log n)。

💡 编程思路

这道题综合考察 Counter + 堆：

方法一（排序）
：用 Counter 统计频率，排序后取前 k。O(n log n)，简单但非最优。
方法二（堆）
：用 Counter 统计后，用 heapq.nlargest(k, ...) 取前 k 个。堆操作 O(n + k log n)，当 k << n 时更优。
方法三（桶排序）
：频率最大为 n，用桶排序思想，O(n)。适合面试展示深度。

🖥️ 参考代码

from collections import Counterimport heapqdef top_k_frequent_sort(nums, k):    """方法一：排序 —— O(n log n)"""    counter = Counter(nums)    # 按频率降序排序    sorted_items = sorted(counter.items(), key=lambda x: x[1], reverse=True)    return [item[0] for item in sorted_items[:k]]def top_k_frequent_heap(nums, k):    """方法二：堆 —— O(n + k log n)"""    counter = Counter(nums)    # heapq.nlargest 内部使用堆    return [item[0] for item in heapq.nlargest(k, counter.items(), key=lambda x: x[1])]def top_k_frequent_bucket(nums, k):    """方法三：桶排序 —— O(n)"""    counter = Counter(nums)    # 桶：索引是频率，值是该频率的元素列表    max_freq = max(counter.values())    buckets = [[] for _ in range(max_freq + 1)]    for num, freq in counter.items():        buckets[freq].append(num)    # 从高频到低频收集    result = []    for i in range(max_freq, 0, -1):        result.extend(buckets[i])        if len(result) >= k:            return result[:k]    return result# 测试if __name__ == "__main__":    nums = [1, 1, 1, 2, 2, 3]    k = 2    print(top_k_frequent_sort(nums, k))   # [1, 2]    print(top_k_frequent_heap(nums, k))   # [1, 2]    print(top_k_frequent_bucket(nums, k)) # [1, 2]    # 性能对比    import timeit    import random    big_nums = [random.randint(1, 1000) for _ in range(100000)]    t1 = timeit.timeit(lambda: top_k_frequent_sort(big_nums, 10), number=10)    t2 = timeit.timeit(lambda: top_k_frequent_heap(big_nums, 10), number=10)    t3 = timeit.timeit(lambda: top_k_frequent_bucket(big_nums, 10), number=10)    print(f"\n排序法: {t1:.4f}s")    print(f"堆方法: {t2:.4f}s")    print(f"桶排序: {t3:.4f}s")

🔗 关联知识点

知识点	说明
`Counter`	一行完成词频统计
`most_common(k)`	Counter 内置方法，直接取前 k
`heapq.nlargest`	堆实现的高效 Top-K
桶排序思想	用频率作为桶索引

题目四：最长连续序列 ⭐⭐⭐

📌 题目描述

给定一个未排序的整数数组，找出数字连续的最长序列的长度。要求算法时间复杂度为 O(n)。

输入：nums = [100, 4, 200, 1, 3, 2]输出：4解释：最长连续序列是 [1, 2, 3, 4]，长度为 4

💡 编程思路

这道题的关键是利用集合的 O(1) 查找：

将所有数字放入集合。
遍历集合，对于每个数字 num，只有当 num - 1不在集合中时，才以 num 为起点开始计数（避免重复计算）。
从起点不断检查 num + 1, num + 2, ... 是否在集合中，统计序列长度。

为什么是 O(n)？每个数字最多被访问两次（一次作为起点检查，一次在序列中），总操作 O(n)。

🖥️ 参考代码

def longest_consecutive(nums):    """最长连续序列 —— 集合 + O(n) 遍历"""    if not nums:        return 0    num_set = set(nums)    max_length = 0    for num in num_set:        # 只有当 num-1 不存在时，才作为序列起点        if num - 1 not in num_set:            current_num = num            current_length = 1            # 向后查找连续数字            while current_num + 1 in num_set:                current_num += 1                current_length += 1            max_length = max(max_length, current_length)    return max_length# 测试if __name__ == "__main__":    print(longest_consecutive([100, 4, 200, 1, 3, 2]))  # 4    print(longest_consecutive([0, 3, 7, 2, 5, 8, 4, 6, 0, 1]))  # 9    print(longest_consecutive([]))  # 0    print(longest_consecutive([1]))  # 1    # 性能测试    import timeit    import random    big_nums = random.sample(range(1000000), 100000)    t = timeit.timeit(lambda: longest_consecutive(big_nums), number=10)    print(f"\n10万数据耗时: {t:.4f}s")

🔗 关联知识点

知识点	说明
`set(nums)`	O(n) 去重，O(1) 成员判断
起点判断技巧	`num - 1 not in set` 确保只从序列头开始
哈希表查找 O(1)	这是本题能 O(n) 的核心原因

题目五：LRU 缓存 ⭐⭐⭐

📌 题目描述

实现一个 LRU（最近最少使用）缓存，支持 get(key) 和 put(key, value) 操作：

get(key)
：存在则返回值并移到最前；不存在返回 -1
put(key, value)
：存在则更新并移到最前；不存在则插入。超出容量时淘汰最久未使用的

cache = LRUCache(2)cache.put(1, 1)cache.put(2, 2)print(cache.get(1))    # 返回 1cache.put(3, 3)        # 淘汰 key=2print(cache.get(2))    # 返回 -1（未找到）cache.put(4, 4)        # 淘汰 key=1print(cache.get(1))    # 返回 -1print(cache.get(3))    # 返回 3print(cache.get(4))    # 返回 4

要求：get 和 put 均为 O(1) 时间复杂度。

💡 编程思路

LRU 缓存需要两个核心能力：

O(1) 查找
→ 用字典
O(1) 移动/删除最旧元素
→ 用双向链表

组合方案：字典存 key → 链表节点，链表维护访问顺序：

头部：最近使用
尾部：最久未使用

Python 中可以直接用 collections.OrderedDict，它内部就是「字典 + 双向链表」的实现，move_to_end() 和 popitem(last=False) 都是 O(1)。

🖥️ 参考代码

from collections import OrderedDictclass LRUCache:    """LRU 缓存 —— 基于 OrderedDict 实现"""    def __init__(self, capacity: int):        self.capacity = capacity        self.cache = OrderedDict()    def get(self, key: int) -> int:        if key not in self.cache:            return -1        # 移动到末尾（表示最近使用）        self.cache.move_to_end(key)        return self.cache[key]    def put(self, key: int, value: int) -> None:        if key in self.cache:            # 更新并移动到末尾            self.cache.move_to_end(key)        else:            # 检查容量            if len(self.cache) >= self.capacity:                # 弹出最久未使用的（头部）                self.cache.popitem(last=False)        self.cache[key] = value# 手动实现版本（双向链表 + 字典）class DLinkedNode:    """双向链表节点"""    def __init__(self, key=0, value=0):        self.key = key        self.value = value        self.prev = None        self.next = Noneclass LRUCacheManual:    """LRU 缓存 —— 手动实现双向链表"""    def __init__(self, capacity: int):        self.capacity = capacity        self.cache = {}  # key -> Node        # 虚拟头尾节点（简化边界处理）        self.head = DLinkedNode()        self.tail = DLinkedNode()        self.head.next = self.tail        self.tail.prev = self.head    def _add_node(self, node):        """添加到头部（最近使用）"""        node.prev = self.head        node.next = self.head.next        self.head.next.prev = node        self.head.next = node    def _remove_node(self, node):        """移除节点"""        node.prev.next = node.next        node.next.prev = node.prev    def _move_to_head(self, node):        """移动到头部"""        self._remove_node(node)        self._add_node(node)    def _pop_tail(self):        """弹出尾部节点（最久未使用）"""        node = self.tail.prev        self._remove_node(node)        return node    def get(self, key: int) -> int:        if key not in self.cache:            return -1        node = self.cache[key]        self._move_to_head(node)        return node.value    def put(self, key: int, value: int) -> None:        if key in self.cache:            node = self.cache[key]            node.value = value            self._move_to_head(node)        else:            if len(self.cache) >= self.capacity:                tail = self._pop_tail()                del self.cache[tail.key]            node = DLinkedNode(key, value)            self.cache[key] = node            self._add_node(node)# 测试if __name__ == "__main__":    # OrderedDict 版本    cache = LRUCache(2)    cache.put(1, 1)    cache.put(2, 2)    print(cache.get(1))  # 1    cache.put(3, 3)    print(cache.get(2))  # -1    cache.put(4, 4)    print(cache.get(1))  # -1    print(cache.get(3))  # 3    print(cache.get(4))  # 4    print("\n--- 手动实现版本 ---")    cache2 = LRUCacheManual(2)    cache2.put(1, 1)    cache2.put(2, 2)    print(cache2.get(1))  # 1    cache2.put(3, 3)    print(cache2.get(2))  # -1

🔗 关联知识点

知识点	说明
`OrderedDict`	Python 内置的有序字典
`move_to_end(key)`	将键移到末尾，O(1)
`popitem(last=False)`	弹出头部（最旧），O(1)
双向链表	手动实现时的核心数据结构
虚拟头尾节点	简化链表边界判断

题目六：实现简易哈希表 ⭐⭐⭐⭐

📌 题目描述

不使用 Python 内置的 dict，实现一个简易哈希表 MyHashMap，支持以下操作：

方法	说明
`put(key, value)`	插入或更新键值对
`get(key)`	返回值，不存在返回 -1
`remove(key)`	删除键值对

hashMap = MyHashMap()hashMap.put(1, 10)hashMap.put(2, 20)print(hashMap.get(1))   # 10print(hashMap.get(3))   # -1hashMap.put(2, 30)print(hashMap.get(2))   # 30hashMap.remove(2)print(hashMap.get(2))   # -1

💡 编程思路

这道题直击第5篇的底层原理 —— 哈希表的核心机制：

哈希函数
：将 key 映射到数组索引，hash(key) % capacity。
冲突处理
：使用链地址法（Separate Chaining），每个桶是一个链表，存储哈希值相同的键值对。
扩容
：当元素数超过 capacity * load_factor 时，扩容并重新哈希。

链地址法 vs 开放寻址法：

链地址法：冲突时在桶内链表追加，实现简单。
开放寻址法：冲突时探测下一个位置，Python 字典采用此方法。

🖥️ 参考代码

class MyHashMap:    """简易哈希表 —— 链地址法实现"""    def __init__(self, capacity=1000, load_factor=0.75):        self.capacity = capacity        self.load_factor = load_factor        self.size = 0        # 每个桶是一个列表，存储 (key, value) 元组        self.buckets = [[] for _ in range(capacity)]    def _hash(self, key):        """哈希函数"""        return key % self.capacity    def _resize(self):        """扩容并重新哈希"""        old_buckets = self.buckets        self.capacity *= 2        self.buckets = [[] for _ in range(self.capacity)]        self.size = 0        # 重新插入所有元素        for bucket in old_buckets:            for key, value in bucket:                self.put(key, value)    def put(self, key, value):        """插入或更新"""        if self.size >= self.capacity * self.load_factor:            self._resize()        idx = self._hash(key)        bucket = self.buckets[idx]        # 查找是否已存在        for i, (k, v) in enumerate(bucket):            if k == key:                bucket[i] = (key, value)  # 更新                return        # 不存在则追加        bucket.append((key, value))        self.size += 1    def get(self, key):        """获取值"""        idx = self._hash(key)        bucket = self.buckets[idx]        for k, v in bucket:            if k == key:                return v        return -1    def remove(self, key):        """删除键值对"""        idx = self._hash(key)        bucket = self.buckets[idx]        for i, (k, v) in enumerate(bucket):            if k == key:                del bucket[i]                self.size -= 1                return    def __repr__(self):        items = []        for bucket in self.buckets:            for k, v in bucket:                items.append(f"{k}: {v}")        return "{" + ", ".join(items) + "}"# 测试if __name__ == "__main__":    hashMap = MyHashMap()    hashMap.put(1, 10)    hashMap.put(2, 20)    print(hashMap.get(1))   # 10    print(hashMap.get(3))   # -1    hashMap.put(2, 30)    print(hashMap.get(2))   # 30    hashMap.remove(2)    print(hashMap.get(2))   # -1    # 冲突测试    hashMap.put(1, 100)    hashMap.put(1001, 200)  # 与 key=1 哈希冲突（假设 capacity=1000）    print(hashMap.get(1))    # 100    print(hashMap.get(1001)) # 200    # 扩容测试    print(f"\n扩容前 capacity: {hashMap.capacity}")    for i in range(1000):        hashMap.put(i, i * 10)    print(f"扩容后 capacity: {hashMap.capacity}")    print(hashMap.get(500))  # 5000

🔗 关联知识点

知识点	说明
哈希函数	`key % capacity` 简单取模
链地址法	每个桶用链表存储冲突元素
负载因子	`size / capacity` ，超过阈值触发扩容
扩容重哈希	容量翻倍后，所有元素重新计算哈希位置
时间复杂度	平均 O(1)，最坏 O(n)（所有元素冲突）

总结：知识点覆盖矩阵

题目	难度	核心考察点
两数之和	⭐	字典 O(1) 查找、空间换时间
异位词分组	⭐⭐	`defaultdict` 、排序/计数作为键
前 K 高频元素	⭐⭐	`Counter` 、堆、桶排序
最长连续序列	⭐⭐⭐	集合 O(1) 查找、起点判断技巧
LRU 缓存	⭐⭐⭐	`OrderedDict` 、双向链表、字典+链表组合
简易哈希表	⭐⭐⭐⭐	哈希函数、链地址法、扩容机制

建议：按顺序从第 1 题做到第 6 题。第 6 题是本篇的「压轴题」，完整实现后你将真正理解 Python 字典的底层工作原理 —— 哈希函数、冲突处理、扩容策略。

Python 字典与集合专项练习:6 道编程题从入门到精通

题目一：两数之和 ⭐

📌 题目描述

💡 编程思路

🖥️ 参考代码

🔗 关联知识点

题目二：字符异位词分组 ⭐⭐

📌 题目描述

💡 编程思路

🖥️ 参考代码

🔗 关联知识点

题目三：前 K 个高频元素 ⭐⭐

📌 题目描述

💡 编程思路

🖥️ 参考代码

🔗 关联知识点

题目四：最长连续序列 ⭐⭐⭐

📌 题目描述

💡 编程思路

🖥️ 参考代码

🔗 关联知识点

题目五：LRU 缓存 ⭐⭐⭐

📌 题目描述

💡 编程思路

🖥️ 参考代码

🔗 关联知识点

题目六：实现简易哈希表 ⭐⭐⭐⭐

📌 题目描述

💡 编程思路

🖥️ 参考代码

🔗 关联知识点

总结：知识点覆盖矩阵

延伸阅读

最新文章

热门文章

随机文章

Python 字典与集合专项练习:6 道编程题从入门到精通

题目一：两数之和 ⭐

📌 题目描述

💡 编程思路

🖥️ 参考代码

🔗 关联知识点

题目二：字符异位词分组 ⭐⭐

📌 题目描述

💡 编程思路

🖥️ 参考代码

🔗 关联知识点

题目三：前 K 个高频元素 ⭐⭐

📌 题目描述

💡 编程思路

🖥️ 参考代码

🔗 关联知识点

题目四：最长连续序列 ⭐⭐⭐

📌 题目描述

💡 编程思路

🖥️ 参考代码

🔗 关联知识点

题目五：LRU 缓存 ⭐⭐⭐

📌 题目描述

💡 编程思路

🖥️ 参考代码

🔗 关联知识点

题目六：实现简易哈希表 ⭐⭐⭐⭐

📌 题目描述

💡 编程思路

🖥️ 参考代码

🔗 关联知识点

总结：知识点覆盖矩阵

延伸阅读

【黑金云课堂】FPGA技术教程Linux开发:NVMe/Qt/OpenCV人脸检测

升学分享|Python仅入门,三年后同时拿下NUS和NTU计算机offer

最新文章

热门文章

随机文章