
大家好,我是煜道。
今天我们一起来学习 迭代器与生成器。
引言
迭代器(Iterator)和生成器(Generator)是Python中处理数据流的重要工具。它们提供了一种惰性求值的方式,能够高效处理大规模数据或无限序列,避免一次性将所有数据加载到内存中。 理解迭代器和生成器的原理,对于编写高性能Python代码至关重要。
本文将深入探讨Python的迭代协议、可迭代对象、迭代器、生成器函数以及生成器表达式。通过系统学习,我们将能够利用这些工具编写内存效率更高的代码,并理解现代Python异步编程的基础。

01 可迭代对象与迭代器
1.1 可迭代对象
可迭代对象(Iterable)是能够返回其成员的对象,可以被for循环遍历:
# 常见的可迭代对象lst = [1, 2, 3] # 列表tup = (1, 2, 3) # 元组s = "hello"# 字符串d = {'a': 1, 'b': 2} # 字典st = {1, 2, 3} # 集合r = range(5) # range对象# 检查是否可迭代from collections.abc import Iterableprint(isinstance(lst, Iterable)) # Trueprint(isinstance(s, Iterable)) # Trueprint(isinstance(42, Iterable)) # False# 手动迭代it = iter(lst) # 获取迭代器print(next(it)) # 1print(next(it)) # 2print(next(it)) # 3# print(next(it)) # StopIteration异常
1.2 迭代协议
迭代协议包含两个方法:
classMyList:def__init__(self, data): self.data = data self.index = 0def__iter__(self):"""返回迭代器对象"""return selfdef__next__(self):"""返回下一个元素"""if self.index >= len(self.data):raise StopIteration value = self.data[self.index] self.index += 1return valuemlist = MyList([1, 2, 3])for item in mlist: print(item)# 1# 2# 3

1.3 iter()函数的高级用法
# 基本用法lst = [1, 2, 3, 4, 5]it = iter(lst)# 带哨兵值的用法with open('file.txt') as f:for line in iter(lambda: f.readline(), ''): print(line.strip())

02 生成器函数
2.1 什么是生成器
生成器是一种特殊的迭代器,通过函数定义中的yield语句创建:
defmy_range(start, end):"""自定义范围生成器""" current = startwhile current < end:yield current current += 1# 使用生成器for i in my_range(0, 5): print(i)# 0# 1# 2# 3# 4
2.2 生成器的工作原理
defsimple_gen(): print("Before first yield")yield1 print("After first yield, before second")yield2 print("After second yield")yield3 print("Generator done")gen = simple_gen()print("Generator created, not executed yet")print(next(gen)) # 执行到第一个yield# 输出:Before first yield# 返回:1print(next(gen)) # 从第一个yield后继续执行# 输出:After first yield, before second# 返回:2print(next(gen))# 输出:After second yield# 返回:3# print(next(gen)) # StopIteration

2.3 生成器的优势
# 生成大量数据的两种方式# 方式一:列表(一次性生成全部)defgenerate_numbers_list(n):return [i * i for i in range(n)]# 内存占用大:需要存储整个列表large_list = generate_numbers_list(1000000)import sysprint(sys.getsizeof(large_list)) # 约8MB# 方式二:生成器(惰性求值)defgenerate_numbers_gen(n):for i in range(n):yield i * i# 内存占用小:每次只生成一个值large_gen = generate_numbers_gen(1000000)print(sys.getsizeof(large_gen)) # 约几百字节# 实际使用total = sum(generate_numbers_gen(1000000))print(total) # 0² + 1² + ... + 999999²

03 生成器表达式
3.1 列表推导式vs生成器表达式
# 列表推导式(立即求值)squares_list = [x * x for x in range(10)]print(type(squares_list)) # <class 'list'>print(squares_list) # [0, 1, 4, 9, 16, 25, 36, 49, 64, 81]# 生成器表达式(惰性求值)squares_gen = (x * x for x in range(10))print(type(squares_gen)) # <class 'generator'>print(squares_gen) # <generator object <genexpr> at 0x...># 逐个取值print(next(squares_gen)) # 0print(next(squares_gen)) # 1
3.2 生成器表达式的使用场景
# 适合使用生成器的情况# 1. 只遍历一次# 2. 内存敏感# 3. 数据量巨大或无限# 计算文件总行数(内存友好)with open('large_file.txt') as f: line_count = sum(1for _ in f)# 管道式处理defread_lines(filename):with open(filename) as f:for line in f:yield line.strip()deffilter_lines(lines, keyword):for line in lines:if keyword in line:yield linedefcount_words(lines):return sum(len(line.split()) for line in lines)# 链式调用lines = read_lines('file.txt')filtered = filter_lines(lines, 'Python')result = count_words(filtered)

3.3 生成器表达式在函数中的应用
# sum()与生成器total = sum(x * x for x in range(1000))# max()与生成器max_val = max(x for x in range(100) if x % 2 == 0)# any()与生成器has_even = any(x % 2 == 0for x in [1, 3, 5, 7, 9])# all()与生成器all_positive = all(x > 0for x in [1, 2, 3, 4, 5])# 嵌套生成器matrix = [[1, 2, 3], [4, 5, 6], [7, 8, 9]]flat = (x for row in matrix for x in row)

04 yield from
4.1 yield from的基本用法
# Python 3.3+ 引入yield from# 传统方式:委托给子生成器defgen1():for i in range(3):yield idefgen2():for i in range(3):yield ifor j in range(3, 6):yield j# yield from方式:更简洁defgen3():yieldfrom range(3)yieldfrom range(3, 6)# 等价于defgen4():for i in range(3):yield ifor i in range(3, 6):yield i
4.2 yield from与深度嵌套
defflatten(nested_list):"""扁平化嵌套列表"""for item in nested_list:if isinstance(item, list):yieldfrom flatten(item)else:yield item# 使用nested = [1, [2, 3], [4, [5, 6]], 7]print(list(flatten(nested))) # [1, 2, 3, 4, 5, 6, 7]
4.3 yield from的传值
defecho():"""回声生成器:接收并传回值""" received = yield print(f"Received: {received}") received = yield received * 2 print(f"Received: {received}")return received * 3gen = echo()print(next(gen)) # 启动print(gen.send(10)) # 发送10,返回20print(gen.send(20)) # StopIteration 60

05 协程基础
5.1 协程与生成器
生成器可以用作协程,实现协作式多任务:
defgrep(pattern):"""搜索模式协程""" print(f"Looking for {pattern}")try:whileTrue: line = yieldif pattern in line: print(line)except GeneratorExit: print("Stopping coroutine")# 使用协程search = grep("Python")next(search) # 启动search.send("Hello world")search.send("Python is great")search.send("Java is good")search.send("Python rocks!")search.close()# 输出:# Looking for Python# Python is great# Python rocks!# Stopping coroutine
5.2 管道式协程
defcountdown(n):"""倒计时协程"""while n > 0:yield n n -= 1defsquare(numbers):"""平方协程"""for n in numbers:yield n * ndefoutput(items):"""输出协程"""for item in items: print(f"Output: {item}")# 构建管道c = countdown(5)s = square(c)o = output(s)# 启动管道o.send(None) # 相当于next(o)# 或使用close停止


06 生成器的状态
6.1 检查生成器状态
defsimple_gen():yield1yield2yield3gen = simple_gen()# gi_code:代码对象print(gen.gi_code.co_name) # 'simple_gen'# gi_frame:当前帧print(gen.gi_frame.f_locals) # {}# gi_running:是否正在运行print(gen.gi_running) # Falsenext(gen)print(gen.gi_running) # False(yield时暂停)import typesprint(isinstance(gen, types.GeneratorType)) # True
6.2 生成器的高级控制
defprocess(): result1 = yield"Ready for first value" result2 = yieldf"Got {result1}, ready for second"returnf"Done with {result1} and {result2}"gen = process()print(gen.send(None)) # 'Ready for first value'print(gen.send("apple")) # 'Got apple, ready for second'try: gen.send("banana")except StopIteration as e: print(f"Final result: {e.value}") # Final result: Done with apple and banana

07 实战示例
7.1 惰性读取大文件
defread_large_file(file_path, chunk_size=8192):"""惰性读取大文件"""with open(file_path, 'r') as f:whileTrue: chunk = f.read(chunk_size)ifnot chunk:breakyield chunk# 使用:处理行而不是块defprocess_file_line_by_line(file_path):for chunk in read_large_file(file_path):for line in chunk.splitlines():yield line.strip()
7.2 生成器管道处理数据
defextract_numbers(text):"""从文本中提取数字""" num = ''for char in text:if char.isdigit() or (char == '.'and num and num.replace('.', '', 1).isdigit()): num += charelif num:yield float(num) num = ''deffilter_positive(numbers):"""过滤正数"""for n in numbers:if n > 0:yield ndefrunning_total(numbers):"""计算运行总和""" total = 0for n in numbers: total += nyield total# 组合管道text = "Prices: $12.50, $25.00, $8.75, -$5.00"pipeline = running_total(filter_positive(extract_numbers(text)))print(list(pipeline))
7.3 实现迭代器适配器
classIteratorAdapter:"""将可迭代对象转换为迭代器(带进度)"""def__init__(self, iterable): self.iterable = iter(iterable) self.count = 0def__iter__(self):return selfdef__next__(self): self.count += 1return next(self.iterable)# 使用adapter = IteratorAdapter(range(5))for item in adapter: print(f"Item {adapter.count}: {item}")

08 小结
本文深入探讨了Python的迭代器和生成器:
- 迭代器:实现
__iter__和__next__方法的对象。 - 生成器函数:使用
yield语句的函数,返回生成器对象。
迭代器和生成器是Python中处理数据流的核心工具。它们通过惰性求值实现内存效率,通过生成器表达式提供简洁的语法,通过协程支持协作式多任务。掌握这些概念对于编写高质量的Python代码至关重要。
