还在用for循环遍历一切?是时候理解迭代器的浪漫了
各位Python进阶选手们好啊。
今天要聊一个很多人学了很久都没搞清楚的概念——生成器和迭代器。
听到这两个词,你是不是一脸问号。"我用for循环遍历列表挺开心的啊,为啥要懂这玩意儿。"
因为看懂Python源码、写出高性能代码、理解协程和异步的底层原理,全都要靠它们。今天混子哥就带你彻底搞懂这个让人又爱又恨的概念。
先说迭代器——它到底是什么
迭代器就是能记住遍历位置的对象,并且必须实现__iter__和__next__两个方法。
用人话讲就是:一个能不断吐出下一个值的对象。
my_list = [1, 2, 3]
it = iter(my_list)
print(next(it))
print(next(it))
print(next(it))
print(next(it))
自己动手实现一个迭代器
classCountDown:
def__init__(self, start):
self.current = start
def__iter__(self):
return self
def__next__(self):
if self.current <= 0:
raise StopIteration
self.current -= 1
return self.current + 1
countdown = CountDown(5)
for num in countdown:
print(num, end=" ")
生成器——迭代器的语法糖,懒人必备
生成器就是让你少写代码的迭代器,本质上就是个函数,但里面用了yield。
defsimple_range(n):
result = []
for i in range(n):
result.append(i)
return result
defgen_range(n):
for i in range(n):
yield i
for i in gen_range(5):
print(i, end=" ")
生成器到底哪里好?内存对比说话
import sys
defsimple_range(n):
result = []
for i in range(n):
result.append(i)
return result
defgen_range(n):
for i in range(n):
yield i
simple = simple_range(10_000_000)
gen = gen_range(10_000_000)
print(f"列表占用内存: {sys.getsizeof(simple) / 1024 / 1024:.2f} MB")
print(f"生成器占用内存: {sys.getsizeof(gen)} bytes")
生成器表达式——列表推导式的懒人版
squares = [x**2for x in range(1000000)]
squares_gen = (x**2for x in range(1000000))
print(next(squares_gen))
print(next(squares_gen))
print(next(squares_gen))
链式生成器,处理大数据的神器
defread_large_file(filepath):
with open(filepath, 'r') as f:
for line in f:
yield line.strip()
deffilter_errors(lines):
for line in lines:
if'ERROR'in line:
yield line
defextract_timestamp(lines):
for line in lines:
parts = line.split()
if parts:
yield parts[0]
for timestamp in extract_timestamp(filter_errors(read_large_file('big_log.txt'))):
print(timestamp)
yield from——生成器的生成器
defflattern(nested):
for item in nested:
for sub_item in item:
yield sub_item
defflattern_with_yield_from(nested):
for item in nested:
yieldfrom item
matrix = [[1, 2], [3, 4], [5, 6]]
for num in flattern_with_yield_from(matrix):
print(num, end=" ")
协程——生成器的骚操作
协程就是用生成器实现的"假多线程",在单线程里实现并发。
defconsumer():
print("消费者等待...")
whileTrue:
item = yield
print(f"消费: {item}")
defproducer():
c = consumer()
c.send(None)
for item in range(5):
print(f"生产: {item}")
c.send(item)
producer()
asyncio的底层——全是生成器的功劳
import asyncio
asyncdeffetch(url):
return url
# 编译后大概等价于:
deffetch(url):
yieldfrom asyncio.get_event_loop().run_in_executor(None, blocking_io, url)
生成器的进阶技巧
# 1. send() - 双向通信
defcounter():
count = 0
whileTrue:
received = yield count
if received isnotNone:
count = received
else:
count += 1
c = counter()
print(next(c))
print(next(c))
print(c.send(10))
print(next(c))
# 2. throw() - 注入异常
defgen():
try:
yield1
except ValueError:
yield"捕获到ValueError"
g = gen()
print(next(g))
print(g.throw(ValueError))
# 3. close() - 优雅停止
defgen():
yield1
yield2
g = gen()
print(next(g))
g.close()