当前位置：首页>python>告别手动保存,Python批量自动提取Word表格里的图片,让你的效率直接拉满!

告别手动保存,Python批量自动提取Word表格里的图片,让你的效率直接拉满!

2026-06-24 19:02:20

前期分享的《办公神器！用Python搞定带照片的登记表，100份文档只用了1秒钟（附代码）》介绍了如何批量把照片插入到表格的指定位置。后来遇到了一个需要反向操作的问题。

问题是这样的：一文件夹内有若干份Word简历，其中姓名在表格的第一行第2列，照片在第1行第7列。需求是将图片按指定方式保存到指定文件夹内。

一般情况下的处理方式就是：打开文档——找到表格——右键点击图片——另存为——重命名。如果只有少许照片，也很简单，但如果有几十张甚至上百张，这种操作就会变得十分枯燥。

下面就来分享一个Python的解决方案。它不仅能批量处理，还能精准定位特定单元格，自动读取姓名并按姓名重命名图片。告别加班，从这段代码开始。

下面是完整代码：

from pathlib import Pathfrom docx import Documentimport reword_path = Path(r"D:\简历")image_out = word_path.joinpath("提取的图片")image_out.mkdir(parents=True, exist_ok=True)word_files = list(word_path.glob("*.docx"))if not word_files:    print("⚠️ 未找到任何 .docx 文件，请检查路径。")else:    print(f"🔍 发现 {len(word_files)} 个文件，开始处理...")    for word_file in word_files:        try:            doc = Document(word_file)            # 添加一个计数器，防止同一个文档内图片重名被覆盖            file_img_count = 0            for table in doc.tables:                # 安全性检查：确保表格有足够的行和列，防止报错                if len(table.rows) == 0 or len(table.columns) < 7:                    continue                try:                    # 获取姓名 (第1行，第2列，索引从0开始)                    name = table.cell(0, 1).text.strip()                    if not name:  # 如果名字为空，跳过                        continue                    # 获取目标单元格 (第1行、第7列)                    target_cell = table.cell(0, 6)                    # 读取XML内容                    cell_xml = target_cell._element.xml                    # 使用正则查找所有 r:embed 属性 (兼容不同引号)                    img_ids = re.findall(r'r:embed=["\']([^"\']+)["\']', cell_xml)                    for rId in img_ids:                        # 检查 rId 是否存在于关联部件中                        if rId in doc.part.related_parts:                            img_part = doc.part.related_parts[rId]                            # 严格判断是否为图片类型                            if "image" in img_part.content_type:                                file_img_count += 1                                save_path = image_out / f"{name}_{file_img_count}.png"                                save_path.write_bytes(img_part.blob)                                print(f"✅ [{word_file.name}] 提取成功：{save_path.name}")                except Exception as e:                    # 捕获单个表格处理的错误，不影响后续表格                    print(f"⚠️ 文件 {word_file.name} 处理失败: {e}")                    continue        except Exception as e:            print(f"❌ 无法打开文件 {word_file.name}: {e}")print("\n==== 全部提取完成 ====")

代码代码，结果立即呈现在眼前。

下面，再来看看这段代码的运行逻辑：

1、设置路径，获取所有docx文档

word_path = Path(r"D:\简历")image_out = word_path.joinpath("提取的图片")image_out.mkdir(parents=True, exist_ok=True)word_files = list(word_path.glob("*.docx"))

word_path.joinpath("提取的图片")：创建文件夹路径，用来保存图片。
image_out.mkdir(parents=True, exist_ok=True)：创建文件夹。
list(word_path.glob("*.docx"))：获取该目录下所有的docx文件，并将结果转换为列表。

2、遍历所有docx文档及文档中的所有表格

for word_file in word_files:    doc = Document(word_file)    # 添加一个计数器，防止同一个文档内图片重名被覆盖    file_img_count = 0    for table in doc.tables:    # 安全性检查：确保表格有足够的行和列，防止报错        if len(table.rows) == 0 or len(table.columns) < 7:            continue

for word_file in word_files：遍历获取到的每个文档。
doc = Document(word_file)：打开Word文档。
for table in doc.tables：遍历文档中所有的表格。
if len(table.rows) == 0 or len(table.columns) < 7：过滤掉列数不足7列的表格。

3、定位关键数据

name = table.cell(0, 1).text.strip()target_cell = table.cell(0, 6)

table.cell(0, 1).text.strip()：获取姓名，姓名在表格的第1行第2列（索引为0，1）
table.cell(0, 6)：图片位于表格的第1行第7列。

4、xml解析和正则提取

cell_xml = target_cell._element.xmlimg_ids = re.findall(r'r:embed=["\']([^"\']+)["\']', cell_xml)

target_cell._element.xml：直接读取目标单元格target_cell底层的xml字符串（._element.xml）
re.findall(r'r:embed=["\']([^"\']+)["\']', cell_xml)：使用正则表达式在xml字符串中查找所有的指定匹配内容。在这里，指定内容中存储了图片的关联ID。

5、提取图片并保存

if rId in doc.part.related_parts:    img_part = doc.part.related_parts[rId]# 严格判断是否为图片类型if"image"in img_part.content_type:    file_img_count += 1    save_path = image_out / f"{name}_{file_img_count}.png"    save_path.write_bytes(img_part.blob)

doc.part.related_parts[rId]：利用获取的rId，从关联的部件中找出对应的图片对象。doc.part.related_parts为文档各部分的关系字典，键为rId。
img_part.content_type：如image/jpeg。
image_out / f"{name}_{file_img_count}.png"：构建文件名称。
save_path.write_bytes(img_part.blob)：保存图片。img_part.blob为图片的原始二进制数据。

这段代码是专门为固定表格设计的。如果你的文档结构更为复杂，还可以进一步调整循环逻辑，如遍历所有的table.rows等。

如果你再遇到类似的问题时，不妨试试这段代码。它会显著提升你的工作效率，帮你节省出大量的时间！也欢迎点赞、转发+收藏！！！

本文来自网友投稿或网络内容，如有侵犯您的权益请联系我们删除，联系邮箱：wyl860211@qq.com 。

告别手动保存,Python批量自动提取Word表格里的图片,让你的效率直接拉满!

最新文章

热门文章

随机文章

告别手动保存,Python批量自动提取Word表格里的图片,让你的效率直接拉满!

linux史诗级安全漏洞copyfail cve-2026-31431,及修复方案

一份靠谱的从零开始学习Linux文件系统的学习路径

最新文章

热门文章

随机文章