Python办公自动化工程师必会的指令
掌握这些办公自动化指令,让你的工作效率提升10倍
欢迎大家关注此公众号,后台点击按钮【免费资料】可免费获取【Python入门30节课】电子书
此外小庄推荐一本适合于新手\小白入手一本 Python基础书籍,欢迎大家订阅,也感谢大家支持,我才有更新的动力
前言
Python办公自动化工程师需要掌握处理Word、Excel、PPT、PDF等办公文档的技能。本文将系统性地介绍办公自动化工程师必须掌握的指令和库,帮助你实现办公文档的批量处理和自动化。
一、环境准备与基础指令
1.1 安装必要的库
pip install python-docx
pip install openpyxl
pip install xlrd
pip install python-pptx
pip install PyPDF2
pip install pdfplumber
pip install reportlab
pip install python-docx-template
pip install jinja2
pip install schedule
二、Word文档处理
2.1 创建Word文档
from docx import Document
from docx.shared import Inches, Pt, RGBColor
from docx.enum.text import WD_ALIGN_PARAGRAPH
# 创建文档
doc = Document()
# 添加标题
doc.add_heading('文档标题', 0)
doc.add_heading('一级标题', 1)
doc.add_heading('二级标题', 2)
# 添加段落
paragraph = doc.add_paragraph('这是一个段落。')
paragraph.add_run('这是追加的文字。').bold = True
# 设置段落格式
paragraph.alignment = WD_ALIGN_PARAGRAPH.CENTER # 居中对齐
# 添加列表
doc.add_paragraph('列表项1', style='List Bullet')
doc.add_paragraph('列表项2', style='List Bullet')
doc.add_paragraph('列表项3', style='List Bullet')
# 添加表格
table = doc.add_table(rows=3, cols=3)
for i inrange(3):
for j inrange(3):
table.cell(i, j).text = f'单元格{i+1}-{j+1}'
# 添加图片
doc.add_picture('image.png', width=Inches(4))
# 保存文档
doc.save('output.docx')
2.2 读取Word文档
from docx import Document
# 读取文档
doc = Document('input.docx')
# 读取所有段落
for paragraph in doc.paragraphs:
print(f'文本: {paragraph.text}')
print(f'样式: {paragraph.style.name}')
# 读取所有表格
for table in doc.tables:
for row in table.rows:
for cell in row.cells:
print(cell.text, end='\t')
print()
# 读取所有图片
for rel in doc.part.rels.values():
if"image"in rel.reltype:
print(f'图片: {rel.target_ref}')
2.3 修改Word文档
from docx import Document
from docx.shared import Pt, RGBColor
# 读取文档
doc = Document('input.docx')
# 修改段落文本
for paragraph in doc.paragraphs:
if'旧文本'in paragraph.text:
for run in paragraph.runs:
run.text = run.text.replace('旧文本', '新文本')
# 修改字体样式
for paragraph in doc.paragraphs:
for run in paragraph.runs:
run.font.size = Pt(12)
run.font.color.rgb = RGBColor(0, 0, 0)
# 添加新内容
doc.add_paragraph('这是新添加的内容')
# 保存修改
doc.save('modified.docx')
2.4 使用docxtpl模板
from docxtpl import DocxTemplate
# 加载模板
tpl = DocxTemplate('template.docx')
# 准备数据
context = {
'name': '张三',
'date': '2024-01-01',
'items': [
{'item': '项目1', 'amount': 1000},
{'item': '项目2', 'amount': 2000},
{'item': '项目3', 'amount': 3000},
]
}
# 渲染模板
tpl.render(context)
# 保存文档
tpl.save('output.docx')
三、Excel文档处理
3.1 创建Excel文档
from openpyxl import Workbook
from openpyxl.styles import Font, PatternFill, Alignment, Border, Side
# 创建工作簿
wb = Workbook()
ws = wb.active
ws.title = '数据表'
# 写入数据
ws['A1'] = '姓名'
ws['B1'] = '年龄'
ws['C1'] = '城市'
ws['A2'] = '张三'
ws['B2'] = 25
ws['C2'] = '北京'
# 批量写入
data = [
['李四', 30, '上海'],
['王五', 35, '广州'],
['赵六', 28, '深圳']
]
for row in data:
ws.append(row)
# 设置样式
header_font = Font(name='微软雅黑', size=12, bold=True, color='FFFFFF')
header_fill = PatternFill(start_color='4472C4', end_color='4472C4', fill_type='solid')
header_alignment = Alignment(horizontal='center', vertical='center')
for cell in ws[1]:
cell.font = header_font
cell.fill = header_fill
cell.alignment = header_alignment
# 设置列宽
ws.column_dimensions['A'].width = 15
ws.column_dimensions['B'].width = 10
ws.column_dimensions['C'].width = 15
# 保存
wb.save('output.xlsx')
3.2 读取Excel文档
from openpyxl import load_workbook
# 加载工作簿
wb = load_workbook('input.xlsx')
# 获取所有工作表名
print(wb.sheetnames)
# 获取活动工作表
ws = wb.active
# 读取单个单元格
print(ws['A1'].value)
# 读取指定单元格
print(ws.cell(row=1, column=1).value)
# 遍历所有数据
for row in ws.iter_rows(min_row=1, max_row=ws.max_row, values_only=True):
print(row)
# 遍历指定范围
for row in ws.iter_rows(min_row=1, max_row=5, min_col=1, max_col=3):
for cell in row:
print(cell.value, end='\t')
print()
# 获取工作表尺寸
print(f'行数: {ws.max_row}')
print(f'列数: {ws.max_column}')
3.3 使用pandas处理Excel
import pandas as pd
# 读取Excel
df = pd.read_excel('input.xlsx', sheet_name='Sheet1')
print(df.head())
# 读取多个工作表
all_sheets = pd.read_excel('input.xlsx', sheet_name=None)
for sheet_name, sheet_data in all_sheets.items():
print(f'工作表: {sheet_name}')
print(sheet_data.head())
# 写入Excel
df.to_excel('output.xlsx', index=False, sheet_name='数据')
# 写入多个工作表
with pd.ExcelWriter('output.xlsx', engine='openpyxl') as writer:
df1.to_excel(writer, sheet_name='Sheet1', index=False)
df2.to_excel(writer, sheet_name='Sheet2', index=False)
# 追加到现有工作表
with pd.ExcelWriter('output.xlsx', engine='openpyxl', mode='a') as writer:
df.to_excel(writer, sheet_name='NewSheet', index=False)
3.4 Excel公式和图表
from openpyxl import Workbook
from openpyxl.chart import BarChart, Reference
# 创建工作簿
wb = Workbook()
ws = wb.active
# 写入数据
data = [
['月份', '销售额', '成本'],
['1月', 1000, 600],
['2月', 1500, 900],
['3月', 2000, 1200],
['4月', 1800, 1000],
]
for row in data:
ws.append(row)
# 添加公式
ws['D1'] = '利润'
for row inrange(2, 6):
ws[f'D{row}'] = f'=B{row}-C{row}'
# 创建图表
chart = BarChart()
chart.title = '销售数据图表'
chart.x_axis.title = '月份'
chart.y_axis.title = '金额'
data_ref = Reference(ws, min_col=2, min_row=1, max_col=3, max_row=5)
cats_ref = Reference(ws, min_col=1, min_row=2, max_row=5)
chart.add_data(data_ref, titles_from_data=True)
chart.set_categories(cats_ref)
ws.add_chart(chart, 'F2')
wb.save('chart_example.xlsx')
四、PPT文档处理
4.1 创建PPT文档
from pptx import Presentation
from pptx.util import Inches, Pt
from pptx.dml.color import RGBColor
from pptx.enum.text import PP_ALIGN
# 创建演示文稿
prs = Presentation()
# 添加标题幻灯片
slide_layout = prs.slide_layouts[0] # 标题幻灯片
slide = prs.slides.add_slide(slide_layout)
title = slide.shapes.title
subtitle = slide.placeholders[1]
title.text = '演示文稿标题'
subtitle.text = '副标题'
# 添加内容幻灯片
slide_layout = prs.slide_layouts[1] # 标题和内容
slide = prs.slides.add_slide(slide_layout)
title = slide.shapes.title
title.text = '内容标题'
# 添加文本框
left = Inches(1)
top = Inches(2)
width = Inches(8)
height = Inches(1)
textbox = slide.shapes.add_textbox(left, top, width, height)
tf = textbox.text_frame
tf.text = '这是文本框内容'
# 添加项目符号列表
content = slide.placeholders[1]
tf = content.text_frame
tf.text = '第一项'
p = tf.add_paragraph()
p.text = '第二项'
p.level = 1
p = tf.add_paragraph()
p.text = '第三项'
p.level = 2
# 添加图片
slide = prs.slides.add_slide(prs.slide_layouts[5]) # 空白幻灯片
slide.shapes.add_picture('image.png', Inches(1), Inches(1), width=Inches(5))
# 添加表格
slide = prs.slides.add_slide(prs.slide_layouts[5])
rows, cols = 3, 3
left = Inches(1)
top = Inches(2)
width = Inches(8)
height = Inches(2)
table = slide.shapes.add_table(rows, cols, left, top, width, height).table
for i inrange(rows):
for j inrange(cols):
table.cell(i, j).text = f'单元格{i+1}-{j+1}'
# 保存
prs.save('output.pptx')
4.2 读取PPT文档
from pptx import Presentation
# 加载演示文稿
prs = Presentation('input.pptx')
# 遍历所有幻灯片
for slide in prs.slides:
print(f'幻灯片编号: {slide.slide_id}')
# 遍历所有形状
for shape in slide.shapes:
ifhasattr(shape, 'text'):
print(f'文本: {shape.text}')
if shape.has_table:
table = shape.table
for row in table.rows:
for cell in row.cells:
print(cell.text, end='\t')
print()
五、PDF文档处理
5.1 读取PDF文档
import PyPDF2
# 读取PDF
withopen('input.pdf', 'rb') as f:
reader = PyPDF2.PdfReader(f)
# 获取页数
print(f'页数: {len(reader.pages)}')
# 读取第一页
page = reader.pages[0]
print(f'文本: {page.extract_text()}')
# 读取所有页面
for page in reader.pages:
print(page.extract_text())
5.2 使用pdfplumber读取PDF
import pdfplumber
# 读取PDF
with pdfplumber.open('input.pdf') as pdf:
# 获取页数
print(f'页数: {len(pdf.pages)}')
# 读取第一页
page = pdf.pages[0]
print(f'文本: {page.extract_text()}')
# 提取表格
tables = page.extract_tables()
for table in tables:
for row in table:
print(row)
# 读取所有页面
for page in pdf.pages:
text = page.extract_text()
if text:
print(text)
5.3 创建PDF文档
from reportlab.lib.pagesizes import letter, A4
from reportlab.pdfgen import canvas
from reportlab.lib.units import inch
from reportlab.pdfbase import pdfmetrics
from reportlab.pdfbase.ttfonts import TTFont
# 注册中文字体
pdfmetrics.registerFont(TTFont('SimHei', 'SimHei.ttf'))
# 创建PDF
c = canvas.Canvas('output.pdf', pagesize=A4)
# 设置字体
c.setFont('SimHei', 24)
# 写入文本
c.drawString(100, 750, 'PDF文档标题')
# 设置小字体
c.setFont('SimHei', 12)
c.drawString(100, 700, '这是正文内容')
# 画线
c.line(100, 680, 500, 680)
# 画矩形
c.rect(100, 600, 200, 100)
# 添加图片
c.drawImage('image.png', 100, 400, width=4*inch, height=3*inch)
# 保存
c.save()
5.4 PDF合并与拆分
import PyPDF2
# 合并PDF
defmerge_pdfs(pdf_list, output):
"""合并多个PDF文件"""
merger = PdfMerger()
for pdf in pdf_list:
merger.append(pdf)
merger.write(output)
merger.close()
# 拆分PDF
defsplit_pdf(input_pdf, output_dir):
"""拆分PDF为单页文件"""
reader = PdfReader(input_pdf)
for i, page inenumerate(reader.pages):
writer = PdfWriter()
writer.add_page(page)
output_path = f'{output_dir}/page_{i+1}.pdf'
withopen(output_path, 'wb') as f:
writer.write(f)
# 使用示例
from PyPDF2 import PdfMerger, PdfReader, PdfWriter
# 合并
pdfs = ['file1.pdf', 'file2.pdf', 'file3.pdf']
merge_pdfs(pdfs, 'merged.pdf')
# 拆分
split_pdf('input.pdf', './pages')
5.5 PDF添加水印
from PyPDF2 import PdfReader, PdfWriter
defadd_watermark(input_pdf, watermark_pdf, output_pdf):
"""添加水印到PDF"""
reader = PdfReader(input_pdf)
watermark_reader = PdfReader(watermark_pdf)
watermark_page = watermark_reader.pages[0]
writer = PdfWriter()
for page in reader.pages:
page.merge_page(watermark_page)
writer.add_page(page)
withopen(output_pdf, 'wb') as f:
writer.write(f)
# 使用示例
add_watermark('input.pdf', 'watermark.pdf', 'output.pdf')
六、邮件处理
6.1 发送邮件
import smtplib
from email.mime.text import MIMEText
from email.mime.multipart import MIMEMultipart
from email.mime.base import MIMEBase
from email import encoders
defsend_email(sender, receiver, subject, body, attachments=None):
"""发送邮件"""
msg = MIMEMultipart()
msg['From'] = sender
msg['To'] = receiver
msg['Subject'] = subject
# 添加正文
msg.attach(MIMEText(body, 'html'))
# 添加附件
if attachments:
for file_path in attachments:
withopen(file_path, 'rb') as f:
part = MIMEBase('application', 'octet-stream')
part.set_payload(f.read())
encoders.encode_base64(part)
part.add_header('Content-Disposition', f'attachment; filename="{file_path}"')
msg.attach(part)
# 发送邮件
with smtplib.SMTP('smtp.example.com', 587) as server:
server.starttls()
server.login(sender, 'password')
server.send_message(msg)
# 使用示例
send_email(
sender='sender@example.com',
receiver='receiver@example.com',
subject='测试邮件',
body='<h1>这是一封测试邮件</h1><p>包含附件</p>',
attachments=['file1.pdf', 'file2.xlsx']
)
6.2 读取邮件
import imaplib
import email
from email.header import decode_header
defread_emails(host, username, password, folder='INBOX'):
"""读取邮件"""
# 连接邮箱
mail = imaplib.IMAP4_SSL(host)
mail.login(username, password)
# 选择文件夹
mail.select(folder)
# 搜索邮件
status, messages = mail.search(None, 'ALL')
# 获取邮件列表
email_ids = messages[0].split()
emails = []
for email_id in email_ids[-10:]: # 获取最近10封
status, msg_data = mail.fetch(email_id, '(RFC822)')
for response_part in msg_data:
ifisinstance(response_part, tuple):
msg = email.message_from_bytes(response_part[1])
# 获取主题
subject = decode_header(msg['Subject'])[0][0]
ifisinstance(subject, bytes):
subject = subject.decode()
# 获取发件人
from_addr = msg['From']
# 获取日期
date = msg['Date']
emails.append({
'subject': subject,
'from': from_addr,
'date': date
})
mail.close()
mail.logout()
return emails
# 使用示例
emails = read_emails('imap.example.com', 'user@example.com', 'password')
for email_info in emails:
print(f"主题: {email_info['subject']}")
print(f"发件人: {email_info['from']}")
print(f"日期: {email_info['date']}")
print('---')
七、文件批量处理
7.1 文件批量重命名
import os
import glob
defbatch_rename(directory, old_pattern, new_pattern):
"""批量重命名文件"""
files = glob.glob(os.path.join(directory, old_pattern))
for file_path in files:
dir_name = os.path.dirname(file_path)
file_name = os.path.basename(file_path)
new_name = file_name.replace(old_pattern.replace('*', ''),
new_pattern.replace('*', ''))
new_path = os.path.join(dir_name, new_name)
os.rename(file_path, new_path)
print(f'重命名: {file_name} -> {new_name}')
# 使用示例
batch_rename('./files', '*.txt', '*.md')
7.2 文件批量转换
import os
from docx import Document
from PyPDF2 import PdfReader
defdocx_to_txt(input_file, output_file):
"""Word转TXT"""
doc = Document(input_file)
withopen(output_file, 'w', encoding='utf-8') as f:
for paragraph in doc.paragraphs:
f.write(paragraph.text + '\n')
defpdf_to_txt(input_file, output_file):
"""PDF转TXT"""
reader = PdfReader(input_file)
withopen(output_file, 'w', encoding='utf-8') as f:
for page in reader.pages:
f.write(page.extract_text() + '\n')
# 批量转换
for file in os.listdir('./documents'):
if file.endswith('.docx'):
input_path = os.path.join('./documents', file)
output_path = os.path.join('./output', file.replace('.docx', '.txt'))
docx_to_txt(input_path, output_path)
7.3 文件批量压缩
import zipfile
import os
defcompress_files(file_list, output_zip):
"""压缩多个文件"""
with zipfile.ZipFile(output_zip, 'w', zipfile.ZIP_DEFLATED) as zipf:
for file_path in file_list:
if os.path.isfile(file_path):
zipf.write(file_path, os.path.basename(file_path))
elif os.path.isdir(file_path):
for root, dirs, files in os.walk(file_path):
for file in files:
file_path_full = os.path.join(root, file)
arcname = os.path.relpath(file_path_full, os.path.dirname(file_path))
zipf.write(file_path_full, arcname)
defextract_zip(zip_path, extract_dir):
"""解压文件"""
with zipfile.ZipFile(zip_path, 'r') as zipf:
zipf.extractall(extract_dir)
# 使用示例
files = ['file1.txt', 'file2.docx', 'images/']
compress_files(files, 'archive.zip')
extract_zip('archive.zip', './extracted')
八、定时任务自动化
8.1 定时执行任务
import schedule
import time
from datetime import datetime
defdaily_report():
"""每日报告任务"""
print(f'[{datetime.now()}] 生成每日报告...')
# 执行报告生成逻辑
defweekly_backup():
"""每周备份任务"""
print(f'[{datetime.now()}] 执行每周备份...')
# 执行备份逻辑
# 设置定时任务
schedule.every().day.at("09:00").do(daily_report)
schedule.every().monday.at("00:00").do(weekly_backup)
# 运行定时任务
whileTrue:
schedule.run_pending()
time.sleep(60)
8.2 文件监控自动化
import os
import time
from watchdog.observers import Observer
from watchdog.events import FileSystemEventHandler
classFileHandler(FileSystemEventHandler):
defon_created(self, event):
ifnot event.is_directory:
print(f'新文件创建: {event.src_path}')
self.process_file(event.src_path)
defprocess_file(self, file_path):
"""处理新文件"""
if file_path.endswith('.xlsx'):
print(f'处理Excel文件: {file_path}')
# 执行Excel处理逻辑
elif file_path.endswith('.pdf'):
print(f'处理PDF文件: {file_path}')
# 执行PDF处理逻辑
# 设置监控
observer = Observer()
observer.schedule(FileHandler(), path='./watch_folder', recursive=False)
observer.start()
try:
whileTrue:
time.sleep(1)
except KeyboardInterrupt:
observer.stop()
observer.join()
九、数据报表自动化
9.1 自动生成Excel报表
import pandas as pd
from datetime import datetime, timedelta
defgenerate_sales_report(data_file, output_file):
"""生成销售报表"""
# 读取数据
df = pd.read_excel(data_file)
# 数据处理
df['日期'] = pd.to_datetime(df['日期'])
# 按月汇总
monthly_summary = df.groupby(df['日期'].dt.month).agg({
'销售额': 'sum',
'订单数': 'count',
'客户数': 'nunique'
}).reset_index()
# 写入Excel
with pd.ExcelWriter(output_file, engine='openpyxl') as writer:
# 原始数据
df.to_excel(writer, sheet_name='原始数据', index=False)
# 月度汇总
monthly_summary.to_excel(writer, sheet_name='月度汇总', index=False)
# 图表
workbook = writer.book
worksheet = writer.sheets['月度汇总']
# 创建图表
chart = workbook.add_chart({'type': 'column'})
chart.add_series({
'name': '销售额',
'categories': ['月度汇总', 1, 0, len(monthly_summary), 0],
'values': ['月度汇总', 1, 1, len(monthly_summary), 1],
})
worksheet.insert_chart('E2', chart)
# 使用示例
generate_sales_report('sales_data.xlsx', 'sales_report.xlsx')
9.2 自动生成Word报告
from docx import Document
from docx.shared import Inches
import pandas as pd
defgenerate_word_report(data_file, template_file, output_file):
"""生成Word报告"""
# 读取数据
df = pd.read_excel(data_file)
# 创建文档
doc = Document()
# 添加标题
doc.add_heading('数据分析报告', 0)
# 添加摘要
doc.add_heading('摘要', 1)
summary = f"本报告基于{len(df)}条数据记录进行分析。"
doc.add_paragraph(summary)
# 添加数据分析
doc.add_heading('数据分析', 1)
# 添加表格
table = doc.add_table(rows=1, cols=len(df.columns))
table.style = 'Table Grid'
# 写入表头
for i, column inenumerate(df.columns):
table.rows[0].cells[i].text = column
# 写入数据
for index, row in df.head(10).iterrows():
cells = table.add_row().cells
for i, value inenumerate(row):
cells[i].text = str(value)
# 添加结论
doc.add_heading('结论', 1)
doc.add_paragraph('基于以上数据分析,得出以下结论...')
# 保存
doc.save(output_file)
# 使用示例
generate_word_report('data.xlsx', 'template.docx', 'report.docx')
总结
作为Python办公自动化工程师,掌握这些指令是核心技能:
- 2. Excel处理 - openpyxl/pandas库
- 4. PDF处理 - PyPDF2/pdfplumber库
- 5. 邮件处理 - smtplib/imaplib库
掌握这些工具,你就能大幅提升办公效率。
关注我,获取更多Python技术干货!