当前位置：首页>python>Python 零基础100天—Day26 正则表达式基础

Python 零基础100天—Day26 正则表达式基础

2026-07-01 19:41:04

🐍 正则表达式基础 — 文本处理的瑞士军刀

🕐 预计用时：2-3 小时｜ 🎯 目标：掌握 re 模块、元字符、match/search/findall

📖 今日目录

什么是正则表达式？
re 模块入门
基础元字符
字符类 []
量词
位置锚点
常用函数：match/search/findall
实战练习
今日小结

1. 什么是正则表达式？

正则表达式（Regular Expression，简称 regex）是一种"模式匹配语言"——用特殊符号描述文本模式，然后从海量文本中精准提取你想要的内容。

# 没有正则：手动逐字符判断
text = "我的手机号是13800138000，他的号码是13912345678"
# 想提取手机号？手动写代码逐字符判断？太累了！

# 有正则：一行搞定
import re
phones = re.findall(r"1[3-9]\d{9}", text)
print(phones)  # ['13800138000', '13912345678']

场景	示例
验证手机号	`1[3-9]\d{9}`
验证邮箱	`\w+@\w+\.\w+`
提取日期	`\d{4}-\d{2}-\d{2}`
替换敏感词	`re.sub("敏感词", "***", text)`
分割字符串	`re.split(r"[,;\s]", text)`

2. re 模块入门

import re

# 最简单的正则：普通字符匹配自己
result = re.search(r"hello", "say hello to python")
print(result)        # <re.Match object; span=(4, 9), match='hello'>
print(result.group()) # hello
print(result.start()) # 4（匹配的起始位置）
print(result.end())   # 9（匹配的结束位置）

💡 r 前缀是什么？
r"hello" 是"原始字符串"——反斜杠不被转义。
正则里大量用 \，不加 r 的话 \d 要写成 \\d，太丑了。
写正则永远加 r 前缀！

3. 基础元字符

元字符	含义	示例	匹配
`.`	任意一个字符（除换行）	`a.c`	"abc", "a1c", "a@c"
`\d`	数字 [0-9]	`a\db`	"a1b", "a9b"
`\D`	非数字 [^0-9]	`a\Db`	"a_b", "a b"
`\w`	字母/数字/下划线	`\w+`	"hello_123"
`\W`	非字母数字下划线	`\W+`	"@#$"
`\s`	空白字符	`a\sb`	"a b", "a\tb"
`\S`	非空白字符	`a\Sb`	"a1b", "a_b"
`\\`	转义特殊字符	`a\.b`	"a.b"（不是任意字符）

import re

# . 匹配任意一个字符
print(re.findall(r"a.c", "abc a1c a@c a\nc"))  # ['abc', 'a1c', 'a@c']（不含换行）

# \d 匹配数字
print(re.findall(r"\d+", "今天是2024年1月15日"))  # ['2024', '1', '15']

# \w 匹配字母数字下划线
print(re.findall(r"\w+", "hello_world 123 @#$"))  # ['hello_world', '123']

# \s 匹配空白
print(re.findall(r"\s+", "a  b\tc\nd"))  # ['  ', '\t', '\n']

# 转义特殊字符
print(re.findall(r"a\.b", "a.b acb a1b"))  # ['a.b']（只匹配字面意义的点）

4. 字符类 []

方括号定义"可选字符集"——匹配其中任意一个字符。

# [abc] — 匹配 a 或 b 或 c
print(re.findall(r"[abc]", "apple banana cherry"))  # ['a', 'b', 'a', 'a', 'a', 'c']

# [a-z] — 匹配 a 到 z 的任意小写字母
print(re.findall(r"[a-z]+", "Hello World 123"))  # ['ello', 'orld']

# [A-Za-z] — 匹配所有字母
print(re.findall(r"[A-Za-z]+", "Hello World 123"))  # ['Hello', 'World']

# [0-9] — 等价于 \d
print(re.findall(r"[0-9]+", "abc123def456"))  # ['123', '456']

# [^...] — 取反（不匹配这些字符）
print(re.findall(r"[^0-9]+", "abc123def456"))  # ['abc', 'def']

# 常用字符类简写
# [0-9]  → \d
# [a-zA-Z0-9_] → \w
# [ \t\n\r\f\v] → \s

# 实用示例

# 提取中文
text = "Hello 你好 World 世界 123"
chinese = re.findall(r"[\u4e00-\u9fa5]+", text)
print(chinese)  # ['你好', '世界']

# 匹配手机号前缀（1开头，第二位是3-9）
print(re.findall(r"1[3-9]\d{9}", "13800138000 12345 15912345678"))
# ['13800138000', '15912345678']

# 匹配十六进制颜色值
text = "背景色是 #FF5733，前景色是 #00AACC"
colors = re.findall(r"#[0-9A-Fa-f]{6}", text)
print(colors)  # ['#FF5733', '#00AACC']

5. 量词

量词控制前面的元素重复几次。

量词	含义	示例	匹配
`*`	0 次或多次	`ab*c`	"ac", "abc", "abbc"
`+`	1 次或多次	`ab+c`	"abc", "abbc"（不含"ac"）
`?`	0 次或 1 次	`ab?c`	"ac", "abc"（不含"abbc"）
`{n}`	恰好 n 次	`a{3}`	"aaa"
`{n,}`	至少 n 次	`a{2,}`	"aa", "aaa", "aaaa"...
`{n,m}`	n 到 m 次	`a{2,4}`	"aa", "aaa", "aaaa"

import re

# * 零次或多次
print(re.findall(r"ab*c", "ac abc abbc abbbc"))  # ['ac', 'abc', 'abbc', 'abbbc']

# + 一次或多次
print(re.findall(r"ab+c", "ac abc abbc abbbc"))  # ['abc', 'abbc', 'abbbc']

# ? 零次或一次
print(re.findall(r"colou?r", "color colour"))  # ['color', 'colour']

# {n} 恰好 n 次
print(re.findall(r"\d{3}", "12 345 6789 12345"))  # ['345', '678', '123']

# {n,m} n 到 m 次
print(re.findall(r"\d{2,4}", "1 12 123 1234 12345"))
# ['12', '123', '1234', '1234', '5'] → 注意 12345 被拆成 1234 和 5

⚡ 贪婪 vs 非贪婪

# 默认是贪婪模式（尽可能多匹配）
text = "<div>hello</div><div>world</div>"

# 贪婪：尽可能多匹配
greedy = re.findall(r"<div>.*</div>", text)
print(greedy)  # ['<div>hello</div><div>world</div>']（匹配了全部）

# 非贪婪：加 ? 尽可能少匹配
lazy = re.findall(r"<div>.*?</div>", text)
print(lazy)    # ['<div>hello</div>', '<div>world</div>']（分别匹配）

💡 贪婪 vs 非贪婪：
贪婪：.*、.+、.{n,m} — 尽可能多匹配
非贪婪：.*?、.+?、.{n,m}? — 加个 ? 就变非贪婪
提取 HTML 标签内容时，几乎 always 用非贪婪！

6. 位置锚点

# ^ — 字符串开头
print(re.findall(r"^hello", "hello world"))   # ['hello']
print(re.findall(r"^world", "hello world"))   # []（不在开头）

# $ — 字符串结尾
print(re.findall(r"world$", "hello world"))   # ['world']
print(re.findall(r"hello$", "hello world"))   # []（不在结尾）

# \b — 单词边界
print(re.findall(r"\bcat\b", "cat category scatter"))
# ['cat']（只匹配独立的 cat，不匹配 category 和 scatter）

print(re.findall(r"\bpython\b", "python3 python snake"))
# ['python']（只匹配独立的 python）

# 实用：验证整个字符串是否匹配
# 完全匹配（^开头 $结尾）
print(bool(re.match(r"^\d{6}$", "123456")))   # True（6位数字）
print(bool(re.match(r"^\d{6}$", "12345")))    # False（5位）
print(bool(re.match(r"^\d{6}$", "1234567")))  # False（7位）

7. 常用函数

🔍 re.match — 从开头匹配

# match 只从字符串开头匹配
result = re.match(r"\d+", "123abc")
print(result.group())  # 123

result = re.match(r"\d+", "abc123")
print(result)  # None（开头不是数字）

🔍 re.search — 搜索第一个匹配

# search 搜索整个字符串，返回第一个匹配
result = re.search(r"\d+", "abc123def456")
print(result.group())  # 123（第一个匹配）

# 没找到返回 None
result = re.search(r"\d+", "no numbers here")
print(result)  # None

🔍 re.findall — 找到所有匹配

# findall 返回所有匹配的列表
result = re.findall(r"\d+", "abc123def456ghi789")
print(result)  # ['123', '456', '789']

# 提取所有邮箱
text = "联系 zhang@test.com 或 li@example.org"
emails = re.findall(r"\w+@\w+\.\w+", text)
print(emails)  # ['zhang@test.com', 'li@example.org']

🔍 re.finditer — 返回迭代器

# finditer 返回 Match 对象的迭代器（大文本更高效）
for match in re.finditer(r"\d+", "abc123def456"):
    print(f"找到 '{match.group()}' 在位置 {match.start()}-{match.end()}")
# 找到 '123' 在位置 3-6
# 找到 '456' 在位置 9-12

🔄 re.sub — 替换

# sub 替换匹配的内容
result = re.sub(r"\d+", "***", "电话13800138000，邮编100000")
print(result)  # 电话***，邮编***

# 用函数替换
def double_number(match):
    return str(int(match.group()) * 2)

result = re.sub(r"\d+", double_number, "a1 b2 c3")
print(result)  # 'a2 b4 c6'

✂️ re.split — 分割

# 按多种分隔符分割
result = re.split(r"[,;\s]+", "apple, banana; cherry  date")
print(result)  # ['apple', 'banana', 'cherry', 'date']

📋 re 完整函数速查表

函数	作用	返回值
`re.match(p, s)`	从开头匹配	Match 或 None
`re.search(p, s)`	搜索第一个	Match 或 None
`re.findall(p, s)`	找所有	字符串列表
`re.finditer(p, s)`	找所有（迭代器）	Match 迭代器
`re.sub(p, r, s)`	替换	新字符串
`re.split(p, s)`	分割	字符串列表
`re.compile(p)`	编译正则	Pattern 对象

8. 实战练习

🎯 练习 1：手机号/邮箱/身份证验证器

import re

def validate_phone(phone):
    """验证手机号"""
    pattern = r"^1[3-9]\d{9}$"
    return bool(re.match(pattern, phone))

def validate_email(email):
    """验证邮箱"""
    pattern = r"^[a-zA-Z0-9._%+-]+@[a-zA-Z0-9.-]+\.[a-zA-Z]{2,}$"
    return bool(re.match(pattern, email))

def validate_id_card(id_card):
    """验证身份证号（18位）"""
    pattern = r"^[1-9]\d{5}(19|20)\d{2}(0[1-9]|1[0-2])(0[1-9]|[12]\d|3[01])\d{3}[\dXx]$"
    return bool(re.match(pattern, id_card))

# 测试
tests = [
    ("13800138000", "手机号", validate_phone),
    ("12345678901", "手机号", validate_phone),
    ("zhang@test.com", "邮箱", validate_email),
    ("invalid@", "邮箱", validate_email),
    ("110101199003076531", "身份证", validate_id_card),
]

for value, label, func in tests:
    status = "✅" if func(value) else "❌"
    print(f"  {status} {label}: {value}")

🎯 练习 2：文本信息提取器

import re

def extract_info(text):
    """从文本中提取各种信息"""
    info = {}

    # 手机号
    info["phones"] = re.findall(r"1[3-9]\d{9}", text)

    # 邮箱
    info["emails"] = re.findall(r"\w+@\w+\.\w+", text)

    # 日期
    info["dates"] = re.findall(r"\d{4}[-/]\d{1,2}[-/]\d{1,2}", text)

    # URL
    info["urls"] = re.findall(r"https?://\S+", text)

    # IP 地址
    info["ips"] = re.findall(r"\b(?:\d{1,3}\.){3}\d{1,3}\b", text)

    # 中文字符
    info["chinese"] = re.findall(r"[\u4e00-\u9fa5]+", text)

    # 数字
    info["numbers"] = re.findall(r"\b\d+\.?\d*\b", text)

    return info

# 测试
text = """
联系张三：手机13800138000，邮箱zhangsan@company.com
访问 https://www.example.com 获取更多信息
服务器地址：192.168.1.100
日期：2024-01-15，价格 99.5 元
"""

info = extract_info(text)
for key, values in info.items():
    if values:
        print(f"  {key}: {values}")

🎯 练习 3：日志解析器

import re
from collections import Counter

def parse_log(log_text):
    """解析日志并统计"""
    # 匹配日志格式：[时间] [级别] 消息
    pattern = r"\[(\d{4}-\d{2}-\d{2} \d{2}:\d{2}:\d{2})\] \[(\w+)\] (.+)"

    records = []
    for match in re.finditer(pattern, log_text):
        records.append({
            "time": match.group(1),
            "level": match.group(2),
            "message": match.group(3),
        })

    # 统计各级别数量
    level_count = Counter(r["level"] for r in records)

    # 提取所有 IP 地址
    ips = re.findall(r"\b(?:\d{1,3}\.){3}\d{1,3}\b", log_text)

    # 提取所有错误消息
    errors = [r["message"] for r in records if r["level"] == "ERROR"]

    return {
        "records": records,
        "level_count": level_count,
        "ips": list(set(ips)),
        "errors": errors,
    }

log_text = """
[2024-01-15 08:30:15] [INFO] 用户登录 192.168.1.100
[2024-01-15 08:31:22] [INFO] 页面访问 /home
[2024-01-15 08:35:45] [ERROR] 数据库连接失败 10.0.0.50
[2024-01-15 09:00:12] [WARNING] 磁盘使用率 85%
[2024-01-15 09:15:33] [ERROR] API 超时 /api/users
[2024-01-15 10:00:00] [INFO] 定时任务完成
"""

result = parse_log(log_text)
print(f"📊 日志解析结果：")
print(f"  总记录: {len(result['records'])}")
print(f"  级别统计: {dict(result['level_count'])}")
print(f"  IP地址: {result['ips']}")
print(f"  错误消息: {result['errors']}")

9. 今日小结

知识点	核心内容
元字符	`.` 任意 `\d`数字 `\w`字母 `\s`空白
字符类	`[abc]` 选一 `[a-z]`范围 `[^abc]`取反
量词	`*` 0+ `+`1+ `?`0/1 `{n,m}`范围
贪婪/非贪婪	`.` 贪婪 vs `.?`非贪婪
锚点	`^` 开头 `$`结尾 `\b`单词边界
match	从开头匹配
search	搜索第一个
findall	找所有（列表）
sub	替换
split	分割

🧠 记忆口诀：
反斜杠 d 是数字，w 字母 s 空白。
大写取反 D 非数，W 非字母 S 非空。
星号零或多，加号一或多。
问号零或一，贪婪加问号。
match 开头找，search 全文搜。
findall 拿全部，sub 来替换。

🔮 预告： Day 27 正则进阶 — 分组、贪婪/非贪婪深入、编译正则、实际应用（邮箱/手机验证）。正则的高级玩法！

轻松时刻：

请在微信客户端打开

本文来自网友投稿或网络内容，如有侵犯您的权益请联系我们删除，联系邮箱：wyl860211@qq.com 。

5. 量词

⚡ 贪婪 vs 非贪婪

6. 位置锚点

7. 常用函数

🔍 re.match — 从开头匹配

🔍 re.search — 搜索第一个匹配

🔍 re.findall — 找到所有匹配

🔍 re.finditer — 返回迭代器

🔄 re.sub — 替换

✂️ re.split — 分割

📋 re 完整函数速查表

8. 实战练习

🎯 练习 1：手机号/邮箱/身份证验证器

🎯 练习 2：文本信息提取器

🎯 练习 3：日志解析器

9. 今日小结

Python 零基础100天—Day26 正则表达式基础

🐍 正则表达式基础 — 文本处理的瑞士军刀

📖 今日目录

1. 什么是正则表达式？

2. re 模块入门

3. 基础元字符

4. 字符类 []

最新文章

热门文章

随机文章

Python 零基础100天—Day26 正则表达式基础

🐍 正则表达式基础 — 文本处理的瑞士军刀

📖 今日目录

1. 什么是正则表达式？

2. re 模块入门

3. 基础元字符

4. 字符类 []

5. 量词

⚡ 贪婪 vs 非贪婪

6. 位置锚点

7. 常用函数

🔍 re.match — 从开头匹配

🔍 re.search — 搜索第一个匹配

🔍 re.findall — 找到所有匹配

🔍 re.finditer — 返回迭代器

🔄 re.sub — 替换

✂️ re.split — 分割

📋 re 完整函数速查表

8. 实战练习

🎯 练习 1：手机号/邮箱/身份证验证器

🎯 练习 2：文本信息提取器

🎯 练习 3：日志解析器

9. 今日小结

31.Python实战__自定义类的导入和使用2

如何使用pyinstaller对python程序进行打包

最新文章

热门文章

随机文章