当前位置：首页>python>Python学习日记 | 正则表达式(re)/ 2

Python学习日记 | 正则表达式(re)/ 2

2026-07-04 05:34:18

六、分组与捕获

1. 基本分组

（1）() - 分组捕获

match = re.search(r'(\d{4})-(\d{2})-(\d{2})', '2024-01-15')

if match:

print(match.group(0)) # '2024-01-15' (完整匹配)

print(match.group(1)) # '2024' (第一组)

print(match.group(2)) # '01'

print(match.group(3)) # '15'

print(match.groups()) # ('2024', '01', '15')

（2）命名分组 (?P<name>...)

match = re.search(r'(?P<year>\d{4})-(?P<month>\d{2})', '2024-01')

print(match.group('year')) # '2024'

print(match.group('month')) # '01'

2. 非捕获分组

（1）(?:...) - 非捕获分组

# 想要分组但不捕获时使用

text = 'color colour'

match = re.search(r'col(?:ou)?r', text)

print(match.group()) # 'colour'

print(match.groups()) # () 空元组，因为没有捕获

3. 反向引用

（1）\1, \2 - 引用前面的分组

# 匹配重复的单词

text = 'the the cat cat dog'

print(re.findall(r'\b(\w+)\s+\1\b', text)) # ['the', 'cat']

# 检查标签是否匹配

html = '<h1>Title</h1>'

match = re.search(r'<(\w+)>.*?</\1>', html)

print(match.group(1)) # 'h1'

七、常用函数

（1）match() - 从字符串的起始位置开始匹配，如果开头不符合规则就返回None

print(re.match(r'\d+', '123abc')) # 匹配对象

print(re.match(r'\d+', 'abc123')) # None

（2）search() - 在整个字符串中搜索，返回第一个匹配到的结果

print(re.search(r'\d+', 'abc123')) # 匹配对象

（3）findall() - 找出所有匹配，返回字符串中所有不重叠匹配项的列表

emails = 'test@example.com, user@test.org, invalid@'

print(re.findall(r'\b[\w.-]+@[\w.-]+\.\w+\b', emails))

（4）finditer() - 逐个获取匹配对象，返回迭代器，每个元素是匹配对象

text = 'Hello 123 World 456'

for match in re.finditer(r'\d+', text):

print(f'找到 {match.group()} 在位置 {match.span()}')

（5）sub() - 替换字符串中匹配模式的部分

text = 'Hello 123, World 456'

result = re.sub(r'\d+', 'NUM', text)

print(result)

（6）subn() - 返回(替换后的字符串, 替换次数)

text = 'Hello 123, World 456'

result, count = re.subn(r'\d+', 'NUM', text)

print(f'结果: {result}, 替换次数: {count}')

（7）split() - 按模式分割

text = 'apple, banana; orange grape|pear'

result = re.split(r'[,;|\s]+', text)

print(result)

# 保留分割符（使用捕获分组）

text = 'a1b2c3'

result = re.split(r'(\d+)', text)

print(result)

八、编译正则表达式

# 对于多次使用的正则，编译可以提高效率

pattern = re.compile(r'\b\d{3}-\d{4}\b')

# 使用编译后的模式

text = 'My phone: 123-4567, office: 987-6543'

print(pattern.findall(text)) # ['123-4567', '987-6543']

print(pattern.search(text).group()) # '123-4567'

# 编译时设置标志

pattern = re.compile(r'hello', re.IGNORECASE)

print(pattern.findall('HELLO hello HeLLo')) # ['HELLO', 'hello', 'HeLLo']

以上皆为课内课后整理，仅供个人学习参考，非官方学习资料，如有错漏之处，敬请指正。

点击上方蓝字关注我们

清妍小筑

本文来自网友投稿或网络内容，如有侵犯您的权益请联系我们删除，联系邮箱：wyl860211@qq.com 。

Python学习日记 | 正则表达式(re)/ 2

最新文章

热门文章

随机文章

Python学习日记 | 正则表达式(re)/ 2

Python函数递归练习

当你说”无差异曲线看不懂”时,可以用Python做点什么

最新文章

热门文章

随机文章