当前位置：首页>python>Python 3 入门与进阶(十二):正则表达式与JSON

Python 3 入门与进阶(十二):正则表达式与JSON

2026-02-06 22:45:14

大家好，我是煜道。

今天我们一起来学习 正则表达式与JSON。

引言

正则表达式（Regular Expression）是一种强大的文本模式匹配工具，用于检索、替换或验证符合特定模式的字符串。JSON（JavaScript Object Notation）是一种轻量级的数据交换格式，因其简洁性和跨语言兼容性，已成为Web API数据传输的事实标准。

本文将系统介绍Python正则表达式的语法和使用方法，包括元字符、字符集、量词、边界匹配、分组以及常见操作函数。同时，本文也将详细讲解JSON数据格式的特点、Python中的JSON序列化和反序列化操作，以及它们在实际开发中的应用。

01 正则表达式基础

1.1 正则表达式简介

正则表达式是一个特殊的字符序列，用于描述字符串的匹配模式：

import retext = "Python is powerful. Python is popular. Python is fun."# 简单匹配pattern = "Python"result = re.findall(pattern, text)print(result)  # ['Python', 'Python', 'Python']

1.2 元字符

元字符是正则表达式中具有特殊意义的字符：

元字符	含义
.	匹配任意单个字符（除换行符）
^	匹配字符串开头
$	匹配字符串结尾
*	匹配前一个字符0次或多次
+	匹配前一个字符1次或多次
?	匹配前一个字符0次或1次
\	转义字符
\|	或运算符
[]	字符集
()	分组

import re# . 匹配任意字符print(re.findall("P.thon", "Python, Pithon, Pethon"))  # ['Python', 'Pithon', 'Pethon']# ^ 匹配开头print(re.findall("^Python", "Python is great"))  # ['Python']print(re.findall("^Python", "I love Python"))    # []# $ 匹配结尾print(re.findall("Python$", "I love Python"))    # ['Python']print(re.findall("Python$", "Python is great"))  # []# * + ?print(re.findall("Py*hon", "Pyhon Pyyhon Pyyyhon"))  # ['Pyhon', 'Pyyhon', 'Pyyyon']print(re.findall("Py+hon", "Pyhon Pyyhon"))          # ['Pyhon', 'Pyyhon']print(re.findall("Py?hon", "Pyhon Pyyhon"))          # ['Pyhon']

1.3 字符集

使用方括号[]定义字符集：

import re# [abc] 匹配任意一个a、b或cprint(re.findall("[abc]", "Pythona"))  # ['a']# [^abc] 匹配非a、b、c的字符print(re.findall("[^abc]", "Python"))  # ['P', 'y', 't', 'h', 'o', 'n']# [a-z] 范围匹配print(re.findall("[a-z]", "Python3"))  # ['y', 't', 'h', 'o', 'n']# [0-9] 数字范围print(re.findall("[0-9]+", "abc123def456"))  # ['123', '456']

1.4 概括字符集

代码	含义
\d	匹配数字（等价于[0-9]）
\D	匹配非数字
\w	匹配单词字符（字母、数字、下划线）
\W	匹配非单词字符
\s	匹配空白字符（空格、制表符、换行符）
\S	匹配非空白字符

import retext = "Python3.12 is released on 2023-10-02.\nEmail: example@test.com"# 匹配所有数字print(re.findall(r"\d+", text))  # ['3', '12', '2023', '10', '02']# 匹配所有单词字符print(re.findall(r"\w+",                 text))  # ['Python3', '12', 'is', 'released', 'on', '2023', '10', '02', 'Email', 'example', 'test', 'com']# 匹配空白print(re.findall(r"\s", text))  # [' ', ' ', ' ', ' ', '\n', ' ']

1.5 数量词

数量词用于指定匹配的次数：

量词	含义
{n}	恰好n次
{n,}	至少n次
{n,m}	n到m次

import re# {n} 恰好n次print(re.findall(r"\d{3}", "123 4567 89"))  # ['123', '456']# {n,} 至少n次print(re.findall(r"\d{2,}", "1 12 123 1234"))  # ['12', '123', '1234']# {n,m} n到m次print(re.findall(r"\d{2,4}", "1 12 123 12345 123456"))# ['12', '123', '1234', '1234', '56']

02 贪婪与非贪婪

2.1 贪婪匹配

正则表达式默认使用贪婪匹配，匹配尽可能多的字符：

import retext = "<div>content</div>"# 贪婪匹配print(re.findall(r"<div>.*</div>", text))  # ['<div>content</div>']

2.2 非贪婪匹配

使用?使量词变为非贪婪（最小匹配）：

import retext = "<div>content</div><div>more</div>"# 贪婪匹配print(re.findall(r"<div>.*</div>", text))# ['<div>content</div><div>more</div>']# 非贪婪匹配print(re.findall(r"<div>.*?</div>", text))# ['<div>content</div>', '<div>more</div>']

03 边界匹配

3.1 单词边界

import retext = "The cat in the hat sat on the mat."# \b 单词边界print(re.findall(r"\b\w{3}\b", text))# ['The', 'cat', 'the', 'hat', 'sat', 'the', 'mat']# \B 非单词边界print(re.findall(r"\B\w{3}\B", text))# []

3.2 行边界

import retext = """Line 1Line 2Line 3"""# ^ 每行开头lines = re.findall(r"^Line \d+", text, re.MULTILINE)print(lines)  # ['Line 1', 'Line 2', 'Line 3']# $ 每行结尾ends = re.findall(r"Line \d+$", text, re.MULTILINE)print(ends)  # ['Line 1', 'Line 2', 'Line 3']

04 分组与捕获

4.1 分组

使用圆括号()创建分组：

import retext = "The price is $100, and the discount is $25."# 基本分组print(re.findall(r"\$\d+", text))  # ['$100', '$25']# 捕获分组matches = re.findall(r"(\$)(\d+)", text)print(matches)  # [('$', '100'), ('$', '25')]# 命名分组pattern = r"(?P<currency>\$)(?P<amount>\d+)"matches = re.findall(pattern, text)print(matches)  # [('$', '100'), ('$', '25')]

4.2 后向引用

import retext = "hello hello world world"# 引用前面的分组print(re.sub(r"(\w+) \1", r"\1", text))  # 'hello world'

05 re模块常用函数

5.1 match

从字符串开头匹配：

import repattern = r"\d+"text = "123abc456"match = re.match(pattern, text)print(match.group()) if match else print("No match")  # '123'# match从开头开始no_match = re.match(r"[a-z]+", text)print(no_match) if no_match else print("No match")  # No match

5.2 search

搜索整个字符串，找到第一个匹配：

import repattern = r"\d+"text = "abc123def456"match = re.search(pattern, text)print(match.group()) if match else print("No match")  # '123'

5.3 findall

返回所有匹配的列表：

import repattern = r"\d+"text = "abc123def456ghi789"matches = re.findall(pattern, text)print(matches)  # ['123', '456', '789']

5.4 finditer

返回所有匹配的迭代器：

import repattern = r"\d+"text = "abc123def456"for match in re.finditer(pattern, text):    print(f"Match: {match.group()}, Position: {match.span()}")# Match: 123, Position: (3, 6)# Match: 456, Position: (9, 12)

5.5 sub/subn

替换匹配：

import retext = "The price is $100, and $200."# 替换print(re.sub(r"\$\d+", "XXX", text))  # 'The price is XXX, and XXX.'# 替换并统计次数result, count = re.subn(r"\$\d+", "XXX", text)print(result, count)  # 'The price is XXX, and XXX.' 2# 使用函数替换defreplace_price(match):    price_str = match.group()      amount_str = price_str[1:]    amount = int(amount_str)returnf"${amount * 2}"print(re.sub(r"\$\d+", replace_price, text))# 'The price is $200, and $400.'

5.6 split

使用正则表达式分割：

import retext = "apple, banana; orange. grape"# 多种分隔符parts = re.split(r"[,;.\s]+", text)print(parts)  # ['apple', 'banana', 'orange', 'grape']

06 匹配模式

import retext = "Hello\nWorld"# re.I - 忽略大小写print(re.findall(r"hello", "Hello HELLO hello", re.I))  # ['Hello', 'HELLO', 'hello']# re.S - .匹配包括换行符print(re.findall(r".+", "Line1\nLine2", re.S))# ['Line1\nLine2']# re.M - ^和$匹配每行print(re.findall(r"^\w+", "Line1\nLine2", re.M))# ['Line1', 'Line2']# 组合使用text = """PythonjavascriptJAVA"""matches = re.findall(r"python", text, re.I | re.M)print(matches)  # ['Python']

07 JSON数据格式

7.1 JSON简介

JSON（JavaScript Object Notation）是一种轻量级的数据交换格式：

{"name": "Python","version": 3.12,"is_popular": true,"features": ["easy", "powerful", "versatile"],"author": {"first_name": "Guido","last_name": "van Rossum"  }}

JSON数据类型与Python对应：

JSON类型	Python类型
object	dict
array	list
string	str
number (int)	int
number (real)	float
true/false	True/False
null	None

7.2 JSON序列化（ dumps ）

import json# 基本类型data = {"name": "Python","version": 3.12,"is_popular": True,"creator": None}# 序列化json_str = json.dumps(data, indent=2, ensure_ascii=False)print(json_str)

7.3 JSON反序列化（ loads ）

import jsonjson_str = '{"name": "Python", "version": 3.12}'# 反序列化data = json.loads(json_str)print(data)  # {'name': 'Python', 'version': 3.12}print(data["version"])  # 3.12

7.4 JSON与文件

import json# 写入JSON文件data = {"name": "Python", "version": 3.12}with open("config.json", "w", encoding="utf-8") as f:    json.dump(data, f, indent=2, ensure_ascii=False)# 读取JSON文件with open("config.json", "r", encoding="utf-8") as f:    data = json.load(f)    print(data)

7.5 JSON序列化进阶

import jsonfrom datetime import datetime, date# 自定义日期序列化classDateEncoder(json.JSONEncoder):defdefault(self, obj):if isinstance(obj, (datetime, date)):return obj.isoformat()return super().default(obj)data = {"name": "event","date": datetime.now()}print(json.dumps(data, cls=DateEncoder))# {"name": "event", "date": "2024-01-15T10:30:00.000000"}# 自定义反序列化defdate_decoder(dct):if"date"in dct and isinstance(dct["date"], str):        dct["date"] = datetime.fromisoformat(dct["date"])return dctjson_str = '{"name": "event", "date": "2024-01-15T10:30:00"}'data = json.loads(json_str, object_hook=date_decoder)

08 实战示例

8.1 邮箱验证

import redefvalidate_email(email):    pattern = r"^[a-zA-Z0-9._%+-]+@[a-zA-Z0-9.-]+\.[a-zA-Z]{2,}$"return re.match(pattern, email) isnotNone# 测试emails = ["test@example.com", "invalid.email", "@example.com", "test@.com"]for email in emails:    print(f"{email}: {validate_email(email)}")

8.2 日志解析

import refrom datetime import datetimeLOG_PATTERN = r"^(?P<timestamp>\d{4}-\d{2}-\d{2} \d{2}:\d{2}:\d{2}) (?P<level>\w+) (?P<message>.*)$"defparse_log_line(line):    match = re.match(LOG_PATTERN, line)if match:return {"timestamp": datetime.strptime(match.group("timestamp"), "%Y-%m-%d %H:%M:%S"),"level": match.group("level"),"message": match.group("message")        }returnNone

8.3 配置文件读取

import jsonimport reclassConfig:def__init__(self, file_path):        self.config = self._load_config(file_path)def_load_config(self, file_path):with open(file_path, 'r') as f:            content = f.read()# 移除注释        content = re.sub(r'#.*$', '', content, flags=re.MULTILINE)return json.loads(content)defget(self, key, default=None):return self.config.get(key, default)# 使用config = Config("app.json")print(config.get("database.host"))

09 小结

本文介绍了Python中的正则表达式和JSON处理：

正则表达式基础：元字符、字符集、概括字符集。
数量词与贪婪：匹配次数的控制。
边界匹配：单词边界和行边界。
分组与捕获：分组、后向引用。
re模块函数：match、search、findall、sub、split。
JSON格式：数据类型对应、序列化与反序列化。
实战应用：邮箱验证、日志解析、配置读取。

正则表达式和JSON是Python开发中最常用的工具之一。掌握正则表达式能够高效处理文本匹配和替换任务，而JSON则是Web开发和数据交换的标准格式。

本文来自网友投稿或网络内容，如有侵犯您的权益请联系我们删除，联系邮箱：wyl860211@qq.com 。

Python 3 入门与进阶(十二):正则表达式与JSON

引言

01 正则表达式基础

1.1 正则表达式简介

1.2 元字符

1.3 字符集

1.4 概括字符集

1.5 数量词

02 贪婪与非贪婪

2.1 贪婪匹配

2.2 非贪婪匹配

03 边界匹配

3.1 单词边界

3.2 行边界

04 分组与捕获

4.1 分组

4.2 后向引用

05 re模块常用函数

5.1 match

5.2 search

5.3 findall

5.4 finditer

5.5 sub/subn

5.6 split

06 匹配模式

07 JSON数据格式

7.1 JSON简介

7.2 JSON序列化（ dumps ）

7.3 JSON反序列化（ loads ）

7.4 JSON与文件

7.5 JSON序列化进阶

08 实战示例

8.1 邮箱验证

8.2 日志解析

8.3 配置文件读取

09 小结

最新文章

热门文章

随机文章

Python 3 入门与进阶(十二):正则表达式与JSON

引言

01 正则表达式基础

1.1 正则表达式简介

1.2 元字符

1.3 字符集

1.4 概括字符集

1.5 数量词

02 贪婪与非贪婪

2.1 贪婪匹配

2.2 非贪婪匹配

03 边界匹配

3.1 单词边界

3.2 行边界

04 分组与捕获

4.1 分组

4.2 后向引用

05 re模块常用函数

5.1 match

5.2 search

5.3 findall

5.4 finditer

5.5 sub/subn

5.6 split

06 匹配模式

07 JSON数据格式

7.1 JSON简介

7.2 JSON序列化（ dumps ）

7.3 JSON反序列化（ loads ）

7.4 JSON与文件

7.5 JSON序列化进阶

08 实战示例

8.1 邮箱验证

8.2 日志解析

8.3 配置文件读取

09 小结

Python 数据分析利器:Pandas常用方法

用 Python 做数据清洗,看这篇就会

最新文章

热门文章

随机文章