当前位置：首页>python>第12篇|Python 常用内置库精选:os、datetime、re 实用指南

第12篇|Python 常用内置库精选:os、datetime、re 实用指南

2026-07-04 09:39:31

这篇文章精选 Python 标准库中最实用的五个模块——os/pathlib、datetime、re、collections、random，每个都有真实使用场景和即插即用的代码示例，不需要安装任何第三方库。

前言

前端开发者学 Python 有个误区：觉得什么功能都需要安装第三方库。

实际上，Python 的标准库极为丰富，很多日常需求一行 import 就能解决。这就是为什么 Python 流行一句话："自带电池"（batteries included）。

这篇文章精选 5 个最实用的内置库，不安装任何东西就能直接用：

os / pathlib：文件系统操作
datetime：日期时间处理
re：正则表达式
collections：高级数据结构
random：随机数与抽样

一、os + pathlib：文件系统操作

pathlib 在第 07 篇已经详细介绍过，这里补充 os 模块的常用操作，以及两者的配合使用。

import os
from pathlib import Path

# 获取当前工作目录
print(os.getcwd())         # /home/user/projects/my_app
print(Path.cwd())          # 同上，pathlib 风格

# 列出目录内容
files = os.listdir(".")           # 返回字符串列表
files = list(Path(".").iterdir()) # 返回 Path 对象列表（更推荐）

# 检查路径
print(os.path.exists("config.json"))  # True/False
print(Path("config.json").exists())   # 同上

# 创建/删除目录
os.makedirs("output/2025/reports", exist_ok=True)  # 递归创建
Path("output/2025/reports").mkdir(parents=True, exist_ok=True)  # 同上

# 重命名/移动文件
os.rename("old_name.txt", "new_name.txt")

# 删除文件
os.remove("temp.txt")      # 删除文件（不存在则报错）

# 获取文件信息
stat = os.stat("config.json")
print(stat.st_size)         # 文件大小（字节）
print(stat.st_mtime)        # 最后修改时间（时间戳）

实用场景：批量查找特定类型文件

from pathlib import Path

def find_files(directory: str, extension: str) -> list[Path]:
    """递归查找指定目录下所有特定扩展名的文件"""
    return list(Path(directory).rglob(f"*.{extension}"))

# 找出所有 Python 文件
py_files = find_files(".", "py")
for f in py_files:
    print(f)

# 找出所有 JSON 文件
json_files = find_files("data", "json")
print(f"共找到 {len(json_files)} 个 JSON 文件")

二、datetime：日期时间处理

前端处理时间通常用 new Date() 或 dayjs。Python 标准库的 datetime 模块能覆盖大多数需求。

2.1 获取当前时间

from datetime import datetime, date, timedelta

now = datetime.now()          # 本地时间（含时分秒）
today = date.today()          # 今天日期（不含时间）

print(now)      # 2025-03-07 14:30:25.123456
print(today)    # 2025-03-07

2.2 格式化输出

now = datetime.now()

# strftime：datetime → 字符串
print(now.strftime("%Y-%m-%d"))             # 2025-03-07
print(now.strftime("%Y年%m月%d日 %H:%M"))   # 2025年03月07日 14:30
print(now.strftime("%Y/%m/%d %I:%M %p"))    # 2025/03/07 02:30 PM

常用格式符速查：

格式符	含义	示例
`%Y`	四位年份	2025
`%m`	两位月份	03
`%d`	两位日期	07
`%H`	24小时制小时	14
`%M`	分钟	30
`%S`	秒	25
`%A`	英文星期全称	Friday
`%a`	英文星期简称	Fri

2.3 字符串解析为 datetime

# strptime：字符串 → datetime（和 strftime 互为逆操作）
date_str = "2025-03-07"
dt = datetime.strptime(date_str, "%Y-%m-%d")
print(type(dt))    # <class 'datetime.datetime'>

# 解析带时间的字符串
dt2 = datetime.strptime("2025年03月07日 14:30", "%Y年%m月%d日 %H:%M")

2.4 时间运算

from datetime import datetime, timedelta

now = datetime.now()

# timedelta：时间差对象
one_week = timedelta(weeks=1)
three_days = timedelta(days=3)
two_hours = timedelta(hours=2)

next_week = now + one_week
three_days_ago = now - three_days

# 计算两个时间之间的差值
start = datetime(2025, 1, 1)
end = datetime(2025, 3, 7)
diff = end - start
print(diff.days)       # 65（天数差）
print(diff.seconds)    # 0（当天秒数差）

2.5 时间戳互转

import time

# 获取当前时间戳
timestamp = time.time()   # 浮点数，如 1741341025.123

# 时间戳 → datetime
dt = datetime.fromtimestamp(timestamp)

# datetime → 时间戳
ts = dt.timestamp()

实用示例：计算用户注册天数

from datetime import datetime

def days_since_registration(register_date_str: str) -> int:
    """计算从注册日期到今天经过了多少天"""
    register_date = datetime.strptime(register_date_str, "%Y-%m-%d")
    today = datetime.now()
    delta = today - register_date
    return delta.days

print(days_since_registration("2024-09-01"))  # 例如：187

三、re：正则表达式

正则表达式前端开发者一定用过，Python 的正则语法和 JS 基本相同，差别主要在 API 调用方式。

3.1 核心函数

import re

text = "我的邮箱是 alice@example.com，备用邮箱是 backup@gmail.com"

# re.search()：搜索第一个匹配（找到返回 Match 对象，否则返回 None）
match = re.search(r"\w+@\w+\.\w+", text)
if match:
    print(match.group())   # alice@example.com

# re.findall()：找出所有匹配（返回字符串列表）
emails = re.findall(r"\w+@\w+\.\w+", text)
print(emails)   # ['alice@example.com', 'backup@gmail.com']

# re.sub()：替换（对应 JS 的 str.replace()）
result = re.sub(r"\w+@\w+\.\w+", "***", text)
print(result)   # 我的邮箱是 ***，备用邮箱是 ***

# re.match()：只匹配字符串开头（不同于 search）
m = re.match(r"\d+", "123abc")
print(m.group())   # 123

# re.split()：按正则分割（对应 JS 的 str.split()）
parts = re.split(r"[,，；;]\s*", "苹果,香蕉，樱桃; 葡萄")
print(parts)   # ['苹果', '香蕉', '樱桃', '葡萄']

3.2 对比 JS 正则

// JavaScript 正则
const text = "邮箱：alice@example.com"
const pattern = /\w+@\w+\.\w+/g

// 查找所有匹配
const emails = text.match(pattern)   // ['alice@example.com']

// 替换
const result = text.replace(pattern, "***")

// 测试是否匹配
pattern.test(text)   // true

# Python 正则（等价写法）
import re
text = "邮箱：alice@example.com"
pattern = r"\w+@\w+\.\w+"

emails = re.findall(pattern, text)          # ['alice@example.com']
result = re.sub(pattern, "***", text)       # 替换
bool(re.search(pattern, text))              # True

最大区别： JS 正则方法在字符串对象上调用（str.match()），Python 正则是模块函数（re.search(string, pattern)，顺序注意）。

3.3 编译正则（提升性能）

如果同一个正则要用很多次，先编译可以提升性能：

import re

# 编译一次，多次使用
email_pattern = re.compile(r"\b[A-Za-z0-9._%+-]+@[A-Za-z0-9.-]+\.[A-Z|a-z]{2,}\b")

texts = ["alice@example.com", "invalid-email", "bob@gmail.com"]
valid_emails = [t for t in texts if email_pattern.search(t)]
print(valid_emails)   # ['alice@example.com', 'bob@gmail.com']

四、collections：高级数据结构

collections 模块提供了几个比内置类型更强大的数据结构。

4.1 Counter：计数器

from collections import Counter

# 统计列表中每个元素出现的次数
words = ["apple", "banana", "apple", "cherry", "banana", "apple", "date"]
count = Counter(words)

print(count)
# Counter({'apple': 3, 'banana': 2, 'cherry': 1, 'date': 1})

print(count["apple"])     # 3
print(count["不存在"])    # 0（不会报 KeyError！）

# 最常见的 N 个
print(count.most_common(2))   # [('apple', 3), ('banana', 2)]

# 统计字符出现次数
char_count = Counter("hello world")
print(char_count.most_common(3))  # [('l', 3), ('o', 2), (' ', 1)]

# 合并两个计数器
c1 = Counter(["a", "b", "a"])
c2 = Counter(["b", "c", "b"])
print(c1 + c2)   # Counter({'b': 3, 'a': 2, 'c': 1})

实用场景：统计文章词频

from collections import Counter
import re

def word_frequency(text: str, top_n: int = 10) -> list:
    """统计文章中出现频率最高的 N 个词"""
    words = re.findall(r'\b[a-zA-Z]+\b', text.lower())
    counter = Counter(words)
    return counter.most_common(top_n)

4.2 defaultdict：有默认值的字典

from collections import defaultdict

# 普通字典访问不存在的键会报 KeyError
# defaultdict 访问不存在的键时会自动创建默认值

# int 类型默认值为 0
word_count = defaultdict(int)
for word in ["apple", "banana", "apple", "cherry"]:
    word_count[word] += 1   # 不用先判断 key 是否存在
print(dict(word_count))   # {'apple': 2, 'banana': 1, 'cherry': 1}

# list 类型默认值为空列表（分组场景常用）
grouped = defaultdict(list)
students = [("Alice", "数学"), ("Bob", "语文"), ("Alice", "英语"), ("Bob", "数学")]
for name, subject in students:
    grouped[name].append(subject)   # 无需先判断 key 是否存在

print(dict(grouped))
# {'Alice': ['数学', '英语'], 'Bob': ['语文', '数学']}

4.3 deque：双端队列

from collections import deque

# 比列表更高效的双端操作
queue = deque([1, 2, 3])

queue.append(4)      # 右端添加：[1, 2, 3, 4]
queue.appendleft(0)  # 左端添加：[0, 1, 2, 3, 4]
queue.pop()          # 右端删除并返回：4
queue.popleft()      # 左端删除并返回：0

# 实用场景：保留最近 N 条记录
recent_logs = deque(maxlen=5)   # 最多保留 5 条
for i in range(10):
    recent_logs.append(f"日志 {i}")

print(list(recent_logs))   # ['日志 5', '日志 6', '日志 7', '日志 8', '日志 9']

五、random：随机数与抽样

import random

# 生成随机整数（含两端）
n = random.randint(1, 100)   # 1 到 100 之间的随机整数

# 生成随机浮点数（0.0 到 1.0）
f = random.random()

# 生成指定范围的随机浮点数
f2 = random.uniform(1.5, 5.5)

# 从列表中随机选一个
items = ["苹果", "香蕉", "樱桃", "葡萄"]
choice = random.choice(items)

# 随机选 N 个（不重复）
sample = random.sample(items, 2)   # ['苹果', '樱桃']（每次不同）

# 随机选 N 个（可重复）
choices = random.choices(items, k=3)   # 可能有重复

# 打乱列表顺序（原地修改）
random.shuffle(items)
print(items)   # 顺序被打乱

# 设置随机种子（保证可复现，测试时用）
random.seed(42)
print(random.randint(1, 100))   # 每次运行结果相同

实用场景：生成随机密码

import random
import string

def generate_password(length: int = 12) -> str:
    """生成随机密码：包含大小写字母、数字和特殊字符"""
    characters = (
        string.ascii_uppercase +   # A-Z
        string.ascii_lowercase +   # a-z
        string.digits +            # 0-9
        "!@#$%^&*"                 # 特殊字符
    )
    # 确保每种字符至少出现一次
    password = [
        random.choice(string.ascii_uppercase),
        random.choice(string.ascii_lowercase),
        random.choice(string.digits),
        random.choice("!@#$%^&*"),
    ]
    # 补足剩余长度
    password += random.choices(characters, k=length - 4)
    # 打乱顺序
    random.shuffle(password)
    return "".join(password)

print(generate_password(16))   # 例如：aK3#mZp9!wRn5Qx2

小结

模块	核心用途	最常用的功能
`os`	操作系统交互	`os.listdir()` 、`os.makedirs()`、`os.rename()`
`pathlib`	路径操作	`Path.glob()` 、`Path.rglob()`、`Path.mkdir()`
`datetime`	日期时间	`datetime.now()` 、`strftime()`、`strptime()`、`timedelta`
`re`	正则表达式	`re.findall()` 、`re.sub()`、`re.search()`
`collections`	高级数据结构	`Counter` 、`defaultdict`、`deque`
`random`	随机数	`random.choice()` 、`random.sample()`、`random.shuffle()`

3 个高频使用场景：

解析日志文件里的时间戳 → datetime.strptime()
统计文本词频 → Counter
从配置里批量查找匹配的字符串 → re.findall()

下篇预告

第 13 篇：Python 网络请求实战：requests 库完全指南

第二阶段进入实战了！下一篇讲 requests 库——Python 版的 fetch API。GET、POST、请求头、超时、异常处理，手把手带你调用一个真实的公开 API。

本文来自网友投稿或网络内容，如有侵犯您的权益请联系我们删除，联系邮箱：wyl860211@qq.com 。

小结

下篇预告

第12篇|Python 常用内置库精选:os、datetime、re 实用指南

前言

一、os + pathlib：文件系统操作

二、datetime：日期时间处理

2.1 获取当前时间

2.2 格式化输出

2.3 字符串解析为 datetime

2.4 时间运算

2.5 时间戳互转

三、re：正则表达式

3.1 核心函数

3.2 对比 JS 正则

3.3 编译正则（提升性能）

四、collections：高级数据结构

4.1 Counter：计数器

4.2 defaultdict：有默认值的字典

4.3 deque：双端队列

五、random：随机数与抽样

最新文章

热门文章

随机文章

第12篇|Python 常用内置库精选:os、datetime、re 实用指南

前言

一、os + pathlib：文件系统操作

二、datetime：日期时间处理

2.1 获取当前时间

2.2 格式化输出

2.3 字符串解析为 datetime

2.4 时间运算

2.5 时间戳互转

三、re：正则表达式

3.1 核心函数

3.2 对比 JS 正则

3.3 编译正则（提升性能）

四、collections：高级数据结构

4.1 Counter：计数器

4.2 defaultdict：有默认值的字典

4.3 deque：双端队列

五、random：随机数与抽样

小结

下篇预告

单细胞数据分析培训班(Python/Galaxy可选),不怕学不会

Python实战操作:Python 环境搭建(超详细图文教程)

最新文章

热门文章

随机文章