Python数据分析:时间差 (Timedelta)
1. 核心知识点概述
Timedelta表示两个时间之间的差值,是时间序列分析中的重要概念:
Timedeltato_timedelta()- 时间差运算
- 频率转换
关键参数说明
dayshoursminutessecondsunit
2. 示例代码
2.1 准备数据
In [1]:
from datetime import datetime, timedeltaprint("Pandas Timedelta功能演示")
========================================
2.2 创建Timedelta
多种方式创建时间差对象。
In [2]:
td1 = pd.Timedelta('2 days')td2 = pd.Timedelta('1 days 2 hours 30 minutes')td3 = pd.Timedelta('3W') # 3周print("使用字符串创建Timedelta:")print(f"1天2小时30分: {td2}")td4 = pd.Timedelta(days=5, hours=3, minutes=30)td5 = pd.Timedelta(weeks=2, days=1)print(f"5天3小时30分: {td4}")td_list = pd.to_timedelta(['1 days', '2 days', '3 days', '4 days'])
1天2小时30分: 1 days 02:30:005天3小时30分: 5 days 03:30:00TimedeltaIndex(['1 days', '2 days', '3 days', '4 days'], dtype='timedelta64[ns]', freq=None)
2.3 Timedelta属性
获取时间差的各个组成部分。
In [3]:
td = pd.Timedelta(days=5, hours=3, minutes=30, seconds=45)print(f"Timedelta: {td}")print(f"总秒数: {td.total_seconds()}")print(f"总小时数: {td.total_seconds() / 3600:.2f}")print(f"components: {td.components}")print(f"天数: {td.components.days}")print(f"小时: {td.components.hours}")print(f"分钟: {td.components.minutes}")print(f"秒: {td.components.seconds}")print(f"to_timedelta64: {td.to_timedelta64()}") # numpy timedelta64print(f"value: {td.value}") # 纳秒数
Timedelta: 5 days 03:30:45components: Components(days=5, hours=3, minutes=30, seconds=45, milliseconds=0, microseconds=0, nanoseconds=0)to_timedelta64: 444645000000000 nanoseconds
2.4 Timedelta运算
时间差支持各种数学运算。
In [4]:
td1 = pd.Timedelta(days=2)td2 = pd.Timedelta(days=1, hours=12)print(f"td1 + td2: {td1 + td2}")print(f"td1 - td2: {td1 - td2}")print(f"td1 * 2: {td1 * 2}")print(f"td1 / 2: {td1 / 2}")print(f"td1 / td2: {td1 / td2:.2f}") # 比值td_neg = pd.Timedelta(days=-3)print(f"abs: {abs(td_neg)}")
td1 + td2: 3 days 12:00:00td1 - td2: 0 days 12:00:00
2.5 时间戳与Timedelta运算
时间戳加减时间差得到新的时间戳。
In [5]:
ts = pd.Timestamp('2024-03-15 10:00:00')td = pd.Timedelta(days=3, hours=5)print(f"ts + td: {ts + td}")print(f"ts - td: {ts - td}")ts1 = pd.Timestamp('2024-03-15')ts2 = pd.Timestamp('2024-03-20')print(f"ts2 - ts1: {diff}")print(f"相差天数: {diff.days}")
原始时间戳: 2024-03-15 10:00:00ts + td: 2024-03-18 15:00:00ts - td: 2024-03-12 05:00:00ts2 - ts1: 5 days 00:00:00
2.6 Series中的Timedelta
处理包含时间差的Series。
In [6]:
print("Timedelta Series:")seconds = td_series.dt.total_seconds()hours = td_series.dt.total_seconds() / 3600
2.7 实际应用:计算时间间隔
计算事件发生的时间间隔。
In [7]:
'order_id': ['A001', 'A002', 'A003', 'A004', 'A005'], 'order_time': pd.to_datetime(['2024-03-15 09:00', '2024-03-15 10:30', '2024-03-15 14:00', '2024-03-15 16:30', 'delivery_time': pd.to_datetime(['2024-03-15 11:30', '2024-03-15 13:00', '2024-03-15 17:00', '2024-03-15 19:00', orders['delivery_duration'] = orders['delivery_time'] - orders['order_time']print(f"平均配送时间: {orders['delivery_duration'].mean()}")print(f"最短配送时间: {orders['delivery_duration'].min()}")print(f"最长配送时间: {orders['delivery_duration'].max()}")orders['delivery_hours'] = orders['delivery_duration'].dt.total_seconds() / 3600print(orders[['order_id', 'delivery_hours']])
order_id order_time delivery_time delivery_duration0 A001 2024-03-15 09:00:00 2024-03-15 11:30:00 0 days 02:30:001 A002 2024-03-15 10:30:00 2024-03-15 13:00:00 0 days 02:30:002 A003 2024-03-15 14:00:00 2024-03-15 17:00:00 0 days 03:00:003 A004 2024-03-15 16:30:00 2024-03-15 19:00:00 0 days 02:30:004 A005 2024-03-16 09:00:00 2024-03-16 12:00:00 0 days 03:00:00
2.8 实际应用:用户留存分析
计算用户两次登录之间的时间间隔。
In [8]:
'user_id': ['U001', 'U001', 'U001', 'U002', 'U002', 'U003', 'U003', 'U003', 'U003'], 'login_time': pd.to_datetime([ '2024-03-01', '2024-03-05', '2024-03-15', '2024-03-02', '2024-03-10', '2024-03-01', '2024-03-03', '2024-03-07', '2024-03-20'logins = logins.sort_values(['user_id', 'login_time'])logins['days_since_last'] = logins.groupby('user_id')['login_time'].diff()avg_interval = logins.groupby('user_id')['days_since_last'].mean()for user, interval in avg_interval.items(): print(f"{user}: {interval.days}天")
user_id login_time days_since_last2 U001 2024-03-15 10 days8 U003 2024-03-20 13 days
2.9 频率转换
将Timedelta转换为不同的时间单位。
In [9]:
td = pd.Timedelta(days=2, hours=5, minutes=30, seconds=45)print(f"原始Timedelta: {td}")print(f"总天数: {td.total_seconds() / 86400:.4f}天")print(f"总小时: {td.total_seconds() / 3600:.4f}小时")print(f"总分钟: {td.total_seconds() / 60:.2f}分钟")print(f"总秒数: {td.total_seconds()}秒")print(f"总毫秒: {td.total_seconds() * 1000}毫秒")print(f"floor('D'): {td.floor('D')}") # 向下取整到天print(f"ceil('h'): {td.ceil('h')}") # 向上取整到小时print(f"round('h'): {td.round('h')}") # 四舍五入到小时
原始Timedelta: 2 days 05:30:45floor('D'): 2 days 00:00:00ceil('h'): 2 days 06:00:00round('h'): 2 days 06:00:00
2.10 TimedeltaIndex
使用Timedelta作为索引。
In [10]:
td_index = pd.timedelta_range(start='0 days', end='10 days', freq='1D')td_series = pd.Series(range(len(td_index)), index=td_index)print(f"\n使用TimedeltaIndex的Series:")filtered = td_series['2 days':'5 days']td_range = pd.timedelta_range(start='0 hours', periods=12, freq='2h')print(f"\n每2小时的TimedeltaIndex:")
TimedeltaIndex([ '0 days', '1 days', '2 days', '3 days', '4 days', '5 days', '6 days', '7 days', '8 days', '9 days', dtype='timedelta64[ns]', freq='D')
TimedeltaIndex(['0 days 00:00:00', '0 days 02:00:00', '0 days 04:00:00', '0 days 06:00:00', '0 days 08:00:00', '0 days 10:00:00', '0 days 12:00:00', '0 days 14:00:00', '0 days 16:00:00', '0 days 18:00:00', '0 days 20:00:00', '0 days 22:00:00'], dtype='timedelta64[ns]', freq='2h')
3. 常见应用场景总结
- 配送时间分析
- 用户行为分析
- 设备运行时间
- ** SLA监控**:计算服务响应时间是否在SLA范围内。
- 会话时长
- 任务调度