业务接口、爬虫数据常存为列表字典格式,杂乱难分析,快速拆分转为标准二维表格。场景:解析包含规格、标签的字典列表字段,拆分出规格参数、商品标签,转为独立字段便于筛选与分组。核心知识点:列表遍历、字典键值提取、字段横向拆分、批量规整。① 生成测试数据
import pandas as pddata_list = []for i in range(60): item = { "id":i+1, "title":f"爆款商品{i+1}", "spec":{"尺寸":"大号","材质":"塑料","等级":"A类"}, "tag":["热销","包邮"] if i%2==0 else ["新品","折扣"] } data_list.append(item)df = pd.DataFrame(data_list)df.to_excel("dict_list_data.xlsx",index=False)print("列表字典嵌套数据生成完成")
② 核心代码
import pandas as pdimport astdf = pd.read_excel("dict_list_data.xlsx")# 字符串转字典并拆分字段df["spec"] = df["spec"].apply(ast.literal_eval)df["尺寸"] = df["spec"].apply(lambda x:x["尺寸"])df["材质"] = df["spec"].apply(lambda x:x["材质"])# 标签列表拼接df["tag"] = df["tag"].apply(ast.literal_eval)df["标签"] = df["tag"].apply(lambda x:"," .join(x))# 简单统计print(df[["title","尺寸","材质","标签"]].head())print("\n各标签商品数量:")print(df["标签"].value_counts())
结果展示
总结
嵌套列表字典是非标数据常见格式,熟练拆解转换后,可无缝接入常规数据分析流程,大幅提升非标数据处理效率。