从单站到全国,从几百个风暴到几千个候选区——我们如何让算法不“眼花缭乱”?
如果你用过单站雷达数据做风暴识别,你可能会觉得:识别几十个风暴,画几个矩形框,美滋滋。
但当你拿到全国天气雷达拼图3.0(覆盖73°E~135°E, 12°N~54°N,分辨率0.01°×0.01°),一张图就有 4200×6200 ≈ 2600万个网格点。一次体扫识别出的候选区域动辄5000+,过滤后仍然有600+个有效风暴单体。
问题来了:如果每个风暴都画一个框,整张图会变成“蜘蛛网”——框叠框、点压点,根本看不清。而且,全国拼图中的风暴往往是成片出现(比如一条飑线、一个MCS),内部包含许多小单体,如果单独框选,既冗余又难看。
所以,我们需要一套专为全国拼图优化的SCIT后处理流程:聚类、合并、去重、分级显示。
阈值二值化(≥35 dBZ)
8连通域标记
提取每个连通域的外接矩形(像素坐标转经纬度)
这一步与单站完全一样,输出的 storms 列表包含每个单体的矩形框、质心、面积、最大dBZ。
我们定义了两个阈值:
box_area_threshold(如100像素):面积≥此值的风暴直接画黑色矩形框。
min_storm_size(如50像素):面积小于此值的风暴直接丢弃(噪声)。
剩下的小风暴(面积在50~100之间)不画框,而是在质心画一个红色小点。这样既保留了弱对流信息,又不会让图像太乱。
为什么小风暴还要合并?因为全国拼图中经常有成片的小单体(比如阵风锋前的新生对流),如果每个都单独画红点,会形成密密麻麻的“红疹”。我们设定:
cluster_radius_km(如50 km):在此距离内的风暴视为同一簇。
cluster_threshold(如20个):簇内风暴数量达到此值,就将它们合并成一个大的外接矩形,不再画红点。
这样,一片雷暴群会用一个框圈起来,简洁有力。
即使经过聚类,仍然可能产生重叠的矩形(例如一个大风暴内部被识别出多个连通域)。我们使用并查集将所有相交或包含的矩形合并为一个大矩形。同时,任何红点如果落在最终矩形框内部,就会被剔除——避免“框中有框、框中有点”的冗余。
def haversine_distance(lon1, lat1, lon2, lat2):lon1, lat1, lon2, lat2 = map(radians, [lon1, lat1, lon2, lat2])dlon = lon2 - lon1dlat = lat2 - lat1a = sin(dlat/2)**2 + cos(lat1) * cos(lat2) * sin(dlon/2)**2c = 2 * asin(sqrt(a))return EARTH_RADIUS_KM * c
精度足够,且无需额外依赖。
我们使用一份2024年7月31日的全国拼图数据(Z_RADA_C_BABJ_2024xxxxxxxxx_P_DOR_ACHN_CREF_xxxxxx_084800.bin)进行测试:
原始候选区域:5030个
过滤后(面积≥50像素):633个
分级后:
大风暴(≥100像素)直接画框:约120个
小风暴(50~100像素)先聚类(半径50km,阈值20),合并后得到额外约15个大框
剩余未聚类的小风暴画红点:约200个
去重后最终框数量:约130个,红点数量:约180个
下图是最终输出(示意):

与单站雷达图相比:
单站的案例如下:
单站:几十个框,可以直接画每个风暴的精确轮廓。
全国:必须用矩形框(节省计算),且需要聚类合并,否则图像无法阅读。
我们的代码已完全适配拼图3.0格式(二进制头+bz2压缩数据),可直接用于业务。如果你正在从单站转向全国组网,这套后处理逻辑应该能帮你省下不少调试时间。当然解析雷达数据也可以使用相关库,介绍如下:
import osimport bz2import structimport numpy as npimport matplotlib.pyplot as pltfrom scipy.ndimage import label, find_objectsfrom matplotlib.colors import ListedColormap, BoundaryNormimport cartopy.crs as ccrsimport cartopy.feature as cfeaturefrom collections import defaultdictfrom math import radians, sin, cos, sqrt, asinEARTH_RADIUS_KM = 6371.0def haversine_distance(lon1, lat1, lon2, lat2):lon1, lat1, lon2, lat2 = map(radians, [lon1, lat1, lon2, lat2])dlon = lon2 - lon1dlat = lat2 - lat1a = sin(dlat/2)**2 + cos(lat1) * cos(lat2) * sin(dlon/2)**2c = 2 * asin(sqrt(a))return EARTH_RADIUS_KM * c# ==================== 拼图3.0数据解析模块 ====================# 解析模块代码可以参照博主全国天气雷达数据处理教程完整代码# ==================== SCIT 风暴单体识别 ====================class SCIT_StormIdentifier:def __init__(self, ref_data, lon_grid, lat_grid, dbz_threshold=35, min_storm_size=5):self.ref = ref_dataself.lon = lon_gridself.lat = lat_gridself.thresh = dbz_thresholdself.min_size = min_storm_sizeself.storm_cells = []def run_scit(self):print("="*60)print("【SCIT 风暴单体识别算法】")print(f"阈值: {self.thresh} dBZ, 最小面积: {self.min_size} 像素")print("="*60)binary = np.zeros_like(self.ref, dtype=np.uint8)valid = ~np.isnan(self.ref)binary[valid & (self.ref >= self.thresh)] = 1labeled, num = label(binary, structure=np.ones((3,3)))slices = find_objects(labeled)print(f"候选区域: {num}")self.storm_cells = []for slc in slices:if slc is None:continuey_slc, x_slc = slcy1, y2 = y_slc.start, y_slc.stopx1, x2 = x_slc.start, x_slc.stoparea_pixels = (y2 - y1) * (x2 - x1)if area_pixels < self.min_size:continuelon_rect = [self.lon[y1, x1], self.lon[y1, x2],self.lon[y2, x2], self.lon[y2, x1],self.lon[y1, x1]]lat_rect = [self.lat[y1, x1], self.lat[y1, x2],self.lat[y2, x2], self.lat[y2, x1],self.lat[y1, x1]]center_lon = (lon_rect[0] + lon_rect[2]) / 2.0center_lat = (lat_rect[0] + lat_rect[2]) / 2.0self.storm_cells.append({"bbox_lon": lon_rect,"bbox_lat": lat_rect,"center_lon": center_lon,"center_lat": center_lat,"area_pixels": area_pixels,"max_dbz": np.nanmax(self.ref[y_slc, x_slc])})print(f"✅ 有效风暴单体: {len(self.storm_cells)} 个")return self.storm_cells# ==================== 矩形框几何工具 ====================def boxes_overlap(box1, box2):# box1, box2 格式: [min_lon, max_lon, min_lat, max_lat]lon1_min, lon1_max = box1[0], box1[1]lat1_min, lat1_max = box1[2], box1[3]lon2_min, lon2_max = box2[0], box2[1]lat2_min, lat2_max = box2[2], box2[3]# 检查是否不重叠if lon1_max < lon2_min or lon2_max < lon1_min:return Falseif lat1_max < lat2_min or lat2_max < lat1_min:return Falsereturn Truedef merge_boxes(boxes):rects = []for b in boxes:lons = b["bbox_lon"][:4]lats = b["bbox_lat"][:4]rects.append({"min_lon": min(lons),"max_lon": max(lons),"min_lat": min(lats),"max_lat": max(lats),"original": b})n = len(rects)parent = list(range(n))def find(x):while parent[x] != x:parent[x] = parent[parent[x]]x = parent[x]return xdef union(x, y):rx, ry = find(x), find(y)if rx != ry:parent[ry] = rxfor i in range(n):for j in range(i+1, n):if boxes_overlap([rects[i]["min_lon"], rects[i]["max_lon"], rects[i]["min_lat"], rects[i]["max_lat"]],[rects[j]["min_lon"], rects[j]["max_lon"], rects[j]["min_lat"], rects[j]["max_lat"]]):union(i, j)groups = defaultdict(list)for i in range(n):groups[find(i)].append(rects[i])merged = []for group in groups.values():min_lon = min(r["min_lon"] for r in group)max_lon = max(r["max_lon"] for r in group)min_lat = min(r["min_lat"] for r in group)max_lat = max(r["max_lat"] for r in group)merged_box = {"bbox_lon": [min_lon, max_lon, max_lon, min_lon, min_lon],"bbox_lat": [min_lat, min_lat, max_lat, max_lat, min_lat],"area_pixels": sum(r["original"]["area_pixels"] for r in group),"max_dbz": max(r["original"]["max_dbz"] for r in group)}merged.append(merged_box)return mergeddef point_in_box(lon, lat, box):lons = box["bbox_lon"][:4]lats = box["bbox_lat"][:4]min_lon = min(lons)max_lon = max(lons)min_lat = min(lats)max_lat = max(lats)return min_lon <= lon <= max_lon and min_lat <= lat <= max_lat# ==================== 风暴聚类与分级处理 ====================def merge_and_classify_storms(storms, cluster_radius_km=20.0, cluster_threshold=10, box_area_threshold=100):if not storms:return [], []large_storms = [s for s in storms if s["area_pixels"] >= box_area_threshold]small_storms = [s for s in storms if s["area_pixels"] < box_area_threshold]n = len(small_storms)boxes_from_clusters = []points = []if n > 0:def calc_dist(s1, s2):return haversine_distance(s1["center_lon"], s1["center_lat"],s2["center_lon"], s2["center_lat"])adj = np.zeros((n, n), dtype=bool)for i in range(n):for j in range(i+1, n):if calc_dist(small_storms[i], small_storms[j]) <= cluster_radius_km:adj[i, j] = adj[j, i] = Trueparent = list(range(n))def find(x):while parent[x] != x:parent[x] = parent[parent[x]]x = parent[x]return xdef union(x, y):rx, ry = find(x), find(y)if rx != ry:parent[ry] = rxfor i in range(n):for j in range(i+1, n):if adj[i, j]:union(i, j)clusters = defaultdict(list)for i in range(n):clusters[find(i)].append(i)for idx_list in clusters.values():if len(idx_list) >= cluster_threshold:lons = []lats = []for i in idx_list:storm = small_storms[i]lons.extend(storm["bbox_lon"])lats.extend(storm["bbox_lat"])if lons:min_lon = min(lons)max_lon = max(lons)min_lat = min(lats)max_lat = max(lats)merged_bbox_lon = [min_lon, max_lon, max_lon, min_lon, min_lon]merged_bbox_lat = [min_lat, min_lat, max_lat, max_lat, min_lat]boxes_from_clusters.append({"bbox_lon": merged_bbox_lon,"bbox_lat": merged_bbox_lat,"area_pixels": sum(small_storms[i]["area_pixels"] for i in idx_list),"max_dbz": max(small_storms[i]["max_dbz"] for i in idx_list)})else:for i in idx_list:points.append(small_storms[i])all_boxes = large_storms + boxes_from_clustersif len(all_boxes) > 1:merged_boxes = merge_boxes(all_boxes)else:merged_boxes = all_boxesfiltered_points = []for pt in points:inside = any(point_in_box(pt["center_lon"], pt["center_lat"], box) for box in merged_boxes)if not inside:filtered_points.append(pt)return merged_boxes, filtered_points# ==================== 绘图函数 ====================def plot_scit_result(ref, lon, lat, storms, output_path,box_area_threshold=100, cluster_radius_km=20.0, cluster_threshold=10):print("\n【开始绘图】")plt.rcParams['font.sans-serif'] = ['SimHei']plt.rcParams['axes.unicode_minus'] = Falseproj = ccrs.PlateCarree()fig, ax = plt.subplots(figsize=(10, 10), subplot_kw={'projection': proj})ax.set_facecolor('white')levels = [0, 10, 15, 20, 25, 30, 35, 40, 45, 50, 55, 60, 65, 70]colors = [(1,1,1,0), (0,161/255,247/255), (1/255,236/255,237/255),(0,216/255,0), (1/255,144/255,0), (255/255,255/255,0),(231/255,192/255,0), (255/255,144/255,0), (255/255,0,0),(214/255,0,0), (192/255,0,0), (255/255,0,240/255),(150/255,0,180/255), (173/255,144/255,240/255)]cmap = ListedColormap(colors)norm = BoundaryNorm(levels, cmap.N, clip=True)cf = ax.contourf(lon, lat, ref, levels=levels, cmap=cmap, norm=norm,transform=proj, antialiased=True)boxes, points = merge_and_classify_storms(storms, cluster_radius_km, cluster_threshold, box_area_threshold)for box in boxes:ax.plot(box["bbox_lon"], box["bbox_lat"], 'k-', linewidth=0.8, transform=proj, zorder=5)for pt in points:ax.plot(pt["center_lon"], pt["center_lat"], 'ro', markersize=1, transform=proj, zorder=5)print(f"绘图统计: 矩形框 {len(boxes)} 个, 红点 {len(points)} 个")ax.add_feature(cfeature.BORDERS.with_scale('50m'), linewidth=0.4, color='black', alpha=0.7)ax.add_feature(cfeature.STATES.with_scale('50m'), linewidth=0.3, color='dimgray', alpha=0.6)gl = ax.gridlines(crs=proj, draw_labels=True, linewidth=0.5, color='gray', alpha=0.5, linestyle='--')gl.top_labels = Falsegl.right_labels = Falsegl.xlabel_style = {'size': 11}gl.ylabel_style = {'size': 11}cbar = plt.colorbar(cf, ax=ax, shrink=0.7, pad=0.02)cbar.set_label('Reflectivity (dBZ)', fontsize=12, fontweight='bold')ax.set_title('SCIT Storm Cell Identification\nMerged boxes + filtered red dots', fontsize=14, fontweight='bold', pad=20)plt.savefig(output_path, dpi=300, bbox_inches='tight', facecolor='white', edgecolor='none')plt.close()print(f"✅ 绘图完成: {output_path}")# ==================== 主程序 ====================if __name__ == "__main__":# 请修改为您的实际文件路径radar_file = r"Z_RADA_C_BABJ_2xxxxxxxxx_P_DOR_ACHN_CREF_xxxxxx_084800.bin"output_img = "SCIT_PUZZLE3_1.png"ref, lon, lat = read_puzzle3_data(radar_file)if ref is None:print("数据读取失败,程序退出")exit(1)scit = SCIT_StormIdentifier(ref, lon, lat, dbz_threshold=35, min_storm_size=50)storms = scit.run_scit()plot_scit_result(ref, lon, lat, storms, output_img,box_area_threshold=100,cluster_radius_km=50.0,cluster_threshold=20)
