一、开篇:从“这是什么”到“哪里是什么”
前两天的学习中,我们掌握了:
但特征提取只能告诉我们“这里有角点”,目标检测只能告诉我们“这里有猫”。如果我们想知道猫的轮廓在哪里、天空和道路的边界在哪里,就需要更精细的理解——图像分割。
图像分割 = 将图像划分为若干具有语义含义的区域
如果说分类是回答“这是什么”,检测是回答“它在哪(框)”,那么分割就是回答“它占据哪些像素”。
二、图像分割的三种层次
2.1 语义分割(Semantic Segmentation)
任务:给图像中的每个像素分配一个类别标签。
输出:与输入同尺寸的“标签图”(每个像素一个整数)
2.2 实例分割(Instance Segmentation)
任务:不仅要分割出物体,还要区分不同的个体。
输出:每个实例的掩膜(Mask)
2.3 全景分割(Panoptic Segmentation)
任务:语义分割 + 实例分割的统一。
三种分割的对比:
原始图像:两只猫在草地上语义分割:所有猫像素=猫,草地像素=草地实例分割:猫A掩膜,猫B掩膜,草地像素=草地全景分割:猫A掩膜,猫B掩膜,草地像素=草地(stuff)
三、传统图像分割方法(前深度学习时代)
在深度学习普及之前,图像分割主要依靠手工特征和经典算法。这些方法至今仍在某些场景下有用。
3.1 阈值分割(Thresholding)
最简单的分割方法:根据像素灰度值划分。
import cv2import numpy as np# 读取灰度图img = cv2.imread('coins.jpg', cv2.IMREAD_GRAYSCALE)# 全局阈值ret, thresh1 = cv2.threshold(img, 127, 255, cv2.THRESH_BINARY)# 自适应阈值(考虑局部亮度变化)thresh2 = cv2.adaptiveThreshold(img, 255, cv2.ADAPTIVE_THRESH_MEAN_C, cv2.THRESH_BINARY, 11, 2)# Otsu's 二值化(自动计算最优阈值)ret2, thresh3 = cv2.threshold(img, 0, 255, cv2.THRESH_BINARY + cv2.THRESH_OTSU)cv2.imshow('Original', img)cv2.imshow('Global', thresh1)cv2.imshow('Adaptive', thresh2)cv2.imshow('Otsu', thresh3)cv2.waitKey(0)
适用场景:前景背景对比明显的图像(如文档扫描、细胞计数)。
3.2 分水岭算法(Watershed)
将图像视为地形图,像素灰度值为高度,模拟水从低处漫延的过程,最终在山脊处形成分割边界。
import cv2import numpy as np# 读取图像并二值化img = cv2.imread('coins.jpg')gray = cv2.cvtColor(img, cv2.COLOR_BGR2GRAY)ret, thresh = cv2.threshold(gray, 0, 255, cv2.THRESH_BINARY_INV + cv2.THRESH_OTSU)# 去除噪声kernel = np.ones((3,3), np.uint8)opening = cv2.morphologyEx(thresh, cv2.MORPH_OPEN, kernel, iterations=2)# 确定背景区域sure_bg = cv2.dilate(opening, kernel, iterations=3)# 确定前景区域dist_transform = cv2.distanceTransform(opening, cv2.DIST_L2, 5)ret, sure_fg = cv2.threshold(dist_transform, 0.7 * dist_transform.max(), 255, 0)# 找到未知区域sure_fg = np.uint8(sure_fg)unknown = cv2.subtract(sure_bg, sure_fg)# 标记连通组件ret, markers = cv2.connectedComponents(sure_fg)markers = markers + 1markers[unknown == 255] = 0# 应用分水岭markers = cv2.watershed(img, markers)img[markers == -1] = [0, 0, 255] # 边界标记为红色cv2.imshow('Watershed', img)cv2.waitKey(0)
适用场景:分离相互接触的物体(如硬币、细胞)。
3.3 区域生长(Region Growing)
从种子点开始,逐渐合并相似邻域像素。
def region_growing(img, seed, threshold=10): """简单区域生长算法""" h, w = img.shape segmented = np.zeros((h, w), dtype=np.uint8) visited = np.zeros((h, w), dtype=bool) seed_value = img[seed] queue = [seed] visited[seed] = True while queue: x, y = queue.pop(0) segmented[x, y] = 255 # 检查4邻域 for dx, dy in [(1,0), (-1,0), (0,1), (0,-1)]: nx, ny = x + dx, y + dy if 0 <= nx < h and 0 <= ny < w and not visited[nx, ny]: if abs(int(img[nx, ny]) - int(seed_value)) < threshold: visited[nx, ny] = True queue.append((nx, ny)) return segmented
适用场景:交互式分割,医学图像中的特定器官提取。
四、深度学习时代的图像分割革命
2015年,全卷积网络(FCN) 的提出标志着图像分割进入深度学习时代。
4.1 全卷积网络(FCN)—— 端到端像素预测
核心思想:将分类网络的最后全连接层替换为卷积层,输出与输入同尺寸的“热力图”。
关键创新:
上采样:通过转置卷积(反卷积)恢复空间尺寸
跳跃连接:融合深层语义信息和浅层细节信息
import torchimport torch.nn as nnclass FCN8s(nn.Module): """简化版FCN-8s""" def __init__(self, num_classes): super().__init__() # 编码器(VGG前几层) self.conv1 = nn.Sequential( nn.Conv2d(3, 64, 3, padding=1), nn.ReLU(), nn.Conv2d(64, 64, 3, padding=1), nn.ReLU(), nn.MaxPool2d(2) ) self.conv2 = nn.Sequential( nn.Conv2d(64, 128, 3, padding=1), nn.ReLU(), nn.Conv2d(128, 128, 3, padding=1), nn.ReLU(), nn.MaxPool2d(2) ) self.conv3 = nn.Sequential( nn.Conv2d(128, 256, 3, padding=1), nn.ReLU(), nn.Conv2d(256, 256, 3, padding=1), nn.ReLU(), nn.Conv2d(256, 256, 3, padding=1), nn.ReLU(), nn.MaxPool2d(2) ) # ... 更多层 # 上采样与跳跃连接 self.score_pool3 = nn.Conv2d(256, num_classes, 1) self.upscore2 = nn.ConvTranspose2d(num_classes, num_classes, 4, stride=2, padding=1) self.upscore8 = nn.ConvTranspose2d(num_classes, num_classes, 16, stride=8, padding=4) def forward(self, x): h = self.conv1(x) h = self.conv2(h) h = self.conv3(h) # 上采样与跳跃连接 score = self.score_pool3(h) upscore = self.upscore2(score) # ... 融合更多层 out = self.upscore8(upscore) return out
4.2 U-Net —— 医学图像分割的王者
结构特点:对称的编码器-解码器结构,加上跳跃连接(Skip Connection),使得网络能够同时利用深层语义和浅层细节。
为什么U-Net在医学图像中表现出色?
医学图像通常边界模糊,需要细节信息
数据集较小,U-Net参数量适中,不易过拟合
import torchimport torch.nn as nnclass DoubleConv(nn.Module): """(Conv => BN => ReLU) * 2""" def __init__(self, in_ch, out_ch): super().__init__() self.conv = nn.Sequential( nn.Conv2d(in_ch, out_ch, 3, padding=1), nn.BatchNorm2d(out_ch), nn.ReLU(inplace=True), nn.Conv2d(out_ch, out_ch, 3, padding=1), nn.BatchNorm2d(out_ch), nn.ReLU(inplace=True) ) def forward(self, x): return self.conv(x)class UNet(nn.Module): def __init__(self, in_channels=3, out_channels=1): super().__init__() # 编码器 self.enc1 = DoubleConv(in_channels, 64) self.pool1 = nn.MaxPool2d(2) self.enc2 = DoubleConv(64, 128) self.pool2 = nn.MaxPool2d(2) self.enc3 = DoubleConv(128, 256) self.pool3 = nn.MaxPool2d(2) self.enc4 = DoubleConv(256, 512) self.pool4 = nn.MaxPool2d(2) # 瓶颈 self.bottleneck = DoubleConv(512, 1024) # 解码器 self.up4 = nn.ConvTranspose2d(1024, 512, 2, stride=2) self.dec4 = DoubleConv(1024, 512) self.up3 = nn.ConvTranspose2d(512, 256, 2, stride=2) self.dec3 = DoubleConv(512, 256) self.up2 = nn.ConvTranspose2d(256, 128, 2, stride=2) self.dec2 = DoubleConv(256, 128) self.up1 = nn.ConvTranspose2d(128, 64, 2, stride=2) self.dec1 = DoubleConv(128, 64) self.out = nn.Conv2d(64, out_channels, 1) def forward(self, x): # 编码 e1 = self.enc1(x) e2 = self.enc2(self.pool1(e1)) e3 = self.enc3(self.pool2(e2)) e4 = self.enc4(self.pool3(e3)) # 瓶颈 b = self.bottleneck(self.pool4(e4)) # 解码 + 跳跃连接 d4 = self.up4(b) d4 = torch.cat([d4, e4], dim=1) d4 = self.dec4(d4) d3 = self.up3(d4) d3 = torch.cat([d3, e3], dim=1) d3 = self.dec3(d3) d2 = self.up2(d3) d2 = torch.cat([d2, e2], dim=1) d2 = self.dec2(d2) d1 = self.up1(d2) d1 = torch.cat([d1, e1], dim=1) d1 = self.dec1(d1) out = self.out(d1) return out
4.3 DeepLab系列 —— 空洞卷积的威力
DeepLab通过空洞卷积(Atrous Convolution) 扩大感受野,同时保持特征图分辨率,避免下采样带来的信息丢失。
DeepLab v3+ 结合了空间金字塔池化(ASPP)和编码器-解码器结构,是目前最先进的语义分割架构之一。
class ASPP(nn.Module): """空洞空间金字塔池化""" def __init__(self, in_ch, out_ch, rates=[6, 12, 18]): super().__init__() self.conv1 = nn.Conv2d(in_ch, out_ch, 1) self.conv2 = nn.Conv2d(in_ch, out_ch, 3, padding=rates[0], dilation=rates[0]) self.conv3 = nn.Conv2d(in_ch, out_ch, 3, padding=rates[1], dilation=rates[1]) self.conv4 = nn.Conv2d(in_ch, out_ch, 3, padding=rates[2], dilation=rates[2]) self.pool = nn.Sequential( nn.AdaptiveAvgPool2d(1), nn.Conv2d(in_ch, out_ch, 1), nn.Upsample(scale_factor=64, mode='bilinear', align_corners=True) ) self.final = nn.Conv2d(out_ch * 5, out_ch, 1) def forward(self, x): x1 = self.conv1(x) x2 = self.conv2(x) x3 = self.conv3(x) x4 = self.conv4(x) x5 = self.pool(x) return self.final(torch.cat([x1, x2, x3, x4, x5], dim=1))
4.4 Mask R-CNN —— 实例分割的标杆
Mask R-CNN = Faster R-CNN(目标检测)+ 并行分支预测掩膜。
核心创新:
五、实战项目:U-Net实现细胞核分割
我们将使用U-Net对显微镜图像进行细胞核分割。
5.1 数据集准备
使用 Kaggle 的 2018 Data Science Bowl 数据集(细胞核分割)。
import osimport cv2import numpy as npimport torchfrom torch.utils.data import Dataset, DataLoaderfrom torchvision import transformsclass NucleiDataset(Dataset): """细胞核分割数据集""" def __init__(self, img_dir, mask_dir, transform=None): self.img_dir = img_dir self.mask_dir = mask_dir self.transform = transform self.images = sorted(os.listdir(img_dir)) def __len__(self): return len(self.images) def __getitem__(self, idx): img_name = self.images[idx] img_path = os.path.join(self.img_dir, img_name) mask_path = os.path.join(self.mask_dir, img_name) # 假设掩膜同名 image = cv2.imread(img_path) image = cv2.cvtColor(image, cv2.COLOR_BGR2RGB) mask = cv2.imread(mask_path, cv2.IMREAD_GRAYSCALE) # 归一化 image = image.astype(np.float32) / 255.0 mask = mask.astype(np.float32) / 255.0 # 二值掩膜 # 转成Tensor image = torch.from_numpy(image).permute(2, 0, 1) mask = torch.from_numpy(mask).unsqueeze(0) # (1, H, W) return image, mask
5.2 损失函数:Dice Loss + BCE
对于医学图像分割,常用Dice系数作为评估指标,其损失函数为 1 - Dice。
class DiceBCELoss(nn.Module): """Dice Loss + Binary Cross Entropy""" def __init__(self, smooth=1e-6): super().__init__() self.smooth = smooth self.bce = nn.BCEWithLogitsLoss() def forward(self, pred, target): pred = torch.sigmoid(pred) # 转为概率 # Dice Loss intersection = (pred * target).sum(dim=(2,3)) union = pred.sum(dim=(2,3)) + target.sum(dim=(2,3)) dice = (2.0 * intersection + self.smooth) / (union + self.smooth) dice_loss = 1 - dice.mean() # BCE Loss bce_loss = self.bce(pred, target) return dice_loss + bce_loss
5.3 训练循环
device = torch.device('cuda' if torch.cuda.is_available() else 'cpu')model = UNet(in_channels=3, out_channels=1).to(device)criterion = DiceBCELoss()optimizer = torch.optim.Adam(model.parameters(), lr=1e-4)# 数据加载train_loader = DataLoader(NucleiDataset('train/images', 'train/masks'), batch_size=4, shuffle=True, num_workers=2)for epoch in range(20): model.train() total_loss = 0 for images, masks in train_loader: images, masks = images.to(device), masks.to(device) optimizer.zero_grad() outputs = model(images) loss = criterion(outputs, masks) loss.backward() optimizer.step() total_loss += loss.item() print(f"Epoch {epoch+1}: Loss = {total_loss/len(train_loader):.4f}")
5.4 预测与可视化
model.eval()with torch.no_grad(): for i, (images, masks) in enumerate(val_loader): images = images.to(device) outputs = model(images) preds = torch.sigmoid(outputs).cpu().numpy() # 二值化(阈值0.5) preds = (preds > 0.5).astype(np.uint8) # 显示 import matplotlib.pyplot as plt fig, axes = plt.subplots(1, 3) axes[0].imshow(images[0].permute(1,2,0).cpu()) axes[0].set_title('Input') axes[1].imshow(masks[0][0], cmap='gray') axes[1].set_title('Ground Truth') axes[2].imshow(preds[0][0], cmap='gray') axes[2].set_title('Prediction') plt.show() if i > 2: break
六、评估指标
6.1 IoU(Intersection over Union)—— 交并比
IoU = (预测掩膜 ∩ 真实掩膜) / (预测掩膜 ∪ 真实掩膜)
def iou_score(pred, target): intersection = (pred & target).sum() union = (pred | target).sum() if union == 0: return 1.0 # 都是0,完美 return intersection / union
6.2 Dice系数
Dice = 2 * |A ∩ B| / (|A| + |B|)
等价于 F1分数,在医学图像中更常用。
6.3 像素准确率(Pixel Accuracy)
注意:当类别不平衡时(如背景占95%),PA可能虚高,应结合IoU使用。七、常用数据集
| | | | |
|---|
| PASCAL VOC 2012 | | | | |
| MS COCO | | | | |
| Cityscapes | | | | |
| ADE20K | | | | |
| KITTI | | | | |
| ISIC | | | | |
总结:从像素到理解
图像分割是计算机视觉从“看见”走向“理解”的关键一步。它赋予每个像素语义,让机器真正理解场景的构成。
今天,我们从传统方法走到深度学习,从理论走到实战,亲手训练了一个U-Net模型。这些技术正被广泛应用于:
自动驾驶:分割道路、车辆、行人
医疗影像:肿瘤分割、器官提取
卫星遥感:土地覆盖分类
工业质检:缺陷检测