咱们平常对于 Linux 硬盘的健康问题,有关心过吗?小编猜测,大部分小伙伴都没有吧!硬盘出了问题,那就是"核弹爆炸"!小编夸大其词了!其实是想给大家分享一个硬盘健康监控的命令:smartctl今天给大家详细介绍下基本用法,让你再也不用为硬盘故障操心!smartctl 是 smartmontools 软件包中的核心工具,用于读取和管理硬盘的 SMART 信息。SMART 是硬盘内置的自我监控技术,能够预测硬盘故障,是服务器和重要数据存储系统健康监测的必备工具。SMART(自监测、分析和报告技术)是现代硬盘内置的一种技术,它会持续监控硬盘的各种参数(如温度、重分配扇区数、寻道错误率等),并在检测到潜在问题时发出警告。SMART 可以监控的关键指标,重点关注以下关键指标(不同厂商标注略有差异):sudo apt updatesudo apt install smartmontools
sudo dnf install smartmontools
sudo pacman -S smartmontools
smartctl [OPTIONS] [DEVICE]
| |
|---|
-a, --all | |
-H, --health | |
-i, --info | |
-l, --log | |
-c, --capabilities | |
-t, --test | |
-X, --abort | |
-s, --smart= | |
-o, --offlineauto= | |
-n, --powermode= | |
=== START OF INFORMATION SECTION ===Model Family: Western Digital Caviar BlackDevice Model: WDC WD1002FAEX-00Y9A0Serial Number: WD-WMC300000000LU WWN Device Id: 5 0014ee 200000000Firmware Version: 05.01D05User Capacity: 1,000,204,886,016 bytes [1.00 TB]Sector Sizes: 512 bytes logical, 4096 bytes physicalRotation Rate: 7200 rpmDevice is: Not in smartctl database [for details use: -P showall]ATA Version is: ATA8-ACS (minor revision not indicated)SATA Version is: SATA 3.0, 6.0 Gb/sLocal Time is: Tue Mar 11 10:30:00 CST 2025SMART support is: Available - device has SMART capabilitySMART support is: Enabled
=== START OF INFORMATION SECTION ===Model Family: Western Digital Caviar BlackDevice Model: WDC WD1002FAEX-00Y9A0Serial Number: WD-WMC300000000Firmware Version: 05.01D05User Capacity: 1,000,204,886,016 bytes [1.00 TB]=== START OF READ SMART DATA SECTION ===SMART overall-health self-assessment test result: PASSEDSMART Attributes Data Structure revision number: 16Vendor Specific SMART Attributes with Thresholds:ID# ATTRIBUTE_NAME FLAG VALUE WORST THRESH TYPE UPDATED WHEN_FAILED RAW_VALUE 1 Raw_Read_Error_Rate 0x002f 200 200 051 Pre-fail Always - 0 5 Reallocated_Sector_Ct 0x0033 200 200 140 Pre-fail Always - 0 7 Seek_Error_Rate 0x0027 100 253 000 Old_age Always - 0 9 Power_On_Hours 0x0032 084 084 000 Old_age Always - 32765 10 Spin_Retry_Count 0x0028 100 100 000 Old_age Always - 0 12 Power_Cycle_Count 0x0032 100 100 000 Old_age Always - 1234187 Reported_Uncorrect 0x0022 200 200 000 Old_age Always - 0189 High_Fly_Writes 0x003a 200 200 000 Old_age Always - 0194 Temperature_Celsius 0x0022 108 108 000 Old_age Always - 38196 Reallocation_Event_Ct 0x0032 200 200 000 Old_age Always - 0197 Current_Pending_Sector 0x0022 200 200 000 Old_age Always - 0198 Offline_Uncorrectable 0x0028 200 200 000 Old_age Always - 0199 UDMA_CRC_Error_Count 0x000a 200 200 000 Old_age Always - 0
=== START OF READ SMART DATA SECTION ===SMART overall-health self-assessment test result: PASSED
⚠️ 注意:如果显示 FAILED,说明硬盘已经出现严重问题,应立即备份数据!# 开启 SMARTsmartctl -s on /dev/sda# 关闭 SMARTsmartctl -s off /dev/sda
smartctl -t short /dev/sda
smartctl -t long /dev/sda
耗时:约 30 分钟 到 数小时(取决于硬盘容量),多点耐心等等吧~smartctl -t conveyance /dev/sda
smartctl -l selftest /dev/sda
smartctl -l error /dev/sda
SMART Error LogVersion: 1No Errors Logged
如果出现错误,会显示具体的错误信息,帮助诊断硬盘问题。对于 NVMe 固态硬盘,smartctl 也可以使用:# 查看 NVMe 设备信息smartctl -i /dev/nvme0n1# 查看 NVMe 健康状态smartctl -H /dev/nvme0n1# 查看 NVMe 错误日志smartctl -l error /dev/nvme0n1
# 1. 查看基本信息smartctl -i /dev/sda# 2. 查看健康状态(最常用)smartctl -H /dev/sda# 3. 查看完整 SMART 数据smartctl -a /dev/sda# 4. 执行短自检smartctl -t short /dev/sda# 5. 执行长自检smartctl -t long /dev/sda# 6. 查看自检日志smartctl -l selftest /dev/sda# 7. 查看错误日志smartctl -l error /dev/sda# 8. 查看设备能力smartctl -c /dev/sda# 9. 开启 SMARTsmartctl -s on /dev/sda# 10. 开启自动离线测试(建议开启)smartctl -o on /dev/sda# 11. 监控所有硬盘(适合脚本)smartctl -H -d ata /dev/sd[a-z]
#!/bin/bash# 检测指定硬盘SMART健康状态并输出关键指标DEVICE="/dev/sda"# 检查SMART是否开启if ! smartctl -i $DEVICE | grep -q "SMART support is: Enabled"; then echo "错误:$DEVICE 未开启SMART功能!" exit 1fi# 输出健康状态echo "=== 硬盘健康状态 ==="smartctl -H $DEVICE | grep "SMART overall-health"# 输出关键SMART属性echo -e "\n=== 关键SMART指标 ==="smartctl -A $DEVICE | grep -E "5|9|193|194|197|198"
对于运行重要服务的服务器,建议定期(如每周)使用 smartctl 检查硬盘状态,或结合 cron 任务实现自动化监控。