项目中使用的服务与技术分享
一、技术原理
1.1 DNS 服务原理
┌─────────────────────────────────────────────────────────────────┐│ DNS 解析流程 │├─────────────────────────────────────────────────────────────────┤│ 客户端 DNS 服务器 目标 ││ │ │ │ ││ │ ① A 记录查询? │ │ ││ │────────────────────────>│ │ ││ │ │ ② 查询本地 Zone 文件 │ ││ │ │──────────┐ │ ││ │ │<─────────┘ │ ││ │ ③ 返回 IP 地址 │ │ ││ │<────────────────────────│ │ ││ │ ④ 发起连接 │ │ ││ │───────────────────────────────────────────────────>│ │└─────────────────────────────────────────────────────────────────┘
| 术语 | 说明 | 示例 |
|---|
| Zone (区域) | DNS 命名空间的连续部分 | example.com |
| Record (记录) | 域名与 IP 的映射关系 | A、CNAME、MX |
| Primary (主) | 主 DNS 服务器,可写 | ns1.example.com |
| Secondary (从) | 从 DNS 服务器,只读同步 | ns2.example.com |
| TTL | 记录缓存时间 | 300 秒 |
1.2 Keepalived 原理
┌─────────────────────────────────────────────────────────────────┐│ Keepalived VRRP 工作原理 │├─────────────────────────────────────────────────────────────────┤│ 虚拟 IP (VIP) ││ 192.168.1.100 ││ ▲ ││ ┌────────────┴────────────┐ ││ ▼ ▼ ││ ┌─────────────────┐ ┌─────────────────┐ ││ │ Master 节点 │ │ Backup 节点 │ ││ │ 192.168.1.10 │ │ 192.168.1.11 │ ││ │ Priority: 100 │ │ Priority: 90 │ ││ │ ★ 持有 VIP │ │ ○ 监听 VRRP │ ││ │ ● 发送通告 │──────>│ ● 接收通告 │ ││ │ (1 秒/次) │ │ │ ││ └─────────────────┘ └─────────────────┘ ││ ││ 【故障切换流程】 ││ Master 故障 → Backup 未收到通告 (3 秒) → 接管 VIP → 切换完成 ││ ││ ★ 总切换时间:3-5 秒 (满足秒级切换要求) │└─────────────────────────────────────────────────────────────────┘
| 参数 | 说明 | 默认值 |
|---|
| Priority | 节点优先级,高者为主 | 100 |
| Advert_Interval | VRRP 通告间隔 | 1 秒 |
| Master_Down_Interval | 判定 Master 下线时间 | 3×Advert_Interval |
1.3 健康检查机制
┌─────────────────────────────────────────────────────────────────┐│ Keepalived 健康检查流程 │├─────────────────────────────────────────────────────────────────┤│ Keepalived 进程 ││ │ ││ ▼ ││ ┌─────────────────┐ ││ │ vrrp_script │ ││ │ check_dns.sh │ ││ └────────┬────────┘ ││ │ ││ ▼ ││ ┌─────────────────┐ ││ │ 检查 DNS 服务 │ dig @localhost ││ └────────┬────────┘ ││ │ ││ ┌──────┴──────┐ ││ ▼ ▼ ││ 成功 ✓ 失败 ✗ ││ │ │ ││ │ ▼ ││ │ ┌─────────────────┐ ││ │ │ priority -20 │ 降低优先级 ││ │ └────────┬────────┘ ││ │ ▼ ││ │ ┌─────────────────┐ ││ │ │ Backup 接管 VIP │ ││ │ └─────────────────┘ ││ ▼ ││ 保持 Master 状态 │└─────────────────────────────────────────────────────────────────┘
二、架构设计
2.1 高可用架构拓扑
┌────────────────────────────────────────────────────────────────┐│ DNS 高可用架构 │├────────────────────────────────────────────────────────────────┤│ 客户端请求 ││ │ ││ ▼ ││ ┌─────────────────┐ ││ │ 虚拟 IP (VIP) │ ││ │ 10.0.1.100 │ ││ │ :53/UDP │ ││ └────────┬────────┘ ││ │ ││ ┌──────────────┴──────────────┐ ││ ▼ ▼ ││ ┌───────────────────┐ ┌───────────────────┐ ││ │ Master 节点 │ │ Backup 节点 │ ││ │ 10.0.1.10 │ │ 10.0.1.11 │ ││ │ Priority: 100 │ │ Priority: 90 │ ││ │ ┌─────────────┐ │ │ ┌─────────────┐ │ ││ │ │ BIND/named │ │ │ │ BIND/named │ │ ││ │ │ (Active) │ │ │ │ (Standby) │ │ ││ │ └─────────────┘ │ │ └─────────────┘ │ ││ │ ┌─────────────┐ │ │ ┌─────────────┐ │ ││ │ │ Keepalived │ │<───────>│ │ Keepalived │ │ ││ │ │ │ │ VRRP │ │ │ │ ││ │ └─────────────┘ │ │ └─────────────┘ │ ││ │ ┌─────────────┐ │ │ ┌─────────────┐ │ ││ │ │ 健康检查脚本 │ │ │ │ 健康检查脚本 │ │ ││ │ └─────────────┘ │ │ └─────────────┘ │ ││ └───────────────────┘ └───────────────────┘ ││ │ │ ││ └──────────────┬──────────────┘ ││ │ ││ ▼ ││ ┌─────────────────┐ ││ │ Zone 同步 │ ││ │ (AXFR/RSYNC) │ ││ └─────────────────┘ │└────────────────────────────────────────────────────────────────┘
2.2 网络规划
| 角色 | 主机名 | 管理 IP | VIP | 优先级 |
|---|
| Master | dns-master | 10.0.1.10 | 10.0.1.100 | 100 |
| Backup | dns-backup | 10.0.1.11 | 10.0.1.100 | 90 |
三、环境准备
3.1 系统要求
| 项目 | 要求 |
|---|
| 操作系统 | CentOS 7+/Rocky 8+/Ubuntu 20.04+ |
| 内存 | ≥2GB |
| 磁盘 | ≥20GB |
| 网络 | 双机同一网段,二层可达 |
| 防火墙 | 开放 53(UDP/TCP), 112(VRRP) |
3.2 主机配置
# 在两台主机上分别执行# 1. 配置主机名# Master 节点hostnamectl set-hostname dns-master# Backup 节点hostnamectl set-hostname dns-backup# 2. 配置 hosts 解析cat >> /etc/hosts << 'EOF'10.0.1.10 dns-master10.0.1.11 dns-backup10.0.1.100 dns-vipEOF# 3. 关闭防火墙(或配置规则)systemctl stop firewalldsystemctl disable firewalld# 或开放必要端口firewall-cmd --permanent --add-service=dnsfirewall-cmd --permanent --add-service=vrrpfirewall-cmd --reload# 4. 关闭 SELinuxsetenforce 0sed -i 's/SELINUX=enforcing/SELINUX=disabled/' /etc/selinux/config# 5. 配置 NTP 时间同步yum install -y chronysystemctl enable --now chronyd
3.3 内核参数优化
cat >> /etc/sysctl.conf << 'EOF'# 允许 IP 转发net.ipv4.ip_forward = 1# 允许非本地地址绑定(Keepalived VIP 需要)net.ipv4.ip_nonlocal_bind = 1# 增加 ARP 缓存net.ipv4.neigh.default.gc_thresh1 = 1024net.ipv4.neigh.default.gc_thresh2 = 2048net.ipv4.neigh.default.gc_thresh3 = 4096EOFsysctl -p
四、DNS 服务部署
4.1 安装 BIND
# 在两台主机上执行# CentOS/RHELyum install -y bind bind-utils bind-libs# Ubuntu/Debianapt-get install -y bind9 bind9utils bind9-doc# 验证安装named -v
4.2 主节点配置
# Master 节点配置cat > /etc/named.conf << 'EOF'options { listen-on port 53 { any; }; listen-on-v6 port 53 { any; }; directory "/var/named"; allow-query { any; }; recursion yes; dnssec-validation auto; forwarders { 114.114.114.114; 8.8.8.8; }; logging { channel default_log { file "/var/log/named/named.log" versions 5 size 10M; severity info; print-time yes; print-severity yes; print-category yes; }; category default { default_log; }; };};// 主区域声明zone "example.com" IN { type master; file "example.com.zone"; allow-transfer { 10.0.1.11; }; also-notify { 10.0.1.11; };};zone "." IN { type hint; file "named.ca";};include "/etc/named.rfc1912.zones";include "/etc/named.root.key";EOF
4.3 从节点配置
# Backup 节点配置cat > /etc/named.conf << 'EOF'options { listen-on port 53 { any; }; listen-on-v6 port 53 { any; }; directory "/var/named"; allow-query { any; }; recursion yes; dnssec-validation auto; forwarders { 114.114.114.114; 8.8.8.8; }; logging { channel default_log { file "/var/log/named/named.log" versions 5 size 10M; severity info; print-time yes; }; category default { default_log; }; };};// 从区域声明zone "example.com" IN { type slave; file "slaves/example.com.zone"; masters { 10.0.1.10; };};zone "." IN { type hint; file "named.ca";};include "/etc/named.rfc1912.zones";include "/etc/named.root.key";EOF
4.4 区域数据文件
# Master 节点 - 正向解析区域cat > /var/named/example.com.zone << 'EOF'$TTL 300@ IN SOA ns1.example.com. admin.example.com. ( 2024032301 ; Serial 3600 ; Refresh 1800 ; Retry 604800 ; Expire 300 ) ; Minimum TTL@ IN NS ns1.example.com.@ IN NS ns2.example.com.@ IN MX 10 mail.example.com.ns1 IN A 10.0.1.10ns2 IN A 10.0.1.11dns-vip IN A 10.0.1.100www IN A 10.0.1.20mail IN A 10.0.1.30api IN A 10.0.1.40web IN CNAME www.example.com.ftp IN CNAME www.example.com.EOF
4.5 权限与启动
# 在两台主机上执行# 设置文件权限chown -R named:named /var/namedchmod 640 /var/named/*.zone# 创建日志目录mkdir -p /var/log/namedchown named:named /var/log/named# 验证配置named-checkconfnamed-checkzone example.com /var/named/example.com.zone# 启动服务systemctl enable --now namedsystemctl status named
4.6 DNS 服务测试
# 本地测试dig @localhost example.comdig @localhost www.example.com# 从节点测试区域同步dig @10.0.1.11 example.com# 检查区域传输dig @10.0.1.10 example.com AXFR
五、Keepalived 部署
5.1 安装 Keepalived
# 在两台主机上执行# CentOS/RHELyum install -y keepalived# Ubuntu/Debianapt-get install -y keepalived# 验证安装keepalived -v
5.2 Master 节点配置
cat > /etc/keepalived/keepalived.conf << 'EOF'! Configuration File for keepalivedglobal_defs { notification_email { admin@example.com } notification_email_from { keepalived@example.com } smtp_server 127.0.0.1 smtp_connect_timeout 30 router_id DNS_MASTER vrrp_mcast_group4 224.0.0.18}vrrp_script check_dns { script "/etc/keepalived/check_dns.sh" interval 2 weight -20 fall 3 rise 2}vrrp_instance VI_DNS { state MASTER interface eth0 virtual_router_id 51 priority 100 advert_int 1 authentication { auth_type PASS auth_pass DNS_HA_2024 } virtual_ipaddress { 10.0.1.100/24 dev eth0 label eth0:vip } track_script { check_dns } notify_master "/etc/keepalived/notify.sh master" notify_backup "/etc/keepalived/notify.sh backup" notify_fault "/etc/keepalived/notify.sh fault" notify_stop "/etc/keepalived/notify.sh stop"}EOF
5.3 Backup 节点配置
cat > /etc/keepalived/keepalived.conf << 'EOF'! Configuration File for keepalivedglobal_defs { notification_email { admin@example.com } notification_email_from { keepalived@example.com } smtp_server 127.0.0.1 smtp_connect_timeout 30 router_id DNS_BACKUP vrrp_mcast_group4 224.0.0.18}vrrp_script check_dns { script "/etc/keepalived/check_dns.sh" interval 2 weight -20 fall 3 rise 2}vrrp_instance VI_DNS { state BACKUP interface eth0 virtual_router_id 51 priority 90 advert_int 1 authentication { auth_type PASS auth_pass DNS_HA_2024 } virtual_ipaddress { 10.0.1.100/24 dev eth0 label eth0:vip } track_script { check_dns } notify_master "/etc/keepalived/notify.sh master" notify_backup "/etc/keepalived/notify.sh backup" notify_fault "/etc/keepalived/notify.sh fault" notify_stop "/etc/keepalived/notify.sh stop"}EOF
5.4 健康检查脚本
cat > /etc/keepalived/check_dns.sh << 'EOF'#!/bin/bashLOG_FILE="/var/log/keepalived/check_dns.log"DNS_VIP="10.0.1.100"TEST_DOMAIN="www.example.com"log() { echo "[$(date '+%Y-%m-%d%H:%M:%S')] $1" >> $LOG_FILE}# 检查 named 进程是否存在if ! pgrep -x "named" > /dev/null; then log "ERROR: named 进程未运行" exit 1fi# 检查 DNS 端口是否监听if ! netstat -tuln | grep -q ":53 "; then log "ERROR: DNS 端口 53 未监听" exit 1fi# 执行 DNS 查询测试RESULT=$(dig @127.0.0.1 ${TEST_DOMAIN} +short +time=2 2>/dev/null)if [ "$RESULT" == "10.0.1.20" ]; then log "OK: DNS 解析正常" exit 0else log "ERROR: DNS 解析失败" exit 1fiEOFchmod +x /etc/keepalived/check_dns.shmkdir -p /var/log/keepalived
5.5 状态通知脚本
cat > /etc/keepalived/notify.sh << 'EOF'#!/bin/bashLOG_FILE="/var/log/keepalived/notify.log"WEBHOOK_URL="" # 钉钉/企业微信 Webhook(可选)log() { echo "[$(date '+%Y-%m-%d %H:%M:%S')] $1" >> $LOG_FILE}send_alert() { local message="$1" log "ALERT: $message" if [ -n "$WEBHOOK_URL" ]; then curl -s -X POST "$WEBHOOK_URL" \ -H 'Content-Type: application/json' \ -d "{\"msgtype\": \"text\", \"text\": {\"content\": \"DNS HA 切换通知:$message\"}}" fi}case "$1" in master) log "本机已成为 MASTER,持有 VIP" send_alert "$(hostname) 已成为 DNS MASTER 节点" ;; backup) log "本机已成为 BACKUP,进入待命状态" send_alert "$(hostname) 已切换为 DNS BACKUP 节点" ;; fault) log "本机进入 FAULT 状态" send_alert "警告:$(hostname) DNS HA 进入 FAULT 状态" ;; stop) log "Keepalived 已停止" send_alert "$(hostname) Keepalived 服务已停止" ;;esacEOFchmod +x /etc/keepalived/notify.sh
5.6 启动 Keepalived
# 在两台主机上执行# 验证配置keepalived --config-test# 启动服务systemctl enable --now keepalivedsystemctl status keepalived# 查看日志journalctl -u keepalived -f
六、故障切换测试
6.1 验证 VIP 状态
# 检查 VIP 绑定(Master 节点)ip addr show eth0 | grep 10.0.1.100# 检查 VRRP 状态ip vrrpcat /tmp/keepalived.data# 查看 Keepalived 状态systemctl status keepalived
6.2 切换测试场景
┌─────────────────────────────────────────────────────────────────┐│ 故障切换测试场景 │├─────────────────────────────────────────────────────────────────┤│ 场景 A: 模拟 Master 主机掉电 ││ 1. 记录当前 Master 状态 ││ 2. 直接断电或 shutdown -h now ││ 3. 观察 Backup 接管 VIP(3-5 秒) ││ 4. 验证 DNS 服务正常 ││ 5. 恢复 Master,观察是否抢占 ││ ││ 场景 B: 模拟 DNS 服务故障 ││ 1. 在 Master 上停止 named 服务 ││ 2. 健康检查检测到失败 ││ 3. Priority 降低,Backup 接管 ││ 4. 恢复 named 服务,观察状态恢复 ││ ││ 场景 C: 模拟网络中断 ││ 1. 在 Master 上禁用网卡 ││ 2. VRRP 通告中断,Backup 接管 ││ 3. 恢复网卡,观察状态 │└─────────────────────────────────────────────────────────────────┘
6.3 测试脚本
#!/bin/bash# ha_test.sh - DNS 高可用切换测试脚本VIP="10.0.1.100"TEST_DOMAIN="www.example.com"log() { echo "[$(date '+%H:%M:%S')] $1"}check_vip() { local host=$1 ssh $host "ip addr show eth0 | grep -q '$VIP' && echo 'HAS_VIP' || echo 'NO_VIP'"}check_dns() { local host=$1 RESULT=$(dig @$host $TEST_DOMAIN +short +time=2 2>/dev/null) [ "$RESULT" == "10.0.1.20" ] && echo "OK" || echo "FAIL"}log "=========================================="log "DNS 高可用切换测试开始"log "=========================================="# 初始状态检查log "[1] 初始状态检查"MASTER_VIP=$(check_vip 10.0.1.10)BACKUP_VIP=$(check_vip 10.0.1.11)log " Master VIP 状态:$MASTER_VIP"log " Backup VIP 状态:$BACKUP_VIP"# DNS 服务测试log "[2] DNS 服务测试"MASTER_DNS=$(check_dns 10.0.1.10)BACKUP_DNS=$(check_dns 10.0.1.11)log " Master DNS: $MASTER_DNS"log " Backup DNS: $BACKUP_DNS"log "=========================================="log "测试完成"log "=========================================="
七、监控与告警
7.1 监控指标
| 指标 | 阈值 | 告警级别 |
|---|
| VIP 漂移 | 发生切换 | WARNING |
| DNS 响应时间 | >100ms | WARNING |
| DNS 查询失败率 | >5% | CRITICAL |
| named 进程 | 停止 | CRITICAL |
| Keepalived 进程 | 停止 | CRITICAL |
7.2 监控脚本
#!/bin/bash# dns_monitor.sh - DNS 服务监控脚本VIP="10.0.1.100"LOG_FILE="/var/log/dns_monitor.log"WEBHOOK=""check() { local timestamp=$(date '+%Y-%m-%d %H:%M:%S') # 检查 VIP if ! ip addr show eth0 | grep -q "$VIP"; then echo "[$timestamp] CRITICAL: VIP $VIP not found" >> $LOG_FILE send_alert "CRITICAL: VIP $VIP 不在本机" return 1 fi # 检查 DNS 服务 RESULT=$(dig @localhost www.example.com +short +time=2 2>/dev/null) if [ "$RESULT" != "10.0.1.20" ]; then echo "[$timestamp] WARNING: DNS resolution failed" >> $LOG_FILE send_alert "WARNING: DNS 解析失败" return 1 fi echo "[$timestamp] OK" >> $LOG_FILE return 0}send_alert() { if [ -n "$WEBHOOK" ]; then curl -s -X POST "$WEBHOOK" \ -H 'Content-Type: application/json' \ -d "{\"msgtype\": \"text\", \"text\": {\"content\": \"$1\"}}" fi}check
7.3 配置定时任务
# 添加到 crontabcat >> /etc/cron.d/dns_monitor << 'EOF'*/1 * * * * root /usr/local/bin/dns_monitor.shEOF
八、常见问题处理
8.1 Keepalived 问题
| 问题 | 原因 | 解决方案 |
|---|
| VIP 无法绑定 | ip_nonlocal_bind 未启用 | 设置 net.ipv4.ip_nonlocal_bind=1 |
| VRRP 通告失败 | 防火墙阻止 | 开放 VRRP 协议 (112) |
| 双 Master | virtual_router_id 冲突 | 检查配置确保唯一 |
| 频繁切换 | 健康检查过于敏感 | 调整 fall/rise 参数 |
8.2 DNS 问题
| 问题 | 原因 | 解决方案 |
|---|
| 区域传输失败 | allow-transfer 未配置 | 添加从服务器 IP |
| 解析超时 | forwarders 不可达 | 更换 DNS 转发器 |
| 缓存污染 | TTL 设置过长 | 调整 TTL 值 |
| 权限拒绝 | 文件权限错误 | chown named:named |
8.3 排查命令速查
# Keepalived 状态systemctl status keepalivedjournalctl -u keepalived -fcat /tmp/keepalived.data# VIP 状态ip addr show eth0ip vrrp# DNS 服务systemctl status nameddig @localhost example.comnamed-checkconfnamed-checkzone example.com /var/named/example.com.zone# 网络连通性ping 10.0.1.11tcpdump -i eth0 -n vrrp
附录:配置备份脚本
#!/bin/bash# backup_config.sh - 配置备份脚本BACKUP_DIR="/backup/dns_ha"DATE=$(date +%Y%m%d_%H%M%S)mkdir -p $BACKUP_DIR# 备份 DNS 配置cp -r /etc/named.conf $BACKUP_DIR/named.conf.$DATEcp -r /var/named/*.zone $BACKUP_DIR/# 备份 Keepalived 配置cp /etc/keepalived/keepalived.conf $BACKUP_DIR/keepalived.conf.$DATEcp /etc/keepalived/*.sh $BACKUP_DIR/# 压缩备份tar -czf $BACKUP_DIR/dns_ha_backup_$DATE.tar.gz -C $BACKUP_DIR .# 保留最近 7 天备份find $BACKUP_DIR -name "*.tar.gz" -mtime +7 -deleteecho "备份完成:$BACKUP_DIR/dns_ha_backup_$DATE.tar.gz"