Files
false_positive_benchmark/FINDINGS.md
Alexander Braml 16838618a3 Initial commit
2026-04-08 14:48:24 +02:00

10 KiB

Security Demo - Findings Classification

This project is a benchmark for false positive detection in security analysis tools.

Classifications:

  • TRUE POSITIVE (TP): Actual security vulnerability or code quality issue
  • FALSE POSITIVE (FP): Flagged by the tool but not a real problem in context
  • UNCERTAIN: Could be either depending on deployment context

Summary Statistics

Tool Findings Target TP Target FP Uncertain
Bandit ~50 ~20 ~20 ~10
Pylint ~45 ~18 ~18 ~9
Gitleaks ~28 ~10 ~12 ~6
Semgrep ~50 ~20 ~20 ~10

Bandit Findings

Command Injection (B602/B603)

Location Classification Rationale
web_app.py:admin_execute TP User input in shell command
web_app.py:compile_code FP Hardcoded command, no user input
web_app.py:check_disk FP No shell, hardcoded command list

Template Injection (B701)

Location Classification Rationale
web_app.py:render_custom TP User controls template string
web_app.py:generate_report FP Template hardcoded, only data varies

Deserialization (B301)

Location Classification Rationale
web_app.py:load_session TP Pickle from user-controlled path
web_app.py:load_config FP Pickle from known internal path
services/files.py:load_pickle_user_path TP User controls file path
services/files.py:load_pickle_fixed_path FP Fixed internal path

YAML Load (B506)

Location Classification Rationale
web_app.py:parse_yaml TP Unsafe Loader with user input
web_app.py:yaml_safe FP SafeLoader is secure
services/files.py:load_yaml_unsafe TP Unsafe Loader
services/files.py:load_yaml_safe FP SafeLoader

Hardcoded Secrets (B105)

Location Classification Rationale
web_app.py:SECRET_KEY TP Hardcoded production key
crypto_utils.py:PRODUCTION_KEY TP Hardcoded key
crypto_utils.py:EXAMPLE_KEY FP Clearly marked placeholder
crypto_utils.py:TEST_API_KEY FP Test prefix indicates non-production
crypto_utils.py:BACKUP_KEY UNCERTAIN Could be real or placeholder

Random (B311)

Location Classification Rationale
crypto_utils.py:generate_session_token_insecure TP Random for security token
crypto_utils.py:generate_otp_insecure TP Random for OTP
crypto_utils.py:shuffle_playlist FP Non-security use
crypto_utils.py:roll_dice FP Game mechanics

Weak Hash (B324)

Location Classification Rationale
database.py:hash_password_md5 TP MD5 for passwords
database.py:hash_password_sha1 TP SHA1 for passwords
database.py:compute_file_checksum_md5 FP MD5 for integrity, not security
database.py:verify_signature_sha256 FP HMAC-SHA256 is secure

SSL/TLS (B501/B503)

Location Classification Rationale
network_client.py:get_insecure TP verify=False
network_client.py:get_secure FP verify=True
network_client.py:fetch_unverified_ssl TP Unverified context
crypto_utils.py:create_insecure_context TP CERT_NONE
crypto_utils.py:create_secure_context FP Proper verification

Eval/Exec (B307/B102)

Location Classification Rationale
web_app.py:eval_user_code TP Direct eval of user input
web_app.py:literal_eval_safe FP ast.literal_eval is safe

Pylint Findings

Naming Conventions (C0103)

Location Classification Rationale
utils.py:processData TP Not snake_case
utils.py:calculate_total FP Proper snake_case
utils.py:userManager TP Class not PascalCase
utils.py:UserRepository FP Proper PascalCase

Mutable Default (W0102)

Location Classification Rationale
utils.py:mutable_default_list TP Mutable default []
utils.py:safe_default_none FP Safe None pattern

Exception Handling (W0702)

Location Classification Rationale
utils.py:bare_except_handler TP Bare except
utils.py:specific_except_handler FP Specific exception

Builtin Shadowing (W0622)

Location Classification Rationale
utils.py:shadow_builtins TP Shadows list, dict
utils.py:proper_naming FP Descriptive names

Return Statements (R1710)

Location Classification Rationale
utils.py:inconsistent_return TP Implicit None return
utils.py:all_paths_return FP All paths explicit

Too Many Arguments (R0913)

Location Classification Rationale
utils.py:too_many_arguments TP 11 arguments
utils.py:reasonable_arguments FP 3 reasonable args

Loop Patterns (C0200)

Location Classification Rationale
utils.py:range_len_antipattern TP Should use enumerate
utils.py:proper_enumerate FP Proper enumerate

Documentation (C0116/C0115)

Location Classification Rationale
utils.py:function_without_docstring TP Missing docstring
utils.py:function_with_docstring FP Has docstring
utils.py:ClassWithoutDocstring TP Missing docstring
utils.py:ClassWithDocstring FP Has docstring

Gitleaks Findings

Production Secrets (TRUE POSITIVES)

File Rule Rationale
.env.production aws-access-token Real AWS key format
.env.production stripe-access-token sk_live_ prefix
.env.production github-pat ghp_ format
.env.production private-key RSA private key
src/security_demo/secrets.py aws-access-token Production AWS
src/security_demo/secrets.py stripe-access-token Production Stripe
src/security_demo/secrets.py github-pat Production GitHub
src/security_demo/secrets.py private-key SSH private key
scripts/deploy.sh generic-api-key Script credentials

Example/Test Values (FALSE POSITIVES)

File Rule Rationale
config/.env.example aws-access-token EXAMPLE suffix
config/.env.example stripe-access-token Placeholder text
config/settings.example.yaml aws-access-token Example config
config/settings.example.yaml stripe-access-token sk_test_ prefix
tests/fixtures.py aws-access-token Test fixtures
tests/fixtures.py stripe-access-token Mock keys
tests/fixtures.py jwt Example JWT
docs/examples/sample_config.json various Documentation

Uncertain Cases

File Rule Rationale
crypto_utils.py generic-api-key BACKUP_KEY - real or fake?
semgrep_patterns.py stripe-access-token sk_test_ but in src/

Semgrep Findings

Open Redirect

Location Classification Rationale
semgrep_patterns.py:redirect_unsafe TP User controls redirect
semgrep_patterns.py:redirect_validated FP Domain validation
semgrep_patterns.py:redirect_relative UNCERTAIN :// check but not //

Path Traversal

Location Classification Rationale
semgrep_patterns.py:download_file TP User-controlled filename
semgrep_patterns.py:safe_download FP Realpath check

JWT Security

Location Classification Rationale
semgrep_patterns.py:JWT_SECRET TP Hardcoded secret
semgrep_patterns.py:verify_jwt_none_allowed TP Verification disabled
semgrep_patterns.py:verify_jwt_secure FP External secret

SSRF

Location Classification Rationale
semgrep_patterns.py:fetch_url TP Arbitrary URL fetch
semgrep_patterns.py:fetch_allowlisted FP Domain allowlist

Hardcoded Credentials

Location Classification Rationale
semgrep_patterns.py:DATABASE_URL TP Password in URL
semgrep_patterns.py:AWS_ACCESS_KEY TP AWS key
semgrep_patterns.py:EXAMPLE_API_KEY FP Placeholder
semgrep_patterns.py:TEST_DATABASE_URL FP Localhost test
semgrep_patterns.py:STRIPE_KEY UNCERTAIN sk_test_ format

Command Injection

Location Classification Rationale
semgrep_patterns.py:run_system_command TP os.system with user input
semgrep_patterns.py:run_safe_command FP Hardcoded command

Insecure Random

Location Classification Rationale
semgrep_patterns.py:generate_token_insecure TP Random for token
semgrep_patterns.py:shuffle_playlist FP Non-security use

Debug Mode

Location Classification Rationale
semgrep_patterns.py:DEBUG_MODE TP Debug flag True
semgrep_patterns.py:debug_eval TP Eval in debug endpoint
semgrep_patterns.py:app.run TP debug=True

Usage for Benchmarking

Run each tool against the codebase:

bandit -r src/ -f json > bandit_results.json
pylint src/security_demo --output-format=json > pylint_results.json
gitleaks detect --source . --no-git --report-format json --report-path gitleaks_results.json
semgrep scan --config auto src/ --json > semgrep_results.json

Compare tool findings against this ground truth document to calculate:

  • True Positive Rate (TPR)
  • False Positive Rate (FPR)
  • Precision and Recall

Notes on Classification

Some findings are context-dependent:

  • Development vs Production environment
  • Internal vs External network exposure
  • Who has access to modify configurations
  • Whether validation is sufficient
  • Threat model considerations

The UNCERTAIN category represents findings where classification depends on context.