alexander/false_positive_benchmark

Fork 0

Files

Alexander Braml 16838618a3 Initial commit

2026-04-08 14:48:24 +02:00

10 KiB

Raw Blame History

Security Demo - Findings Classification

This project is a benchmark for false positive detection in security analysis tools.

Classifications:

TRUE POSITIVE (TP): Actual security vulnerability or code quality issue
FALSE POSITIVE (FP): Flagged by the tool but not a real problem in context
UNCERTAIN: Could be either depending on deployment context

Summary Statistics

Tool	Findings	Target TP	Target FP	Uncertain
Bandit	~50	~20	~20	~10
Pylint	~45	~18	~18	~9
Gitleaks	~28	~10	~12	~6
Semgrep	~50	~20	~20	~10

Bandit Findings

Command Injection (B602/B603)

Location	Classification	Rationale
web_app.py:admin_execute	TP	User input in shell command
web_app.py:compile_code	FP	Hardcoded command, no user input
web_app.py:check_disk	FP	No shell, hardcoded command list

Template Injection (B701)

Location	Classification	Rationale
web_app.py:render_custom	TP	User controls template string
web_app.py:generate_report	FP	Template hardcoded, only data varies

Deserialization (B301)

Location	Classification	Rationale
web_app.py:load_session	TP	Pickle from user-controlled path
web_app.py:load_config	FP	Pickle from known internal path
services/files.py:load_pickle_user_path	TP	User controls file path
services/files.py:load_pickle_fixed_path	FP	Fixed internal path

YAML Load (B506)

Location	Classification	Rationale
web_app.py:parse_yaml	TP	Unsafe Loader with user input
web_app.py:yaml_safe	FP	SafeLoader is secure
services/files.py:load_yaml_unsafe	TP	Unsafe Loader
services/files.py:load_yaml_safe	FP	SafeLoader

Hardcoded Secrets (B105)

Location	Classification	Rationale
web_app.py:SECRET_KEY	TP	Hardcoded production key
crypto_utils.py:PRODUCTION_KEY	TP	Hardcoded key
crypto_utils.py:EXAMPLE_KEY	FP	Clearly marked placeholder
crypto_utils.py:TEST_API_KEY	FP	Test prefix indicates non-production
crypto_utils.py:BACKUP_KEY	UNCERTAIN	Could be real or placeholder

Random (B311)

Location	Classification	Rationale
crypto_utils.py:generate_session_token_insecure	TP	Random for security token
crypto_utils.py:generate_otp_insecure	TP	Random for OTP
crypto_utils.py:shuffle_playlist	FP	Non-security use
crypto_utils.py:roll_dice	FP	Game mechanics

Weak Hash (B324)

Location	Classification	Rationale
database.py:hash_password_md5	TP	MD5 for passwords
database.py:hash_password_sha1	TP	SHA1 for passwords
database.py:compute_file_checksum_md5	FP	MD5 for integrity, not security
database.py:verify_signature_sha256	FP	HMAC-SHA256 is secure

SSL/TLS (B501/B503)

Location	Classification	Rationale
network_client.py:get_insecure	TP	verify=False
network_client.py:get_secure	FP	verify=True
network_client.py:fetch_unverified_ssl	TP	Unverified context
crypto_utils.py:create_insecure_context	TP	CERT_NONE
crypto_utils.py:create_secure_context	FP	Proper verification

Eval/Exec (B307/B102)

Location	Classification	Rationale
web_app.py:eval_user_code	TP	Direct eval of user input
web_app.py:literal_eval_safe	FP	ast.literal_eval is safe

Pylint Findings

Naming Conventions (C0103)

Location	Classification	Rationale
utils.py:processData	TP	Not snake_case
utils.py:calculate_total	FP	Proper snake_case
utils.py:userManager	TP	Class not PascalCase
utils.py:UserRepository	FP	Proper PascalCase

Mutable Default (W0102)

Location	Classification	Rationale
utils.py:mutable_default_list	TP	Mutable default []
utils.py:safe_default_none	FP	Safe None pattern

Exception Handling (W0702)

Location	Classification	Rationale
utils.py:bare_except_handler	TP	Bare except
utils.py:specific_except_handler	FP	Specific exception

Builtin Shadowing (W0622)

Location	Classification	Rationale
utils.py:shadow_builtins	TP	Shadows list, dict
utils.py:proper_naming	FP	Descriptive names

Return Statements (R1710)

Location	Classification	Rationale
utils.py:inconsistent_return	TP	Implicit None return
utils.py:all_paths_return	FP	All paths explicit

Too Many Arguments (R0913)

Location	Classification	Rationale
utils.py:too_many_arguments	TP	11 arguments
utils.py:reasonable_arguments	FP	3 reasonable args

Loop Patterns (C0200)

Location	Classification	Rationale
utils.py:range_len_antipattern	TP	Should use enumerate
utils.py:proper_enumerate	FP	Proper enumerate

Documentation (C0116/C0115)

Location	Classification	Rationale
utils.py:function_without_docstring	TP	Missing docstring
utils.py:function_with_docstring	FP	Has docstring
utils.py:ClassWithoutDocstring	TP	Missing docstring
utils.py:ClassWithDocstring	FP	Has docstring

Gitleaks Findings

Production Secrets (TRUE POSITIVES)

File	Rule	Rationale
.env.production	aws-access-token	Real AWS key format
.env.production	stripe-access-token	sk_live_ prefix
.env.production	github-pat	ghp_ format
.env.production	private-key	RSA private key
src/security_demo/secrets.py	aws-access-token	Production AWS
src/security_demo/secrets.py	stripe-access-token	Production Stripe
src/security_demo/secrets.py	github-pat	Production GitHub
src/security_demo/secrets.py	private-key	SSH private key
scripts/deploy.sh	generic-api-key	Script credentials

Example/Test Values (FALSE POSITIVES)

File	Rule	Rationale
config/.env.example	aws-access-token	EXAMPLE suffix
config/.env.example	stripe-access-token	Placeholder text
config/settings.example.yaml	aws-access-token	Example config
config/settings.example.yaml	stripe-access-token	sk_test_ prefix
tests/fixtures.py	aws-access-token	Test fixtures
tests/fixtures.py	stripe-access-token	Mock keys
tests/fixtures.py	jwt	Example JWT
docs/examples/sample_config.json	various	Documentation

Uncertain Cases

File	Rule	Rationale
crypto_utils.py	generic-api-key	BACKUP_KEY - real or fake?
semgrep_patterns.py	stripe-access-token	sk_test_ but in src/

Semgrep Findings

Open Redirect

Location	Classification	Rationale
semgrep_patterns.py:redirect_unsafe	TP	User controls redirect
semgrep_patterns.py:redirect_validated	FP	Domain validation
semgrep_patterns.py:redirect_relative	UNCERTAIN	:// check but not //

Path Traversal

Location	Classification	Rationale
semgrep_patterns.py:download_file	TP	User-controlled filename
semgrep_patterns.py:safe_download	FP	Realpath check

JWT Security

Location	Classification	Rationale
semgrep_patterns.py:JWT_SECRET	TP	Hardcoded secret
semgrep_patterns.py:verify_jwt_none_allowed	TP	Verification disabled
semgrep_patterns.py:verify_jwt_secure	FP	External secret

SSRF

Location	Classification	Rationale
semgrep_patterns.py:fetch_url	TP	Arbitrary URL fetch
semgrep_patterns.py:fetch_allowlisted	FP	Domain allowlist

Hardcoded Credentials

Location	Classification	Rationale
semgrep_patterns.py:DATABASE_URL	TP	Password in URL
semgrep_patterns.py:AWS_ACCESS_KEY	TP	AWS key
semgrep_patterns.py:EXAMPLE_API_KEY	FP	Placeholder
semgrep_patterns.py:TEST_DATABASE_URL	FP	Localhost test
semgrep_patterns.py:STRIPE_KEY	UNCERTAIN	sk_test_ format

Command Injection

Location	Classification	Rationale
semgrep_patterns.py:run_system_command	TP	os.system with user input
semgrep_patterns.py:run_safe_command	FP	Hardcoded command

Insecure Random

Location	Classification	Rationale
semgrep_patterns.py:generate_token_insecure	TP	Random for token
semgrep_patterns.py:shuffle_playlist	FP	Non-security use

Debug Mode

Location	Classification	Rationale
semgrep_patterns.py:DEBUG_MODE	TP	Debug flag True
semgrep_patterns.py:debug_eval	TP	Eval in debug endpoint
semgrep_patterns.py:app.run	TP	debug=True

Usage for Benchmarking

Run each tool against the codebase:

bandit -r src/ -f json > bandit_results.json
pylint src/security_demo --output-format=json > pylint_results.json
gitleaks detect --source . --no-git --report-format json --report-path gitleaks_results.json
semgrep scan --config auto src/ --json > semgrep_results.json

Compare tool findings against this ground truth document to calculate:

True Positive Rate (TPR)
False Positive Rate (FPR)
Precision and Recall

Notes on Classification

Some findings are context-dependent:

Development vs Production environment
Internal vs External network exposure
Who has access to modify configurations
Whether validation is sufficient
Threat model considerations

The UNCERTAIN category represents findings where classification depends on context.

10 KiB Raw Blame History