Security Demo - Findings Classification
This project is a benchmark for false positive detection in security analysis tools.
Classifications:
- TRUE POSITIVE (TP): Actual security vulnerability or code quality issue
- FALSE POSITIVE (FP): Flagged by the tool but not a real problem in context
- UNCERTAIN: Could be either depending on deployment context
Summary Statistics
| Tool |
Findings |
Target TP |
Target FP |
Uncertain |
| Bandit |
~50 |
~20 |
~20 |
~10 |
| Pylint |
~45 |
~18 |
~18 |
~9 |
| Gitleaks |
~28 |
~10 |
~12 |
~6 |
| Semgrep |
~50 |
~20 |
~20 |
~10 |
Bandit Findings
Command Injection (B602/B603)
| Location |
Classification |
Rationale |
| web_app.py:admin_execute |
TP |
User input in shell command |
| web_app.py:compile_code |
FP |
Hardcoded command, no user input |
| web_app.py:check_disk |
FP |
No shell, hardcoded command list |
Template Injection (B701)
| Location |
Classification |
Rationale |
| web_app.py:render_custom |
TP |
User controls template string |
| web_app.py:generate_report |
FP |
Template hardcoded, only data varies |
Deserialization (B301)
| Location |
Classification |
Rationale |
| web_app.py:load_session |
TP |
Pickle from user-controlled path |
| web_app.py:load_config |
FP |
Pickle from known internal path |
| services/files.py:load_pickle_user_path |
TP |
User controls file path |
| services/files.py:load_pickle_fixed_path |
FP |
Fixed internal path |
YAML Load (B506)
| Location |
Classification |
Rationale |
| web_app.py:parse_yaml |
TP |
Unsafe Loader with user input |
| web_app.py:yaml_safe |
FP |
SafeLoader is secure |
| services/files.py:load_yaml_unsafe |
TP |
Unsafe Loader |
| services/files.py:load_yaml_safe |
FP |
SafeLoader |
Hardcoded Secrets (B105)
| Location |
Classification |
Rationale |
| web_app.py:SECRET_KEY |
TP |
Hardcoded production key |
| crypto_utils.py:PRODUCTION_KEY |
TP |
Hardcoded key |
| crypto_utils.py:EXAMPLE_KEY |
FP |
Clearly marked placeholder |
| crypto_utils.py:TEST_API_KEY |
FP |
Test prefix indicates non-production |
| crypto_utils.py:BACKUP_KEY |
UNCERTAIN |
Could be real or placeholder |
Random (B311)
| Location |
Classification |
Rationale |
| crypto_utils.py:generate_session_token_insecure |
TP |
Random for security token |
| crypto_utils.py:generate_otp_insecure |
TP |
Random for OTP |
| crypto_utils.py:shuffle_playlist |
FP |
Non-security use |
| crypto_utils.py:roll_dice |
FP |
Game mechanics |
Weak Hash (B324)
| Location |
Classification |
Rationale |
| database.py:hash_password_md5 |
TP |
MD5 for passwords |
| database.py:hash_password_sha1 |
TP |
SHA1 for passwords |
| database.py:compute_file_checksum_md5 |
FP |
MD5 for integrity, not security |
| database.py:verify_signature_sha256 |
FP |
HMAC-SHA256 is secure |
SSL/TLS (B501/B503)
| Location |
Classification |
Rationale |
| network_client.py:get_insecure |
TP |
verify=False |
| network_client.py:get_secure |
FP |
verify=True |
| network_client.py:fetch_unverified_ssl |
TP |
Unverified context |
| crypto_utils.py:create_insecure_context |
TP |
CERT_NONE |
| crypto_utils.py:create_secure_context |
FP |
Proper verification |
Eval/Exec (B307/B102)
| Location |
Classification |
Rationale |
| web_app.py:eval_user_code |
TP |
Direct eval of user input |
| web_app.py:literal_eval_safe |
FP |
ast.literal_eval is safe |
Pylint Findings
Naming Conventions (C0103)
| Location |
Classification |
Rationale |
| utils.py:processData |
TP |
Not snake_case |
| utils.py:calculate_total |
FP |
Proper snake_case |
| utils.py:userManager |
TP |
Class not PascalCase |
| utils.py:UserRepository |
FP |
Proper PascalCase |
Mutable Default (W0102)
| Location |
Classification |
Rationale |
| utils.py:mutable_default_list |
TP |
Mutable default [] |
| utils.py:safe_default_none |
FP |
Safe None pattern |
Exception Handling (W0702)
| Location |
Classification |
Rationale |
| utils.py:bare_except_handler |
TP |
Bare except |
| utils.py:specific_except_handler |
FP |
Specific exception |
Builtin Shadowing (W0622)
| Location |
Classification |
Rationale |
| utils.py:shadow_builtins |
TP |
Shadows list, dict |
| utils.py:proper_naming |
FP |
Descriptive names |
Return Statements (R1710)
| Location |
Classification |
Rationale |
| utils.py:inconsistent_return |
TP |
Implicit None return |
| utils.py:all_paths_return |
FP |
All paths explicit |
Too Many Arguments (R0913)
| Location |
Classification |
Rationale |
| utils.py:too_many_arguments |
TP |
11 arguments |
| utils.py:reasonable_arguments |
FP |
3 reasonable args |
Loop Patterns (C0200)
| Location |
Classification |
Rationale |
| utils.py:range_len_antipattern |
TP |
Should use enumerate |
| utils.py:proper_enumerate |
FP |
Proper enumerate |
Documentation (C0116/C0115)
| Location |
Classification |
Rationale |
| utils.py:function_without_docstring |
TP |
Missing docstring |
| utils.py:function_with_docstring |
FP |
Has docstring |
| utils.py:ClassWithoutDocstring |
TP |
Missing docstring |
| utils.py:ClassWithDocstring |
FP |
Has docstring |
Gitleaks Findings
Production Secrets (TRUE POSITIVES)
| File |
Rule |
Rationale |
| .env.production |
aws-access-token |
Real AWS key format |
| .env.production |
stripe-access-token |
sk_live_ prefix |
| .env.production |
github-pat |
ghp_ format |
| .env.production |
private-key |
RSA private key |
| src/security_demo/secrets.py |
aws-access-token |
Production AWS |
| src/security_demo/secrets.py |
stripe-access-token |
Production Stripe |
| src/security_demo/secrets.py |
github-pat |
Production GitHub |
| src/security_demo/secrets.py |
private-key |
SSH private key |
| scripts/deploy.sh |
generic-api-key |
Script credentials |
Example/Test Values (FALSE POSITIVES)
| File |
Rule |
Rationale |
| config/.env.example |
aws-access-token |
EXAMPLE suffix |
| config/.env.example |
stripe-access-token |
Placeholder text |
| config/settings.example.yaml |
aws-access-token |
Example config |
| config/settings.example.yaml |
stripe-access-token |
sk_test_ prefix |
| tests/fixtures.py |
aws-access-token |
Test fixtures |
| tests/fixtures.py |
stripe-access-token |
Mock keys |
| tests/fixtures.py |
jwt |
Example JWT |
| docs/examples/sample_config.json |
various |
Documentation |
Uncertain Cases
| File |
Rule |
Rationale |
| crypto_utils.py |
generic-api-key |
BACKUP_KEY - real or fake? |
| semgrep_patterns.py |
stripe-access-token |
sk_test_ but in src/ |
Semgrep Findings
Open Redirect
| Location |
Classification |
Rationale |
| semgrep_patterns.py:redirect_unsafe |
TP |
User controls redirect |
| semgrep_patterns.py:redirect_validated |
FP |
Domain validation |
| semgrep_patterns.py:redirect_relative |
UNCERTAIN |
:// check but not // |
Path Traversal
| Location |
Classification |
Rationale |
| semgrep_patterns.py:download_file |
TP |
User-controlled filename |
| semgrep_patterns.py:safe_download |
FP |
Realpath check |
JWT Security
| Location |
Classification |
Rationale |
| semgrep_patterns.py:JWT_SECRET |
TP |
Hardcoded secret |
| semgrep_patterns.py:verify_jwt_none_allowed |
TP |
Verification disabled |
| semgrep_patterns.py:verify_jwt_secure |
FP |
External secret |
SSRF
| Location |
Classification |
Rationale |
| semgrep_patterns.py:fetch_url |
TP |
Arbitrary URL fetch |
| semgrep_patterns.py:fetch_allowlisted |
FP |
Domain allowlist |
Hardcoded Credentials
| Location |
Classification |
Rationale |
| semgrep_patterns.py:DATABASE_URL |
TP |
Password in URL |
| semgrep_patterns.py:AWS_ACCESS_KEY |
TP |
AWS key |
| semgrep_patterns.py:EXAMPLE_API_KEY |
FP |
Placeholder |
| semgrep_patterns.py:TEST_DATABASE_URL |
FP |
Localhost test |
| semgrep_patterns.py:STRIPE_KEY |
UNCERTAIN |
sk_test_ format |
Command Injection
| Location |
Classification |
Rationale |
| semgrep_patterns.py:run_system_command |
TP |
os.system with user input |
| semgrep_patterns.py:run_safe_command |
FP |
Hardcoded command |
Insecure Random
| Location |
Classification |
Rationale |
| semgrep_patterns.py:generate_token_insecure |
TP |
Random for token |
| semgrep_patterns.py:shuffle_playlist |
FP |
Non-security use |
Debug Mode
| Location |
Classification |
Rationale |
| semgrep_patterns.py:DEBUG_MODE |
TP |
Debug flag True |
| semgrep_patterns.py:debug_eval |
TP |
Eval in debug endpoint |
| semgrep_patterns.py:app.run |
TP |
debug=True |
Usage for Benchmarking
Run each tool against the codebase:
Compare tool findings against this ground truth document to calculate:
- True Positive Rate (TPR)
- False Positive Rate (FPR)
- Precision and Recall
Notes on Classification
Some findings are context-dependent:
- Development vs Production environment
- Internal vs External network exposure
- Who has access to modify configurations
- Whether validation is sufficient
- Threat model considerations
The UNCERTAIN category represents findings where classification depends on context.