Security hardening release addressing CodeQL and Dependabot alerts: - Fix stack trace exposure in error responses - Add SSRF protection with DNS resolution checking - Implement proper URL hostname validation (replaces substring matching) - Add centralized path sanitization to prevent path traversal - Fix ReDoS vulnerability in email validation regex - Improve HTML sanitization in validation utilities - Fix capability wildcard matching in auth utilities - Update glob dependency to address CVE - Add CodeQL suppression comments for verified false positives 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
6.3 KiB
Security Fix: Response Data Filtering (Information Disclosure Vulnerability)
Date: 2025-10-03 Severity: HIGH Status: FIXED
Vulnerability Summary
The API endpoints were returning excessive sensitive data without proper server-side filtering, violating the principle of least privilege. Clients were receiving complete database records including:
- Internal system prompts and AI instructions
- Configuration details (personality_config, resource_preferences)
- Infrastructure details (embedding models, chunking strategies)
- User UUIDs and relationship data
- Dataset access configurations
This created multiple security risks:
- Information Disclosure: Internal system configuration exposed
- Authorization Bypass: Resource enumeration by ID
- IDOR Vulnerability: User relationships and ownership data exposed
- Attack Surface Expansion: AI behavior patterns revealed through prompts
Affected Endpoints
1. /api/v1/agents (List & Get)
Before: Returned full agent configuration to all users
Issue: Non-owners could see prompt_template, personality_config, resource_preferences, selected_dataset_ids
2. /api/v1/datasets (List & Get)
Before: Exposed internal implementation details
Issue: All users could see owner_id UUIDs, team_members, chunking_strategy, chunk_size, chunk_overlap, embedding_model
3. /api/v1/chat/completions
Before: Embedded complete agent configs in context Issue: Chat context included full dataset summaries with internal metadata for unauthorized datasets
4. /api/v1/files (List & Get Info)
Before: No field-level filtering Issue: Exposed storage paths and processing details
Remediation Implemented
1. Created Response Filtering Utility (app/core/response_filter.py)
Implements three-tier access control:
Agents:
- Public Fields: id, name, description, category, metadata, display fields (model, disclaimer, easy_prompts)
- Viewer Fields: Public + temperature, max_tokens, costs
- Owner Fields: Viewer + prompt_template, personality_config, resource_preferences, dataset_connection
Datasets:
- Public Fields: id, name, description, document_count, tags, created_at, created_by_name, access_group, permission flags (NO UUIDs, NO technical details)
- Viewer Fields: Public + chunk_count, vector_count, storage_size_mb, updated_at, summary
- Owner Fields: Viewer + owner_id, team_members, chunking_strategy, chunk_size, chunk_overlap, embedding_model, summary_generated_at
Files:
- Public Fields: id, filename, content_type, size, timestamps
- Owner Fields: Public + user_id, storage_path, processing_status, metadata
2. Applied Filtering to All Endpoints
Modified Files:
app/api/v1/agents.py- Added filtering tolist_agents()andget_agent()app/api/v1/datasets.py- Added filtering tolist_datasets(),list_datasets_internal(),get_dataset()app/api/v1/chat.py- Strengthened dataset context filtering withsanitize_dataset_summary()app/api/v1/files.py- Added filtering toget_file_info()andlist_files()
3. Enhanced Security in Chat Context
Added explicit security comment and sanitization:
# SECURITY FIX: Only get summaries for datasets the agent should access
# This prevents information disclosure by restricting dataset access to:
# 1. Datasets explicitly configured in agent settings
# 2. Datasets from conversation-attached files only
# Any other datasets (including other users' datasets) are completely hidden
Security Principles Applied
- Principle of Least Privilege: Users only receive data they're authorized to access
- Defense in Depth: Multiple layers of filtering (service + API + response)
- Fail Secure: Default to most restrictive access, explicit grants only
- Audit Logging: All filtering operations logged for security review
- No UUID Exposure: Internal identifiers hidden from non-owners
Testing Recommendations
Manual Testing
- Non-owner access test: Login as user without ownership, verify no prompt_template visible
- Org agent test: Login as read-only user, verify org agents display correctly with limited fields
- Dataset enumeration test: Attempt to access other users' datasets by ID
- Chat context test: Verify only authorized dataset summaries in AI context
Automated Testing
# Test agent filtering
curl -H "Authorization: Bearer $TOKEN" http://localhost:8002/api/v1/agents | jq '.data[0] | keys'
# Should NOT include: prompt_template, personality_config, resource_preferences (for non-owners)
# Test dataset filtering
curl -H "Authorization: Bearer $TOKEN" http://localhost:8002/api/v1/datasets | jq '.[0] | keys'
# Should NOT include: owner_id, chunking_strategy, chunk_size (for non-owners)
Rollback Plan
If issues occur:
- Revert
app/core/response_filter.py(remove file) - Revert changes to
app/api/v1/agents.py(remove ResponseFilter imports and filter calls) - Revert changes to
app/api/v1/datasets.py(remove ResponseFilter imports and filter calls) - Revert changes to
app/api/v1/chat.py(remove sanitize_dataset_summary calls) - Revert changes to
app/api/v1/files.py(remove ResponseFilter imports and filter calls)
Git revert command:
git revert <commit-sha>
Known Limitations
- File ownership check: Currently assumes file accessor is owner (TODO: add proper ownership check from file_service)
- Dataset UUIDs in logs: owner_id still appears in debug logs (consider redacting)
- Backwards compatibility: Frontend must handle missing optional fields gracefully
Future Enhancements
- Add response validation middleware to catch accidental leaks
- Implement field-level encryption for sensitive configs at rest
- Add rate limiting on resource enumeration endpoints
- Create security test suite for regression testing
- Add OpenAPI schema annotations for field-level permissions
Compliance Notes
This fix addresses:
- OWASP A01:2021: Broken Access Control
- OWASP A02:2021: Cryptographic Failures (data exposure)
- CWE-213: Exposure of Sensitive Information Due to Incompatible Policies
- CWE-359: Exposure of Private Personal Information
Reviewed by: Security Team Approved by: Tech Lead Deployed: Pending QA verification