Files
gt-ai-os-community/apps/tenant-backend/SECURITY-FIX-RESPONSE-FILTERING.md
HackWeasel b9dfb86260 GT AI OS Community Edition v2.0.33
Security hardening release addressing CodeQL and Dependabot alerts:

- Fix stack trace exposure in error responses
- Add SSRF protection with DNS resolution checking
- Implement proper URL hostname validation (replaces substring matching)
- Add centralized path sanitization to prevent path traversal
- Fix ReDoS vulnerability in email validation regex
- Improve HTML sanitization in validation utilities
- Fix capability wildcard matching in auth utilities
- Update glob dependency to address CVE
- Add CodeQL suppression comments for verified false positives

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2025-12-12 17:04:45 -05:00

6.3 KiB

Security Fix: Response Data Filtering (Information Disclosure Vulnerability)

Date: 2025-10-03 Severity: HIGH Status: FIXED


Vulnerability Summary

The API endpoints were returning excessive sensitive data without proper server-side filtering, violating the principle of least privilege. Clients were receiving complete database records including:

  • Internal system prompts and AI instructions
  • Configuration details (personality_config, resource_preferences)
  • Infrastructure details (embedding models, chunking strategies)
  • User UUIDs and relationship data
  • Dataset access configurations

This created multiple security risks:

  • Information Disclosure: Internal system configuration exposed
  • Authorization Bypass: Resource enumeration by ID
  • IDOR Vulnerability: User relationships and ownership data exposed
  • Attack Surface Expansion: AI behavior patterns revealed through prompts

Affected Endpoints

1. /api/v1/agents (List & Get)

Before: Returned full agent configuration to all users Issue: Non-owners could see prompt_template, personality_config, resource_preferences, selected_dataset_ids

2. /api/v1/datasets (List & Get)

Before: Exposed internal implementation details Issue: All users could see owner_id UUIDs, team_members, chunking_strategy, chunk_size, chunk_overlap, embedding_model

3. /api/v1/chat/completions

Before: Embedded complete agent configs in context Issue: Chat context included full dataset summaries with internal metadata for unauthorized datasets

4. /api/v1/files (List & Get Info)

Before: No field-level filtering Issue: Exposed storage paths and processing details


Remediation Implemented

1. Created Response Filtering Utility (app/core/response_filter.py)

Implements three-tier access control:

Agents:

  • Public Fields: id, name, description, category, metadata, display fields (model, disclaimer, easy_prompts)
  • Viewer Fields: Public + temperature, max_tokens, costs
  • Owner Fields: Viewer + prompt_template, personality_config, resource_preferences, dataset_connection

Datasets:

  • Public Fields: id, name, description, document_count, tags, created_at, created_by_name, access_group, permission flags (NO UUIDs, NO technical details)
  • Viewer Fields: Public + chunk_count, vector_count, storage_size_mb, updated_at, summary
  • Owner Fields: Viewer + owner_id, team_members, chunking_strategy, chunk_size, chunk_overlap, embedding_model, summary_generated_at

Files:

  • Public Fields: id, filename, content_type, size, timestamps
  • Owner Fields: Public + user_id, storage_path, processing_status, metadata

2. Applied Filtering to All Endpoints

Modified Files:

  • app/api/v1/agents.py - Added filtering to list_agents() and get_agent()
  • app/api/v1/datasets.py - Added filtering to list_datasets(), list_datasets_internal(), get_dataset()
  • app/api/v1/chat.py - Strengthened dataset context filtering with sanitize_dataset_summary()
  • app/api/v1/files.py - Added filtering to get_file_info() and list_files()

3. Enhanced Security in Chat Context

Added explicit security comment and sanitization:

# SECURITY FIX: Only get summaries for datasets the agent should access
# This prevents information disclosure by restricting dataset access to:
# 1. Datasets explicitly configured in agent settings
# 2. Datasets from conversation-attached files only
# Any other datasets (including other users' datasets) are completely hidden

Security Principles Applied

  1. Principle of Least Privilege: Users only receive data they're authorized to access
  2. Defense in Depth: Multiple layers of filtering (service + API + response)
  3. Fail Secure: Default to most restrictive access, explicit grants only
  4. Audit Logging: All filtering operations logged for security review
  5. No UUID Exposure: Internal identifiers hidden from non-owners

Testing Recommendations

Manual Testing

  1. Non-owner access test: Login as user without ownership, verify no prompt_template visible
  2. Org agent test: Login as read-only user, verify org agents display correctly with limited fields
  3. Dataset enumeration test: Attempt to access other users' datasets by ID
  4. Chat context test: Verify only authorized dataset summaries in AI context

Automated Testing

# Test agent filtering
curl -H "Authorization: Bearer $TOKEN" http://localhost:8002/api/v1/agents | jq '.data[0] | keys'
# Should NOT include: prompt_template, personality_config, resource_preferences (for non-owners)

# Test dataset filtering
curl -H "Authorization: Bearer $TOKEN" http://localhost:8002/api/v1/datasets | jq '.[0] | keys'
# Should NOT include: owner_id, chunking_strategy, chunk_size (for non-owners)

Rollback Plan

If issues occur:

  1. Revert app/core/response_filter.py (remove file)
  2. Revert changes to app/api/v1/agents.py (remove ResponseFilter imports and filter calls)
  3. Revert changes to app/api/v1/datasets.py (remove ResponseFilter imports and filter calls)
  4. Revert changes to app/api/v1/chat.py (remove sanitize_dataset_summary calls)
  5. Revert changes to app/api/v1/files.py (remove ResponseFilter imports and filter calls)

Git revert command:

git revert <commit-sha>

Known Limitations

  1. File ownership check: Currently assumes file accessor is owner (TODO: add proper ownership check from file_service)
  2. Dataset UUIDs in logs: owner_id still appears in debug logs (consider redacting)
  3. Backwards compatibility: Frontend must handle missing optional fields gracefully

Future Enhancements

  1. Add response validation middleware to catch accidental leaks
  2. Implement field-level encryption for sensitive configs at rest
  3. Add rate limiting on resource enumeration endpoints
  4. Create security test suite for regression testing
  5. Add OpenAPI schema annotations for field-level permissions

Compliance Notes

This fix addresses:

  • OWASP A01:2021: Broken Access Control
  • OWASP A02:2021: Cryptographic Failures (data exposure)
  • CWE-213: Exposure of Sensitive Information Due to Incompatible Policies
  • CWE-359: Exposure of Private Personal Information

Reviewed by: Security Team Approved by: Tech Lead Deployed: Pending QA verification