Files

HackWeasel b9dfb86260 GT AI OS Community Edition v2.0.33

Security hardening release addressing CodeQL and Dependabot alerts:

- Fix stack trace exposure in error responses
- Add SSRF protection with DNS resolution checking
- Implement proper URL hostname validation (replaces substring matching)
- Add centralized path sanitization to prevent path traversal
- Fix ReDoS vulnerability in email validation regex
- Improve HTML sanitization in validation utilities
- Fix capability wildcard matching in auth utilities
- Update glob dependency to address CVE
- Add CodeQL suppression comments for verified false positives

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>

2025-12-12 17:04:45 -05:00

6.3 KiB

Raw Blame History

Security Fix: Response Data Filtering (Information Disclosure Vulnerability)

Date: 2025-10-03 Severity: HIGH Status: FIXED

Vulnerability Summary

The API endpoints were returning excessive sensitive data without proper server-side filtering, violating the principle of least privilege. Clients were receiving complete database records including:

Internal system prompts and AI instructions
Configuration details (personality_config, resource_preferences)
Infrastructure details (embedding models, chunking strategies)
User UUIDs and relationship data
Dataset access configurations

This created multiple security risks:

Information Disclosure: Internal system configuration exposed
Authorization Bypass: Resource enumeration by ID
IDOR Vulnerability: User relationships and ownership data exposed
Attack Surface Expansion: AI behavior patterns revealed through prompts

Affected Endpoints

1. `/api/v1/agents` (List & Get)

Before: Returned full agent configuration to all users Issue: Non-owners could see prompt_template, personality_config, resource_preferences, selected_dataset_ids

2. `/api/v1/datasets` (List & Get)

Before: Exposed internal implementation details Issue: All users could see owner_id UUIDs, team_members, chunking_strategy, chunk_size, chunk_overlap, embedding_model

3. `/api/v1/chat/completions`

Before: Embedded complete agent configs in context Issue: Chat context included full dataset summaries with internal metadata for unauthorized datasets

4. `/api/v1/files` (List & Get Info)

Before: No field-level filtering Issue: Exposed storage paths and processing details

Remediation Implemented

1. Created Response Filtering Utility (`app/core/response_filter.py`)

Implements three-tier access control:

Agents:

Public Fields: id, name, description, category, metadata, display fields (model, disclaimer, easy_prompts)
Viewer Fields: Public + temperature, max_tokens, costs
Owner Fields: Viewer + prompt_template, personality_config, resource_preferences, dataset_connection

Datasets:

Public Fields: id, name, description, document_count, tags, created_at, created_by_name, access_group, permission flags (NO UUIDs, NO technical details)
Viewer Fields: Public + chunk_count, vector_count, storage_size_mb, updated_at, summary
Owner Fields: Viewer + owner_id, team_members, chunking_strategy, chunk_size, chunk_overlap, embedding_model, summary_generated_at

Files:

Public Fields: id, filename, content_type, size, timestamps
Owner Fields: Public + user_id, storage_path, processing_status, metadata

2. Applied Filtering to All Endpoints

Modified Files:

app/api/v1/agents.py - Added filtering to list_agents() and get_agent()
app/api/v1/datasets.py - Added filtering to list_datasets(), list_datasets_internal(), get_dataset()
app/api/v1/chat.py - Strengthened dataset context filtering with sanitize_dataset_summary()
app/api/v1/files.py - Added filtering to get_file_info() and list_files()

3. Enhanced Security in Chat Context

Added explicit security comment and sanitization:

# SECURITY FIX: Only get summaries for datasets the agent should access
# This prevents information disclosure by restricting dataset access to:
# 1. Datasets explicitly configured in agent settings
# 2. Datasets from conversation-attached files only
# Any other datasets (including other users' datasets) are completely hidden

Security Principles Applied

Principle of Least Privilege: Users only receive data they're authorized to access
Defense in Depth: Multiple layers of filtering (service + API + response)
Fail Secure: Default to most restrictive access, explicit grants only
Audit Logging: All filtering operations logged for security review
No UUID Exposure: Internal identifiers hidden from non-owners

Testing Recommendations

Manual Testing

Non-owner access test: Login as user without ownership, verify no prompt_template visible
Org agent test: Login as read-only user, verify org agents display correctly with limited fields
Dataset enumeration test: Attempt to access other users' datasets by ID
Chat context test: Verify only authorized dataset summaries in AI context

Automated Testing

# Test agent filtering
curl -H "Authorization: Bearer $TOKEN" http://localhost:8002/api/v1/agents | jq '.data[0] | keys'
# Should NOT include: prompt_template, personality_config, resource_preferences (for non-owners)

# Test dataset filtering
curl -H "Authorization: Bearer $TOKEN" http://localhost:8002/api/v1/datasets | jq '.[0] | keys'
# Should NOT include: owner_id, chunking_strategy, chunk_size (for non-owners)

Rollback Plan

If issues occur:

Revert app/core/response_filter.py (remove file)
Revert changes to app/api/v1/agents.py (remove ResponseFilter imports and filter calls)
Revert changes to app/api/v1/datasets.py (remove ResponseFilter imports and filter calls)
Revert changes to app/api/v1/chat.py (remove sanitize_dataset_summary calls)
Revert changes to app/api/v1/files.py (remove ResponseFilter imports and filter calls)

Git revert command:

git revert <commit-sha>

Known Limitations

File ownership check: Currently assumes file accessor is owner (TODO: add proper ownership check from file_service)
Dataset UUIDs in logs: owner_id still appears in debug logs (consider redacting)
Backwards compatibility: Frontend must handle missing optional fields gracefully

Future Enhancements

Add response validation middleware to catch accidental leaks
Implement field-level encryption for sensitive configs at rest
Add rate limiting on resource enumeration endpoints
Create security test suite for regression testing
Add OpenAPI schema annotations for field-level permissions

Compliance Notes

This fix addresses:

OWASP A01:2021: Broken Access Control
OWASP A02:2021: Cryptographic Failures (data exposure)
CWE-213: Exposure of Sensitive Information Due to Incompatible Policies
CWE-359: Exposure of Private Personal Information

Reviewed by: Security Team Approved by: Tech Lead Deployed: Pending QA verification

6.3 KiB Raw Blame History