[HOTFIX] Raise URLValidator max_length to 8192 for long S3 presigned URLs#2111
[HOTFIX] Raise URLValidator max_length to 8192 for long S3 presigned URLs#2111ritwik-g merged 1 commit intov0.176.4-hotfixZipstack/unstract:v0.176.4-hotfixfrom hotfix/presigned-url-max-lengthZipstack/unstract:hotfix/presigned-url-max-lengthCopy head branch name to clipboard
Conversation
|
|
| Filename | Overview |
|---|---|
| backend/backend/settings/base.py | Adds a global class-level monkey-patch raising URLValidator.max_length to 8192; effective only on Django 5.0+ but pyproject.toml pins django==4.2.30, creating a discrepancy that could leave the production fix inoperative |
Sequence Diagram
%%{init: {'theme': 'neutral'}}%%
sequenceDiagram
participant Client
participant DRF as DRF Serializer
participant UV as URLValidator (global)
participant CV as Custom _validate_presigned_url
Client->>DRF: "POST /api/deployment (presigned_urls=[long_s3_url])"
DRF->>UV: URLField validation (child of ListField)
Note over UV: Before fix: max_length=2048 → 2094-char URL fails
Note over UV: After fix: max_length=8192 → passes
UV-->>DRF: valid / ValidationError("Enter a valid URL.")
DRF->>CV: validate_presigned_urls()
CV->>CV: Check HTTPS scheme
CV->>CV: Check .amazonaws.com S3 host
CV-->>DRF: valid / ValidationError (scheme/host)
DRF-->>Client: 200 OK / 400 Bad Request
%%{init: {'theme': 'base', 'themeVariables': {"darkMode": true, "background": "#0d1117", "primaryColor": "#21262d", "primaryTextColor": "#e6edf3", "primaryBorderColor": "#8b949e", "lineColor": "#8b949e", "textColor": "#e6edf3", "edgeLabelBackground": "#161b22", "actorBkg": "#21262d", "actorBorder": "#8b949e", "actorTextColor": "#e6edf3", "actorLineColor": "#8b949e", "signalColor": "#8b949e", "signalTextColor": "#e6edf3", "noteBkgColor": "#373320", "noteBorderColor": "#d4a72c", "noteTextColor": "#f0e6c0", "labelBoxBkgColor": "#21262d", "labelBoxBorderColor": "#8b949e", "labelTextColor": "#e6edf3", "loopTextColor": "#e6edf3", "activationBkgColor": "#30363d", "activationBorderColor": "#8b949e"}}}%%
sequenceDiagram
participant Client
participant DRF as DRF Serializer
participant UV as URLValidator (global)
participant CV as Custom _validate_presigned_url
Client->>DRF: "POST /api/deployment (presigned_urls=[long_s3_url])"
DRF->>UV: URLField validation (child of ListField)
Note over UV: Before fix: max_length=2048 → 2094-char URL fails
Note over UV: After fix: max_length=8192 → passes
UV-->>DRF: valid / ValidationError("Enter a valid URL.")
DRF->>CV: validate_presigned_urls()
CV->>CV: Check HTTPS scheme
CV->>CV: Check .amazonaws.com S3 host
CV-->>DRF: valid / ValidationError (scheme/host)
DRF-->>Client: 200 OK / 400 Bad Request
Prompt To Fix All With AI
Fix the following 2 code review issues. Work through them one at a time, proposing concise fixes.
---
### Issue 1 of 2
backend/backend/settings/base.py:27
**Dependency spec doesn't match targeted Django version**
`pyproject.toml` pins `django==4.2.30`, yet the inline comment on this very line says "No-op on Django 4.2.x (no such attribute is checked)." If production is actually running 4.2.30, the monkey-patch has no effect and the reported validation failure remains unfixed. The `pyproject.toml` should be updated to reflect the Django version that is genuinely deployed, so that the hotfix actually takes effect and future installs reproduce the prod configuration.
### Issue 2 of 2
backend/backend/settings/base.py:27
**Global class mutation widens validation for all URLField serializers**
Mutating `URLValidator.max_length` at class level raises the length ceiling for every `URLField` in the application — including the webhook `url` fields in `notification_v2/internal_serializers.py` — not just `presigned_urls`. Model-backed `URLField`s are protected by Django's `MaxLengthValidator(200)` that runs first, but bare DRF serializer `URLField()`s (like the webhook fields) lose the previous 2048-char guard. A more surgical alternative is to override the validator on the specific field. For a hotfix the global approach is pragmatic, but it's worth tracking a follow-up to scope it properly.
Reviews (1): Last reviewed commit: "[HOTFIX] Raise URLValidator max_length t..." | Re-trigger Greptile
| # temporary/STS credentials (carrying X-Amz-Security-Token) routinely exceed this, | ||
| # causing "Enter a valid URL." on the API deployment `presigned_urls` field. | ||
| # Raise the cap globally. No-op on Django 4.2.x (no such attribute is checked). | ||
| URLValidator.max_length = 8192 |
There was a problem hiding this comment.
Dependency spec doesn't match targeted Django version
pyproject.toml pins django==4.2.30, yet the inline comment on this very line says "No-op on Django 4.2.x (no such attribute is checked)." If production is actually running 4.2.30, the monkey-patch has no effect and the reported validation failure remains unfixed. The pyproject.toml should be updated to reflect the Django version that is genuinely deployed, so that the hotfix actually takes effect and future installs reproduce the prod configuration.
Prompt To Fix With AI
This is a comment left during a code review.
Path: backend/backend/settings/base.py
Line: 27
Comment:
**Dependency spec doesn't match targeted Django version**
`pyproject.toml` pins `django==4.2.30`, yet the inline comment on this very line says "No-op on Django 4.2.x (no such attribute is checked)." If production is actually running 4.2.30, the monkey-patch has no effect and the reported validation failure remains unfixed. The `pyproject.toml` should be updated to reflect the Django version that is genuinely deployed, so that the hotfix actually takes effect and future installs reproduce the prod configuration.
How can I resolve this? If you propose a fix, please make it concise.Note: If this suggestion doesn't match your team's coding style, reply to this and let me know. I'll remember it for next time!
| # temporary/STS credentials (carrying X-Amz-Security-Token) routinely exceed this, | ||
| # causing "Enter a valid URL." on the API deployment `presigned_urls` field. | ||
| # Raise the cap globally. No-op on Django 4.2.x (no such attribute is checked). | ||
| URLValidator.max_length = 8192 |
There was a problem hiding this comment.
Global class mutation widens validation for all URLField serializers
Mutating URLValidator.max_length at class level raises the length ceiling for every URLField in the application — including the webhook url fields in notification_v2/internal_serializers.py — not just presigned_urls. Model-backed URLFields are protected by Django's MaxLengthValidator(200) that runs first, but bare DRF serializer URLField()s (like the webhook fields) lose the previous 2048-char guard. A more surgical alternative is to override the validator on the specific field. For a hotfix the global approach is pragmatic, but it's worth tracking a follow-up to scope it properly.
Prompt To Fix With AI
This is a comment left during a code review.
Path: backend/backend/settings/base.py
Line: 27
Comment:
**Global class mutation widens validation for all URLField serializers**
Mutating `URLValidator.max_length` at class level raises the length ceiling for every `URLField` in the application — including the webhook `url` fields in `notification_v2/internal_serializers.py` — not just `presigned_urls`. Model-backed `URLField`s are protected by Django's `MaxLengthValidator(200)` that runs first, but bare DRF serializer `URLField()`s (like the webhook fields) lose the previous 2048-char guard. A more surgical alternative is to override the validator on the specific field. For a hotfix the global approach is pragmatic, but it's worth tracking a follow-up to scope it properly.
How can I resolve this? If you propose a fix, please make it concise.* UN-3621 [HOTFIX] Structure tool no longer crashes when single-pass extraction returns a non-object output (#2110) UN-3621 [FIX] Guard structure pipeline against non-dict single-pass output Single-pass extraction can return a top-level JSON array (e.g. a truncated/ runaway LLM response that hit its output-token cap). The parsed `output` is then a list, and _handle_structure_pipeline called `.values()` on it unconditionally, raising an opaque `AttributeError: 'list' object has no attribute 'values'` that failed the file with no actionable signal. Guard the output shape: if it isn't a dict, return a clear ExecutionResult failure naming the likely cause instead of crashing, and stop the malformed payload from flowing downstream as a success. Co-authored-by: Claude Opus 4.8 (1M context) <noreply@anthropic.com> * [HOTFIX] Raise URLValidator max_length to 8192 for long S3 presigned URLs (#2111) --------- Co-authored-by: Kirtiman Mishra <110175055+kirtimanmishrazipstack@users.noreply.github.com> Co-authored-by: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
|
Important Review skippedAuto reviews are disabled on base/target branches other than the default branch. Please check the settings in the CodeRabbit UI or the ⚙️ Run configurationConfiguration used: Organization UI Review profile: CHILL Plan: Pro Run ID: You can disable this status message by setting the Use the checkbox below for a quick retry:
✨ Finishing Touches🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
Hotfix (Scenario 2: OSS-only) — target OSS `v0.176.4` → release `v0.176.5`
Problem
API deployment requests with S3 pre-signed URLs fail validation with:
```json
{"code":"invalid","detail":"Enter a valid URL.","attr":"presigned_urls.0"}
```
Django 5.0+ caps `URLValidator` at 2048 characters. Pre-signed URLs signed with temporary/STS credentials carry a large `X-Amz-Security-Token` (~1.5 KB), pushing the URL past 2048 chars, so DRF's `URLField` rejects them before they're ever parsed. Confirmed live against prod: the same URL trimmed under 2048 passes validation; at 2094 chars it fails.
Fix
Raise the global `URLValidator.max_length` to 8192 in settings. Scheme/host validation is unchanged; only the arbitrary length ceiling is lifted. No-op on Django 4.2.x (attribute not checked there). No DB/migration impact — the URL is downloaded transiently and never persisted.
Files
Follow-up (per Hotfix Deployment Guide)