Skip to content

Navigation Menu

Sign in
Appearance settings

Search code, repositories, users, issues, pull requests...

Provide feedback

We read every piece of feedback, and take your input very seriously.

Saved searches

Use saved searches to filter your results more quickly

Appearance settings

Conversation

@wisechengyi
Copy link
Collaborator

@wisechengyi wisechengyi commented Dec 26, 2025

Restrict Dependency Cache Busting to HTTP/HTTPS URLs Only

Problem

The ?nocache cache-busting feature is being applied to all URL-based dependencies, including S3 URLs. This causes issues when the dependency URLs are passed to Spark via spark.jars, as Spark attempts to download URLs like:

s3://<bucket>/XYZ..jar?nocache

S3 (and other non-HTTP protocols) don't support the ?nocache query parameter, leading to errors when Spark tries to resolve these dependencies.

Root Cause

When a user disables caching for a dependency, the frontend appends ?nocache as a sentinel value to the dependency string. This was originally designed for HTTP-based artifact servers (Artifactory, Nexus) where the same URL can point to different file versions during development. However, the implementation didn't restrict this to HTTP/HTTPS URLs only, allowing it to be applied to S3, file://, and other protocols where it's invalid.

The ?nocache suffix propagates through the system:

  1. Frontend adds ?nocache to the dependency string
  2. Backend uses this for cache control when downloading to Polynote's local cache
  3. The URL with ?nocache is passed to Spark via spark.jars
  4. Spark fails to download from S3 because ?nocache makes the URL invalid

Solution

Changes Made

  1. Frontend UI (notebookconfig.ts):

    • Added isHttpUrl() private method to the Dependencies class to check if a dependency is an HTTP/HTTPS URL
    • Hide the "Advanced Options" (...) button for non-HTTP URLs (S3, file://, etc.)
    • Only append ?nocache for HTTP/HTTPS URLs and pip dependencies in the conf getter
    • Updated help text to clarify: "Applicable to HTTP/HTTPS URL or pip dependencies only"
  2. Behavior:

    • HTTP/HTTPS URLs: Cache option visible and functional (e.g., Artifactory, Nexus)
    • Pip dependencies: Cache option visible and functional (recreates virtualenv)
    • S3 URLs: Cache option hidden, ?nocache never added
    • file:// URLs: Cache option hidden, ?nocache never added
    • Other protocols: Cache option hidden, ?nocache never added

Why This Approach?

This solution prevents the problem at the source rather than stripping ?nocache in the backend:

  • ✅ Cleaner separation of concerns
  • ✅ UI accurately reflects what's supported
  • ✅ No invalid URLs are ever created
  • ✅ Backward compatible - existing notebooks with HTTP URLs continue to work
  • ✅ Aligns with the original design intent (HTTP artifact servers)

Technical Details

The check uses JavaScript's URL constructor to parse the dependency string:

private isHttpUrl(dep: string): boolean {
    try {
        const url = new URL(dep);
        return url.protocol === 'http:' || url.protocol === 'https:';
    } catch {
        return false;
    }
}

This safely handles:

  • Valid HTTP/HTTPS URLs → returns true
  • S3/file/other URLs → returns false
  • Maven coordinates (not URLs) → returns false (caught by exception)

Testing

Manual Testing Steps

  1. Build and run Polynote:

    cd polynote-frontend && npm run dist && cd ..
    sbt dist
    cd target/dist/polynote && ./polynote.py
  2. Create a new notebook and open Configuration → Dependencies

  3. Test S3 URL (cache option should be hidden):

    s3://bucket/path/file.jar
    
    • Verify the ... button is not visible
    • Save and check .ipynb file - no ?nocache appended
  4. Test HTTP URL (cache option should be visible):

    https://repo1.maven.org/maven2/com/google/guava/guava/31.0-jre/guava-31.0-jre.jar
    
    • Verify the ... button is visible
    • Click it, select "Don't cache"
    • Save and check .ipynb file - ?nocache is appended
  5. Test pip dependency (cache option should be visible):

    requests
    
    • Switch to pip type
    • Verify the ... button is visible
  6. Test Maven coordinates (cache option should be hidden):

    org.apache.commons:commons-lang3:3.12.0
    
    • Verify the ... button is not visible

Expected Behavior

Dependency Type Cache Option Visible? ?nocache Added?
HTTP/HTTPS URL ✅ Yes ✅ Yes (if disabled)
S3 URL ❌ No ❌ No
file:// URL ❌ No ❌ No
Pip package ✅ Yes ✅ Yes (if disabled)
Maven coordinate ❌ No ❌ No

Backward Compatibility

  • ✅ Existing notebooks with HTTP URLs and ?nocache continue to work
  • ✅ Existing notebooks with S3 URLs (without ?nocache) continue to work
  • ⚠️ If any notebooks currently have S3 URLs with ?nocache (invalid state), they will be saved without ?nocache on next save
  • ✅ No data model changes required
  • ✅ No backend changes required

Effect

image

When url is http, there would be option to select cache or no cache.

For non-http url, there would be no option to do so.

@wisechengyi wisechengyi changed the title Allow NoCache only for http url artifacts Allow ?nocache only for http url artifacts Dec 26, 2025
@wisechengyi wisechengyi marked this pull request as ready for review December 27, 2025 00:11
Copy link
Collaborator

@mjren23 mjren23 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, thanks Yi - maybe worth an updated line in the docs as well about only applying the config to HTTP(S) URLs?

Copy link
Collaborator

@Rui-L Rui-L left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for taking care of this!

@wisechengyi wisechengyi merged commit 53e5cb0 into master Jan 26, 2026
8 of 9 checks passed
@wisechengyi wisechengyi deleted the cacheforhttponly branch January 26, 2026 23:15
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants

Morty Proxy This is a proxified and sanitized view of the page, visit original site.