Skip to content

Navigation Menu

Sign in
Appearance settings

Search code, repositories, users, issues, pull requests...

Provide feedback

We read every piece of feedback, and take your input very seriously.

Saved searches

Use saved searches to filter your results more quickly

Appearance settings

Bump Spark to 3.3#1478

Merged
wisechengyi merged 5 commits intomasterpolynote/polynote:masterfrom
bumpsparkto3.3polynote/polynote:bumpsparkto3.3Copy head branch name to clipboard
Nov 17, 2025
Merged

Bump Spark to 3.3#1478
wisechengyi merged 5 commits intomasterpolynote/polynote:masterfrom
bumpsparkto3.3polynote/polynote:bumpsparkto3.3Copy head branch name to clipboard

Conversation

@wisechengyi
Copy link
Collaborator

@wisechengyi wisechengyi commented Nov 5, 2025

Summary

Upgrades Polynote to use Apache Spark 3.3.4 (from 3.1.2/3.2.1), moving to Hadoop 3.x binaries and fixing critical CI issues.

Changes

1. Spark Version Upgrade

  • Scala 2.12: 3.1.23.3.4
  • Scala 2.13: 3.2.13.3.4
  • Both Scala versions now use the same Spark version (3.3.4)
  • Updated to Hadoop 3.x binaries for improved cloud storage support

2. Binary Distribution Updates

  • From: spark-X.Y.Z-bin-hadoop2.7 (Hadoop 2.7)
  • To:
    • Scala 2.12: spark-3.3.4-bin-hadoop3
    • Scala 2.13: spark-3.3.4-bin-hadoop3-scala2.13
  • Uploaded distributions to GitHub Release 0.6.1 because of throttling from spark archive site
  • Updated checksum map to use full package filenames instead of version numbers
  • Added checksums for both Scala 2.12 and 2.13 variants

3. Docker Image Updates

  • Removed Spark 2.4 Docker images
  • Removed Spark 3.1 Docker images
  • Added Spark 3.3 Docker images: polynote:${VERSION}-${SCALA_VERSION}-spark3.3
  • Updated latest tag to point to Spark 3.3 images

4. Code Compatibility Fixes

  • Updated SQL parser listener class names for Spark 3.3:
    • SqlBaseBaseListenerSqlBaseParserBaseListener
    • SqlBaseBaseVisitorSqlBaseParserBaseVisitor
  • These class names changed in Spark's internal catalyst parser between 3.1/3.2 and 3.3

5. CI/Build Improvements

Fixed Scala version handling:

  • Changed build script from sbt "set scalaVersion := ..." to sbt ++${SCALA_VERSION}
  • This ensures all subprojects (including polynote-spark) use the correct Scala version
  • Previously, only the root project changed versions, causing Scala 2.13 builds to download wrong Spark binaries

Added job timeouts:

  • Added 30-minute timeout
  • Prevents jobs from hanging for the default 6-hour timeout
  • Provides faster feedback when downloads fail

Compatibility

✅ What's Compatible

Spark 3.3.4 is compatible with Spark 3.x APIs:

  • Core Spark SQL/DataFrame APIs remain stable
  • Most user code written for Spark 3.0-3.2 will work without changes
  • Polynote notebooks using Spark 3.1/3.2 should work with 3.3.4 but no longer verified

⚠️ Breaking Changes

Not compatible with Spark 2.x

Build improvements:

  • Using sbt ++ ensures consistent Scala version across all projects
  • Fixes issue where Scala 2.13 builds downloaded wrong Spark distribution
  • 30-minute timeout prevents stuck CI jobs

This was referenced Nov 5, 2025
Copy link
Collaborator Author

wisechengyi commented Nov 5, 2025

Copy link

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull Request Overview

This PR upgrades Apache Spark from version 3.1 to 3.3, requiring updates to the Spark SQL parser API which changed class names between versions. The changes update class references from SqlBaseBaseVisitor/SqlBaseBaseListener to SqlBaseParserBaseVisitor and migrate from a listener pattern to a visitor pattern for parsing table identifiers.

Key changes:

  • Updated Spark SQL parser class references to match Spark 3.3 API
  • Migrated table identifier extraction from listener pattern to visitor pattern
  • Updated Docker build configuration to use Spark 3.3.4

Reviewed Changes

Copilot reviewed 3 out of 4 changed files in this pull request and generated 1 comment.

File Description
polynote-spark/src/main/scala/polynote/kernel/interpreter/sql/SparkSqlInterpreter.scala Updated class name from SqlBaseBaseVisitor to SqlBaseParserBaseVisitor to match Spark 3.3 API
polynote-spark/src/main/scala/polynote/kernel/interpreter/sql/Parser.scala Migrated from listener-based to visitor-based pattern for extracting table identifiers
.github/workflows/dist.yml Updated Docker build to use Spark 3.3.4 instead of previous versions

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

.github/workflows/dist.yml Show resolved Hide resolved
@jonathanindig
Copy link
Collaborator

Thanks for the PR @wisechengyi ! Could you please add a description and whether or not these changes are compatible with older Spark 3.x versions?

If they aren't can you please update the changelog (looks like we haven't done a great job of keeping that up to date unfortunately)? And we'll likely want to bump the minor version when we release.

@wisechengyi wisechengyi force-pushed the testextract branch 2 times, most recently from f4dbb79 to 695e42c Compare November 7, 2025 18:45
@wisechengyi wisechengyi changed the base branch from testextract to graphite-base/1478 November 7, 2025 19:35
@graphite-app graphite-app bot changed the base branch from graphite-base/1478 to master November 7, 2025 19:35
@wisechengyi wisechengyi marked this pull request as draft November 12, 2025 02:01
@wisechengyi wisechengyi marked this pull request as ready for review November 14, 2025 02:58
Copy link
Collaborator

@jonathanindig jonathanindig left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, much lower-touch than the previous change 🎉

@wisechengyi wisechengyi merged commit 24463d8 into master Nov 17, 2025
4 checks passed
@wisechengyi wisechengyi deleted the bumpsparkto3.3 branch November 17, 2025 18:52
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants

Morty Proxy This is a proxified and sanitized view of the page, visit original site.