Bump Spark to 3.3#1478
Conversation
37903d9 to
f39a36a
Compare
This stack of pull requests is managed by Graphite. Learn more about stacking. |
f39a36a to
fe66502
Compare
There was a problem hiding this comment.
Pull Request Overview
This PR upgrades Apache Spark from version 3.1 to 3.3, requiring updates to the Spark SQL parser API which changed class names between versions. The changes update class references from SqlBaseBaseVisitor/SqlBaseBaseListener to SqlBaseParserBaseVisitor and migrate from a listener pattern to a visitor pattern for parsing table identifiers.
Key changes:
- Updated Spark SQL parser class references to match Spark 3.3 API
- Migrated table identifier extraction from listener pattern to visitor pattern
- Updated Docker build configuration to use Spark 3.3.4
Reviewed Changes
Copilot reviewed 3 out of 4 changed files in this pull request and generated 1 comment.
| File | Description |
|---|---|
| polynote-spark/src/main/scala/polynote/kernel/interpreter/sql/SparkSqlInterpreter.scala | Updated class name from SqlBaseBaseVisitor to SqlBaseParserBaseVisitor to match Spark 3.3 API |
| polynote-spark/src/main/scala/polynote/kernel/interpreter/sql/Parser.scala | Migrated from listener-based to visitor-based pattern for extracting table identifiers |
| .github/workflows/dist.yml | Updated Docker build to use Spark 3.3.4 instead of previous versions |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
polynote-spark/src/main/scala/polynote/kernel/interpreter/sql/Parser.scala
Outdated
Show resolved
Hide resolved
|
Thanks for the PR @wisechengyi ! Could you please add a description and whether or not these changes are compatible with older Spark 3.x versions? If they aren't can you please update the changelog (looks like we haven't done a great job of keeping that up to date unfortunately)? And we'll likely want to bump the minor version when we release. |
5af9d37 to
647b363
Compare
fe66502 to
e421d30
Compare
f4dbb79 to
695e42c
Compare
e421d30 to
900679b
Compare
900679b to
2daacfd
Compare
695e42c to
13ef697
Compare
2daacfd to
7922bc2
Compare
5eab6ec to
af9954b
Compare
af9954b to
ffbddaf
Compare
9267637 to
7c6c8b5
Compare
jonathanindig
left a comment
There was a problem hiding this comment.
LGTM, much lower-touch than the previous change 🎉
Summary
Upgrades Polynote to use Apache Spark 3.3.4 (from 3.1.2/3.2.1), moving to Hadoop 3.x binaries and fixing critical CI issues.
Changes
1. Spark Version Upgrade
3.1.2→3.3.43.2.1→3.3.42. Binary Distribution Updates
spark-X.Y.Z-bin-hadoop2.7(Hadoop 2.7)spark-3.3.4-bin-hadoop3spark-3.3.4-bin-hadoop3-scala2.133. Docker Image Updates
polynote:${VERSION}-${SCALA_VERSION}-spark3.3latesttag to point to Spark 3.3 images4. Code Compatibility Fixes
SqlBaseBaseListener→SqlBaseParserBaseListenerSqlBaseBaseVisitor→SqlBaseParserBaseVisitor5. CI/Build Improvements
Fixed Scala version handling:
sbt "set scalaVersion := ..."tosbt ++${SCALA_VERSION}polynote-spark) use the correct Scala versionAdded job timeouts:
Compatibility
✅ What's Compatible
Spark 3.3.4 is compatible with Spark 3.x APIs:
Not compatible with Spark 2.x
Build improvements:
sbt ++ensures consistent Scala version across all projects