Spark 3.5 Support#1468
Closed
wisechengyi wants to merge 6 commits intopolynote:masterpolynote/polynote:masterfrom
Closed
Spark 3.5 Support#1468wisechengyi wants to merge 6 commits intopolynote:masterpolynote/polynote:masterfrom
wisechengyi wants to merge 6 commits intopolynote:masterpolynote/polynote:masterfrom
Conversation
added 3 commits
October 27, 2025 14:23
wisechengyi
commented
Oct 28, 2025
Comment on lines
+281
to
+302
| // Use reflection to support both Spark 3.3 (RowEncoder.apply) and 3.5+ (Encoders.row) | ||
| // In Spark 3.5+, RowEncoder.apply was removed and we need to use Encoders.row instead | ||
| // In Spark 3.3, RowEncoder.apply returns ExpressionEncoder directly | ||
| val expressionEncoder = try { | ||
| // Try Spark 3.5+ API: Encoders.row(schema).asInstanceOf[ExpressionEncoder] | ||
| val encodersClass = Class.forName("org.apache.spark.sql.Encoders$") | ||
| val encodersModule = encodersClass.getField("MODULE$").get(null) | ||
| val rowMethod = encodersClass.getMethod("row", classOf[org.apache.spark.sql.types.StructType]) | ||
| val encoder = rowMethod.invoke(encodersModule, prototype.schema) | ||
| encoder.asInstanceOf[org.apache.spark.sql.catalyst.encoders.ExpressionEncoder[Row]] | ||
| } catch { | ||
| case _: ClassCastException | _: NoSuchMethodException => | ||
| // Fall back to Spark 3.3 API: RowEncoder.apply(schema) | ||
| val method = RowEncoder.getClass.getMethod("apply", classOf[org.apache.spark.sql.types.StructType]) | ||
| method.invoke(RowEncoder, prototype.schema).asInstanceOf[org.apache.spark.sql.catalyst.encoders.ExpressionEncoder[Row]] | ||
| } | ||
| val boundEncoder = expressionEncoder.resolveAndBind() | ||
| val serializerMethod = boundEncoder.getClass.getMethod("createSerializer") | ||
| val serializer = serializerMethod.invoke(boundEncoder) | ||
| // The serializer converts Row to InternalRow | ||
| val applyMethod = serializer.getClass.getMethod("apply", classOf[Object]) | ||
| (row: Row) => applyMethod.invoke(serializer, row.asInstanceOf[Object]).asInstanceOf[org.apache.spark.sql.catalyst.InternalRow] |
Collaborator
Author
There was a problem hiding this comment.
This is the hack of using reflections in both 3.3. and 3.5 to make the code compatible for them during build and run time.
wisechengyi
commented
Oct 28, 2025
Comment on lines
+35
to
+51
| // Visitor to extract table identifiers | ||
| val tableIdentifierVisitor = new SqlBaseParserBaseVisitor[Unit] { | ||
| override def visitTableIdentifier(ctx: SqlBaseParser.TableIdentifierContext): Unit = { | ||
| val db = Option(ctx.db).map(_.getText).filter(_.nonEmpty) | ||
| val name = Option(ctx.table).map(_.getText).getOrElse("") | ||
| tableIdentifiers += Parser.TableIdentifier(db, name) | ||
| super.visitTableIdentifier(ctx) | ||
| } | ||
| }) | ||
|
|
||
| override def defaultResult(): Unit = () | ||
| override def aggregateResult(aggregate: Unit, nextResult: Unit): Unit = () | ||
| } | ||
|
|
||
| try { | ||
| val statement = parser.singleStatement() | ||
| // Visit the parse tree to extract table identifiers | ||
| tableIdentifierVisitor.visit(statement) |
Collaborator
Author
There was a problem hiding this comment.
claude:
Summary: Spark 3.3+ SQL Parser Compatibility Fix
Problem Identified
In Spark 3.3+, the ANTLR grammar was split from SqlBase.g4 into separate SqlBaseParser.g4 and SqlBaseLexer.g4 files, and the build configuration changed to only generate visitor classes (not
listener classes).
API Changes:
- Spark < 3.3: SqlBaseBaseListener and SqlBaseBaseVisitor
- Spark 3.3+: SqlBaseParserBaseListener (not generated) and SqlBaseParserBaseVisitor
Collaborator
Author
|
close in favor of #1479 |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Add Support for Spark 3.3.4 and 3.5.7
This PR adds support for Apache Spark 3.3.4 and 3.5.7 while removing support for older versions (2.4.5, 3.1.2, 3.2.1).
Key Changes
1. Build Configuration (build.sbt)
sparkVersionsmap to support Spark 3.3.4 and 3.5.7 for both Scala 2.12 and 2.13SPARK_VERSIONenv var)2. CI Workflows (.github/workflows/)
Split CI into 4 separate workflows for better isolation and clarity:
ci-backend-2.12-spark3.3.yml- Tests Scala 2.12 + Spark 3.3.4ci-backend-2.12-spark3.5.yml- Tests Scala 2.12 + Spark 3.5.7ci-backend-2.13-spark3.3.yml- Tests Scala 2.13 + Spark 3.3.4ci-backend-2.13-spark3.5.yml- Tests Scala 2.13 + Spark 3.5.7Additional improvements:
pipwithuvfor faster Python dependency installation (test time went from ~7:15m -> ~5:45m)dist.ymlto build Docker images for Spark 3.3 and 3.5 (removed 2.4 and 3.1)3. Spark SQL Parser Compatibility
Files:
polynote-spark/src/main/scala/polynote/kernel/interpreter/sql/Changes in Parser.scala:
SqlBaseBaseListener→SqlBaseParserBaseVisitorChanges in SparkSqlInterpreter.scala:
SqlBaseBaseVisitor→SqlBaseParserBaseVisitorReason: Spark 3.3+ renamed parser classes and only generates visitors (not listeners) in the ANTLR build configuration.
4. Row Encoder Compatibility
File:
polynote-spark-runtime/src/main/scala/polynote/runtime/spark/reprs/SparkReprsOf.scalaImplemented reflection-based solution to support both Spark versions at runtime:
Reason: Spark 3.5+ removed
RowEncoder.apply()and replaced it withEncoders.row(). Using reflection allows the same compiled binary to work with both versions.5. Documentation (docs-site/docs/docs/installation.md)
Updated to reflect currently supported Spark versions:
Compatibility
SPARK_VERSIONenvironment variableTesting
Each Scala/Spark version combination is tested independently in CI: