Skip to content

Navigation Menu

Sign in
Appearance settings

Search code, repositories, users, issues, pull requests...

Provide feedback

We read every piece of feedback, and take your input very seriously.

Saved searches

Use saved searches to filter your results more quickly

Appearance settings

Spark 3.5 Support#1468

Closed
wisechengyi wants to merge 6 commits intopolynote:masterpolynote/polynote:masterfrom
wisechengyi:spark3.5wisechengyi/polynote:spark3.5Copy head branch name to clipboard
Closed

Spark 3.5 Support#1468
wisechengyi wants to merge 6 commits intopolynote:masterpolynote/polynote:masterfrom
wisechengyi:spark3.5wisechengyi/polynote:spark3.5Copy head branch name to clipboard

Conversation

@wisechengyi
Copy link
Collaborator

@wisechengyi wisechengyi commented Oct 27, 2025

Add Support for Spark 3.3.4 and 3.5.7

This PR adds support for Apache Spark 3.3.4 and 3.5.7 while removing support for older versions (2.4.5, 3.1.2, 3.2.1).

Key Changes

1. Build Configuration (build.sbt)

  • Updated sparkVersions map to support Spark 3.3.4 and 3.5.7 for both Scala 2.12 and 2.13
  • Default Spark version is now 3.5.7 (can be overridden with SPARK_VERSION env var)
  • Updated SHA512 checksums for new Spark versions
  • Changed from hadoop2.7 to hadoop3 for Spark 3.3+
val sparkVersions = Map(
  "2.12" -> Seq("3.3.4", "3.5.7"),
  "2.13" -> Seq("3.3.4", "3.5.7")
)

2. CI Workflows (.github/workflows/)

Split CI into 4 separate workflows for better isolation and clarity:

  • ci-backend-2.12-spark3.3.yml - Tests Scala 2.12 + Spark 3.3.4
  • ci-backend-2.12-spark3.5.yml - Tests Scala 2.12 + Spark 3.5.7
  • ci-backend-2.13-spark3.3.yml - Tests Scala 2.13 + Spark 3.3.4
  • ci-backend-2.13-spark3.5.yml - Tests Scala 2.13 + Spark 3.5.7

Additional improvements:

  • Replaced pip with uv for faster Python dependency installation (test time went from ~7:15m -> ~5:45m)
  • Updated dist.yml to build Docker images for Spark 3.3 and 3.5 (removed 2.4 and 3.1)

3. Spark SQL Parser Compatibility

Files: polynote-spark/src/main/scala/polynote/kernel/interpreter/sql/

Changes in Parser.scala:

  • Changed from listener pattern to visitor pattern due to Spark 3.3+ ANTLR changes
  • Updated class name: SqlBaseBaseListenerSqlBaseParserBaseVisitor

Changes in SparkSqlInterpreter.scala:

  • Updated import: SqlBaseBaseVisitorSqlBaseParserBaseVisitor

Reason: Spark 3.3+ renamed parser classes and only generates visitors (not listeners) in the ANTLR build configuration.

4. Row Encoder Compatibility

File: polynote-spark-runtime/src/main/scala/polynote/runtime/spark/reprs/SparkReprsOf.scala

Implemented reflection-based solution to support both Spark versions at runtime:

// Spark 3.5+: Uses Encoders.row(schema)
// Spark 3.3:  Uses RowEncoder.apply(schema)
val expressionEncoder = try {
  val encodersClass = Class.forName("org.apache.spark.sql.Encoders$")
  val encodersModule = encodersClass.getField("MODULE$").get(null)
  val rowMethod = encodersClass.getMethod("row", classOf[StructType])
  val encoder = rowMethod.invoke(encodersModule, prototype.schema)
  encoder.asInstanceOf[ExpressionEncoder[Row]]
} catch {
  case _: ClassCastException | _: NoSuchMethodException =>
    // Fall back to Spark 3.3 API
    val method = RowEncoder.getClass.getMethod("apply", classOf[StructType])
    method.invoke(RowEncoder, prototype.schema).asInstanceOf[ExpressionEncoder[Row]]
}

Reason: Spark 3.5+ removed RowEncoder.apply() and replaced it with Encoders.row(). Using reflection allows the same compiled binary to work with both versions.

5. Documentation (docs-site/docs/docs/installation.md)

Updated to reflect currently supported Spark versions:

Currently, Polynote supports the following Spark versions:

  • Scala 2.12: Spark 3.3.4, 3.5.7 (default: 3.5.7)
  • Scala 2.13: Spark 3.3.4, 3.5.7 (default: 3.5.7)

You can override the default Spark version by setting the SPARK_VERSION environment variable when building.

Compatibility

  • ✅ Both Spark versions supported in the same build using runtime reflection
  • ✅ Users can choose their Spark version at build time via SPARK_VERSION environment variable
  • ✅ Default version is 3.5.7 for both Scala 2.12 and 2.13
  • ✅ Backward compatible runtime behavior - same binary works with both Spark 3.3.4 and 3.5.7

Testing

Each Scala/Spark version combination is tested independently in CI:

  • Scala 2.12 with Spark 3.3.4
  • Scala 2.12 with Spark 3.5.7
  • Scala 2.13 with Spark 3.3.4
  • Scala 2.13 with Spark 3.5.7

@wisechengyi wisechengyi marked this pull request as ready for review October 27, 2025 19:29
@wisechengyi wisechengyi marked this pull request as draft October 27, 2025 19:32
Comment on lines +281 to +302
// Use reflection to support both Spark 3.3 (RowEncoder.apply) and 3.5+ (Encoders.row)
// In Spark 3.5+, RowEncoder.apply was removed and we need to use Encoders.row instead
// In Spark 3.3, RowEncoder.apply returns ExpressionEncoder directly
val expressionEncoder = try {
// Try Spark 3.5+ API: Encoders.row(schema).asInstanceOf[ExpressionEncoder]
val encodersClass = Class.forName("org.apache.spark.sql.Encoders$")
val encodersModule = encodersClass.getField("MODULE$").get(null)
val rowMethod = encodersClass.getMethod("row", classOf[org.apache.spark.sql.types.StructType])
val encoder = rowMethod.invoke(encodersModule, prototype.schema)
encoder.asInstanceOf[org.apache.spark.sql.catalyst.encoders.ExpressionEncoder[Row]]
} catch {
case _: ClassCastException | _: NoSuchMethodException =>
// Fall back to Spark 3.3 API: RowEncoder.apply(schema)
val method = RowEncoder.getClass.getMethod("apply", classOf[org.apache.spark.sql.types.StructType])
method.invoke(RowEncoder, prototype.schema).asInstanceOf[org.apache.spark.sql.catalyst.encoders.ExpressionEncoder[Row]]
}
val boundEncoder = expressionEncoder.resolveAndBind()
val serializerMethod = boundEncoder.getClass.getMethod("createSerializer")
val serializer = serializerMethod.invoke(boundEncoder)
// The serializer converts Row to InternalRow
val applyMethod = serializer.getClass.getMethod("apply", classOf[Object])
(row: Row) => applyMethod.invoke(serializer, row.asInstanceOf[Object]).asInstanceOf[org.apache.spark.sql.catalyst.InternalRow]
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is the hack of using reflections in both 3.3. and 3.5 to make the code compatible for them during build and run time.

Comment on lines +35 to +51
// Visitor to extract table identifiers
val tableIdentifierVisitor = new SqlBaseParserBaseVisitor[Unit] {
override def visitTableIdentifier(ctx: SqlBaseParser.TableIdentifierContext): Unit = {
val db = Option(ctx.db).map(_.getText).filter(_.nonEmpty)
val name = Option(ctx.table).map(_.getText).getOrElse("")
tableIdentifiers += Parser.TableIdentifier(db, name)
super.visitTableIdentifier(ctx)
}
})

override def defaultResult(): Unit = ()
override def aggregateResult(aggregate: Unit, nextResult: Unit): Unit = ()
}

try {
val statement = parser.singleStatement()
// Visit the parse tree to extract table identifiers
tableIdentifierVisitor.visit(statement)
Copy link
Collaborator Author

@wisechengyi wisechengyi Oct 28, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

claude:

  Summary: Spark 3.3+ SQL Parser Compatibility Fix

  Problem Identified

  In Spark 3.3+, the ANTLR grammar was split from SqlBase.g4 into separate SqlBaseParser.g4 and SqlBaseLexer.g4 files, and the build configuration changed to only generate visitor classes (not
  listener classes).

  API Changes:
  - Spark < 3.3: SqlBaseBaseListener and SqlBaseBaseVisitor
  - Spark 3.3+: SqlBaseParserBaseListener (not generated) and SqlBaseParserBaseVisitor

@wisechengyi wisechengyi changed the title Spark3.5 Spark 3.5 Support Oct 28, 2025
@wisechengyi wisechengyi marked this pull request as ready for review October 28, 2025 04:21
@wisechengyi wisechengyi marked this pull request as draft November 3, 2025 05:41
@wisechengyi
Copy link
Collaborator Author

close in favor of #1479

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant

Morty Proxy This is a proxified and sanitized view of the page, visit original site.