Skip to content

Navigation Menu

Sign in
Appearance settings

Search code, repositories, users, issues, pull requests...

Provide feedback

We read every piece of feedback, and take your input very seriously.

Saved searches

Use saved searches to filter your results more quickly

Appearance settings

Conversation

@wisechengyi
Copy link
Collaborator

@wisechengyi wisechengyi commented Nov 5, 2025

Summary

Add Spark 3.5 support alongside existing Spark 3.3 support with automatic runtime selection based on configuration.

Users can now choose between Spark 3.3 and Spark 3.5 by setting the spark_version field in their notebook's property set configuration. The appropriate runtime will be automatically selected without requiring separate builds or installations.

Spark Version Support

This PR adds full support for Apache Spark 3.5 while maintaining backward compatibility with Spark 3.3.

Configuration

Users can specify their preferred Spark version in the configuration YAML. Each property set represents a single Spark version, with version_configs specifying settings per Scala version:

spark:
  property_sets:
    - name: BDP / Spark 3.5
      properties:
        spark.driver.memory: 4g
        spark.executor.memory: 2g
      version_configs:
        - version_number: "2.12"
          spark_version: "3.5"
          spark_submit_args: "--cluster h2query --ver 3.5 --deploy-mode client"

    - name: BDP / Spark 3.3
      properties:
        spark.driver.memory: 2g
        spark.executor.memory: 1g
      version_configs:
        - version_number: "2.12"
          spark_version: "3.3"
          spark_submit_args: "--cluster h2query --ver 3.3 --deploy-mode client"

The system will automatically load the correct Spark runtime JAR based on the notebook's selected property set template. If no template is selected or spark_version is not specified, defaults to Spark 3.3 for backward compatibility.

Compatibility

  • Spark 3.3.4 - Fully supported (existing)
  • Spark 3.5.7 - Fully supported (new)
  • Scala 2.12 & 2.13 - Both supported with each Spark version

Implementation Details

Technical changes (click to expand)

Configuration Structure

  • Moved spark_version from SparkPropertySet to ScalaVersionConfig (under version_configs)
  • Each property set represents a single Spark version (indicated by name, e.g., "BDP / Spark 3.5")
  • version_configs contains Scala-version-specific settings, all with the same spark_version
  • Simplified runtime JAR selection logic to look up notebook's template → extract spark_version from version_configs

Build & Runtime

  • Added version-specific source directories for Spark API compatibility (spark_3.3/, spark_3.5/)
  • Created SparkVersionCompat abstraction layer for RowEncoder API differences
  • Modified build system to cross-compile for multiple Spark versions
  • Consolidated CI workflows into matrix strategy (Scala 2.12/2.13 × Spark 3.3.4/3.5.7)
  • Runtime JAR selection: deps/{scala_version}/spark-{spark_version}/polynote-spark-runtime.jar

Test Plan

  • ✅ CI matrix builds for all Scala/Spark combinations
  • ✅ RowEncoder compatibility tests across Spark versions
  • ✅ Configuration parsing and validation tests
  • ✅ Runtime JAR selection based on `version_configs

Copy link
Collaborator Author

wisechengyi commented Nov 5, 2025

@wisechengyi wisechengyi force-pushed the spark3.5_2 branch 2 times, most recently from 7dddda2 to 1f172aa Compare November 11, 2025 19:45
@wisechengyi wisechengyi mentioned this pull request Nov 14, 2025
@wisechengyi wisechengyi force-pushed the jdk17 branch 2 times, most recently from f0ee547 to 65b6b10 Compare November 14, 2025 03:14
@wisechengyi wisechengyi changed the title cross 3.5 Spark 3.5 Support Nov 15, 2025
@wisechengyi wisechengyi marked this pull request as ready for review November 17, 2025 05:56
build.sbt Show resolved Hide resolved
Yi Cheng added 3 commits November 17, 2025 19:56
  We need to:

  1. Build multiple runtime JARs - one for each Spark version combination
  2. Organize them in the distribution by Scala AND Spark version:
  deps/2.12/spark-3.3/polynote-spark-runtime.jar
  deps/2.12/spark-3.5/polynote-spark-runtime.jar
  deps/2.13/spark-3.3/polynote-spark-runtime.jar
  deps/2.13/spark-3.5/polynote-spark-runtime.jar
  3. Modify the kernel to select the right runtime JAR based on:
    - Scala version (from compiler)
    - Spark version (from notebook config, dependencies, or SPARK_HOME)

  Would you like me to implement this approach? It will require:
  - Updating build.sbt to build multiple runtime JARs
  - Updating the dist task to organize JARs by version
  - Modifying LocalSparkKernel.scala to select the correct runtime JAR
@graphite-app graphite-app bot changed the base branch from graphite-base/1479 to master November 17, 2025 19:56
@polynote polynote deleted a comment from Copilot AI Nov 18, 2025
versionConfigs = Some(List(
ScalaVersionConfig("some version", Map("arbitrary.spark.args" -> "anything"), sparkSubmitArgs = Some("some args")),
ScalaVersionConfig("some version 2", Map("arbitrary.spark.args" -> "anything else"), sparkSubmitArgs = Some("different submit args"))
ScalaVersionConfig("some version", Map("arbitrary.spark.args" -> "anything"), sparkSubmitArgs = Some("some args"), sparkVersion = "3.3"),
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we pull out "3.3" into a constant somewhere so it'll be easier to bump the default version value in the future?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

}

constructor(readonly versionName: string, readonly versionProperties: Record<string, string>, readonly sparkSubmitArgs?: string) {
constructor(readonly versionName: string, readonly versionProperties: Record<string, string>, readonly sparkSubmitArgs?: string, readonly sparkVersion: string = "3.3") {
Copy link
Collaborator

@jonathanindig jonathanindig Nov 19, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are you planning on adding support to select the Spark version in the UI?

It should be pretty mechanical - basically just modify

class ScalaSparkConf extends Disposable {
readonly el: TagElement<"div">;
private container: TagElement<"div">;
private templateEl: DropdownElement;
private scalaVersionInput: DropdownElement;
private notebookSparkTemplates: SparkPropertySet[]; // keep our own state to handle templates that don't exist on the server
constructor(configState: StateView<NotebookConfig>, private allTemplatesHandler: StateView<SparkPropertySet[]>, stateHandler: StateHandler<NBConfig>) {
super()
this.templateEl = dropdown([], Object.fromEntries([["", "None"]]), );
this.container = div(['spark-config-list'], []);
this.notebookSparkTemplates = [];
const sparkConfHandler = configState.view("sparkConfig");
const templateHandler = configState.view("sparkTemplate");
const scalaVersionHandler = configState.view("scalaVersion");
// TODO: this could come from the server
const availableScalaVersions = [
{key: "2.11", value: "2.11"},
{key: "2.12", value: "2.12"},
{key: "2.13", value: "2.13"}
];
this.el = div(['notebook-spark-config', 'notebook-config-section', 'open'], [
h3([], ['Scala and Spark configuration']).click(() => stateHandler.updateField('openSpark', openDependencies => setValue(!openDependencies))),,
div(['notebook-config-section-content'], [
para([], ['Set the Scala and Spark configuration for this notebook here. Please note that it is possible that your environment may override some of these settings at runtime :(']),
div(['notebook-config-row'], [h4([], ['Spark template:']), this.templateEl, h4([], ['Spark properties:']), this.container]),
h4([], 'Scala version:'),
para([], ['If you have selected a Spark template, this will read from the associated versionConfigs key to display compatible Scala versions.']),
this.scalaVersionInput = dropdown(['scala-version'], {}),
])
])
const setConf = (conf: Record<string, string> | undefined) => {
this.container.innerHTML = "";
if (conf && Object.keys(conf).length > 0) {
Object.entries(conf).forEach(([key, val]) => {
this.addConf({key, val})
})
} else {
this.addConf()
}
}
setConf(sparkConfHandler.state)
sparkConfHandler.addObserver(conf => setConf(conf)).disposeWith(this)
// populate the templates element.
const updatedTemplates = (templates: SparkPropertySet[]) => {
this.notebookSparkTemplates = this.notebookSparkTemplates.concat(templates);
templates.forEach(tmpl => {
this.templateEl.addValue(tmpl.name, tmpl.name)
})
}
updatedTemplates(allTemplatesHandler.state)
allTemplatesHandler.addObserver(templates => updatedTemplates(templates)).disposeWith(this)
// watch for changes in the config's template
const setTemplate = (template: SparkPropertySet | undefined) => {
if (template && !this.notebookSparkTemplates.some(el => el.name === template.name)) {
// if we don't recognize the template defined in the config, add it to this notebook only
this.templateEl.addValue(template.name, template.name);
this.notebookSparkTemplates.push(template);
}
this.templateEl.setSelectedValue(template?.name ?? "")
}
setTemplate(templateHandler.state)
templateHandler.addObserver(template => {
setTemplate(template);
populateScalaVersions(template); // update displayed Scala versions
}).disposeWith(this)
stateHandler.view('openSpark').addObserver(open => {
toggleConfigVisibility(open, this.el);
}).disposeWith(this)
// list the Scala versions coming from the version configurations associated with the selected Spark template
const populateScalaVersions = (selectedTemplate: SparkPropertySet | undefined) => {
this.scalaVersionInput.clearValues();
const scalaVersions = (selectedTemplate?.versionConfigs) ?
selectedTemplate?.versionConfigs.map(versionConfig => ({key: versionConfig.versionName, value: versionConfig.versionName})) :
[{key: "Default", value: "Default"}, ...availableScalaVersions];
scalaVersions.forEach(version => this.scalaVersionInput.addValue(version.key, version.value));
this.scalaVersionInput.setSelectedValue(scalaVersionHandler.state || "");
}
// update displayed Scala versions when a different Spark template is selected
this.templateEl.onSelect((newValue) => {
const newSparkTemplate = this.notebookSparkTemplates.find(tmpl => tmpl.name === newValue);
populateScalaVersions(newSparkTemplate);
});
scalaVersionHandler.addObserver(version => {
this.scalaVersionInput.setSelectedValue(version || "")
}).disposeWith(this);
}
private addConf(item?: {key: string, val: string}) {
const data = item ?? {key: "", val: ""}
const key = textbox(['spark-config-key'], 'key', data.key).change(() => {
data.key = key.value.trim()
})
const val = textbox(['spark-config-val'], 'val', data.val).change(() => {
data.val = val.value.trim()
})
const remove = iconButton(['remove'], 'Remove', 'minus-circle-red', 'Remove').click(evt => {
this.container.removeChild(row);
if (this.container.children.length === 0) this.addConf()
})
const add = iconButton(['add'], 'Add', 'plus-circle', 'Add').click(evt => {
this.addConf()
})
const row = Object.assign(
div(['exclusion-row', 'notebook-config-row'], [key, val, remove, add]),
{ data });
this.container.appendChild(row)
}
get conf(): Record<string, string> {
return Array.from(this.container.children).reduce<Record<string, string>>((acc, row: HTMLDivElement & {data: {key: string, val: string}}) => {
if (row.data.key) acc[row.data.key] = row.data.val
return acc
}, {})
}
get template(): SparkPropertySet | undefined {
const name = this.templateEl.options[this.templateEl.selectedIndex].value;
return this.notebookSparkTemplates.find(tmpl => tmpl.name === name);
}
get scalaVersion(): string | undefined {
return this.scalaVersionInput.getSelectedValue() || undefined;
}
}
and create a new sparkVersionInput field following the existing scalaVersionInput (and associated scala version references)

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the pointer! This part is to fix the UI as we added spark_version in version_configs, and the UI would fail to load the config without it.

If needed, I can follow up on the spark version selection. At the moment, using the preset template would probably suffice.

Copy link
Collaborator

@jonathanindig jonathanindig left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @wisechengyi , looks good to me once it passes final beta testing :)

@wisechengyi wisechengyi merged commit ba0ea1e into master Nov 20, 2025
9 of 10 checks passed
@wisechengyi wisechengyi deleted the spark3.5_2 branch November 20, 2025 21:42
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants

Morty Proxy This is a proxified and sanitized view of the page, visit original site.