-
Notifications
You must be signed in to change notification settings - Fork 391
Spark 3.5 Support #1479
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Spark 3.5 Support #1479
Conversation
This stack of pull requests is managed by Graphite. Learn more about stacking. |
ec43752 to
30bc3a1
Compare
30bc3a1 to
231cdff
Compare
757b106 to
6635e4a
Compare
6635e4a to
82dfca2
Compare
7dddda2 to
1f172aa
Compare
1f172aa to
a82fd39
Compare
a82fd39 to
ad80c30
Compare
ad80c30 to
dc600df
Compare
dc600df to
7ab59a7
Compare
f0ee547 to
65b6b10
Compare
7ab59a7 to
d2557ec
Compare
We need to:
1. Build multiple runtime JARs - one for each Spark version combination
2. Organize them in the distribution by Scala AND Spark version:
deps/2.12/spark-3.3/polynote-spark-runtime.jar
deps/2.12/spark-3.5/polynote-spark-runtime.jar
deps/2.13/spark-3.3/polynote-spark-runtime.jar
deps/2.13/spark-3.5/polynote-spark-runtime.jar
3. Modify the kernel to select the right runtime JAR based on:
- Scala version (from compiler)
- Spark version (from notebook config, dependencies, or SPARK_HOME)
Would you like me to implement this approach? It will require:
- Updating build.sbt to build multiple runtime JARs
- Updating the dist task to organize JARs by version
- Modifying LocalSparkKernel.scala to select the correct runtime JAR
22ed6da to
afd5fd3
Compare
65b6b10 to
0f4a417
Compare
afd5fd3 to
9ca0ab5
Compare
542bbaa to
96d92c3
Compare
553087b to
5757c58
Compare
f355c76 to
4bd5c7f
Compare
This reverts commit 4bd5c7f.
| versionConfigs = Some(List( | ||
| ScalaVersionConfig("some version", Map("arbitrary.spark.args" -> "anything"), sparkSubmitArgs = Some("some args")), | ||
| ScalaVersionConfig("some version 2", Map("arbitrary.spark.args" -> "anything else"), sparkSubmitArgs = Some("different submit args")) | ||
| ScalaVersionConfig("some version", Map("arbitrary.spark.args" -> "anything"), sparkSubmitArgs = Some("some args"), sparkVersion = "3.3"), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should we pull out "3.3" into a constant somewhere so it'll be easier to bump the default version value in the future?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
| } | ||
|
|
||
| constructor(readonly versionName: string, readonly versionProperties: Record<string, string>, readonly sparkSubmitArgs?: string) { | ||
| constructor(readonly versionName: string, readonly versionProperties: Record<string, string>, readonly sparkSubmitArgs?: string, readonly sparkVersion: string = "3.3") { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Are you planning on adding support to select the Spark version in the UI?
It should be pretty mechanical - basically just modify
polynote/polynote-frontend/polynote/ui/component/notebook/notebookconfig.ts
Lines 446 to 588 in 0f4a417
| class ScalaSparkConf extends Disposable { | |
| readonly el: TagElement<"div">; | |
| private container: TagElement<"div">; | |
| private templateEl: DropdownElement; | |
| private scalaVersionInput: DropdownElement; | |
| private notebookSparkTemplates: SparkPropertySet[]; // keep our own state to handle templates that don't exist on the server | |
| constructor(configState: StateView<NotebookConfig>, private allTemplatesHandler: StateView<SparkPropertySet[]>, stateHandler: StateHandler<NBConfig>) { | |
| super() | |
| this.templateEl = dropdown([], Object.fromEntries([["", "None"]]), ); | |
| this.container = div(['spark-config-list'], []); | |
| this.notebookSparkTemplates = []; | |
| const sparkConfHandler = configState.view("sparkConfig"); | |
| const templateHandler = configState.view("sparkTemplate"); | |
| const scalaVersionHandler = configState.view("scalaVersion"); | |
| // TODO: this could come from the server | |
| const availableScalaVersions = [ | |
| {key: "2.11", value: "2.11"}, | |
| {key: "2.12", value: "2.12"}, | |
| {key: "2.13", value: "2.13"} | |
| ]; | |
| this.el = div(['notebook-spark-config', 'notebook-config-section', 'open'], [ | |
| h3([], ['Scala and Spark configuration']).click(() => stateHandler.updateField('openSpark', openDependencies => setValue(!openDependencies))),, | |
| div(['notebook-config-section-content'], [ | |
| para([], ['Set the Scala and Spark configuration for this notebook here. Please note that it is possible that your environment may override some of these settings at runtime :(']), | |
| div(['notebook-config-row'], [h4([], ['Spark template:']), this.templateEl, h4([], ['Spark properties:']), this.container]), | |
| h4([], 'Scala version:'), | |
| para([], ['If you have selected a Spark template, this will read from the associated versionConfigs key to display compatible Scala versions.']), | |
| this.scalaVersionInput = dropdown(['scala-version'], {}), | |
| ]) | |
| ]) | |
| const setConf = (conf: Record<string, string> | undefined) => { | |
| this.container.innerHTML = ""; | |
| if (conf && Object.keys(conf).length > 0) { | |
| Object.entries(conf).forEach(([key, val]) => { | |
| this.addConf({key, val}) | |
| }) | |
| } else { | |
| this.addConf() | |
| } | |
| } | |
| setConf(sparkConfHandler.state) | |
| sparkConfHandler.addObserver(conf => setConf(conf)).disposeWith(this) | |
| // populate the templates element. | |
| const updatedTemplates = (templates: SparkPropertySet[]) => { | |
| this.notebookSparkTemplates = this.notebookSparkTemplates.concat(templates); | |
| templates.forEach(tmpl => { | |
| this.templateEl.addValue(tmpl.name, tmpl.name) | |
| }) | |
| } | |
| updatedTemplates(allTemplatesHandler.state) | |
| allTemplatesHandler.addObserver(templates => updatedTemplates(templates)).disposeWith(this) | |
| // watch for changes in the config's template | |
| const setTemplate = (template: SparkPropertySet | undefined) => { | |
| if (template && !this.notebookSparkTemplates.some(el => el.name === template.name)) { | |
| // if we don't recognize the template defined in the config, add it to this notebook only | |
| this.templateEl.addValue(template.name, template.name); | |
| this.notebookSparkTemplates.push(template); | |
| } | |
| this.templateEl.setSelectedValue(template?.name ?? "") | |
| } | |
| setTemplate(templateHandler.state) | |
| templateHandler.addObserver(template => { | |
| setTemplate(template); | |
| populateScalaVersions(template); // update displayed Scala versions | |
| }).disposeWith(this) | |
| stateHandler.view('openSpark').addObserver(open => { | |
| toggleConfigVisibility(open, this.el); | |
| }).disposeWith(this) | |
| // list the Scala versions coming from the version configurations associated with the selected Spark template | |
| const populateScalaVersions = (selectedTemplate: SparkPropertySet | undefined) => { | |
| this.scalaVersionInput.clearValues(); | |
| const scalaVersions = (selectedTemplate?.versionConfigs) ? | |
| selectedTemplate?.versionConfigs.map(versionConfig => ({key: versionConfig.versionName, value: versionConfig.versionName})) : | |
| [{key: "Default", value: "Default"}, ...availableScalaVersions]; | |
| scalaVersions.forEach(version => this.scalaVersionInput.addValue(version.key, version.value)); | |
| this.scalaVersionInput.setSelectedValue(scalaVersionHandler.state || ""); | |
| } | |
| // update displayed Scala versions when a different Spark template is selected | |
| this.templateEl.onSelect((newValue) => { | |
| const newSparkTemplate = this.notebookSparkTemplates.find(tmpl => tmpl.name === newValue); | |
| populateScalaVersions(newSparkTemplate); | |
| }); | |
| scalaVersionHandler.addObserver(version => { | |
| this.scalaVersionInput.setSelectedValue(version || "") | |
| }).disposeWith(this); | |
| } | |
| private addConf(item?: {key: string, val: string}) { | |
| const data = item ?? {key: "", val: ""} | |
| const key = textbox(['spark-config-key'], 'key', data.key).change(() => { | |
| data.key = key.value.trim() | |
| }) | |
| const val = textbox(['spark-config-val'], 'val', data.val).change(() => { | |
| data.val = val.value.trim() | |
| }) | |
| const remove = iconButton(['remove'], 'Remove', 'minus-circle-red', 'Remove').click(evt => { | |
| this.container.removeChild(row); | |
| if (this.container.children.length === 0) this.addConf() | |
| }) | |
| const add = iconButton(['add'], 'Add', 'plus-circle', 'Add').click(evt => { | |
| this.addConf() | |
| }) | |
| const row = Object.assign( | |
| div(['exclusion-row', 'notebook-config-row'], [key, val, remove, add]), | |
| { data }); | |
| this.container.appendChild(row) | |
| } | |
| get conf(): Record<string, string> { | |
| return Array.from(this.container.children).reduce<Record<string, string>>((acc, row: HTMLDivElement & {data: {key: string, val: string}}) => { | |
| if (row.data.key) acc[row.data.key] = row.data.val | |
| return acc | |
| }, {}) | |
| } | |
| get template(): SparkPropertySet | undefined { | |
| const name = this.templateEl.options[this.templateEl.selectedIndex].value; | |
| return this.notebookSparkTemplates.find(tmpl => tmpl.name === name); | |
| } | |
| get scalaVersion(): string | undefined { | |
| return this.scalaVersionInput.getSelectedValue() || undefined; | |
| } | |
| } |
sparkVersionInput field following the existing scalaVersionInput (and associated scala version references)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the pointer! This part is to fix the UI as we added spark_version in version_configs, and the UI would fail to load the config without it.
If needed, I can follow up on the spark version selection. At the moment, using the preset template would probably suffice.
jonathanindig
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @wisechengyi , looks good to me once it passes final beta testing :)
Summary
Add Spark 3.5 support alongside existing Spark 3.3 support with automatic runtime selection based on configuration.
Users can now choose between Spark 3.3 and Spark 3.5 by setting the
spark_versionfield in their notebook's property set configuration. The appropriate runtime will be automatically selected without requiring separate builds or installations.Spark Version Support
This PR adds full support for Apache Spark 3.5 while maintaining backward compatibility with Spark 3.3.
Configuration
Users can specify their preferred Spark version in the configuration YAML. Each property set represents a single Spark version, with
version_configsspecifying settings per Scala version:The system will automatically load the correct Spark runtime JAR based on the notebook's selected property set template. If no template is selected or
spark_versionis not specified, defaults to Spark 3.3 for backward compatibility.Compatibility
Implementation Details
Technical changes (click to expand)
Configuration Structure
spark_versionfromSparkPropertySettoScalaVersionConfig(underversion_configs)version_configscontains Scala-version-specific settings, all with the samespark_versionspark_versionfromversion_configsBuild & Runtime
spark_3.3/,spark_3.5/)SparkVersionCompatabstraction layer for RowEncoder API differencesdeps/{scala_version}/spark-{spark_version}/polynote-spark-runtime.jarTest Plan