Tags: polynote/polynote
Tags
Spark 3.5 Support (#1479) ## Summary **Add Spark 3.5 support alongside existing Spark 3.3 support** with automatic runtime selection based on configuration. Users can now choose between Spark 3.3 and Spark 3.5 by setting the `spark_version` field in their notebook's property set configuration. The appropriate runtime will be automatically selected without requiring separate builds or installations. ## Spark Version Support This PR adds full support for **Apache Spark 3.5** while maintaining backward compatibility with **Spark 3.3**. ### Configuration Users can specify their preferred Spark version in the configuration YAML. Each property set represents a single Spark version, with `version_configs` specifying settings per Scala version: ```yaml spark: property_sets: - name: BDP / Spark 3.5 properties: spark.driver.memory: 4g spark.executor.memory: 2g version_configs: - version_number: "2.12" spark_version: "3.5" spark_submit_args: "--cluster h2query --ver 3.5 --deploy-mode client" - name: BDP / Spark 3.3 properties: spark.driver.memory: 2g spark.executor.memory: 1g version_configs: - version_number: "2.12" spark_version: "3.3" spark_submit_args: "--cluster h2query --ver 3.3 --deploy-mode client" ``` The system will automatically load the correct Spark runtime JAR based on the notebook's selected property set template. If no template is selected or `spark_version` is not specified, defaults to Spark 3.3 for backward compatibility. ### Compatibility - **Spark 3.3.4** - Fully supported (existing) - **Spark 3.5.7** - Fully supported (new) - **Scala 2.12 & 2.13** - Both supported with each Spark version ## Implementation Details <details> <summary>Technical changes (click to expand)</summary> ### Configuration Structure - Moved `spark_version` from `SparkPropertySet` to `ScalaVersionConfig` (under `version_configs`) - Each property set represents a single Spark version (indicated by name, e.g., "BDP / Spark 3.5") - `version_configs` contains Scala-version-specific settings, all with the same `spark_version` - Simplified runtime JAR selection logic to look up notebook's template → extract `spark_version` from `version_configs` ### Build & Runtime - Added version-specific source directories for Spark API compatibility (`spark_3.3/`, `spark_3.5/`) - Created `SparkVersionCompat` abstraction layer for RowEncoder API differences - Modified build system to cross-compile for multiple Spark versions - Consolidated CI workflows into matrix strategy (Scala 2.12/2.13 × Spark 3.3.4/3.5.7) - Runtime JAR selection: `deps/{scala_version}/spark-{spark_version}/polynote-spark-runtime.jar` </details> ## Test Plan - ✅ CI matrix builds for all Scala/Spark combinations - ✅ RowEncoder compatibility tests across Spark versions - ✅ Configuration parsing and validation tests - ✅ Runtime JAR selection based on `version_configs
Extract test script (#1476) With a later goal to have a single command to run tests locally as well. ## Problem **Duplicated build logic across CI workflows**: The build and test steps were duplicated across both `ci-backend-2.12.yml` and `ci-backend-2.13.yml` workflows, with identical multi-line bash scripts. This violates DRY principles, makes maintenance difficult, and uses the slower `pip` for Python dependency installation. ## Solution ### Extracted build logic into reusable script Created `.github/scripts/build-and-test.sh`: - Single parameterized script that takes Scala version as argument - Replaces duplicated inline bash code in both workflow files - Uses `uv` instead of `pip` for significantly faster Python dependency installation - Creates and activates a Python virtual environment in `.venv` for better isolation - Auto-installs `uv` if not present - Includes proper error handling with `set -euo pipefail` - Can be tested locally: `./build-and-test.sh 2.12.12` ### Simplified CI workflows Both `ci-backend-2.12.yml` and `ci-backend-2.13.yml` now use the same script: ```yaml - name: Build run: ./.github/scripts/build-and-test.sh <scala-version> ``` Benefits - DRY code: Build logic defined once, reused across both Scala 2.12 and 2.13 workflows - Faster CI: uv provides 10-100x faster package installation compared to pip **(eyeballing reduced to 6 min from 8 min total runtime)** - Easier maintenance: Changes to build process only need to be made in one place - Better debuggability: Script can be run locally to reproduce CI issues - Better isolation: Uses virtual environment for Python dependencies
PreviousNext