Skip to content

Navigation Menu

Sign in
Appearance settings

Search code, repositories, users, issues, pull requests...

Provide feedback

We read every piece of feedback, and take your input very seriously.

Saved searches

Use saved searches to filter your results more quickly

Appearance settings

Replace SQLAlchemy with ibis-framework for all database operations#66

Draft
daniel-thom wants to merge 48 commits intomainNatLabRockies/chronify:mainfrom
feat/replace-sqlalchemy-with-ibisNatLabRockies/chronify:feat/replace-sqlalchemy-with-ibisCopy head branch name to clipboard
Draft

Replace SQLAlchemy with ibis-framework for all database operations#66
daniel-thom wants to merge 48 commits intomainNatLabRockies/chronify:mainfrom
feat/replace-sqlalchemy-with-ibisNatLabRockies/chronify:feat/replace-sqlalchemy-with-ibisCopy head branch name to clipboard

Conversation

@daniel-thom
Copy link
Copy Markdown
Collaborator

@daniel-thom daniel-thom commented Apr 8, 2026

Replace the SQLAlchemy ORM with ibis-framework to provide a cleaner multi-backend abstraction for DuckDB, SQLite, and Spark. This is a clean API break: engine->backend, Connection params removed, ibis expressions replace SQLAlchemy select/join chains.

Key changes:

  • New src/chronify/ibis/ module with IbisBackend ABC and DuckDB/SQLite/Spark implementations
  • Remove SQLAlchemy, pyhive vendor code, Hive support, and related dependencies
  • Migrate all source modules (store, mappers, checker, converters) to ibis API
  • Migrate all tests to use backend fixtures instead of engine fixtures
  • Add ibis-framework[duckdb,sqlite] dependency, pyspark == 4.0.0 for spark extra

Replace the SQLAlchemy ORM with ibis-framework to provide a cleaner
multi-backend abstraction for DuckDB, SQLite, and Spark. This is a
clean API break: engine->backend, Connection params removed, ibis
expressions replace SQLAlchemy select/join chains.

Key changes:
- New src/chronify/ibis/ module with IbisBackend ABC and DuckDB/SQLite/Spark implementations
- Remove SQLAlchemy, pyhive vendor code, Hive support, and related dependencies
- Migrate all source modules (store, mappers, checker, converters) to ibis API
- Migrate all tests to use backend fixtures instead of engine fixtures
- Add ibis-framework[duckdb,sqlite] dependency, pyspark >= 4.0 for spark extra

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR migrates Chronify’s database layer from SQLAlchemy/Hive-specific code to an ibis-based backend abstraction, aiming to support multiple execution engines (DuckDB, SQLite, Spark) with a unified API.

Changes:

  • Introduces src/chronify/ibis/ with backend implementations and I/O helpers (read_query, write_table, Parquet helpers).
  • Refactors mappers/checkers/time zone utilities and tests to use IbisBackend instead of sqlalchemy.Engine/MetaData.
  • Removes SQLAlchemy + Hive/pyhive vendor code and updates project dependencies accordingly.

Reviewed changes

Copilot reviewed 53 out of 56 changed files in this pull request and generated 11 comments.

Show a summary per file
File Description
tests/test_time_zone_localizer.py Switches timezone localization tests from SQLAlchemy engine fixtures to IbisBackend fixtures.
tests/test_time_zone_converter.py Switches timezone conversion tests to the ibis backend API.
tests/test_time_series_checker.py Updates timestamp-checker tests to call check_timestamps(backend, table_name, schema).
tests/test_models.py Updates dtype tests to use ibis datatypes (ibis.expr.datatypes).
tests/test_mapper_representative_time_to_datetime.py Refactors representative-time mapper tests to use read_query/write_table.
tests/test_mapper_index_time_to_datetime.py Refactors index-time mapper tests to use ibis backend querying.
tests/test_mapper_datetime_to_datetime.py Updates datetime-to-datetime mapping tests and write-table duplicate-config behavior expectations.
tests/test_mapper_column_representative_to_datetime.py Updates Store fixture + mapping assertions to use ibis querying instead of SQLAlchemy.
tests/test_csv_parser.py Updates parser tests to construct Store(backend=...).
tests/test_checker_representative_time.py Updates checker tests to use ibis backend and new check_timestamps signature.
tests/conftest.py Replaces SQLAlchemy engine fixtures with ibis backend fixtures (iter_backends, make_backend).
src/chronify/utils/sqlalchemy_view.py Removes SQLAlchemy view DDL utilities.
src/chronify/utils/sqlalchemy_table.py Removes SQLAlchemy “CREATE TABLE AS SELECT” utilities and SQLite timestamp workaround.
src/chronify/time_zone_localizer.py Refactors timezone localization to backend-driven querying + mapping.
src/chronify/time_zone_converter.py Refactors timezone conversion to backend-driven querying + mapping.
src/chronify/time_series_mapper.py Updates public mapping entrypoint to accept IbisBackend.
src/chronify/time_series_mapper_representative.py Refactors mapper to fetch distinct time zones via ibis expressions.
src/chronify/time_series_mapper_index_time.py Refactors index-time mapper to fetch distinct time zones via ibis expressions.
src/chronify/time_series_mapper_datetime.py Refactors datetime mapper base wiring to use IbisBackend.
src/chronify/time_series_mapper_column_representative_to_datetime.py Replaces SQLAlchemy-based intermediate table creation with ibis table creation + joins.
src/chronify/time_series_mapper_base.py Reimplements mapping application as ibis join/aggregate + Parquet output support.
src/chronify/time_series_checker.py Refactors timestamp checks to use backend SQL execution and ibis selects.
src/chronify/sqlalchemy/functions.py Removes SQLAlchemy read/write optimization helpers.
src/chronify/schema_manager.py Refactors schema persistence to a backend-managed table and DataFrame inserts.
src/chronify/models.py Replaces SQLAlchemy dtype handling with ibis dtype handling and new conversion helpers.
src/chronify/ibis/base.py Adds IbisBackend ABC and common helpers (raw SQL exec, “transaction” cleanup).
src/chronify/ibis/init.py Exposes backends and make_backend factory.
src/chronify/ibis/duckdb_backend.py Implements DuckDB backend wrapper.
src/chronify/ibis/sqlite_backend.py Implements SQLite backend wrapper (including insert + Parquet handling).
src/chronify/ibis/spark_backend.py Implements Spark backend wrapper and Spark-specific datetime preparation.
src/chronify/ibis/functions.py Adds backend-agnostic read/write utilities (table/query, parquet, datetime conversions).
src/chronify/ibis/types.py Adds DuckDB/ibis type conversion and dataframe schema inference helpers.
src/chronify/hive_functions.py Removes Hive materialized-view workaround.
src/chronify/csv_io.py Updates CSV dtype mapping to use ibis/duckdb conversion.
src/chronify/init.py Removes pyhive/TCLIService module injection.
src/chronify/_vendor/kyuubi/* Removes vendored pyhive/kyuubi code.
pyproject.toml Drops SQLAlchemy/pyhive deps and adds ibis + pyspark extra.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment thread src/chronify/time_series_mapper_base.py Outdated
Comment thread src/chronify/ibis/duckdb_backend.py Outdated
Comment thread src/chronify/schema_manager.py
Comment thread src/chronify/schema_manager.py
Comment thread tests/conftest.py
Comment thread tests/conftest.py
Comment thread src/chronify/time_series_mapper_column_representative_to_datetime.py Outdated
Comment thread src/chronify/ibis/sqlite_backend.py Outdated
Comment thread src/chronify/ibis/duckdb_backend.py Outdated
Comment thread src/chronify/ibis/sqlite_backend.py Outdated
- Fix create_view_from_parquet to return ObjectType so callers drop the
  correct object type (SQLite creates a table, not a view)
- Handle directory paths in DuckDB create_view_from_parquet for
  partitioned parquet datasets
- Add unique index on schemas table name column to prevent duplicates
- Escape single quotes in schema remove_schema to prevent SQL injection
- Add dispose() teardown to test fixtures to prevent resource leaks
- Use explicit column ordering in DuckDB insert to prevent column
  mismatch when DataFrame column order differs from table
- Clean up schema on failed ingestion rollback in all Store methods

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
@daniel-thom daniel-thom requested a review from Copilot April 9, 2026 16:47
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 53 out of 56 changed files in this pull request and generated 5 comments.

Comments suppressed due to low confidence (2)

src/chronify/time_series_mapper_column_representative_to_datetime.py:1

  • This intermediate mapping table previously used replace semantics, but now if_exists=\"fail\" will error if a prior run crashed (or a previous test left the table behind). That can make retries flaky. Consider switching to if_exists=\"replace\" or explicitly dropping the mapping table before creating it so the workflow is robust to partial failures.
    tests/conftest.py:1
  • This fixture still uses the name iter_stores_by_engine even though it now iterates over backends. Renaming it (and any references) to something like iter_stores_by_backend would make the updated abstraction clearer and reduce confusion for future contributors.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment thread src/chronify/ibis/sqlite_backend.py
Comment thread src/chronify/ibis/duckdb_backend.py Outdated
Comment thread src/chronify/ibis/spark_backend.py Outdated
Comment thread src/chronify/models.py Outdated
Comment thread pyproject.toml
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 55 out of 58 changed files in this pull request and generated 6 comments.


💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment thread src/chronify/time_series_checker.py Outdated
Comment thread src/chronify/time_series_mapper_base.py
Comment thread src/chronify/time_series_mapper_column_representative_to_datetime.py Outdated
Comment thread src/chronify/ibis/spark_backend.py Outdated
Comment thread tests/test_spark_backend.py
Comment thread tests/conftest.py
@codecov-commenter
Copy link
Copy Markdown

codecov-commenter commented Apr 10, 2026

Codecov Report

❌ Patch coverage is 85.54965% with 163 lines in your changes missing coverage. Please review.
✅ Project coverage is 89.39%. Comparing base (b5d1cdc) to head (649b801).
⚠️ Report is 1 commits behind head on main.

Files with missing lines Patch % Lines
src/chronify/ibis/base.py 84.07% 36 Missing ⚠️
src/chronify/ibis/sqlite_backend.py 82.41% 32 Missing ⚠️
src/chronify/ibis/spark_backend.py 84.84% 20 Missing ⚠️
src/chronify/ibis/duckdb_backend.py 79.12% 19 Missing ⚠️
src/chronify/schema_manager.py 74.00% 13 Missing ⚠️
src/chronify/time_series_mapper_base.py 86.73% 13 Missing ⚠️
src/chronify/store.py 93.33% 10 Missing ⚠️
...series_mapper_column_representative_to_datetime.py 73.68% 10 Missing ⚠️
src/chronify/time_series_checker.py 89.79% 5 Missing ⚠️
src/chronify/models.py 81.25% 3 Missing ⚠️
... and 1 more
Additional details and impacted files
@@             Coverage Diff             @@
##             main      #66       +/-   ##
===========================================
+ Coverage   52.41%   89.39%   +36.97%     
===========================================
  Files          58       37       -21     
  Lines       12936     3065     -9871     
===========================================
- Hits         6781     2740     -4041     
+ Misses       6155      325     -5830     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

Comment thread src/chronify/ibis/__init__.py Outdated
Comment thread src/chronify/ibis/base.py Outdated
Comment thread src/chronify/ibis/base.py Outdated
Comment thread src/chronify/ibis/duckdb_backend.py Outdated
Comment thread src/chronify/ibis/functions.py Outdated
Comment thread src/chronify/store.py
Comment thread src/chronify/store.py
Comment thread src/chronify/store.py
Comment thread src/chronify/store.py
Comment thread src/chronify/store.py Outdated
daniel-thom and others added 2 commits April 12, 2026 09:31
- Allow injecting an existing ibis connection into DuckDB/SQLite backends,
  mirroring SparkBackend(session=...); external connections are not
  disposed by the backend.
- Remove IbisBackend.reconnect; replace with backend-native backup():
  DuckDB uses disconnect+copy+reconnect internally, SQLite uses the
  sqlite3 online backup API, Spark raises InvalidOperation.
- Validate insert column sets on DuckDB/SQLite/Spark and raise
  InvalidParameter on mismatch instead of silent reindex.
- DuckDB write_parquet uses COPY (FORMAT PARQUET) for the unpartitioned
  path instead of materializing to pandas.
- SparkBackend.dispose always disconnects the ibis connection and stops
  the session only when owned; factor temp-view handling into a helper.
- Collapse duplicated _write_to_{duckdb,sqlite,spark} into a single
  _apply_if_exists helper.
- ObjectType is a StrEnum; make_backend raises InvalidParameter for
  unknown names; schema_manager imports moved to top.
- Restore docstrings on convert/localize time zone methods and
  read_raw_query.
- Relax pyspark pin to >=4.0,<5.
- CI: install package editable and pass --cov=chronify so Codecov can
  map coverage paths back to repo sources.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
daniel-thom and others added 3 commits April 13, 2026 11:00
Reinstates the full Parameters / Raises / Examples docstrings that were
reduced to one-liners during the SQLAlchemy→ibis migration. Drops
references to the removed connection and scratch_dir parameters.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Convert null-consistency and time-array count checks in
TimeSeriesChecker from f-string SQL to ibis filter/group_by/join
expressions, and replace duckdb.sql() in csv_io with rel.project().

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
@daniel-thom daniel-thom requested a review from Copilot April 27, 2026 14:26
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 69 out of 73 changed files in this pull request and generated 3 comments.


💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment thread src/chronify/schema_manager.py
Comment thread src/chronify/schema_manager.py
Comment thread tests/test_column_representative_period.py Outdated
daniel-thom and others added 7 commits April 27, 2026 09:17
ibis-sqlite's create_table calls con.commit() internally, which silently
terminated any outer BEGIN — DDL inside transaction() blocks survived
rollback. Wrap the underlying sqlite3.Connection in a no-commit proxy for
the duration of the transaction so DDL is now genuinely covered by
rollback.

Drop the unused `created` cleanup list from transaction(); no caller
populated it. transaction() is now a thin DB BEGIN/COMMIT/ROLLBACK
wrapper, with semantics documented per backend.

Wrap apply_mapping and _intermediate_mapping_ymdp_to_ymdh in
transaction() so DuckDB and SQLite get atomic rollback of the multi-step
DDL for free. Spark falls back to manual cleanup. The output_file
parquet (not transactional) is unlinked on failure.

Drop the redundant DuckDB and Spark write_parquet overrides — the base
class already delegates to ibis's to_parquet, which uses server-side
COPY/native partitioning.

Switch the quick_start doc example from pandas-style ibis indexing to
filter()/select(), matching the codebase idiom.

Add backend-parametric regression tests for transactional DDL on both
DuckDB and SQLite.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Reviewer-flagged: ``apply_mapping`` was deleting the existing
``output_path`` before renaming the staged output into place, which
opened a window where a failure or process crash left the user with
neither the old output nor the new one — contradicting the docstring's
"prior on-disk state is restored" claim.

Replace the delete-then-rename sequence with a backup-rename-replace
pattern: rename the existing target aside to a sibling
``.<name>.backup.<uid>`` path, rename staging into the target, then
delete the backup. If the second rename fails, restore the backup so
the original is preserved. Each ``Path.replace`` call is atomic, and
the user's data is never simultaneously absent from both target and
backup.

Add ``test_map_table_preserves_existing_parquet_on_promotion_failure``,
which monkey-patches ``Path.replace`` to fail on the staging→target
call and asserts the pre-existing parquet is preserved with no debris
left over.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Reviewer-flagged: ``delete_if_exists(backup_path)`` ran inside the same
try block as the mapping work, so any failure cleaning up the backup
caused ``apply_mapping`` to re-raise. ``Store.map_table_time_config``
then skipped its ``add_schema`` call and the user was left with the new
parquet on disk but no schema registration — the store believed the
mapping had failed.

Wrap the backup cleanup in try/except + warning. The promotion already
succeeded by this point and the new output is observable; a leftover
backup is cosmetic debris that the user can remove manually.

Add ``test_map_table_succeeds_when_backup_cleanup_fails`` which
monkey-patches ``delete_if_exists`` to fail for backup paths only and
asserts the mapping completes, the new content is in place, and the
schema is registered — matching the reviewer's exact reproducer.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Three correctness improvements to transaction handling, motivated by a
review of the branch:

- Cleanup paths (Store.ingest_*, apply_mapping,
  _intermediate_mapping_ymdp_to_ymdh) no longer mask the original error.
  A connectivity failure on Spark inside drop_table during rollback
  previously replaced the user-visible InvalidTable, leaving callers
  debugging the wrong thing. Each cleanup step is now individually
  guarded; failures are logged via logger.exception and swallowed so the
  bare raise propagates the cause. Adds
  test_ingest_cleanup_failure_does_not_mask_original_error as a
  regression test.

- DuckDBBackend and SparkBackend now initialize self._in_transaction in
  __init__ to match SQLiteBackend. The class-level default on
  IbisBackend is dropped in favor of a docstring contract that
  subclasses must initialize the instance attribute, eliminating the
  attribute-shadowing trick.

- Docstrings on ingest_tables, ingest_from_csvs, and
  ingest_pivoted_tables now spell out that Spark cannot roll back
  partial appends to a pre-existing table; only the new-table case is
  fully cleaned up on failure.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Removing the class-level default in 6af444c left mypy unable to infer
the type of ``self._in_transaction`` when read from the base class's
``transaction()`` method. Add a bare ``_in_transaction: bool`` annotation
on ``IbisBackend`` to declare the attribute's type without giving it a
value (subclasses still must initialize it in ``__init__``), and drop
the now-stale ``# type: ignore[assignment]`` on the SQLite proxy swap.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
ibis-pyspark's ``to_parquet`` forwards ``partition_by`` as a kwarg to
``df.write.format('parquet').save(path, **kwargs)``. PySpark's writer
expects camel-cased ``partitionBy``; the snake_case kwarg is silently
dropped as an unknown option, so partitioned writes fell out as a
single unpartitioned directory. (DuckDB's native ``to_parquet`` accepts
``partition_by`` directly, so this only broke on Spark.)

Override ``write_parquet`` on ``SparkBackend`` to call
``df.write.partitionBy(*cols).parquet(path)`` directly when partition
columns are given, falling back to the ibis path otherwise.

Fixes ``test_spark_write_parquet_partitioned``.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
@daniel-thom daniel-thom requested a review from Copilot April 27, 2026 19:56
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 69 out of 73 changed files in this pull request and generated 3 comments.


💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment thread pyproject.toml
Comment thread src/chronify/time_zone_localizer.py Outdated
Comment thread src/chronify/ibis/sqlite_backend.py
daniel-thom and others added 2 commits April 27, 2026 14:03
pyarrow 24 added a ``py.typed`` marker, so mypy now uses the bundled
type info instead of treating pyarrow as ``Any``. The compute kernels
(``assume_timezone``, etc.) are registered dynamically at import time
and aren't declared in the stubs, so mypy reports them as missing
attributes even though they exist at runtime. CI installs pyarrow 24
fresh and trips this; local mypy was clean only because the existing
environment still had pyarrow 23.

Add a targeted ``# type: ignore[attr-defined]`` on the single call site
with a comment explaining the cause.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
- TimeZoneLocalizerByColumn._get_time_zones() previously rejected only
  the *mixed* case of the literal 'None' string (the get_tzname(None)
  sentinel) plus real time zones. The all-'None' case slipped through
  and crashed with ZoneInfoNotFoundError on ZoneInfo('None'). Tighten
  the check to reject 'None' whenever it appears, with a message
  pointing the user to localize_time_zone(None) for tz-naive rows. Add
  test_localize_time_zone_by_column_rejects_none_sentinel as a
  regression test.

- SQLiteBackend.insert() now wraps con.cursor() in
  contextlib.closing so the cursor is deterministically closed even on
  executemany failure or non-CPython runtimes. (Cursor close is
  independent of transaction state — commit still happens on the
  connection afterwards via _commit_if_needed.)

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 69 out of 73 changed files in this pull request and generated no new comments.


💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment thread src/chronify/time_zone_localizer.py Outdated
Comment thread src/chronify/time_zone_localizer.py Outdated
Comment thread tests/test_annual_time_range_generator.py Outdated
daniel-thom and others added 2 commits May 7, 2026 13:03
- Restore Numpy-style Parameters/Returns docstrings on
  localize_time_zone() and localize_time_zone_by_column() (per
  Daniel's PR comments at time_zone_localizer.py:37 and :53), with
  scratch_dir / connection references stripped — those parameters
  were removed during the SQLAlchemy → ibis migration.
- Move 'import pandas as pd' from inside
  test_list_distinct_timestamps_from_dataframe_not_implemented to the
  top of test_annual_time_range_generator.py (Daniel's comment at :33).
- Drop unused 'DatetimeRanges' alias in ibis/base.py — it shadowed the
  one exported from time_configs and was never referenced.
- Drop a stale comment in spark_backend.delete_rows; the parameterized
  SQL behavior is now self-evident from the surrounding code.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
The previous run failed in CI because codecov-action v4.2.0 crashed
during its uploader binary fetch (`Could not pull latest version
information: SyntaxError: Unexpected token '<'` followed by a
`write EPIPE` from the gpg verification subprocess). That failure
mode is not caught by `fail_ci_if_error: false` on v4, and was
fixed in the v5 series (which uses the Codecov CLI directly).

Also pin `files: coverage.xml` to skip the auto-discovery walk —
we already produce that exact path with
`pytest --cov-report=xml:coverage.xml`.

Sticking on v5 (not v6) so the action keeps running on Node.js 20;
the runners don't default to Node 24 until 2026-06-02.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@daniel-thom daniel-thom requested a review from Copilot May 7, 2026 21:44
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 69 out of 73 changed files in this pull request and generated no new comments.

Comments suppressed due to low confidence (5)

src/chronify/schema_manager.py:1

  • LOCATION_ALREADY_EXISTS can occur when Spark’s managed-table directory exists but the table does not. In that case drop_table(self.SCHEMAS_TABLE) may itself fail (table not found) and won’t remove the stale location, re-raising and preventing initialization. Consider making this retry path robust by (a) dropping with an if_exists/force behavior (if supported), and/or (b) adding a backend-level helper to remove the managed table location before retrying (SparkBackend already has _remove_managed_table_location, but SchemaManager can’t rely on a private method).
    src/chronify/schema_manager.py:1
  • This is a check-then-insert sequence without an atomic constraint, so concurrent callers can still insert duplicate schema rows. A concrete fix is to enforce uniqueness at the database level where possible (DuckDB/SQLite: create a UNIQUE index/constraint on name), while keeping the application-level check for Spark; alternatively, wrap the existence check + insert in a backend transaction and re-check / handle an insert error deterministically.
    src/chronify/time_zone_converter.py:1
  • The docstring for this public API was reduced to a short description and no longer documents parameters/returns (it previously did). Please restore a Parameters/Returns section updated for the new backend signature (e.g., backend, removal of engine/metadata/scratch_dir, and how output_file is used) so API consumers have a complete reference.
    src/chronify/schema_manager.py:1
  • The new corruption-detection paths (row name mismatch, duplicate schema rows) and the new duplicate-registration behavior in add_schema don’t appear to be covered by targeted tests in the provided diffs. Adding unit tests that (a) manually insert bad rows into the schemas table and assert InvalidOperation, and (b) assert that registering the same name twice raises InvalidParameter, would help prevent regressions in this critical metadata path.
    tests/test_time_zone_converter.py:1
  • These tests build SQL strings with interpolated identifiers (table_name) and then rely on backend.sql(...). Since the PR’s goal is multi-backend portability, prefer Ibis table expressions (e.g., backend.table(table_name)) for simple reads. This avoids backend-specific SQL parsing/quoting differences and removes identifier-injection hazards even in test code.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants

Morty Proxy This is a proxified and sanitized view of the page, visit original site.