Releases: openproblems-bio/openproblems
Releases · openproblems-bio/openproblems
v2.0.0
A major update to the OpenProblems framework, switching from a Python-based framework to a Viash + Nextflow-based framework. This update features the same concepts as the previous version, but with a new implementation that is more flexible, scalable, and maintainable.
Most relevant parts of the overall structure:
-
src/tasks: Benchmarking tasks:batch_integration: Batch integrationdenoising: Denoisingdimensionality_reduction: Dimensionality reductionmatch_modalities: Match modalitiespredict_modality: Predict modalityspatial_decomposition: Spatial decompositionspatially_variable_genes: Spatially variable genes
-
src/datasets: Components for creating common datasets. Loaders:cellxgene_census: Query cells from a CellxGene Censusopenproblems_neurips2021_bmmc: Fetch a dataset from the OpenProblems NeurIPS2021 competitionopenproblems_neurips2022_pbmc: Fetch a dataset from the OpenProblems NeurIPS2022 competitionopenproblems_v1: Fetch a legacy OpenProblems v1 datasetopenproblems_v1_multimodal: Fetch a legacy OpenProblems v1 multimodal datasettenx_vision: Fetch a and convert 10x Visium datasetzenodo_spatial: Fetch and process an Anndata file containing DBiT seq, MERFISH, seqFISH, Slide-seq v2, STARmap, and Stereo-seq data from Zenodo.zenodo_spatial_slidetags: Download a compressed file containing gene expression matrix and spatial locations from zenodo.
-
src/common: Common components used by all tasks.check_dataset_schema: Check whether an h5ad dataset adheres to a dataset schemacheck_yaml_schema: Check whether a YAML adheres to a JSON schemacomp_tests: Reusable component unit testscreate_component: Create a component Viash component.create_task_readme: Create a README for an OpenProblems task.extract_metadata: Extract the.unsmetadata from an h5ad file.helper_functions: Commonly used helper functions in Python or in R,process_task_results: Process the raw tasks results (containing raw logs, unprocessed component configs, and various metrics) into nicely formatted task results.schemas: JSON schemas for YAML files in the repositorysync_test_resources: Synchronise the test resources from s3 to resources_test
For more information related to the structure of this repository, see the documentation.
v1.0.0
Note: This changelog was automatically generated from the git log.
New functionality
- Added
cell2locationto thespatial_decompositiontask. - Added nearest-neighbor ranking matrix computation to
_utils. - Datasets now store nearest-neighbor ranking matrix in
adata.obsm["X_ranking"]. - Added support for parsing Nextflow output and generating benchmark results for the website.
- Added
max_samplesparameter toqlocal,qglobal,qnn_auc,lcmc,qnn, andcontinuitymetrics to allow for subsampling of data for faster computation. - Added new scArches based methods:
scarches_scanvi_xgb_all_genesandscarches_scanvi_xgb_hvg. - Added
prediction_methodparameter to_scanvi_scarchesto specify prediction method. - Added
_pred_xgbfunction to perform XGBoost prediction based on latent representations. - Added
obsmparameter to_xgboostfunction to allow specifying the embedding space for XGBoost training.
Major changes
- Updated
scvi-toolsto version0.20in both Python and R environments. - Updated datasets to include nearest-neighbor ranking matrix.
- Modified dimensionality reduction task to include nearest-neighbor ranking matrix computation in dataset generation.
- The website update workflow was refactored to use a new workflow using json instead of markdown.
- Updated the website generation process to remove duplicate BibTex entries.
- Added a new
parse_metadata.pyscript for generating metadata for the website. - Added a new function to
openproblems.utils.pyto get the member ID of a task, dataset, method or metric. - Removed the redundant computation and storage of the nearest-neighbor ranking matrix in datasets.
Minor changes
- Updated method names to be shorter and more consistent across tasks.
- Improved method summaries for clarity.
- Updated JAX and JAXlib versions to 0.4.6.
- Updated dependencies to support new versions of Snakemake and GitPython.
- Removed code related to "nbt2022-reproducibility" repo and merged it into the main website.
- Updated the schema for benchmark results to include submission time, code version, and resource usage metrics.
- Improved error handling and added logging to the parsing script.
- Removed the "raw.json" file from the results directory and merged all data into a single "results.json" file.
- Updated the workflow to upload the final results to the website's results directory instead of the data directory.
- Removed unnecessary code and refactored the parsing script for better readability.
- Added unit tests for the new parsing script.
- Updated the
run_testsworkflow to skip testing on thetest_websitebranch. - Updated the
run_testsworkflow to skip testing on thetest_processbranch. - Updated the
create-pull-requeststep to set the author for the pull request. - Updated the
run_testsworkflow to skip testing on pull request reviews. - Updated the
update_website_contentworkflow to update the website on themainbranch. - Updated the
main.bibfile to fix a typo. - Removed extraneous headings from task README files.
- Updated
generate_test_matrix.pyto use the newopenproblems.utils.get_member_idfunction. - Updated the website generation process to copy BibTex files to the correct location.
- Updated the
process_requiressection insetup.pyto includegitpython. - Updated git commit hash generation for openproblems functions.
- Modified
_xgboostto allow for specifyingtree_method. - Modified
_scanvi_scarchesto consistently useunlabeled_category. - Modified
_scanvi_scarchesto remove unnecessary copying oflabels. - Removed
_scanvi_scarchesfunctions that were redundant with_scanvi_scarches. - Removed unused
_scanvifunctions. - Modified
_scanvi_scarchesto allow for specifyingprediction_methodand handleunlabeled_categoryconsistently.
Documentation
- Improved the documentation of the
auprcmetric. - Improved the documentation of the
cell2locationmethods. - Document sub-stub task behaviour
Bug fixes
- Fixed an error in
neuralee_defaultwhere thesubsample_genesargument could be too small. - Fixed an error in
knn_naivewhere theis_baselineargument was set toFalse. - Fixed calculation of ranking matrix in
_utilsto include ties. - Fixed a bug in
load_tenx_5k_pbmc()where a warning about non-unique variable names was being raised. - Removed the unused
_utils.pyfile. - Removed the
X_rankingentry from theobsmattribute of datasets. - The
_fit()function innn_ranking.pynow subsamples the data ifmax_samplesis specified. - The
nn_rankingmetrics now use subsampling in the_fit()function to improve performance. - Fixed the git hash generation for openproblems functions
- Fixed a warning about
pkg_resourcesbeing deprecated - Removed unnecessary
fetch-depth: 1from workflow - Fixed potential issue in
_scanvi_scarcheswherelabels_predcould be overwritten - Fixed potential issue in
_pred_xgbwherenum_roundwasn't being used correctly - Fixed an issue where baseline methods were not being filtered correctly from the benchmark results.
- Fixed an issue where metrics with all NaN values were not being removed from the benchmark results.
- Fixed an issue where some metrics were not being parsed correctly from the Nextflow output.
- Fixed an issue where the "mean_score" field was not being calculated correctly for each method.
- Fixed an issue where the "code_version" field was not being populated correctly for each method.
- Fixed an issue where the "submission_time" field was not being populated correctly for each method.
- Fixed an issue where the resource usage metrics were not being parsed correctly from the Nextflow output.
- Updated the
run_testsworkflow to skip testing on thetest_websitebranch. - Updated the
run_testsworkflow to skip testing on thetest_processbranch. - Updated the
create-pull-requeststep to set the author for the pull request. - Updated the
run_testsworkflow to skip testing on pull request reviews. - Updated the `update_website_
Full Changelog: v0.8.0...v1.0.0
v0.8.0
What's Changed
- Fix DR baselines by @scottgigante-immunai in #816
- set adata.uns['is_baseline'] by @scottgigante-immunai in #820
- Copy anndata in metric decorator by @scottgigante-immunai in #819
- Don't recompute X_emb and neighborhood graph for baseline datasets by @danielStrobl in #823
- Changes in destVI code (#826) by @scottgigante-immunai in #827
- Set explicit token permissions by @scottgigante-immunai in #828
- Warnings fix by @scottgigante-immunai in #831
- Harmonize batch integration dataset APIs by @scottgigante-immunai in #834
- new common baselines and cross import by @danielStrobl in #825
- jitter baseline patch by @danielStrobl in #838
- Add reversed norm order for ALRA in Denoising Task by @wes-lewis in #835
Full Changelog: v0.7.4...v0.8.0
v0.7.0
What's Changed
- Fix docker image builds by @scottgigante-immunai in #758
- [Dimensionality reduction] Fix normalization in baselines by @scottgigante-immunai in #760
- downgrade gtfparse and polars by @scottgigante-immunai in #766
- Fix output headers order by @scottgigante-immunai in #769
- Convert references to bib by @scottgigante-immunai in #720
- fix typo in bibliography path by @scottgigante-immunai in #774
- More bibliography typos by @scottgigante in #775
- Pre-normalize dimensionality reduction datasets by @scottgigante-immunai in #768
- Add pymde to dimensionality reduction by @scottgigante-immunai in #767
- Fix flaky R installations in docker build by @scottgigante-immunai in #783
- save initial layer in X for adata_pre by @danielStrobl in #784
- Filter datasets by celltype by @scottgigante-immunai in #770
- Pass raw counts to neuralee by @scottgigante-immunai in #779
- Label projection describe datasets by @mxposed in #776
- Add missing DR references by @rcannood in #782
- Bugfix/lowercase GitHub repo owner by @scottgigante-immunai in #794
- Upgrade isort by @scottgigante-immunai in #795
- Update styler to 1.9.0 by @github-actions in #787
- [auto] Update docker version by @github-actions in #798
- Update bslib to 0.4.2 by @github-actions in #759
- add missing logfc decorator by @dbdimitrov in #796
- Add ALRA preprocessing identical to literature by @wes-lewis in #763
- run CI on PRs only with approving review by @scottgigante-immunai in #804
- add new workflow to add status by @scottgigante-immunai in #805
- Update bioc/scran to 1.26.2 by @github-actions in #799
- Specify PR number by @scottgigante-immunai in #808
- add magic with reverse norm order by @scottgigante-immunai in #797
- Bump pymde from 0.1.15 to 0.1.18 in /docker/openproblems-python-pytorch by @dependabot in #801
- Update scvi-tools requirement from ~=0.16 to ~=0.19 in /docker/openproblems-r-pytorch by @dependabot in #731
- Use graph and embedding metrics for feature and embedding subtask by @danielStrobl in #807
- Fix typo in dimensionality reduction dataset names by @lazappi in #802
- add new dataloaders by @danielStrobl in #792
- rmse -> distance correlation by @scottgigante-immunai in #811
- CPM -> CP10k by @scottgigante-immunai in #812
- change multimodal data integration task name to matching modalities by @LuckyMD in #778
- updated scib version by @danielStrobl in #793
- Daniel strobl hvg conservation fix by @danielStrobl in #785
Full Changelog: v0.6.1...v0.7.0