Skip to content

Navigation Menu

Sign in
Appearance settings

Search code, repositories, users, issues, pull requests...

Provide feedback

We read every piece of feedback, and take your input very seriously.

Saved searches

Use saved searches to filter your results more quickly

Appearance settings

bio-codes/iscc-bio

Open more actions menu

Repository files navigation

iscc-bio - ISCC Processing for Bioimage Data

Version License

ISCC Processing for Multi-Dimensional Bioimage Data

Generate ISO 24138:2024 International Standard Content Codes (ISCC) for bioimage data across multiple formats using deterministic IMAGEWALK plane traversal.

Project Status

Version 0.1.0 - Unreleased.

Warning

This package is a proof of concept under active development, and breaking changes may be released at any time.

Overview

iscc-bio bridges bioimage formats with ISCC-CODE processing by implementing the IMAGEWALK specification - a deterministic algorithm for traversing and canonicalizing pixel data from multi-dimensional bioimaging data. This produces consistent, reproducible content identifiers regardless of source format or storage platform.

Documentation: https://bio.iscc.codes

Key Features

  • Format-Agnostic Hashing: Generate reproducible ISCCs at the level of pixel data across OME-TIFF, OME-Zarr, OMERO, CZI, ND2, LIF, and other formats
  • IMAGEWALK Implementation: Deterministic Z→C→T plane traversal with canonical byte representation
  • Multi-Source Support: Process local files (via BioIO), OME-Zarr archives, and OMERO remote servers
  • Memory Efficient: Lazy loading with Dask for processing large multi-dimensional images
  • Multi-Scene Processing: Handle complex multi-scene/multi-series bioimage files
  • Command-Line Tools: CLI commands for code generation, pixel hashing, and view extraction

Installation

Basic Installation

# Using uv (recommended)
uv tool install iscc-bio

# Using pip
pip install iscc-bio

Installation with Format Support

# Install with all bioimage reader plugins
uv tool install "iscc-bio[readers]"

# Install with specific format support
uv tool install "iscc-bio[czi,nd2,lif]"

# Install with OMERO support
uv tool install "iscc-bio[omero]"

# Install everything
uv tool install "iscc-bio[all]"

Available Optional Dependencies

  • readers: All BioIO reader plugins (BioFormats, CZI, OME-TIFF, OME-Zarr, ND2, LIF, etc.)
  • omero: OMERO Blitz gateway for remote server access
  • bioformats: BioFormats reader for broad format support
  • czi, nd2, lif, ome-tiff, ome-zarr-plugin, dv, tifffile: Individual format readers

Quick Start

Eperimantal CLI scripts

Generate Bioimage Fingerprint

# Generate ISCC-based bioimage fingerprint
iscc-bio biocode myimage.czi

# Output includes:
# - ISCC-SUM hash over normalized pixel content
# - Representative view extraction (~5 views per scene)
# - ISCC-IMAGE codes for each view
# - ISCC-MIXED global descriptor

Pixel Hash (IMAGEWALK)

# Generate reproducible pixel hash using IMAGEWALK
iscc-bio pixhash myimage.ome.tiff

# Works with multiple sources:
iscc-bio pixhash local/file.czi           # Local bioimage file
iscc-bio pixhash data.zarr                # OME-Zarr/NGFF
iscc-bio pixhash --host omero.server.com --iid 123  # OMERO server

Extract Representative Views

# Extract intelligent 2D views for perceptual hashing
iscc-bio views myimage.nd2 --output-dir ./views/

# Extraction strategies:
# - Maximum intensity projections (MIP)
# - Best focus planes
# - Representative sampling
# - Multi-channel composites

IMAGEWALK Specification

IMAGEWALK is a deterministic algorithm for traversing multi-dimensional bioimage data to produce format-agnostic, reproducible hash digests.

Core Principles

  1. Z→C→T Traversal Order: Planes are processed in deterministic order:

    • Outermost loop: Z dimension (depth/focal plane)
    • Middle loop: C dimension (channel)
    • Innermost loop: T dimension (time)
  2. Canonical Byte Representation: Each 2D plane is:

    • Flattened in row-major order (Y then X)
    • Encoded as big-endian bytes
    • Fed to a hash processor
  3. Multi-Scene Independence: Each scene/series is processed separately, producing one hash per scene

Example Traversal

For an image with Z=2, C=3, T=2 dimensions (12 total planes):

Plane 1:  z=0, c=0, t=0    Plane 7:  z=1, c=0, t=0
Plane 2:  z=0, c=0, t=1    Plane 8:  z=1, c=0, t=1
Plane 3:  z=0, c=1, t=0    Plane 9:  z=1, c=1, t=0
Plane 4:  z=0, c=1, t=1    Plane 10: z=1, c=1, t=1
Plane 5:  z=0, c=2, t=0    Plane 11: z=1, c=2, t=0
Plane 6:  z=0, c=2, t=1    Plane 12: z=1, c=2, t=1

Implementation Modules

  • iw_bioio.py: BioIO-based implementation for local files
  • iw_ngff.py: OME-NGFF/Zarr implementation using ome-zarr-py
  • iw_blitz.py: OMERO Blitz implementation for remote servers

All implementations produce identical hashes for identical pixel data, conforming to the IMAGEWALK specification.

Command-Line Interface

biocode - Generate Bioimage Fingerprint

Create comprehensive bioimage fingerprints with ISCC codes:

iscc-bio biocode INPUT [OPTIONS]

Options:
  -o, --output-dir PATH    Save extracted view PNGs
  -n, --max-views INTEGER  Maximum views per scene (default: 5)

pixhash - Normalized Pixel Hash

Generate reproducible SHA1 hashes over normalized pixel data:

iscc-bio pixhash INPUT [OPTIONS]

Options:
  -s, --source [auto|bioio|omero|zarr]  Data source type
  --host TEXT                           OMERO server hostname
  --iid INTEGER                         OMERO image ID

views - Extract Representative Views

Extract intelligent 2D views for perceptual hashing:

iscc-bio views INPUT [OPTIONS]

Options:
  -s, --strategies TEXT    View strategies (mip, best_focus, representative, composite)
  -n, --max-views INTEGER  Maximum views to extract (default: 8)
  -o, --output-dir PATH    Directory to save thumbnails
  --host TEXT              OMERO server hostname
  --iid INTEGER            OMERO image ID

scenes - Extract Scene Thumbnails

Extract thumbnails from all scenes in a multi-scene file:

iscc-bio scenes INPUT

thumb - Extract Thumbnail

Extract a single representative thumbnail from a bioimage:

iscc-bio thumb INPUT

Python API

IMAGEWALK Plane Iteration

from iscc_bio.imagewalk.iw_bioio import iter_planes_bioio
from iscc_bio.imagewalk.iw_ngff import iter_planes_ngff
from iscc_bio.imagewalk.iw_blitz import iter_planes_blitz

# Iterate over planes using BioIO
for plane in iter_planes_bioio("image.czi"):
    print(f"Scene {plane.scene_idx}, Z={plane.z_depth}, "
          f"C={plane.c_channel}, T={plane.t_time}")
    print(f"Shape: {plane.xy_array.shape}, dtype: {plane.xy_array.dtype}")

# Iterate over OME-Zarr planes
for plane in iter_planes_ngff("data.zarr"):
    # Process plane.xy_array (2D numpy array)
    pass

# Iterate over OMERO planes
from omero.gateway import BlitzGateway
conn = BlitzGateway("user", "pass", host="omero.server.com")
conn.connect()
image = conn.getObject("Image", 123)

for plane in iter_planes_blitz(image):
    # Process plane.xy_array
    pass
conn.close()

Generate Biocode

from iscc_bio.biocode import generate_biocode, format_output

# Generate bioimage fingerprints
fingerprints = generate_biocode("image.nd2", max_views=5)

# Format output
output = format_output(fingerprints, "image.nd2")
print(output)

Pixel Hashing

from iscc_bio.pixhash import pixhash_bioio, pixhash_zarr, pixhash_omero

# Generate pixel hash (returns list of hashes, one per scene)
hashes = pixhash_bioio("image.lif")
print(hashes[0])  # Hash for first scene

# OME-Zarr
hashes = pixhash_zarr("data.zarr")

# OMERO
hashes = pixhash_omero("omero.server.com", image_id=123)

Supported Formats

Via BioIO plugin ecosystem:

  • OME-TIFF/TIFF: Multi-page TIFF with OME-XML metadata
  • OME-Zarr/NGFF: Next-generation file format
  • OMERO: Remote server access via Blitz gateway
  • CZI: Carl Zeiss Image format
  • ND2: Nikon NIS-Elements format
  • LIF: Leica Image File format
  • DV: DeltaVision format
  • BioFormats: 150+ formats via Bio-Formats Java library

Development

Setup Development Environment

# Clone repository
git clone https://github.com/bio-codes/iscc-bio.git
cd iscc-bio

# Install with all dependencies
uv sync --extra all

# Run CLI during development
uv run iscc-bio --help

Development Commands

This project uses poethepoet for task automation:

# Format markdown files
uv run poe format-md

# Format code files
uv run poe format-code

# Build documentation
uv run poe docs-build

# Run all formatting and docs
uv run poe all

Architecture

Core Modules

  • iscc_bio.imagewalk: IMAGEWALK plane traversal implementations

    • iw_bioio.py: BioIO implementation
    • iw_ngff.py: OME-Zarr/NGFF implementation
    • iw_blitz.py: OMERO Blitz implementation
    • models.py: Plane data model
  • iscc_bio.biocode: ISCC bioimage fingerprint generation

  • iscc_bio.pixhash: Normalized pixel hashing across sources

  • iscc_bio.views: Intelligent view extraction strategies

  • iscc_bio.cli: Command-line interface

Design Principles

  1. Lazy Loading: Uses Dask arrays for memory-efficient processing of large images
  2. Format Agnostic: Identical processing logic across all formats via IMAGEWALK
  3. Deterministic: Reproducible hashes across platforms and formats
  4. Modular: Clean separation between traversal, canonicalization, and hashing

Funding

This work was supported through the Open Science Clusters’ Action for Research and Society (OSCARS) European project under grant agreement Nº101129751.

See: BIO-CODES project (Enhancing AI-Readiness of Bioimaging Data with Content-Based Identifiers).

License

Apache License 2.0 - See LICENSE file for details.

Citation

If you use iscc-bio in your research, please cite:

@software{iscc_bio,
  title        = {bio-codes/iscc-bio: ISCC Processing for Bioimage Data},
  author       = {Pan, Titusz},
  year         = 2025,
  url          = {https://github.com/bio-codes/iscc-bio},
  note         = {Supported by OSCARS (Open Science Clusters' Action for Research and Society) under European Commission grant agreement Nº101129751},
  version      = {0.1.0}
}

Related Projects

  • iscc-sum - Fast ISCC Data-Code and Instance-Code hashing
  • iscc-core - ISCC Core Algorithms
  • BioIO - Bioimage reading library
  • OME-Zarr - Next-generation file format implementation

About

ISCC Processing for Multi-Dimensional Bioimage Data

Topics

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages

Morty Proxy This is a proxified and sanitized view of the page, visit original site.