This project provides a framework for evaluating computer vision models, focusing on the Segment Anything Model 2 (SAM2), released by Meta AI (FAIR). SAM2 enhances the original SAM by unifying promptable segmentation for both images and videos within a single model, often using architectures like Hiera and a memory bank for efficient video context propagation.
Architecturally, SAM2 builds upon its predecessor, employing powerful image encoders (like the Hiera architecture used in some variants) and a promptable mask decoder. The key innovation enabling efficient video processing is the introduction of a memory bank. This module allows the model to maintain and propagate context (such as object identities and locations) across consecutive video frames, enabling real-time, consistent segmentation without recomputing from scratch for every frame. This makes SAM2 significantly more efficient than previous models, particularly for video-based tasks.
This project/repo aims to facilitate robustness testing of SAM2. It focuses on systematically evaluating model performance on custom datasets, particularly under challenging conditions or image degradations, using configurable evaluation pipelines and metrics.
- Configurable Evaluation: Run evaluations defined by simple JSON configuration files.
- SAM2 Automatic Mask Generation: Evaluates masks generated by
SamAutomaticMaskGeneratoragainst ground truth masks. - Hierarchical Data Support: Handles datasets where multiple versions (e.g., degraded images) exist for each base image.
- Metric Calculation: Calculates Intersection over Union (IoU) and Boundary F1 Score (BF1) to compare predicted masks against ground truth.
- Extensible: Designed to potentially incorporate other models or evaluation pipelines in the future.
The initial evaluation dataset used with this pipeline consists of images randomly sampled from the popular COCO (Common Objects in Context) dataset. A key filtering criterion was applied: only images containing a single ground truth object mask were selected.
This simplification was made to streamline the mask matching and comparison process between the model's generated masks (e.g., from SAM2's SamAutomaticMaskGenerator) and the single ground truth object. Future work might involve datasets with multiple objects per image.
The specific images and their corresponding ground truth masks (encoded in COCO RLE format) are referenced within the input JSON file provided to the evaluation pipeline (see config/sam2_eval_config.json for an example structure). This file acts as the central map linking image identifiers/paths to the necessary ground truth data for evaluation.
Before running the pipeline, ensure you have the following prerequisites installed and set up:
-
Python: Python 3.8 or higher is recommended. You can check your version with
python --version. -
Git: Required for cloning the necessary repositories. Check with
git --version. -
Virtual Environment (Strongly Recommended): Using a virtual environment is crucial to avoid dependency conflicts. Follow these steps before installing other dependencies:
-
Create the environment: Navigate to the project root directory (
SAM2_analysis) in your terminal and run:python -m venv venv
(This creates a
venvdirectory within your project). -
Activate the environment:
-
macOS/Linux (bash/zsh):
source venv/bin/activate -
Windows (Command Prompt):
venv\Scripts\activate.bat
-
Windows (PowerShell):
venv\Scripts\Activate.ps1
(Your terminal prompt should change, often showing
(venv)at the beginning, indicating the environment is active). -
-
Important: Perform all subsequent installation steps (SAM2 library, requirements.txt) and script executions (
python main.py ...) only when the virtual environment is active. -
Deactivate: When finished working on the project, you can deactivate the environment by simply running:
deactivate
-
-
PyTorch: Install PyTorch according to your system (CPU/GPU, CUDA version). Follow the instructions on the official PyTorch website. A GPU is strongly recommended for reasonable performance. Install this after activating your virtual environment.
-
SAM2 Library (Manual Installation Required): With your virtual environment active, install the official
sam2library manually from its repository:# Ensure you are in the SAM2_analysis project root and venv is active mkdir -p external cd external # Now in the external directory within the project # Clone the repository git clone https://github.com/facebookresearch/sam2.git # Navigate into the directory cd sam2 # Install in editable mode pip install -e . # Go back to the project root cd ../.. # Now back in the SAM2_analysis project root
-
Other Python Dependencies: With your virtual environment active, install the remaining required packages using the
requirements.txtfile:pip install -r requirements.txt
-
Hugging Face Authentication (Optional but Recommended): If you use private models or encounter download rate limits, you may need to authenticate with Hugging Face Hub within your activated environment:
- Option 1 (CLI Login): Run
huggingface-cli loginand follow the prompts. - Option 2 (Environment Variable): Set the
HUGGING_FACE_HUB_TOKENenvironment variable with your token.
- Option 1 (CLI Login): Run
You can run the evaluation pipeline either locally on your machine or using the provided Google Colab notebook.
Follow these steps precisely to set up the project locally.
-
Prerequisites:
- Git: Ensure Git is installed (
git --version). - Python: Python 3.8+ recommended (
python --versionorpython3 --version). - PyTorch: A recent version compatible with your system (CPU/GPU) is needed. A GPU is highly recommended. Install it following instructions at pytorch.org. Wait until Step 4 to install it.
- Git: Ensure Git is installed (
-
Clone This Repository:
git clone <YOUR_REPO_URL> SAM2_analysis # Replace <YOUR_REPO_URL> cd SAM2_analysis
-
Create and Activate Virtual Environment: (Crucial for managing dependencies)
# Create environment (in the SAM2_analysis directory) python -m venv venv # Activate environment # macOS/Linux: source venv/bin/activate # Windows (Git Bash/WSL): # source venv/Scripts/activate # Windows (Command Prompt): # venv\Scripts\activate.bat # Windows (PowerShell): # venv\Scripts\Activate.ps1
(Your terminal prompt should now show
(venv)). Keep this environment active for all subsequent steps. -
Install PyTorch: With the virtual environment active, install PyTorch matching your system (CUDA version, etc.) from pytorch.org. For example:
# Example (check official site for the command specific to your setup): pip install torch torchvision torchaudio -
Install SAM2 Library: SAM2 needs to be installed manually. Create an
externaldirectory if it doesn't exist.# Make sure you are in the SAM2_analysis directory and venv is active mkdir -p external cd external # Now in the external directory within the project # Clone the repository git clone https://github.com/facebookresearch/sam2.git # Navigate into the directory cd sam2 # Install in editable mode pip install -e . # Go back to the project root cd ../.. # Now back in the SAM2_analysis project root
-
Install Project Dependencies:
# Make sure venv is active and you are in the SAM2_analysis root pip install -r requirements.txt -
Hugging Face Authentication (Optional): If using private models or hitting download limits:
huggingface-cli login
(Follow prompts to enter your token).
Alternatively, you can run the evaluation directly in Google Colab using the provided notebook:
This notebook handles cloning the repository, installing dependencies (including SAM2), and running the evaluation pipeline within the Colab environment. Follow the instructions within the notebook cells.
There are two main ways to run the SAM2 evaluation:
-
Locally (using
main.py):Ensure your virtual environment is active and all prerequisites (including the manual SAM2 installation) are met. Then, run the main script, pointing it to a configuration file:
# Make sure (venv) is active python main.py --config config/sam2_eval_config.jsonResults will be saved as timestamped CSV files in the
output/directory as specified in the config. -
Google Colab (using the notebook):
Open
sam2_eval_colab.ipynbin Google Colab and execute the cells sequentially. The notebook guides you through setup and execution.
The evaluation pipeline relies on specific data formats prepared by scripts in the data/data_scripts/ directory:
code_degradation.py: (Optional first step) Samples images/annotations from a source dataset (e.g., COCO), downloads them, applies specified degradations, and organizes them into thedata/images/gt_img/anddata/images/img_degraded/structure.build_local_map.py: Scans thedata/images/gt_img/anddata/images/img_degraded/directories to generate the crucialdata/degradation_map.jsonfile. It extracts image IDs, finds corresponding annotations, converts ground truth masks to RLE format, locates all degraded versions, and compiles the map. This must be run aftercode_degradation.pyor after manually placing the image/annotation files.data_utils.py: Provides utilities for validating thedegradation_map.jsonand visualizing the data samples (original images, degraded versions, and ground truth masks). Useful for verifying the data preparation steps.
Refer to the data/README.md file for detailed instructions on how to prepare your own dataset using these scripts.
SAM2_analysis/
├── config/ # Pipeline configuration files
│ └── sam2_eval_config.json # Example config for SAM2 evaluation
├── data/ # Input data (images, annotations)
│ ├── data_scripts/ # Scripts to prepare data (e.g., build_local_map.py)
│ ├── images/ # Base image files
│ │ ├── gt_img/ # Original images and annotations
│ │ └── img_degraded/ # Degraded image versions (subdirs per type)
│ └── degradation_map.json # Generated JSON map for pipeline input
├── external/ # Manually installed external libraries (e.g., sam2)
│ └── sam2/ # Cloned sam2 repository
├── output/ # Pipeline results (CSV files)
│ └── README.md # Explanation of output files
├── .gitignore # Specifies intentionally untracked files
├── main.py # Main script to run pipelines
├── metrics.py # Evaluation metric calculations (mIoU, BF1)
├── pipeline_utils.py # Utility functions for pipelines (loading, prediction)
├── README.md # This file
├── requirements.txt # Python package dependencies
├── sam2_eval_pipeline.py # Core SAM2 evaluation pipeline logic
└── venv/ # Python virtual environment (if created)
Pipeline behavior is controlled via JSON files in config/.
Example (config/sam2_eval_config.json):
{
"pipeline_name": "sam2_eval",
"description": "Evaluate SAM2 auto-mask generator on data map",
"data_path": "data/degradation_map.json",
"image_base_dir": "data",
"model_hf_id": "facebook/sam2-hiera-tiny",
"generator_config": {
"points_per_side": 32,
"pred_iou_thresh": 0.88,
"stability_score_thresh": 0.95,
"crop_n_layers": 0,
"min_mask_region_area": 100
},
"iou_threshold": 0.5,
"bf1_tolerance": 2,
"output_dir": "output",
"results_filename_prefix": "results_"
}Key Parameters:
pipeline_name(Required): Must match a key inPIPELINE_MAPinmain.py(e.g.,"sam2_eval").data_path: Path to the JSON file mapping image IDs to GT RLE masks and image file versions.image_base_dir: The root directory from which file paths insidedata_pathare relative.model_hf_id: Hugging Face identifier for the SAM2 model variant.generator_config: Dictionary passed toSamAutomaticMaskGenerator. See SAM2 docs for options.iou_threshold: IoU threshold used internally for matching predicted masks to the single ground truth mask during evaluation.bf1_tolerance: Pixel tolerance for the Boundary F1 metric.output_dir: Directory where the results CSV will be saved.results_filename_prefix: The output CSV will be named<prefix><timestamp>.csv.
The pipeline utilizes the sam2 library's built-in loading mechanisms (SAM2ImagePredictor.from_pretrained). You configure the model source using the model_hf_id parameter in your configuration file. Two primary methods are supported:
Specify the Hugging Face Hub identifier for the desired SAM2 model. The sam2 library will handle downloading the necessary files (including .pt checkpoints) and loading the model. This is the recommended approach for models hosted on the Hub, like the default.
"model_hf_id": "facebook/sam2-hiera-tiny"This works out of the box if you have internet access. For private models or to potentially avoid rate limiting, ensure you are authenticated with Hugging Face (refer to Prerequisites).
If you have downloaded the model checkpoint (.pt file) or the entire model directory structure expected by the sam2 library, you can provide the absolute or relative path to either the directory or the specific .pt file.
Pointing to a directory (if supported by the sam2 library's loader):
"model_hf_id": "/path/to/local/sam2-hiera-tiny-directory/"(Note: The exact directory structure required depends on the sam2 library's implementation.)
Pointing directly to the checkpoint file:
"model_hf_id": "/path/to/downloaded/sam2_hiera_tiny.pt"This option is useful for offline use or when using custom-trained checkpoints. Ensure the path is correct and accessible from where you run the script.
The pipeline expects a JSON file (specified by data_path) mapping unique image identifiers to their ground truth mask and different image versions.
{
"<image_id_1>": {
"ground_truth_rle": {
"size": [H, W],
"counts": "..."
},
"versions": {
"original": {"filepath": "images/<id_1>.jpg", "level": 0, "degradation_type": "original"},
"gaussian_blur_5": {"filepath": "pic_degraded/gaussian_blur/<id_1>_gaussian_blur_5.jpg", "level": 5, "degradation_type": "gaussian_blur"},
"jpeg_compression_80":{"filepath": "pic_degraded/jpeg_compression/<id_1>_jpeg_compression_80.jpg", "level": 80, "degradation_type": "jpeg_compression"}
}
},
"<image_id_2>": {
}
}ground_truth_rle: Contains the ground truth mask encoded in COCO RLE format. Crucially, this pipeline currently assumes only a single GT object mask per image.versions: A dictionary where keys are unique identifiers for each version (e.g., "original", "gaussian_blur_5") and values are dictionaries containing:filepath: Path to the image file relative to theimage_base_dirdefined in the main config.level: A numerical level associated with the degradation/version (e.g., blur radius, compression quality).degradation_type: A string describing the version type.
The pipeline generates a CSV file in the specified output_dir. The filename includes the results_filename_prefix and a timestamp. It contains evaluation metrics for each image version processed:
image_id: Identifier for the base image.version_key: Identifier for the specific image version evaluated.level: Numerical level associated with the version.relative_filepath: Path to the image file used.num_pred_masks: Number of masks generated by the model for this image.iou: Calculated IoU between the best-matching predicted mask and the ground truth.bf1: Calculated Boundary F1 score.error: Any error message encountered during processing for this specific version.bf1: Calculated Boundary F1 score for the best pair.sam2_score: Thepredicted_iouscore assigned by SAM2 to the chosen best mask.status: Indicates the outcome (e.g., "Success", "Image File Not Found", "No Valid Match").
This repository ships with light-weight self-tests embedded directly in a few modules plus a traditional pytest suite. These are meant to give immediate feedback that all components are wired correctly after an install or a code change.
| What to run | What it covers |
|---|---|
python metrics.py |
Numerical correctness of mIoU and BF1 |
python pipeline_utils.py |
COCO-RLE decode round-trip, JSON data-map loader |
python sam2_eval_pipeline.py |
End-to-end pipeline smoke test (model stubbed) |
pytest (from repo root) |
Full unit-test suite in tests/ |
The self-tests execute in <2 s each and require no model download (they monkey-patch heavy functions). Run them whenever you tweak core logic or before opening a Pull Request.
Example:
# inside your activated venv
python metrics.py
python pipeline_utils.py
python sam2_eval_pipeline.py
pytest # optional, slowerA CI pipeline can run the same commands to guard against regressions.
To add a new evaluation pipeline:
- Create a new Python script (e.g.,
my_new_pipeline.py) containing the main pipeline logic in a function (e.g.,run_my_pipeline(config)). - Import your new function in
main.py. - Add an entry to the
PIPELINE_MAPdictionary inmain.py, mapping a unique string key (e.g.,"my_new_eval") to your function (run_my_pipeline). - Create a new configuration file in
config/specifying"pipeline_name": "my_new_eval"and any parameters your new pipeline requires. - Run using
python main.py --config config/my_new_config.json.
When the path is prefixed with "pytorch:", the pipeline will:
- Instantiate the SAM2 model architecture from the library code
- Load weights directly from the specified .pt file using PyTorch's loading mechanisms
- Configure any necessary processor components for image preprocessing