unified-memory

Here are 25 public repositories matching this topic...

hogeheer499-commits / strix-halo-guide

Strix Halo local LLM guide: 63-97 t/s direct MoE on Ryzen AI MAX+ 395 / 128GB unified memory. Setup, model choices, benchmarks, and raw evidence.

Updated May 7, 2026
Python

real-space / tfQMRgpu

Star

A CUDA implementation of the transpose-free Quasi-Minimal Residual method

c library cplusplus fortran gpu cuda complex-numbers gpu-computing sparse-linear-systems multiprecision sparse-matrix linear-solvers sparse-linear-solver iterative-algorithms block-sparsity sparse- unified-memory tfqmr

Updated Sep 2, 2025
C++

parallelArchitect / sparkview

Star

Operator-grade GPU monitor for NVIDIA GPUs with native GB10 / DGX Spark coherent UMA support — PSI pressure, clock detection, ConnectX-7 network layer

python monitoring gpu cuda tui nvidia psi unified-memory gb10 dgx-spark

Updated May 3, 2026
Python

hamtun24 / openuma

Star

Unified Memory Abstraction Layer for AI Inference on AMD APUs and Intel iGPUs

rust machine-learning inference unified-memory llm

Updated Apr 3, 2026
Rust

shumbul / Accelerated-Computing

Star

Fundamentals of Accelerated Computing C/C++ is a course provided by NVIDIA.

cuda nvidia high-performance-computing accelerated-computing unified-memory

Updated Oct 9, 2020
Cuda

raspoli / mlx-serve

Star

Local inference server for Apple Silicon — hot-swaps MLX models (LLM, vision, embeddings, TTS, STT) via OpenAI API

python macos machine-learning text-to-speech embeddings speech-to-text inference-server mlx model-serving fastapi unified-memory apple-silicon openai-api llm local-inference vision-language-model local-llm mlx-lm openai-compatible

Updated Mar 31, 2026
Python

sadopc / unified-db-2

Star

Apple Silicon Unified Memory for GPU-Accelerated Analytics — TPC-H benchmarks across DuckDB, NumPy, and MLX

benchmark numpy gpu-computing mlx tpc-h unified-memory duckdb apple-silicon apple-m4 gpu-analytics

Updated Feb 18, 2026
Python

lintenn / cudaAddVectors-explicit-vs-unified-memory

Star

Performance comparison of two different forms of memory management in CUDA

c performance memory cuda memory-management explicit unified-memory

Updated Oct 3, 2021
Cuda

CINOAdam / nvml-unified-shim

Star

NVML unified memory shim for NVIDIA DGX Spark Grace Blackwell GB10 - enables MAX Engine, PyTorch, and GPU monitoring

machine-learning tensorflow gpu cuda pytorch nvidia nvml arm64 unified-memory max-engine dgx-spark grace-blackwell

Updated Jan 28, 2026
C

ChrisJR035 / Talos-O-Architecture

Star

Talos-O (Omni): A sovereign, embodied agentic organism forged on AMD Strix Halo. Integrating the Chimera Kernel (Linux 7.0), Zero-Copy Introspection, and the Phronesis Engine. Built from First Principles.

zero-copy linux-kernel first-principles linux-kernel-hacking unified-memory embodied-ai unified-memory-parallelism ryzen-ai sovereign-ai strix-halo ryzen-ai-max first-principles-ai rocm-6-2 neo-techne phronesis

Updated Apr 21, 2026
Python

parallelArchitect / cuda-unified-memory-analyzer

Star

gpu thrashingNVIDIA GPU Unified Memory diagnostic tool — architecture-aware, measurement-based, PCIe/coherent transport detection

Updated May 4, 2026
Cuda

parallelArchitect / cupti-activity-collector

Star

GB10-aware CUPTI Activity collector — runtime kind detection, phase management, and JSON output for hardware-coherent UMA platforms

cuda nvidia profiling cupti aarch64-linux blackwell unified-memory hardware-coherent-uma

Updated May 3, 2026
Cuda

parallelArchitect / pascal-um-benchmark

Star

Reproducible Pascal GPU Unified Memory benchmark with Nsight and nvprof profiling

benchmark-suite memory-bandwidth page-faults nvprof unified-memory nsight-systems cuda-pascal-unified-memory-gpu nsight-nvprof-pcie-bandwidth cudamallocmanaged cudamemprefetchasync cudapascal

Updated Feb 1, 2026
Python

cloudlinqed / WayInfer

Star

Run LLMs larger than your RAM — native GGUF inference engine with SSD streaming, no GPU required

ai runtime unified-memory ai-inference llm-inference wayos

Updated Apr 2, 2026
C

parallelArchitect / gb10-kernel-probe

Star

Empirical kernel scheduling characterization for NVIDIA GB10 (SM121a). Sweeps GEMM tile configurations, classifies PTX instruction paths, captures hardware telemetry

benchmark gpu cuda nvidia empirical performance-analysis profiling cutlass gemm ptx black-box-testing unified-memory kernel-scheduling nvidia-tools gb10 dgx-spark sm121

Updated May 9, 2026
C++

atakehiro / 3D-U-Net-TFLMS-Keras

Star

3D U-Net with tf.keras for Large-Model-Support or Unified Memory

3d-unet tf-keras large-network unified-memory tf-lms

Updated Aug 9, 2020
Python

sl-badcoder / UVM_benchmark_Extended

Star

Extended the UVM Benchmark such that we can test for huge data workloads(16GiB and more). Needed to make it overflow save and add dataset creation logic for some Applications.

cnn kmeans knn bfs-algorithm unified-memory bayessian-networks

Updated Feb 12, 2026
Roff

parallelArchitect / nvidia-uma-fault-probe

Star

Cycle-accurate UMA fault latency and bandwidth measurement for NVIDIA GPUs. C and PTX. No Python. Pascal (SM 6.0) through Blackwell GB10 (SM 12.1).

pascal cuda nvidia bandwidth aarch64 rtx uma ptx gpu-performance memory-profiling cuda-c unified-memory gb10 gpu-diagnostics dgx-spark rrace-blackwell gpu-proformance fault-latency

Updated Apr 28, 2026
Cuda

parallelArchitect / nvidia-gpu-val

Star

NVIDIA GPU validation: PCIe transport, Unified Memory prefetch, SGEMM compute, drift detection.

Updated Feb 25, 2026
Python

parallelArchitect / gb10-uma-diagnostics

Star

GB10 unified memory diagnostic suite — bandwidth, contention, atomic coherence, CUPTI activity, power and thermal correlation

benchmark gpu cuda nvidia diagnostics cupti unified-memory dgx-spark

Updated May 3, 2026
Cuda

Improve this page

Add a description, image, and links to the unified-memory topic page so that developers can more easily learn about it.

Curate this topic

Add this topic to your repo

To associate your repository with the unified-memory topic, visit your repo's landing page and select "manage topics."

Learn more

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

unified-memory

Here are 25 public repositories matching this topic...

hogeheer499-commits / strix-halo-guide

real-space / tfQMRgpu

parallelArchitect / sparkview

hamtun24 / openuma

shumbul / Accelerated-Computing

raspoli / mlx-serve

sadopc / unified-db-2

lintenn / cudaAddVectors-explicit-vs-unified-memory

CINOAdam / nvml-unified-shim

ChrisJR035 / Talos-O-Architecture

parallelArchitect / cuda-unified-memory-analyzer

parallelArchitect / cupti-activity-collector

parallelArchitect / pascal-um-benchmark

cloudlinqed / WayInfer

parallelArchitect / gb10-kernel-probe

atakehiro / 3D-U-Net-TFLMS-Keras

sl-badcoder / UVM_benchmark_Extended

parallelArchitect / nvidia-uma-fault-probe

parallelArchitect / nvidia-gpu-val

parallelArchitect / gb10-uma-diagnostics

Improve this page

Add this topic to your repo

Search code, repositories, users, issues, pull requests...

unified-memory

Here are 25 public repositories matching this topic...

hogeheer499-commits / strix-halo-guide

real-space / tfQMRgpu

parallelArchitect / sparkview

hamtun24 / openuma

shumbul / Accelerated-Computing

raspoli / mlx-serve

sadopc / unified-db-2

lintenn / cudaAddVectors-explicit-vs-unified-memory

CINOAdam / nvml-unified-shim

ChrisJR035 / Talos-O-Architecture

parallelArchitect / cuda-unified-memory-analyzer

parallelArchitect / cupti-activity-collector

parallelArchitect / pascal-um-benchmark

cloudlinqed / WayInfer

parallelArchitect / gb10-kernel-probe

atakehiro / 3D-U-Net-TFLMS-Keras

sl-badcoder / UVM_benchmark_Extended

parallelArchitect / nvidia-uma-fault-probe

parallelArchitect / nvidia-gpu-val

parallelArchitect / gb10-uma-diagnostics

Improve this page

Add this topic to your repo