vision-model

Here are 33 public repositories matching this topic...

sh4den / Montscan

🖨️ Automated scanner document processor with AI-powered naming and WebDav integration. Receives scans via FTP, extracts text using Vision AI, generates intelligent filenames with Ollama AI, and uploads to your cloud storage.

go golang printer ai scanner nextcloud webdav ftp-server vision-model vision-language-model ollama

Updated Jul 19, 2026
Go

eren23 / openflipbook

Star

Open-source flipbook.page clone — every page is an AI-generated illustration, click anywhere to explore deeper. Next.js + FastAPI + Modal. BYO keys.

typescript nextjs modal self-hosted flipbook infinite-canvas gemini-api fastapi vision-model generative-ai ai-image-generation fal-ai openrouter ltx-video nano-banana byo-keys

Updated Jul 21, 2026
Python

guaardvark / guaardvark

Star

The self-hosted AI workstation. Autonomous screen agents, 3-tier neural routing, parallel agent swarms, video generation, 4K/8K upscaling, RAG, voice interface, 70+ tool execution engine — all running locally on your hardware.

Updated Jul 16, 2026
Python

SpaceinvaderOne / a-eye

Star

Self-hosted AI photo intelligence tool. Uses local vision models via Ollama to describe, tag, rename, and search your photos. No cloud needed.

selfhosted unraid photo-management vision-model local-ai photo-renaming

Updated Apr 5, 2026
Python

Varun-Patkar / ChromePilot

Star

AI-powered browser automation agent using a dual-LLM architecture. The orchestrator (qwen3-vl-32k) creates execution plans from screenshots, while the executor (llama3.1:8b) translates steps into browser actions using an accessibility tree for reliable element selection. Local, private, powered by Ollama.

javascript chrome-extension markdown streaming web-scraping html-parsing browser-extension web-automation conversational-ai privacy-focused vision-model ai-assistant local-ai ollama qwen3-vl

Updated Dec 13, 2025
JavaScript

i-evi / evMLP

Star

evMLP: An Efficient Event-Driven MLP Architecture for Vision

computer-vision backbone mlp-classifier mlp-networks vision-model

Updated Nov 25, 2025
Python

vonhex / a-eye

Star

This is a fork of SpaceInvaderOne's repo to fix some issues I had with the software until he pulls the changes or fixes them himself. It also allows for integration to my gallery app Eyeris. github.com/vonhex/eyeris

fork selfhosted unraid photo-management vision-model local-ai photo-renaming

Updated May 20, 2026
Python

yorha2b-lab / auto-crud-copilot

Star

基于视觉大模型的前端(React+Antd)全自动 CRUD 代码生成器 🚀 / AI-powered full-automatic CRUD code generator.

react cli crud ai code-generator antd umi copilot low-code vision-model

Updated Jul 27, 2026
JavaScript

fxd20060117 / aidanmu

Star

AI 桌面直播弹幕姬 — 让 AI 一边看你的屏幕一边像直播观众一样发弹幕吐槽

python windows ai livestream danmaku desktop-widget vtuber vision-model pyside6 llm

Updated Jul 27, 2026
Python

JordanmFrancis / biotracker

Star

iOS app that ingests bloodwork photos from multiple providers and uses AI to extract and trend lab values over time.

swift ios ai health swiftui vision-model

Updated Apr 8, 2026
Swift

tommyguolin / image-shrink

Star

Compress images before AI vision analysis — save ~80% token costs. Works with Claude Code, Cursor, Codex, and any AI coding agent.

skill cursor image-compression codex vision-model ai-agent llm claude-code token-optimization

Updated May 10, 2026
Python

IRedDragonICY / resonote

Sponsor

Star

Next-gen AI Optical Music Recognition (OMR) platform. Convert sheet music images into playable ABC notation instantly using Google Gemini 3 Pro Vision. Built with React 19, TypeScript, and Tailwind.

react typescript artificial-intelligence omr sheet-music abc-notation abcjs optical-music-recognition gemini-api vision-model generative-ai google-gemini

Updated Apr 21, 2026
TypeScript

Fagan1024 / smart-video-editor

Star

会看画面的 AI 剪辑 Skill — 先用视觉模型看懂每段素材拍了什么、哪几秒能用，再决定取舍、顺序、节奏、调色和配乐。An AI editing skill that actually watches your footage.

ffmpeg video-editing short-video xiaohongshu vision-model ai-agent agent-skills claude-skill

Updated Jul 26, 2026
Python

yazanTah / tiktok-slideshow-analyzer

Star

Find a TikTok user's slideshow posts and OCR the caption text off every slide with an AI vision model.

python ocr computer-vision content-analytics tiktok tiktok-scraper tiktok-api vision-model yt-dlp ai-agent llm agents-md

Updated Jul 26, 2026
Python

connerkward / lookdev-auto-skill

Star

lookdev-auto — a Claude Code skill for automated visual tuning: a vision model rates rendered variants in a loop.

design automation lookdev vision-model ai-tools anthropic agent-skills claude-code claude-code-plugin claude-skill claude-code-skill

Updated Jun 17, 2026

ktsu-dev / ImageDescriber

Star

csharp dotnet image-captioning image-search image-hashing cli-tool bulk-processing vision-model llm local-ai ollama llama-vision image-describer

Updated Jul 27, 2026
C#

h1ddenpr0cess20 / ultravision

Star

Fast, resumable batch image captioning for local vision models (LM Studio, Ollama, any OpenAI-compatible) — CLI + FastAPI web studio, multi-format output

python cli ai vision image-captioning batch-processing multimodal fastapi vision-model llm local-llm ollama lm-studio openai-compatible qwen3-vl

Updated Jul 10, 2026
Python

Yvesnihaohaode / fallback-vision

Star

AI Gateway with visual fallback routing, hybrid multi-backend search, and protocol translation for Claude Code & Codex

multi-model mimo vision-model protocol-translation hybrid-search ai-gateway deepseek openai-compatible claude-code visual-fallback

Updated Jun 15, 2026
TypeScript

Rakshath66 / ClipFindr

Star

🔍 A CLIP-powered image similarity finder built with Streamlit — upload a query image and find the most visually similar matches from a gallery using deep visual embeddings.

Updated Jul 27, 2025
Python

mishafyi / hot-dog-or-not

Star

Compare how vision models reason about images — not just their accuracy scores

python machine-learning typescript ai computer-vision nextjs fastapi vision-model openrouter llm-benchmark nvidia-nemotron

Updated Mar 20, 2026
TypeScript

Improve this page

Add a description, image, and links to the vision-model topic page so that developers can more easily learn about it.

Curate this topic

Add this topic to your repo

To associate your repository with the vision-model topic, visit your repo's landing page and select "manage topics."

Learn more

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

vision-model

Here are 33 public repositories matching this topic...

sh4den / Montscan

eren23 / openflipbook

guaardvark / guaardvark

SpaceinvaderOne / a-eye

Varun-Patkar / ChromePilot

i-evi / evMLP

vonhex / a-eye

yorha2b-lab / auto-crud-copilot

fxd20060117 / aidanmu

JordanmFrancis / biotracker

tommyguolin / image-shrink

IRedDragonICY / resonote

Fagan1024 / smart-video-editor

yazanTah / tiktok-slideshow-analyzer

connerkward / lookdev-auto-skill

ktsu-dev / ImageDescriber

h1ddenpr0cess20 / ultravision

Yvesnihaohaode / fallback-vision

Rakshath66 / ClipFindr

mishafyi / hot-dog-or-not

Improve this page

Add this topic to your repo

Search code, repositories, users, issues, pull requests...

vision-model

Here are 33 public repositories matching this topic...

sh4den / Montscan

eren23 / openflipbook

guaardvark / guaardvark

SpaceinvaderOne / a-eye

Varun-Patkar / ChromePilot

i-evi / evMLP

vonhex / a-eye

yorha2b-lab / auto-crud-copilot

fxd20060117 / aidanmu

JordanmFrancis / biotracker

tommyguolin / image-shrink

IRedDragonICY / resonote

Fagan1024 / smart-video-editor

yazanTah / tiktok-slideshow-analyzer

connerkward / lookdev-auto-skill

ktsu-dev / ImageDescriber

h1ddenpr0cess20 / ultravision

Yvesnihaohaode / fallback-vision

Rakshath66 / ClipFindr

mishafyi / hot-dog-or-not

Improve this page

Add this topic to your repo