InfluxDB 3 OSS is now GA. Transform, enrich, and act on time series data directly in the database. Automate critical tasks and eliminate the need to move data externally. Download now. Learn more โ
Top 23 Python Machine Learning Projects
-
transformers
๐ค Transformers: the model-definition framework for state-of-the-art machine learning models in text, vision, audio, and multimodal models, for both inference and training.
HuggingFace Transformers - Library for building custom detectors
-
Stream
Stream - Scalable APIs for Chat, Feeds, Moderation, & Video. Stream helps developers build engaging apps that scale to millions with performant and flexible Chat, Feeds, Moderation, and Video APIs and SDKs powered by a global edge network and enterprise-grade infrastructure.
-
> A running number also carries data. Before you know it, someone's relying on the ordering or counting on there not being gaps - or counting the gaps to figure out something they shouldn't.
For example, if https://github.com/pytorch/pytorch/issues/111111 can be seen but https://github.com/pytorch/pytorch/issues/111110 can't, someone might infer the existence of a hidden issue relating to a critical security problem.
Whereas if the URL was instead https://github.com/pytorch/pytorch/issues/761500e0-0070-4c0d... that risk would be avoided.
-
nn
๐งโ๐ซ 60+ Implementations/tutorials of deep learning papers with side-by-side notes ๐; including transformers (original, xl, switch, feedback, vit, ...), optimizers (adam, adabelief, sophia, ...), gans(cyclegan, stylegan2, ...), ๐ฎ reinforcement learning (ppo, dqn), capsnet, distillation, ... ๐ง
-
Project mention: Data Analyst Guide: Mastering Random Forest vs XGBoost: Which Wins for Analytics? | dev.to | 2026-01-05
scikit-learn documentation: https://scikit-learn.org/
-
Keras 3 multi-backend
-
Project mention: Teaching AI to Read Emotions: Science, Challenges, and Innovation Behind Facial Emotion Detection with YOLOv11 on Raspberry Pi | dev.to | 2025-11-23
Ultralytics YOLO Documentation
-
OpenBB-finance / OpenBB
-
InfluxDB
InfluxDB โ Built for High-Performance Time Series Workloads. InfluxDB 3 OSS is now GA. Transform, enrich, and act on time series data directly in the database. Automate critical tasks and eliminate the need to move data externally. Download now.
-
Project mention: Show HN: Real-time privacy protection for smart glasses | news.ycombinator.com | 2025-08-11
Did you look at egoblur? its a lot more effective at face detection than https://github.com/ageitgey/face_recognition granted, you'd have to do your own face matching to do exception.
-
-
Project mention: Why DETRs are replacing YOLOs for real-time object detection | news.ycombinator.com | 2025-11-22
> The YOLO series is developed and maintained by Ultralytics. All YOLO code and weights are released under the AGPL-3.0 license.The YOLO series is developed and maintained by Ultralytics. All YOLO code and weights are released under the AGPL-3.0 license.
The original author of YOLO and the Darknet framework [1] issued the code under pretty much every license you wish to use [2]. My preferred fork by AlexeyAB is under an equally permissive license [3].
Ultralytics then created their own model under the AGPL-3.0 license [4], which probably would never stand up in a court as they have the model from the likes of YOLOv3 in their source [5].
This entire article is flawed anyway, because they don't state which YOLOv11 model they are using or compare the accuracy. They appear to have just taken the pre-trained models and assumed it's apples-to-apples. They could have at least compared YOLO11n/s/m/l/x,
[1] https://pjreddie.com/darknet/yolo/
[2] https://github.com/pjreddie/darknet
[3] https://github.com/AlexeyAB/darknet
[4] https://github.com/ultralytics/ultralytics
[5] https://github.com/ultralytics/ultralytics/tree/main/ultraly...
-
Project mention: Top Open-Source Data Engineering Tools- Unravelling the Best in 2026 | dev.to | 2025-12-10
Airflow
-
Javelit brings the power of rapid prototyping and interactive web app development to the Java ecosystem, much like Streamlit does for Python. With its simple, loop-based programming model, developers can quickly build data-driven applications without needing extensive frontend knowledge, leveraging familiar Java syntax and the rich JVM ecosystem. The live-reload feature enables instant experimentation and iteration, making it ideal for prototyping AI agents, data visualizations, and interactive tools. By integrating seamlessly with libraries like LangGraph4j combined with both Spring AI and LangChain4j, Javelit empowers Java developers to create engaging user interfaces effortlessly, bridging the gap between backend logic and user-facing applications. Checkout project, try it and let me know your feedback and ... happy coding! ๐
-
gradio
Build and share delightful machine learning apps, all in Python. ๐ Star to support our work!
Project mention: The Ultimate Guide to Building Stunning AI Apps For Beginners - Gradio | dev.to | 2025-11-14Why Gradio is the New Superpower for Every AI Learner in 2025
-
DeepSpeed
DeepSpeed is a deep learning optimization library that makes distributed training and inference easy, efficient, and effective.
-
Ray
Ray is an AI compute engine. Ray consists of a core distributed runtime and a set of AI Libraries for accelerating ML workloads.
Project mention: Top Open-Source Data Engineering Tools- Unravelling the Best in 2026 | dev.to | 2025-12-10Ray
-
Project mention: MindsDB Supercharges Google's MCP Toolbox with Unstructured Data Support | dev.to | 2025-12-29
Weโre happy to announce that weโve integrated MindsDB with Google's open-source project, MCP (Model Context Protocol) Toolbox. This will make your AI applications very, very smart. This enhancement expands the Toolbox's reach, especially for organizations grappling with lots of siloed data.
-
Open-Assistant
OpenAssistant is a chat-based assistant that understands tasks, can interact with third-party systems, and retrieve information dynamically to do so.
-
-
Project mention: Show HN: Plug-and-play Python utils for any computer-vision pipeline | news.ycombinator.com | 2025-07-21
-
paperless-ngx
A community-supported supercharged document management system: scan, index and archive all your documents
Borg Backup - I use it to automatically back up my main hosted Docker services. I have publicly hosted instances of Immich, and Paperless-NGX using Docker containers. I periodically make a backup of their data folder using Borg and store it in a Borg repo. The advantage of storing the backups in a Borg repo is that it is a deduplicating archival program. So no matter how many backups you make, it will not take any extra space than the first backup, provided nothing has changed. If there is a change, only that changed chunk is backed up, just like git. Also, you can easily encrypt and/or compress while backing up. Restoring a backup is also as easy as running a single Borg command.
-
qlib
Qlib is an AI-oriented Quant investment platform that aims to use AI tech to empower Quant Research, from exploring ideas to implementing productions. Qlib supports diverse ML modeling paradigms, including supervised learning, market dynamics modeling, and RL, and is now equipped with https://github.com/microsoft/RD-Agent to automate R&D process.
After researching different AI models in Qlib (a quantitative finance platform), here's what I learned:
-
Project mention: Solved: Is there a better way to test subject lines besides random A/B tools? | dev.to | 2025-12-29
Open-Source NLP Libraries: Python libraries like spaCy, NLTK, and Hugging Face Transformers for building custom models.
-
These methods improve efficiency, reduce hallucination, and enhance autonomy. Frameworks such as LangChain and DSPy could integrate many of these strategies, proving their practical value.
-
SaaSHub
SaaSHub - Software Alternatives and Reviews. SaaSHub helps you find the best software and product alternatives
Python Machine Learning discussion
Python Machine Learning related posts
-
How to Evaluate Your Text-to-SQL Agent in Cortex Analyst Using TruLens
-
Data Analyst Guide: Mastering Random Forest vs XGBoost: Which Wins for Analytics?
-
SynthTS โ Open-source CLI for generating privacy-safe synthetic time-series
-
AWS Sagemaker Notebook Jobs for Accelerating Data Science Experimentation Workflows with Mlflow and Optuna
-
๐ 2026-01-04 - Daily Intelligence Recap - Top 5 Signals
-
Build a Deep Learning Library
-
MindsDB Supercharges Google's MCP Toolbox with Unstructured Data Support
-
A note from our sponsor - InfluxDB
www.influxdata.com | 5 Jan 2026
Index
What are some of the best open-source Machine Learning projects in Python? This list will help you:
| # | Project | Stars |
|---|---|---|
| 1 | transformers | 154,507 |
| 2 | Pytorch | 96,237 |
| 3 | nn | 65,107 |
| 4 | scikit-learn | 64,474 |
| 5 | Keras | 63,678 |
| 6 | yolov5 | 56,518 |
| 7 | OpenBB | 56,002 |
| 8 | Face Recognition | 55,756 |
| 9 | faceswap | 54,846 |
| 10 | ultralytics | 50,555 |
| 11 | Airflow | 43,710 |
| 12 | streamlit | 42,959 |
| 13 | gradio | 41,151 |
| 14 | DeepSpeed | 41,145 |
| 15 | Ray | 40,583 |
| 16 | MindsDB | 38,177 |
| 17 | Open-Assistant | 37,492 |
| 18 | gym | 36,649 |
| 19 | supervision | 36,247 |
| 20 | paperless-ngx | 35,263 |
| 21 | qlib | 35,136 |
| 22 | spaCy | 33,030 |
| 23 | dspy | 31,171 |