Tracked Repositories

133 open-source AI inference repositories across 61 organizations.

133 repositories across 61 organizations

HuggingFace

4 repos·226.3k·100 commits this week

HuggingFace Transformers — state-of-the-art NLP/ML model library (~140K stars)

161.3k
67

HuggingFace Diffusers — diffusion model inference & training (Stable Diffusion, Flux, etc.)

33.8k
27

Minimalist Rust ML framework for inference — targets browser WASM and GPU, zero Python dependency

20.4k
6

HuggingFace TGI — LLM serving (archived March 2026, read-only)

10.9k

TensorFlow

3 repos·204.7k·8 commits this week

Industry-standard deep learning framework with XLA compilation backend

195.4k

TensorFlow Serving — high-performance gRPC/REST serving for TF models (multi-version, canary, batching)

6.3k
2

TensorFlow Lite for microcontrollers and embedded devices

2.9k
6

Ollama

1 repo·173.1k·33 commits this week

User-friendly local LLM runner built on llama.cpp (~167K stars)

173.1k
33

ggml-org

2 repos·165.0k·176 commits this week

High-performance LLM inference in C/C++ (CPU + GPU)

114.5k
125

High-performance Whisper speech recognition in C/C++

50.4k
51

Open WebUI

1 repo·139.9k·120 commits this week

Self-hosted ChatGPT alternative with built-in RAG, offline-capable (~104K stars)

139.9k
120

Meta / PyTorch

3 repos·109.4k·485 commits this week

Primary ML framework; torch.compile + AOTInductor for production inference optimization

100.4k
392

PyTorch's portable execution framework for on-device inference

4.7k
93

TorchServe — production PyTorch model serving (archived August 2025)

4.4k

DeepSeek AI

1 repo·103.7k

Reference inference code for DeepSeek-V3 (671B MoE); includes FP8 training framework

103.7k

vLLM Project

2 repos·81.9k·258 commits this week

Most widely adopted open-source LLM serving engine; PagedAttention, continuous batching

81.9k
250

vLLM community plugin for Intel Gaudi accelerators

40
8

Google AI Edge

16 repos·78.6k·230 commits this week

Cross-platform ML pipeline framework (vision, audio, NLP)

35.5k
6

AI Edge model gallery

23.5k
8

LiteRT for language model inference

5.4k
43

Official Gemma model cookbook — recipes, fine-tuning, deployment guides

3.6k
9

Sample apps using MediaPipe

2.7k

Google's Lite Runtime (successor to TensorFlow Lite)

2.5k
99

Highly optimized neural network operators library (ARM, x86, WASM)

2.4k
42

Model visualization and exploration tool

1.5k

LiteRT integration with PyTorch

1.0k
1

Sample code for LiteRT

334
13

Quantization tooling for AI Edge models

143
2

Web examples for MediaPipe Task APIs

38

Sample models for AI Edge

24

CLI for LiteRT conversion, quantization, compilation, management, running, benchmarking, and visualization workflows

20
5

Curated resources for Google AI Edge software

Evaluation tooling for Google AI Edge

4
2

Nomic AI

1 repo·77.4k

Desktop AI app + SDK for running LLMs locally (~73K stars)

77.4k

Apple / ML-Explore

10 repos·58.4k·13 commits this week

Array framework for ML on Apple silicon (Python)

26.6k
7

Example models and applications using MLX

8.7k

Reverse-engineered Apple Neural Engine (ANE) — hardware ops, memory layout, firmware interactions

6.7k

LLM inference and fine-tuning with MLX

5.5k

Tools for converting & running models with Core ML

5.3k

Example apps using MLX Swift

2.6k

Swift bindings for MLX

1.9k

LLM inference in Swift via MLX

539
6

Efficient data loading for MLX

476

C bindings for MLX

209

BerriAI

1 repo·49.2k·62 commits this week

Unified OpenAI-compatible proxy for 100+ LLM providers (vLLM, Ollama, Bedrock, Azure, etc.)

49.2k
62

Oobabooga

1 repo·47.3k·2 commits this week

Gradio web UI for LLMs — multi-backend (llama.cpp, ExLlamaV2, transformers) (~43K stars)

47.3k
2

Mudler (LocalAI)

1 repo·46.7k·81 commits this week

Free, open-source OpenAI drop-in replacement — runs locally, no GPU required (~36K stars)

46.7k
81

Exo Explore

1 repo·45.1k·5 commits this week

Run LLMs distributed across heterogeneous devices (Mac, iPhone, etc.)

45.1k
5

Ray Project

1 repo·42.8k·90 commits this week

Distributed AI compute engine; Ray Serve handles online and async batch inference (~39K stars)

42.8k
90

DeepSpeed AI

1 repo·42.5k·7 commits this week

Microsoft DeepSpeed — distributed training and inference (ZeRO, MII, FastGen)

42.5k
7

Microsoft / ONNX

2 repos·41.7k·81 commits this week

Open Neural Network Exchange format specification

20.9k
24

Microsoft's cross-platform, high-performance ONNX inference engine

20.7k
57

LM-Sys

1 repo·39.5k

LLM serving framework and home of Chatbot Arena (~37K stars)

39.5k

JAX (Google DeepMind)

1 repo·35.8k·133 commits this week

Composable NumPy transformations (JIT, grad, vmap) compiled via XLA to GPUs and TPUs — primary DeepMind research/production runtime

35.8k
133

Miscellaneous

4 repos·32.1k·30 commits this week

MLC's universal LLM deployment engine (multi-backend)

22.8k

Tile-based ML language and compiler

6.4k
19

Community on-device LLM project

1.6k

vLLM-style inference on Apple silicon via MLX

1.3k
11

SGLang

1 repo·29.0k·331 commits this week

High-throughput LLM/VLM serving with RadixAttention and structured generation

29.0k
331

Tencent

2 repos·28.0k·5 commits this week

High-performance neural network inference for mobile (Android/iOS)

23.3k
5

Tencent Neural Network — mobile and edge inference

4.6k

NVIDIA

3 repos·27.3k·165 commits this week

NVIDIA's optimized LLM inference library (GPU)

13.8k
161

NVIDIA's high-performance deep learning inference SDK (GPU)

13.0k
2

C++ LLM/VLM inference runtime for Jetson and NVIDIA edge devices

422
2

Mozilla AI

1 repo·24.6k·5 commits this week

Single-file LLM executables via Cosmopolitan Libc — zero install, all platforms (~21K stars)

24.6k
5

Triton Language (OpenAI)

1 repo·19.4k·42 commits this week

Python-like GPU kernel language used by vLLM FlashAttention and PyTorch inductor

19.4k
42

MLC AI

1 repo·18.1k

High-performance LLM inference in web browsers via WebGPU

18.1k

KVCache AI

1 repo·17.2k·2 commits this week

CPU-GPU hybrid inference; runs DeepSeek 671B on 14GB VRAM + 382GB DRAM with massive speedup over llama.cpp

17.2k
2

jundot

1 repo·15.8k·135 commits this week

LLM inference server with continuous batching & SSD caching for Apple Silicon — managed from the macOS menu bar

15.8k
135

Alibaba

1 repo·15.4k·16 commits this week

Alibaba's neural network inference framework for mobile & edge

15.4k
16

Apache

2 repos·13.8k·37 commits this week

Apache TVM ML compiler — auto-tunes models for any hardware target

13.4k
33

Apache TVM Foreign Function Interface for deep learning compilation

402
4

Blaizzy (Community MLX)

5 repos·13.4k·50 commits this week

Audio models (TTS, ASR) with MLX

7.2k
14

Vision-language models on Apple silicon via MLX

4.9k
36

Swift audio inference using MLX

646

Text embedding models with MLX

392

Video model inference with MLX

238

K2 / Next-gen ASR

1 repo·12.7k·6 commits this week

ONNX-based runtime for ASR, TTS, VAD, and keyword spotting

12.7k
6

OpenVINO Toolkit / Intel

3 repos·12.0k·80 commits this week

Intel's toolkit for optimizing & deploying deep learning on Intel hardware

10.3k
68

Neural Network Compression Framework — quantization, pruning, sparsity for OpenVINO

1.2k
3

OpenVINO GenAI — generative AI layer with speculative decoding & KV-cache opt

518
9

RunAnywhere

2 repos·11.9k

RunAnywhere SDKs for on-device inference deployment

10.3k

RunAnywhere CLI tool

1.5k

Intel

2 repos·11.5k·5 commits this week

Intel IPEX-LLM — local LLM acceleration on Intel hardware (archived Jan 2026, read-only)

8.8k

SOTA low-bit LLM quantization (INT8/FP8/MXFP8/INT4/MXFP4/NVFP4) & sparsity

2.7k
5

Mistral AI

1 repo·10.8k

Official minimal inference library for all Mistral models (7B, Mixtral, Pixtral)

10.8k

Triton Inference Server

1 repo·10.7k·4 commits this week

NVIDIA Triton — production multi-model inference server (HTTP/gRPC, multi-backend)

10.7k
4

Dusty-NV (NVIDIA Jetson)

1 repo·8.9k

DNN inference library & tutorials for NVIDIA Jetson

8.9k

BentoML

1 repo·8.7k·2 commits this week

Unified serving framework: real-time APIs, task queues, batching, multi-model chains

8.7k
2

Nexa AI

1 repo·8.1k

Unified SDK for running LLMs and multimodal models locally

8.1k

InternLM / Shanghai AI Lab

1 repo·7.9k·15 commits this week

High-throughput LLM serving with TurboMind engine (C++/CUDA)

7.9k
15

PaddlePaddle (Baidu)

1 repo·7.3k

Lightweight inference engine for mobile & embedded from PaddlePaddle

7.3k

AI Dynamo (NVIDIA)

1 repo·7.2k·128 commits this week

Datacenter-scale distributed inference serving framework (Rust + Python, disaggregated prefill/decode, engine-agnostic)

7.2k
128

ArgMax

4 repos·6.5k·1 commits this week

On-device Whisper inference for Apple platforms (Swift)

6.2k
1

Python tooling for WhisperKit model optimization

243

On-device AI benchmarking framework

89

Swift playground for ArgMax SDK

21

Cactus Compute

5 repos·5.6k·1 commits this week

Cactus core edge inference framework

5.3k
1

React Native bindings for Cactus

174

Flutter bindings for Cactus

71

Kotlin/Android bindings for Cactus

71

Demo chat app using Cactus

28

Osaurus

1 repo·5.6k·36 commits this week

Native macOS AI agent harness in Swift — any model, persistent memory, autonomous execution, MCP server, MLX + Apple Neural Engine, fully offline

5.6k
36

TurboDeRP (ExLlamaV2)

1 repo·4.5k

High-performance EXL2-quantized inference for consumer NVIDIA GPUs

4.5k

OpenNMT

1 repo·4.5k·2 commits this week

Fast C++ inference for Transformer models; INT8/INT16 CPU quantization, multi-platform

4.5k
2

OpenXLA

1 repo·4.3k·210 commits this week

Compiler for JAX, TF, PyTorch targeting GPU, TPU, and CPU from a unified IR

4.3k
210

Luminal AI

1 repo·2.9k·1 commits this week

Rust-based deep learning compiler with a small static graph IR for fast, portable inference (CUDA, Metal, CPU)

2.9k
1

Liquid AI

5 repos·2.8k·8 commits this week

Examples, tutorials and apps for Liquid AI LFM + LEAP SDK

2.1k

Speech-to-Speech audio models by Liquid AI

528
5

Minimal fine-tuning repo for LFM2, fully open-source

174
1

Example apps for LeapSDK

66

Liquid AI documentation

25
2

Fluid Inference

3 repos·2.2k·14 commits this week

On-device audio inference framework

2.1k
13

Fluid Inference core runtime

68

Rust text processing library for inference

34
1

Try Mirai

2 repos·1.7k·47 commits this week

Mirai's on-device inference runtime

1.6k
38

Mirai's LLaMA-based on-device model

77
9

UbiquitousLearning

1 repo·1.5k

Multimodal LLM inference framework for mobile & edge

1.5k

Qualcomm

2 repos·1.5k·40 commits this week

State-of-the-art ML models optimized for Qualcomm Snapdragon NPU/DSP/QNN deployment

1.1k
34

Sample apps and tutorials for deploying models on Qualcomm hardware (TFLite, ONNX, QNN)

417
6

ARM Software

1 repo·1.3k

ARM Neural Network SDK for ARM & Mali devices

1.3k

AMD ROCm

4 repos·1.1k·145 commits this week

AI Tensor Engine for ROCm — centralized repo for high-perf AI operators on AMD Instinct GPUs

454
74

AMD's graph inference engine for MI-series GPUs

307
17

ROCm fork of FlashAttention with Composable Kernel (CK) and Triton backends

232

AiTer Optimized Model — lightweight vLLM-like server built on AITER kernels for ROCm

102
54

NimbleEdge

2 repos·535

NimbleEdge's deliteAI on-device inference framework

533

NimbleEdge fork of ExecuTorch with edge optimizations

2

Picovoice

1 repo·311

Picovoice's on-device LLM inference engine

311

Zetic AI

5 repos·65·1 commits this week

MLange sample applications

60

MLange extension library

4

iOS framework for MLange

1
1

MLange SDK documentation

0

iOS extension framework for MLange

0