Tracked Repositories

140 open-source AI inference repositories across 66 organizations.

140 repositories across 66 organizations

HuggingFace

4 repos·226.8k·98 commits this week

HuggingFace Transformers — state-of-the-art NLP/ML model library (~140K stars)

161.6k
70

HuggingFace Diffusers — diffusion model inference & training (Stable Diffusion, Flux, etc.)

33.9k
26

Minimalist Rust ML framework for inference — targets browser WASM and GPU, zero Python dependency

20.5k
2

HuggingFace TGI — LLM serving (archived March 2026, read-only)

10.9k

TensorFlow

3 repos·205.0k·249 commits this week

Industry-standard deep learning framework with XLA compilation backend

195.7k
220

TensorFlow Serving — high-performance gRPC/REST serving for TF models (multi-version, canary, batching)

6.4k
27

TensorFlow Lite for microcontrollers and embedded devices

3.0k
2

Ollama

1 repo·174.3k·11 commits this week

User-friendly local LLM runner built on llama.cpp (~167K stars)

174.3k
11

ggml-org

2 repos·167.4k·128 commits this week

High-performance LLM inference in C/C++ (CPU + GPU)

116.7k
100

High-performance Whisper speech recognition in C/C++

50.8k
28

Open WebUI

1 repo·141.7k

Self-hosted ChatGPT alternative with built-in RAG, offline-capable (~104K stars)

141.7k

Meta / PyTorch

3 repos·109.9k·469 commits this week

Primary ML framework; torch.compile + AOTInductor for production inference optimization

100.8k
371

PyTorch's portable execution framework for on-device inference

4.7k
98

TorchServe — production PyTorch model serving (archived August 2025)

4.4k

DeepSeek AI

1 repo·103.8k

Reference inference code for DeepSeek-V3 (671B MoE); includes FP8 training framework

103.8k

vLLM Project

2 repos·83.0k·278 commits this week

Most widely adopted open-source LLM serving engine; PagedAttention, continuous batching

83.0k
275

vLLM community plugin for Intel Gaudi accelerators

40
3

Google AI Edge

16 repos·79.5k·215 commits this week

Cross-platform ML pipeline framework (vision, audio, NLP)

35.6k
20

AI Edge model gallery

23.8k
5

LiteRT for language model inference

5.6k
37

Official Gemma model cookbook — recipes, fine-tuning, deployment guides

3.7k

Sample apps using MediaPipe

2.7k

Google's Lite Runtime (successor to TensorFlow Lite)

2.6k
85

Highly optimized neural network operators library (ARM, x86, WASM)

2.4k
47

Model visualization and exploration tool

1.5k

LiteRT integration with PyTorch

1.0k
7

Sample code for LiteRT

343
6

Quantization tooling for AI Edge models

156
5

Web samples for MediaPipe

40

Sample models for AI Edge

24

Command-line tooling for LiteRT

23
1

Evaluation tooling for AI Edge models

7
2

Google AI Edge documentation site

Nomic AI

1 repo·77.4k

Desktop AI app + SDK for running LLMs locally (~73K stars)

77.4k

Apple / ML-Explore

10 repos·59.6k·59 commits this week

Array framework for ML on Apple silicon (Python)

27.0k
28

Example models and applications using MLX

8.7k

Reverse-engineered Apple Neural Engine (ANE) — hardware ops, memory layout, firmware interactions

6.7k

LLM inference and fine-tuning with MLX

5.9k
3

Tools for converting & running models with Core ML

5.3k
8

Example apps using MLX Swift

2.6k
4

Swift bindings for MLX

1.9k
2

LLM inference in Swift via MLX

660
14

Efficient data loading for MLX

477

C bindings for MLX

213

BerriAI

1 repo·50.5k

Unified OpenAI-compatible proxy for 100+ LLM providers (vLLM, Ollama, Bedrock, Azure, etc.)

50.5k

Oobabooga

1 repo·47.3k

Gradio web UI for LLMs — multi-backend (llama.cpp, ExLlamaV2, transformers) (~43K stars)

47.3k

Mudler (LocalAI)

1 repo·46.9k·90 commits this week

Free, open-source OpenAI drop-in replacement — runs locally, no GPU required (~36K stars)

46.9k
90

Exo Explore

1 repo·45.4k

Run LLMs distributed across heterogeneous devices (Mac, iPhone, etc.)

45.4k

Ray Project

1 repo·42.9k·82 commits this week

Distributed AI compute engine; Ray Serve handles online and async batch inference (~39K stars)

42.9k
82

DeepSpeed AI

1 repo·42.5k·6 commits this week

Microsoft DeepSpeed — distributed training and inference (ZeRO, MII, FastGen)

42.5k
6

Microsoft / ONNX

2 repos·41.8k·59 commits this week

Open Neural Network Exchange format specification

21.0k
9

Microsoft's cross-platform, high-performance ONNX inference engine

20.8k
50

LM-Sys

1 repo·39.5k

LLM serving framework and home of Chatbot Arena (~37K stars)

39.5k

NVIDIA

4 repos·37.3k·120 commits this week

NVIDIA's optimized LLM inference library (GPU)

13.9k
115

NVIDIA's high-performance deep learning inference SDK (GPU)

13.1k

CUDA C++ templates for high-performance matrix-multiply (GEMM) and convolution kernels

9.9k
5

C++ LLM/VLM inference runtime for Jetson and NVIDIA edge devices

438

JAX (Google DeepMind)

1 repo·35.8k·138 commits this week

Composable NumPy transformations (JIT, grad, vmap) compiled via XLA to GPUs and TPUs — primary DeepMind research/production runtime

35.8k
138

Miscellaneous

4 repos·32.3k·35 commits this week

MLC's universal LLM deployment engine (multi-backend)

22.8k

Tile-based ML language and compiler

6.5k
20

Community on-device LLM project

1.6k

vLLM-style inference on Apple silicon via MLX

1.3k
15

SGLang

1 repo·29.1k·327 commits this week

High-throughput LLM/VLM serving with RadixAttention and structured generation

29.1k
327

Tencent

2 repos·28.0k·4 commits this week

High-performance neural network inference for mobile (Android/iOS)

23.4k
4

Tencent Neural Network — mobile and edge inference

4.6k

Modular

1 repo·26.3k·317 commits this week

Modular Platform monorepo — MAX inference server/framework + Mojo programming language for portable, high-performance AI on CPUs and GPUs

26.3k
317

Mozilla AI

1 repo·25.0k

Single-file LLM executables via Cosmopolitan Libc — zero install, all platforms (~21K stars)

25.0k

Dao AI Lab

1 repo·24.2k·3 commits this week

Official FlashAttention — fast, memory-efficient exact attention (FA-2/FA-3) kernels for GPUs

24.2k
3

Triton Language (OpenAI)

1 repo·19.4k·54 commits this week

Python-like GPU kernel language used by vLLM FlashAttention and PyTorch inductor

19.4k
54

MLC AI

1 repo·18.2k

High-performance LLM inference in web browsers via WebGPU

18.2k

KVCache AI

1 repo·17.3k·3 commits this week

CPU-GPU hybrid inference; runs DeepSeek 671B on 14GB VRAM + 382GB DRAM with massive speedup over llama.cpp

17.3k
3

jundot

1 repo·16.7k·79 commits this week

LLM inference server with continuous batching & SSD caching for Apple Silicon — managed from the macOS menu bar

16.7k
79

Alibaba

1 repo·15.5k·19 commits this week

Alibaba's neural network inference framework for mobile & edge

15.5k
19

Apache

2 repos·13.9k·84 commits this week

Apache TVM ML compiler — auto-tunes models for any hardware target

13.5k
75

Apache TVM Foreign Function Interface for deep learning compilation

413
9

Blaizzy (Community MLX)

5 repos·13.7k·31 commits this week

Audio models (TTS, ASR) with MLX

7.4k
4

Vision-language models on Apple silicon via MLX

5.1k
22

Swift audio inference using MLX

675
5

Text embedding models with MLX

401

Video model inference with MLX

242

K2 / Next-gen ASR

1 repo·13.0k·6 commits this week

ONNX-based runtime for ASR, TTS, VAD, and keyword spotting

13.0k
6

OpenVINO Toolkit / Intel

3 repos·12.1k·98 commits this week

Intel's toolkit for optimizing & deploying deep learning on Intel hardware

10.4k
72

Neural Network Compression Framework — quantization, pruning, sparsity for OpenVINO

1.2k
10

OpenVINO GenAI — generative AI layer with speculative decoding & KV-cache opt

530
16

RunAnywhere

2 repos·11.9k·51 commits this week

RunAnywhere SDKs for on-device inference deployment

10.3k
51

RunAnywhere CLI tool

1.5k

Intel

2 repos·11.5k

Intel IPEX-LLM — local LLM acceleration on Intel hardware (archived Jan 2026, read-only)

8.8k

SOTA low-bit LLM quantization (INT8/FP8/MXFP8/INT4/MXFP4/NVFP4) & sparsity

2.7k

Mistral AI

1 repo·10.8k·1 commits this week

Official minimal inference library for all Mistral models (7B, Mixtral, Pixtral)

10.8k
1

Triton Inference Server

1 repo·10.8k·7 commits this week

NVIDIA Triton — production multi-model inference server (HTTP/gRPC, multi-backend)

10.8k
7

Qualcomm

3 repos·9.6k·90 commits this week

On-device LLM/VLM SDK for Snapdragon NPU, GPU, and CPU (formerly Nexa AI)

8.1k
30

State-of-the-art ML models optimized for Qualcomm Snapdragon NPU/DSP/QNN deployment

1.1k
54

Sample apps and tutorials for deploying models on Qualcomm hardware (TFLite, ONNX, QNN)

424
6

Dusty-NV (NVIDIA Jetson)

1 repo·8.9k

DNN inference library & tutorials for NVIDIA Jetson

8.9k

BentoML

1 repo·8.7k

Unified serving framework: real-time APIs, task queues, batching, multi-model chains

8.7k

InternLM / Shanghai AI Lab

1 repo·7.9k·14 commits this week

High-throughput LLM serving with TurboMind engine (C++/CUDA)

7.9k
14

AI Dynamo (NVIDIA)

1 repo·7.3k·90 commits this week

Datacenter-scale distributed inference serving framework (Rust + Python, disaggregated prefill/decode, engine-agnostic)

7.3k
90

PaddlePaddle (Baidu)

1 repo·7.3k

Lightweight inference engine for mobile & embedded from PaddlePaddle

7.3k

ArgMax

4 repos·6.6k

On-device Whisper inference for Apple platforms (Swift)

6.2k

Python tooling for WhisperKit model optimization

245

On-device AI benchmarking framework

90

Swift playground for ArgMax SDK

22

Osaurus

1 repo·5.8k·96 commits this week

Native macOS AI agent harness in Swift — any model, persistent memory, autonomous execution, MCP server, MLX + Apple Neural Engine, fully offline

5.8k
96

FlashInfer

1 repo·5.8k·28 commits this week

High-performance GPU kernel library for LLM serving — attention, sampling, and KV-cache primitives

5.8k
28

Cactus Compute

5 repos·5.7k·35 commits this week

Cactus core edge inference framework

5.3k
35

React Native bindings for Cactus

175

Kotlin/Android bindings for Cactus

72

Flutter bindings for Cactus

71

Demo chat app using Cactus

28

TurboDeRP (ExLlamaV2)

1 repo·4.6k

High-performance EXL2-quantized inference for consumer NVIDIA GPUs

4.6k

OpenNMT

1 repo·4.5k

Fast C++ inference for Transformer models; INT8/INT16 CPU quantization, multi-platform

4.5k

OpenXLA

1 repo·4.3k·223 commits this week

Compiler for JAX, TF, PyTorch targeting GPU, TPU, and CPU from a unified IR

4.3k
223

ModelTC

1 repo·4.1k·10 commits this week

Lightweight, high-throughput Python-based LLM inference and serving framework

4.1k
10

Predibase

1 repo·3.8k

Multi-LoRA inference server — serve thousands of fine-tuned adapters on a single GPU

3.8k

Liquid AI

5 repos·2.9k·2 commits this week

Examples, tutorials and apps for Liquid AI LFM + LEAP SDK

2.1k
2

Speech-to-Speech audio models by Liquid AI

535

Minimal fine-tuning repo for LFM2, fully open-source

175

Example apps for LeapSDK

68

Liquid AI documentation

25

Luminal AI

1 repo·2.9k·4 commits this week

Rust-based deep learning compiler with a small static graph IR for fast, portable inference (CUDA, Metal, CPU)

2.9k
4

Fluid Inference

3 repos·2.4k·24 commits this week

On-device audio inference framework

2.3k
19

Fluid Inference core runtime

72
5

Rust text processing library for inference

35

Try Mirai

2 repos·1.7k·18 commits this week

Mirai's on-device inference runtime

1.6k
16

Mirai's LLaMA-based on-device model

85
2

UbiquitousLearning

1 repo·1.5k

Multimodal LLM inference framework for mobile & edge

1.5k

ARM Software

1 repo·1.3k

ARM Neural Network SDK for ARM & Mali devices

1.3k

AMD ROCm

4 repos·1.1k·135 commits this week

AI Tensor Engine for ROCm — centralized repo for high-perf AI operators on AMD Instinct GPUs

461
72

AMD's graph inference engine for MI-series GPUs

308
21

ROCm fork of FlashAttention with Composable Kernel (CK) and Triton backends

232

AiTer Optimized Model — lightweight vLLM-like server built on AITER kernels for ROCm

112
42

NimbleEdge

2 repos·534

NimbleEdge's deliteAI on-device inference framework

532

NimbleEdge fork of ExecuTorch with edge optimizations

2

ThunderAgent

1 repo·347

A simple, fast and robust program-aware agentic inference system

347

Picovoice

1 repo·312

Picovoice's on-device LLM inference engine

312

Zetic AI

5 repos·66·3 commits this week

MLange sample applications

61
3

MLange extension library

4

iOS framework for MLange

1

MLange SDK documentation

0

iOS extension framework for MLange

0