Tracked Repositories

131 open-source AI inference repositories across 61 organizations.

131 repositories across 61 organizations

HuggingFace

4 repos·224.2k·100 commits this week

HuggingFace Transformers — state-of-the-art NLP/ML model library (~140K stars)

159.8k
69

HuggingFace Diffusers — diffusion model inference & training (Stable Diffusion, Flux, etc.)

33.4k
21

Minimalist Rust ML framework for inference — targets browser WASM and GPU, zero Python dependency

20.1k
10

HuggingFace TGI — LLM serving (archived March 2026, read-only)

10.8k

TensorFlow

3 repos·204.1k·328 commits this week

Industry-standard deep learning framework with XLA compilation backend

194.8k
327

TensorFlow Serving — high-performance gRPC/REST serving for TF models (multi-version, canary, batching)

6.3k
1

TensorFlow Lite for microcontrollers and embedded devices

2.9k

Ollama

1 repo·169.8k·23 commits this week

User-friendly local LLM runner built on llama.cpp (~167K stars)

169.8k
23

ggml-org

2 repos·154.9k·86 commits this week

High-performance LLM inference in C/C++ (CPU + GPU)

106.0k
84

High-performance Whisper speech recognition in C/C++

49.0k
2

Open WebUI

1 repo·133.6k·112 commits this week

Self-hosted ChatGPT alternative with built-in RAG, offline-capable (~104K stars)

133.6k
112

Meta / PyTorch

3 repos·108.3k·448 commits this week

Primary ML framework; torch.compile + AOTInductor for production inference optimization

99.4k
362

PyTorch's portable execution framework for on-device inference

4.5k
86

TorchServe — production PyTorch model serving (archived August 2025)

4.4k

DeepSeek AI

1 repo·102.7k

Reference inference code for DeepSeek-V3 (671B MoE); includes FP8 training framework

102.7k

vLLM Project

2 repos·77.9k·41 commits this week

Most widely adopted open-source LLM serving engine; PagedAttention, continuous batching

77.9k
26

vLLM community plugin for Intel Gaudi accelerators

38
15

Nomic AI

1 repo·77.4k

Desktop AI app + SDK for running LLMs locally (~73K stars)

77.4k

Google AI Edge

13 repos·74.5k·205 commits this week

Cross-platform ML pipeline framework (vision, audio, NLP)

34.9k
5

AI Edge model gallery

21.9k
14

LiteRT for language model inference

4.2k
31

Official Gemma model cookbook — recipes, fine-tuning, deployment guides

3.4k
48

Sample apps using MediaPipe

2.7k

Highly optimized neural network operators library (ARM, x86, WASM)

2.3k
49

Google's Lite Runtime (successor to TensorFlow Lite)

2.3k
48

Model visualization and exploration tool

1.4k
3

LiteRT integration with PyTorch

1.0k

Sample code for LiteRT

283
3

Quantization tooling for AI Edge models

126
4

Sample models for AI Edge

22

AI Edge APIs — upstream repo deleted (404), local copy retained

Apple / ML-Explore

10 repos·56.4k·32 commits this week

Array framework for ML on Apple silicon (Python)

25.7k
16

Example models and applications using MLX

8.5k

Reverse-engineered Apple Neural Engine (ANE) — hardware ops, memory layout, firmware interactions

6.6k

Tools for converting & running models with Core ML

5.2k
1

LLM inference and fine-tuning with MLX

4.9k
12

Example apps using MLX Swift

2.5k
1

Swift bindings for MLX

1.8k

Efficient data loading for MLX

470

LLM inference in Swift via MLX

434
2

C bindings for MLX

196

Oobabooga

1 repo·46.8k·24 commits this week

Gradio web UI for LLMs — multi-backend (llama.cpp, ExLlamaV2, transformers) (~43K stars)

46.8k
24

Mudler (LocalAI)

1 repo·45.7k·80 commits this week

Free, open-source OpenAI drop-in replacement — runs locally, no GPU required (~36K stars)

45.7k
80

BerriAI

1 repo·44.4k·400 commits this week

Unified OpenAI-compatible proxy for 100+ LLM providers (vLLM, Ollama, Bedrock, Azure, etc.)

44.4k
400

Exo Explore

1 repo·44.0k·20 commits this week

Run LLMs distributed across heterogeneous devices (Mac, iPhone, etc.)

44.0k
20

Ray Project

1 repo·42.3k·102 commits this week

Distributed AI compute engine; Ray Serve handles online and async batch inference (~39K stars)

42.3k
102

DeepSpeed AI

1 repo·42.2k·3 commits this week

Microsoft DeepSpeed — distributed training and inference (ZeRO, MII, FastGen)

42.2k
3

Microsoft / ONNX

2 repos·40.8k·49 commits this week

Open Neural Network Exchange format specification

20.7k
8

Microsoft's cross-platform, high-performance ONNX inference engine

20.1k
41

LM-Sys

1 repo·39.5k

LLM serving framework and home of Chatbot Arena (~37K stars)

39.5k

JAX (Google DeepMind)

1 repo·35.5k·206 commits this week

Composable NumPy transformations (JIT, grad, vmap) compiled via XLA to GPUs and TPUs — primary DeepMind research/production runtime

35.5k
206

Miscellaneous

4 repos·30.6k·114 commits this week

MLC's universal LLM deployment engine (multi-backend)

22.5k
3

Tile-based ML language and compiler

5.6k
26

Community on-device LLM project

1.6k

vLLM-style inference on Apple silicon via MLX

929
85

Tencent

2 repos·27.8k·5 commits this week

High-performance neural network inference for mobile (Android/iOS)

23.1k
5

Tencent Neural Network — mobile and edge inference

4.6k

NVIDIA

3 repos·26.8k·120 commits this week

NVIDIA's optimized LLM inference library (GPU)

13.5k
118

NVIDIA's high-performance deep learning inference SDK (GPU)

12.9k

C++ LLM/VLM inference runtime for Jetson and NVIDIA edge devices

362
2

SGLang

1 repo·26.3k·216 commits this week

High-throughput LLM/VLM serving with RadixAttention and structured generation

26.3k
216

Mozilla AI

1 repo·24.2k·3 commits this week

Single-file LLM executables via Cosmopolitan Libc — zero install, all platforms (~21K stars)

24.2k
3

Triton Language (OpenAI)

1 repo·19.0k·46 commits this week

Python-like GPU kernel language used by vLLM FlashAttention and PyTorch inductor

19.0k
46

MLC AI

1 repo·17.8k

High-performance LLM inference in web browsers via WebGPU

17.8k

KVCache AI

1 repo·17.0k·4 commits this week

CPU-GPU hybrid inference; runs DeepSeek 671B on 14GB VRAM + 382GB DRAM with massive speedup over llama.cpp

17.0k
4

Alibaba

1 repo·15.0k·7 commits this week

Alibaba's neural network inference framework for mobile & edge

15.0k
7

Apache

2 repos·13.7k·24 commits this week

Apache TVM ML compiler — auto-tunes models for any hardware target

13.3k
15

Apache TVM Foreign Function Interface for deep learning compilation

378
9

Blaizzy (Community MLX)

5 repos·12.4k·61 commits this week

Audio models (TTS, ASR) with MLX

6.8k
40

Vision-language models on Apple silicon via MLX

4.5k
15

Swift audio inference using MLX

589
5

Text embedding models with MLX

355
1

Video model inference with MLX

198

RunAnywhere

2 repos·11.8k·33 commits this week

RunAnywhere SDKs for on-device inference deployment

10.3k
33

RunAnywhere CLI tool

1.5k

OpenVINO Toolkit / Intel

3 repos·11.8k·84 commits this week

Intel's toolkit for optimizing & deploying deep learning on Intel hardware

10.1k
65

Neural Network Compression Framework — quantization, pruning, sparsity for OpenVINO

1.2k
7

OpenVINO GenAI — generative AI layer with speculative decoding & KV-cache opt

494
12

K2 / Next-gen ASR

1 repo·11.8k·12 commits this week

ONNX-based runtime for ASR, TTS, VAD, and keyword spotting

11.8k
12

Intel

2 repos·11.4k·4 commits this week

Intel IPEX-LLM — local LLM acceleration on Intel hardware (archived Jan 2026, read-only)

8.8k

SOTA low-bit LLM quantization (INT8/FP8/MXFP8/INT4/MXFP4/NVFP4) & sparsity

2.6k
4

jundot

1 repo·11.2k·48 commits this week

LLM inference server with continuous batching & SSD caching for Apple Silicon — managed from the macOS menu bar

11.2k
48

Mistral AI

1 repo·10.8k·1 commits this week

Official minimal inference library for all Mistral models (7B, Mixtral, Pixtral)

10.8k
1

Triton Inference Server

1 repo·10.6k·7 commits this week

NVIDIA Triton — production multi-model inference server (HTTP/gRPC, multi-backend)

10.6k
7

Dusty-NV (NVIDIA Jetson)

1 repo·8.8k

DNN inference library & tutorials for NVIDIA Jetson

8.8k

BentoML

1 repo·8.6k·1 commits this week

Unified serving framework: real-time APIs, task queues, batching, multi-model chains

8.6k
1

Nexa AI

1 repo·8.0k

Unified SDK for running LLMs and multimodal models locally

8.0k

InternLM / Shanghai AI Lab

1 repo·7.8k·7 commits this week

High-throughput LLM serving with TurboMind engine (C++/CUDA)

7.8k
7

PaddlePaddle (Baidu)

1 repo·7.2k

Lightweight inference engine for mobile & embedded from PaddlePaddle

7.2k

AI Dynamo (NVIDIA)

1 repo·6.6k·126 commits this week

Datacenter-scale distributed inference serving framework (Rust + Python, disaggregated prefill/decode, engine-agnostic)

6.6k
126

ArgMax

4 repos·6.4k

On-device Whisper inference for Apple platforms (Swift)

6.0k

Python tooling for WhisperKit model optimization

241

On-device AI benchmarking framework

84

Swift playground for ArgMax SDK

19

Osaurus

1 repo·5.1k·37 commits this week

Native macOS AI agent harness in Swift — any model, persistent memory, autonomous execution, MCP server, MLX + Apple Neural Engine, fully offline

5.1k
37

Cactus Compute

5 repos·5.0k·19 commits this week

Cactus core edge inference framework

4.7k
17

React Native bindings for Cactus

156
2

Flutter bindings for Cactus

70

Kotlin/Android bindings for Cactus

68

Demo chat app using Cactus

28

TurboDeRP (ExLlamaV2)

1 repo·4.5k

High-performance EXL2-quantized inference for consumer NVIDIA GPUs

4.5k

OpenNMT

1 repo·4.4k

Fast C++ inference for Transformer models; INT8/INT16 CPU quantization, multi-platform

4.4k

OpenXLA

1 repo·4.2k·275 commits this week

Compiler for JAX, TF, PyTorch targeting GPU, TPU, and CPU from a unified IR

4.2k
275

Luminal AI

1 repo·2.8k·35 commits this week

Rust-based deep learning compiler with a small static graph IR for fast, portable inference (CUDA, Metal, CPU)

2.8k
35

Liquid AI

5 repos·2.5k·6 commits this week

Examples, tutorials and apps for Liquid AI LFM + LEAP SDK

1.8k
4

Speech-to-Speech audio models by Liquid AI

434

Minimal fine-tuning repo for LFM2, fully open-source

145
1

Example apps for LeapSDK

61

Liquid AI documentation

24
1

Fluid Inference

3 repos·2.0k·8 commits this week

On-device audio inference framework

1.9k
8

Fluid Inference core runtime

63

Rust text processing library for inference

28

Try Mirai

3 repos·1.7k·17 commits this week

Mirai's on-device inference runtime

1.6k
9

Mirai's LLaMA-based on-device model

76
8

Swift SDK for Uzu

54

UbiquitousLearning

1 repo·1.5k

Multimodal LLM inference framework for mobile & edge

1.5k

Qualcomm

2 repos·1.4k·62 commits this week

State-of-the-art ML models optimized for Qualcomm Snapdragon NPU/DSP/QNN deployment

1.0k
50

Sample apps and tutorials for deploying models on Qualcomm hardware (TFLite, ONNX, QNN)

399
12

ARM Software

1 repo·1.3k

ARM Neural Network SDK for ARM & Mali devices

1.3k

AMD ROCm

4 repos·1.0k·97 commits this week

AI Tensor Engine for ROCm — centralized repo for high-perf AI operators on AMD Instinct GPUs

413
55

AMD's graph inference engine for MI-series GPUs

293
7

ROCm fork of FlashAttention with Composable Kernel (CK) and Triton backends

230

AiTer Optimized Model — lightweight vLLM-like server built on AITER kernels for ROCm

69
35

NimbleEdge

2 repos·536

NimbleEdge's deliteAI on-device inference framework

534

NimbleEdge fork of ExecuTorch with edge optimizations

2

Picovoice

1 repo·311·3 commits this week

Picovoice's on-device LLM inference engine

311
3

Zetic AI

5 repos·58

MLange sample applications

54

MLange extension library

4

MLange SDK documentation

0

iOS extension framework for MLange

0

iOS framework for MLange

0