Tracked Repositories

131 open-source AI inference repositories across 61 organizations.

131 repositories across 61 organizations

HuggingFace

4 repos·224.2k ★·100 commits this week

huggingface/transformers

HuggingFace Transformers — state-of-the-art NLP/ML model library (~140K stars)

Python

159.8k

Wiki

huggingface/diffusers

HuggingFace Diffusers — diffusion model inference & training (Stable Diffusion, Flux, etc.)

Python

33.4k

Wiki

huggingface/candle

Minimalist Rust ML framework for inference — targets browser WASM and GPU, zero Python dependency

Rust

20.1k

Wiki

huggingface/text-generation-inferencearchived

HuggingFace TGI — LLM serving (archived March 2026, read-only)

Python

10.8k

Wiki

TensorFlow

3 repos·204.1k ★·328 commits this week

tensorflow/tensorflow

Industry-standard deep learning framework with XLA compilation backend

C++

194.8k

327

Wiki

tensorflow/serving

TensorFlow Serving — high-performance gRPC/REST serving for TF models (multi-version, canary, batching)

C++

6.3k

Wiki

tensorflow/tflite-micro

TensorFlow Lite for microcontrollers and embedded devices

C++

2.9k

Wiki

Ollama

1 repo·169.8k ★·23 commits this week

ollama/ollama

User-friendly local LLM runner built on llama.cpp (~167K stars)

169.8k

Wiki

ggml-org

2 repos·154.9k ★·86 commits this week

ggml-org/llama.cpp

High-performance LLM inference in C/C++ (CPU + GPU)

C++

106.0k

Wiki

ggml-org/whisper.cpp

High-performance Whisper speech recognition in C/C++

C++

49.0k

Wiki

Open WebUI

1 repo·133.6k ★·112 commits this week

open-webui/open-webui

Self-hosted ChatGPT alternative with built-in RAG, offline-capable (~104K stars)

Python

133.6k

112

Wiki

Meta / PyTorch

3 repos·108.3k ★·448 commits this week

pytorch/pytorch

Primary ML framework; torch.compile + AOTInductor for production inference optimization

Python

99.4k

362

Wiki

pytorch/executorch

PyTorch's portable execution framework for on-device inference

Python

4.5k

Wiki

pytorch/servearchived

TorchServe — production PyTorch model serving (archived August 2025)

Java

4.4k

Wiki

DeepSeek AI

1 repo·102.7k ★

deepseek-ai/DeepSeek-V3

Reference inference code for DeepSeek-V3 (671B MoE); includes FP8 training framework

Python

102.7k

Wiki

vLLM Project

2 repos·77.9k ★·41 commits this week

vllm-project/vllm

Most widely adopted open-source LLM serving engine; PagedAttention, continuous batching

Python

77.9k

Wiki

vllm-project/vllm-gaudi

vLLM community plugin for Intel Gaudi accelerators

Python

Wiki

Nomic AI

1 repo·77.4k ★

nomic-ai/gpt4all

Desktop AI app + SDK for running LLMs locally (~73K stars)

C++

77.4k

Wiki

Google AI Edge

13 repos·74.5k ★·205 commits this week

google-ai-edge/mediapipe

Cross-platform ML pipeline framework (vision, audio, NLP)

C++

34.9k

Wiki

google-ai-edge/gallery

AI Edge model gallery

Kotlin

21.9k

Wiki

google-ai-edge/LiteRT-LM

LiteRT for language model inference

C++

4.2k

Wiki

google-gemma/cookbook

Official Gemma model cookbook — recipes, fine-tuning, deployment guides

Jupyter Notebook

3.4k

Wiki

google-ai-edge/mediapipe-samples

Sample apps using MediaPipe

Jupyter Notebook

2.7k

Wiki

google/XNNPACK

Highly optimized neural network operators library (ARM, x86, WASM)

2.3k

Wiki

google-ai-edge/LiteRT

Google's Lite Runtime (successor to TensorFlow Lite)

C++

2.3k

Wiki

google-ai-edge/model-explorer

Model visualization and exploration tool

JavaScript

1.4k

Wiki

google-ai-edge/litert-torch

LiteRT integration with PyTorch

Jupyter Notebook

1.0k

Wiki

google-ai-edge/litert-samples

Sample code for LiteRT

Kotlin

283

Wiki

google-ai-edge/ai-edge-quantizer

Quantization tooling for AI Edge models

Python

126

Wiki

google-ai-edge/models-samples

Sample models for AI Edge

Jupyter Notebook

Wiki

google-ai-edge/ai-edge-apis

AI Edge APIs — upstream repo deleted (404), local copy retained

Wiki

Apple / ML-Explore

10 repos·56.4k ★·32 commits this week

ml-explore/mlx

Array framework for ML on Apple silicon (Python)

C++

25.7k

Wiki

ml-explore/mlx-examples

Example models and applications using MLX

Python

8.5k

Wiki

maderix/ANE

Reverse-engineered Apple Neural Engine (ANE) — hardware ops, memory layout, firmware interactions

Objective-C

6.6k

Wiki

apple/coremltools

Tools for converting & running models with Core ML

Python

5.2k

Wiki

ml-explore/mlx-lm

LLM inference and fine-tuning with MLX

Python

4.9k

Wiki

ml-explore/mlx-swift-examples

Example apps using MLX Swift

Swift

2.5k

Wiki

ml-explore/mlx-swift

Swift bindings for MLX

C++

1.8k

Wiki

ml-explore/mlx-data

Efficient data loading for MLX

C++

470

Wiki

ml-explore/mlx-swift-lm

LLM inference in Swift via MLX

Swift

434

Wiki

ml-explore/mlx-c

C bindings for MLX

C++

196

Wiki

Oobabooga

1 repo·46.8k ★·24 commits this week

oobabooga/text-generation-webui

Gradio web UI for LLMs — multi-backend (llama.cpp, ExLlamaV2, transformers) (~43K stars)

Python

46.8k

Wiki

Mudler (LocalAI)

1 repo·45.7k ★·80 commits this week

mudler/LocalAI

Free, open-source OpenAI drop-in replacement — runs locally, no GPU required (~36K stars)

45.7k

Wiki

BerriAI

1 repo·44.4k ★·400 commits this week

BerriAI/litellm

Unified OpenAI-compatible proxy for 100+ LLM providers (vLLM, Ollama, Bedrock, Azure, etc.)

Python

44.4k

400

Wiki

Exo Explore

1 repo·44.0k ★·20 commits this week

exo-explore/exo

Run LLMs distributed across heterogeneous devices (Mac, iPhone, etc.)

Python

44.0k

Wiki

Ray Project

1 repo·42.3k ★·102 commits this week

ray-project/ray

Distributed AI compute engine; Ray Serve handles online and async batch inference (~39K stars)

Python

42.3k

102

Wiki

DeepSpeed AI

1 repo·42.2k ★·3 commits this week

deepspeedai/DeepSpeed

Microsoft DeepSpeed — distributed training and inference (ZeRO, MII, FastGen)

Python

42.2k

Wiki

Microsoft / ONNX

2 repos·40.8k ★·49 commits this week

onnx/onnx

Open Neural Network Exchange format specification

Python

20.7k

Wiki

microsoft/onnxruntime

Microsoft's cross-platform, high-performance ONNX inference engine

C++

20.1k

Wiki

LM-Sys

1 repo·39.5k ★

lm-sys/FastChat

LLM serving framework and home of Chatbot Arena (~37K stars)

Python

39.5k

Wiki

JAX (Google DeepMind)

1 repo·35.5k ★·206 commits this week

jax-ml/jax

Composable NumPy transformations (JIT, grad, vmap) compiled via XLA to GPUs and TPUs — primary DeepMind research/production runtime

Python

35.5k

206

Wiki

Miscellaneous

4 repos·30.6k ★·114 commits this week

mlc-ai/mlc-llm

MLC's universal LLM deployment engine (multi-backend)

Python

22.5k

Wiki

tile-ai/tilelang

Tile-based ML language and compiler

Python

5.6k

Wiki

Anemll/Anemll

Community on-device LLM project

Python

1.6k

Wiki

waybarrios/vllm-mlx

vLLM-style inference on Apple silicon via MLX

Python

929

Wiki

Tencent

2 repos·27.8k ★·5 commits this week

Tencent/ncnn

High-performance neural network inference for mobile (Android/iOS)

C++

23.1k

Wiki

Tencent/TNN

Tencent Neural Network — mobile and edge inference

C++

4.6k

Wiki

NVIDIA

3 repos·26.8k ★·120 commits this week

NVIDIA/TensorRT-LLM

NVIDIA's optimized LLM inference library (GPU)

Python

13.5k

118

Wiki

NVIDIA/TensorRT

NVIDIA's high-performance deep learning inference SDK (GPU)

C++

12.9k

Wiki

NVIDIA/TensorRT-Edge-LLM

C++ LLM/VLM inference runtime for Jetson and NVIDIA edge devices

C++

362

Wiki

SGLang

1 repo·26.3k ★·216 commits this week

2 repos·13.7k ★·24 commits this week

apache/tvm

Apache TVM ML compiler — auto-tunes models for any hardware target

Python

13.3k

Wiki

apache/tvm-ffi

Apache TVM Foreign Function Interface for deep learning compilation

Python

378

Wiki

Blaizzy (Community MLX)

5 repos·12.4k ★·61 commits this week

Blaizzy/mlx-audio

Audio models (TTS, ASR) with MLX

Python

6.8k

Wiki

Blaizzy/mlx-vlm

Vision-language models on Apple silicon via MLX

Python

4.5k

Wiki

Blaizzy/mlx-audio-swift

Swift audio inference using MLX

Swift

589

Wiki

Blaizzy/mlx-embeddings

Text embedding models with MLX

Python

355

Wiki

Blaizzy/mlx-video

Video model inference with MLX

Python

198

Wiki

RunAnywhere

2 repos·11.8k ★·33 commits this week

RunanywhereAI/runanywhere-sdks

RunAnywhere SDKs for on-device inference deployment

C++

10.3k

Wiki

RunanywhereAI/RCLI

RunAnywhere CLI tool

C++

1.5k

Wiki

OpenVINO Toolkit / Intel

3 repos·11.8k ★·84 commits this week

openvinotoolkit/openvino

Intel's toolkit for optimizing & deploying deep learning on Intel hardware

C++

10.1k

Wiki

openvinotoolkit/nncf

Neural Network Compression Framework — quantization, pruning, sparsity for OpenVINO

Python

1.2k

Wiki

openvinotoolkit/openvino.genai

OpenVINO GenAI — generative AI layer with speculative decoding & KV-cache opt

C++

494

Wiki

K2 / Next-gen ASR

1 repo·11.8k ★·12 commits this week

k2-fsa/sherpa-onnx

ONNX-based runtime for ASR, TTS, VAD, and keyword spotting

C++

11.8k

Wiki

Intel

2 repos·11.4k ★·4 commits this week

intel/ipex-llmarchived

Intel IPEX-LLM — local LLM acceleration on Intel hardware (archived Jan 2026, read-only)

Python

8.8k

ArgMax

4 repos·6.4k ★

argmaxinc/WhisperKit

On-device Whisper inference for Apple platforms (Swift)

Swift

6.0k

Wiki

argmaxinc/whisperkittools

Python tooling for WhisperKit model optimization

Python

241

Wiki

argmaxinc/OpenBench

On-device AI benchmarking framework

Jupyter Notebook

Wiki

argmaxinc/argmax-sdk-swift-playground

Swift playground for ArgMax SDK

Swift

Wiki

Osaurus

1 repo·5.1k ★·37 commits this week

osaurus-ai/osaurus

Native macOS AI agent harness in Swift — any model, persistent memory, autonomous execution, MCP server, MLX + Apple Neural Engine, fully offline

Swift

5.1k

Wiki

Cactus Compute

5 repos·5.0k ★·19 commits this week

cactus-compute/cactus

Cactus core edge inference framework

4.7k

Wiki

cactus-compute/cactus-react-native

React Native bindings for Cactus

C++

156

Wiki

cactus-compute/cactus-flutter

Flutter bindings for Cactus

C++

Wiki

cactus-compute/cactus-kotlin

Kotlin/Android bindings for Cactus

Kotlin

Wiki

cactus-compute/demo-cactus-chat

Demo chat app using Cactus

TypeScript

Wiki

TurboDeRP (ExLlamaV2)

1 repo·4.5k ★

turboderp-org/exllamav2

High-performance EXL2-quantized inference for consumer NVIDIA GPUs

Python

4.5k

Wiki

OpenNMT

1 repo·4.4k ★

OpenNMT/CTranslate2

Fast C++ inference for Transformer models; INT8/INT16 CPU quantization, multi-platform

C++

4.4k

Wiki

OpenXLA

1 repo·4.2k ★·275 commits this week

openxla/xla

Compiler for JAX, TF, PyTorch targeting GPU, TPU, and CPU from a unified IR

C++

4.2k

275

Wiki

Luminal AI

1 repo·2.8k ★·35 commits this week

luminal-ai/luminal

Rust-based deep learning compiler with a small static graph IR for fast, portable inference (CUDA, Metal, CPU)

Rust

2.8k

Wiki

Liquid AI

5 repos·2.5k ★·6 commits this week

Liquid4All/cookbook

Examples, tutorials and apps for Liquid AI LFM + LEAP SDK

Jupyter Notebook

1.8k

Wiki

Liquid4All/liquid-audio

Speech-to-Speech audio models by Liquid AI

Python

434

Wiki

Liquid4All/leap-finetune

Minimal fine-tuning repo for LFM2, fully open-source

Python

145

Wiki

Liquid4All/LeapSDK-Examples

Example apps for LeapSDK

Kotlin

Wiki

Liquid4All/docs

Liquid AI documentation

Jupyter Notebook

Wiki

Fluid Inference

3 repos·2.0k ★·8 commits this week

FluidInference/FluidAudio

On-device audio inference framework

Swift

1.9k

Wiki

FluidInference/mobius

Fluid Inference core runtime

Python

Wiki

FluidInference/text-processing-rs

Rust text processing library for inference

Rust

Wiki

Try Mirai

3 repos·1.7k ★·17 commits this week

trymirai/uzu

Mirai's on-device inference runtime

Rust

1.6k

Wiki

trymirai/lalamo

Mirai's LLaMA-based on-device model

Python

Wiki

trymirai/uzu-swift

Swift SDK for Uzu

Swift

Wiki

UbiquitousLearning

1 repo·1.5k ★

UbiquitousLearning/mllm

Multimodal LLM inference framework for mobile & edge

C++

1.5k

Wiki

Qualcomm

2 repos·1.4k ★·62 commits this week

qualcomm/ai-hub-models

State-of-the-art ML models optimized for Qualcomm Snapdragon NPU/DSP/QNN deployment

Python

1.0k

Wiki

qualcomm/ai-hub-apps

Sample apps and tutorials for deploying models on Qualcomm hardware (TFLite, ONNX, QNN)

Python

399

Wiki

ARM Software

1 repo·1.3k ★

ARM-software/armnn

ARM Neural Network SDK for ARM & Mali devices

C++

1.3k

Wiki

AMD ROCm

4 repos·1.0k ★·97 commits this week

ROCm/aiter

AI Tensor Engine for ROCm — centralized repo for high-perf AI operators on AMD Instinct GPUs

Python

413

Wiki

ROCm/AMDMIGraphX

AMD's graph inference engine for MI-series GPUs

C++

293

Wiki

ROCm/flash-attention

ROCm fork of FlashAttention with Composable Kernel (CK) and Triton backends

Python

230

Wiki

ROCm/ATOM

AiTer Optimized Model — lightweight vLLM-like server built on AITER kernels for ROCm

Python

Wiki

NimbleEdge

2 repos·536 ★

NimbleEdge/deliteAI

NimbleEdge's deliteAI on-device inference framework

C++

534

Wiki

NimbleEdge/executorch

NimbleEdge fork of ExecuTorch with edge optimizations

Python

Wiki

Picovoice

1 repo·311 ★·3 commits this week

Picovoice/picollm

Picovoice's on-device LLM inference engine

Python

311

Wiki

Zetic AI

5 repos·58 ★

zetic-ai/ZETIC_Melange_apps

MLange sample applications

Swift

Wiki

zetic-ai/zetic_mlange_ext

MLange extension library

C++

Wiki

zetic-ai/ZETIC_MLange_document

MLange SDK documentation

Python

Wiki

zetic-ai/ZeticMLangeExtiOS

iOS extension framework for MLange

Swift

Wiki

zetic-ai/ZeticMLangeiOS

iOS framework for MLange

Swift

Wiki