Documentation.
Quickstart, supported hardware, and inference backends. Full deployment guides delivered with our engineering team.
[ Getting Started ]
01
Quickstart
Pull the container, serve a model, call the API. No account, no licence key. Running in under a minute.
02
CLI reference
Every command, every flag. Lives inside the binary so it always matches your version.
03
Deployment guides
Hardware audits, air-gapped installs, production hardening. Delivered with our forward-deployed engineering team.
Talk to the team[ Hardware ]
Supported hardware.
Sector88 runs on what you have. These are the hardware families we test against directly.
NVIDIA CUDA
RTX 30/40/50, A-series, L-series, H100, H200, Jetson
AMD ROCm
MI250X, MI300X accelerators
Intel
Gaudi 2, Gaudi 3, Xeon CPUs
Qualcomm
Cloud AI 100 accelerators
Apple Silicon
M-series via Metal
CPU-only
x86 and ARM, server and edge
Google TPU
v4 / v5, validated per deployment
Custom silicon
On request, forward-deployed
If your hardware is not listed, it probably still works. Sector88 falls back to a CPU path on anything without a native backend.
[ Backends ]
Inference backends.
Sector88 orchestrates the inference engines you already trust. Runtime picks the right backend for your hardware and keeps it in sync across a fleet.
llama.cpp
GGUF, CPU + GPU, edge
vLLM
PagedAttention, throughput
SGLang
Structured generation, high throughput
TensorRT-LLM
NVIDIA datacenter path
NVIDIA Dynamo
Large-cluster orchestration
Custom backends
Per-deployment via engineering
[ API ]
API shape.
OpenAI-compatible. Point any client at your Sector88 Runtime instead of api.openai.com.
# Chat completions
POST /v1/chat/completions
# Models
GET /v1/models
# Completions (legacy)
POST /v1/completions
# Embeddings
POST /v1/embeddings
# Health
GET /health
Need deployment guides?
Production hardening, SSO and RBAC, air-gapped installs. Delivered with the engineering team.