What is the difference between Runtime and Hub?

Runtime is the inference engine. It manages memory orchestration across VRAM, RAM, and NVMe to run models on hardware they would not normally fit on. Hub is the fleet management layer. It handles model distribution, health monitoring, and orchestration across distributed nodes.

How does pricing work?

Pricing is scoped to your environment: hardware, fleet size, security requirements, and depth of forward-deployed engineering support. Every engagement receives a written proposal.

Frequently asked questions.

Technical and commercial questions about Runtime, Hub, and deployment.

Talk to the team

[ Platform ]

Sector88 builds inference infrastructure for AI deployment where cloud is not an option. The platform runs large language models on hardware-starved, sovereign, and air-gapped environments across space, defence, energy, and mining.

Runtime manages memory orchestration across VRAM, RAM, and NVMe so models run on hardware they would not normally fit on. Hub provides fleet management for distributed deployments. We forward-deploy engineers to install, tune, and harden the platform on site.

Runtime is the inference engine. It sits between your application and the GPU, orchestrating memory across VRAM, system RAM, and NVMe to fit models that exceed available VRAM. It exposes an OpenAI-compatible API.

Hub is the fleet management layer. It handles model distribution, version control, health monitoring, and orchestration across distributed nodes. Hub is how you manage tens or hundreds of Runtime instances from one place.

Runtime treats VRAM, system RAM, and NVMe as a unified memory hierarchy. Hot layers stay in VRAM for fast inference. Warm layers sit in RAM for rapid promotion. Cold layers page to NVMe. The scheduler moves data between tiers automatically based on access patterns, context length, and concurrency. The result is that models which would OOM on a baseline engine run successfully with predictable latency.

Any model supported by vLLM, SGLang, or llama.cpp. This includes Llama 3, Mistral, Qwen, DeepSeek, Phi, Gemma, Command R, and thousands of HuggingFace models. Both SafeTensors and GGUF quantized formats are supported.

You can also run your own fine-tuned or proprietary models. If the base architecture is supported by the underlying engine, Runtime will orchestrate it.

Runtime integrates with vLLM, SGLang, llama.cpp, and TGI. The engine is selected automatically based on hardware profile and model format, or you can pin a specific backend. All engines expose the same OpenAI-compatible API through Runtime.

[ Hardware & Compatibility ]

NVIDIA CUDA (RTX 30/40/50 series, A-series, L-series, H100, H200, Jetson), AMD ROCm (MI250X, MI300X), Intel (Gaudi 2, Gaudi 3, Xeon), Qualcomm Cloud AI, Apple Silicon, and CPU-only deployments. If it can run a supported inference engine, Sector88 runs on it.

Yes. Runtime supports tensor parallelism across multiple GPUs within a single node. For multi-node deployments, Hub coordinates model sharding and request routing across machines. This is how we run 70B+ parameter models in environments with limited per-node VRAM.

Linux is the primary target. Runtime ships as a container image (Docker, Podman, Kubernetes) and runs on any distribution with a compatible GPU driver. We test against Ubuntu, RHEL, Rocky Linux, and Alpine. macOS is supported for local development on Apple Silicon.

[ Deployment & Security ]

Yes. Sector88 is purpose-built for air-gapped and sovereign deployments. Zero external dependencies at runtime. Models load from local filesystem or internal registry. Enterprise deployments run fully offline with no outbound network calls.

Community Edition sends one anonymous heartbeat on first run and every 24 hours. The schema includes install UUID, version, OS, architecture, and hardware family. Never prompts, completions, model names, or user identifiers.

Set S88_TELEMETRY=off to disable completely. Pro and Enterprise default to telemetry off. Full schema is published on our security page.

A single-node proof of concept is running in minutes. Full production deployments with forward-deployed engineers, hardening, and integration typically take weeks, not quarters. Timeline depends on fleet size, security requirements, and network topology.

SOC 2 Type II, ISO 27001, and IRAP are actively in progress. We do not claim certifications we have not earned. Our current security posture, data handling practices, and vulnerability disclosure process are documented on the security page.

No. All inference happens locally on your hardware. Prompts, completions, and model weights never leave your environment. There is no cloud relay, no external API dependency, and no data exfiltration path. This is a core architectural guarantee.

[ Engagement & Pricing ]

Pricing is scoped to your environment: hardware mix, fleet size, security requirements, and how deeply our forward-deployed engineers embed with your team. Every engagement receives a written proposal after an initial scoping call. There are no self-serve tiers for production deployments.

Forward-deployed engineers are Sector88 staff who work on site or embedded with your team. They handle infrastructure assessment, installation, integration, tuning, and production hardening. They stay until the platform runs in production and your team can operate it independently.

This is how we work with defence, space, and energy customers where remote access is not possible. Learn more about our process.

Runtime and Hub deployed and configured on your hardware. Forward-deployed engineers who embed with your team. Infrastructure and hardware assessment. Model selection, quantization, and tuning for your workload. Integration with your existing systems. Production hardening, monitoring, and a custom SLA.

Yes. Community Edition is free for individual use and experimentation. It includes single-node Runtime with memory orchestration and an OpenAI-compatible API. Pro and Enterprise tiers add Hub, multi-node support, SSO, SLAs, and forward-deployed engineering.

Yes. Most engagements start with a scoped proof of concept on your hardware. We deploy Runtime on a representative node, benchmark against your workload, and deliver measured results. Working inference before the contract is signed.

Enterprise engagements include a custom SLA covering response time, uptime targets, and escalation paths. Because Sector88 runs on your infrastructure, uptime depends on your hardware and network. Our SLA covers platform availability, incident response, and engineering support, not the underlying compute.

Still have questions?

Our engineering team responds within 24 hours.

Talk to the team

Frequently asked questions.

What is Sector88?

What is the difference between Runtime and Hub?

How does memory orchestration work?

What models can I run?

What inference engines does Sector88 support?

What hardware does Sector88 support?

Do you support multi-GPU and multi-node?

What operating systems are supported?

Can Sector88 run in air-gapped environments?

Does the Runtime send telemetry?

How long does deployment take?

Does Sector88 hold any security certifications?

Does my data ever leave my network?

How does pricing work?

What is forward-deployed engineering?

What does a production engagement include?

Is there a free tier or community edition?

Can I run a proof of concept before committing?

What SLAs do you offer?

Still have questions?