Platform & Deployment

Linux. Windows. macOS. FreeBSD.
One command.

Native installers for Linux (AMD ROCm), Windows (NVIDIA CUDA), macOS (Apple Silicon MLX), and FreeBSD (NVIDIA CUDA). Docker, Helm, vLLM, Direct Python. Every product runs fully air-gapped — no cloud calls, no telemetry, no data leaves your machine. Fine-tuning pipeline and benchmark suite included.

Choose Your Product

Every product. Every platform.

Same security stack. Same API. All products air-gapped by design. Different hardware targets.

Product Hardware Model Install Method Air-Gapped
Kwyre Personal AMD GPU (RX 7900 XT+) / NVIDIA GPU (RTX 3060+) Qwen3.5-4B Uncensored NF4 + 0.8B draft (3.3 GB) — 0/465 refusals docker compose up or installer ✓ Always
Kwyre Professional AMD GPU (RX 7900 XTX / MI210) / NVIDIA GPU (RTX 4090 / A100 / H100) Qwen3.5-4B Uncensored + 6 LoRA hot-swap adapters + GRPO (3.3 GB) docker compose up or installer ✓ Always
Kwyre Air Any CPU (8GB+ RAM) GGUF (KWYRE_GGUF_PATH) python server/serve_cpu.py ✓ Always
vLLM Backend AMD / NVIDIA GPU (multi-GPU supported) Qwen3.5-4B Uncensored or custom — PagedAttention + continuous batching KWYRE_BACKEND=vllm python server/serve_vllm.py ✓ Always
Kwyre Apple Silicon Apple M1/M2/M3/M4 (macOS 12+) Any MLX model — Metal-optimized, unified memory python server/serve_mlx.py ✓ Always
FreeBSD NVIDIA GPU (FreeBSD 13+) Qwen3.5-4B Uncensored NF4 + GGUF — CUDA or CPU sudo ./install_freebsd.sh ✓ Always
Custom LLM Any (we configure) Custom-trained for your domain Turnkey delivery ✓ Always
UNCENSORED + HOT-SWAP

Qwen3.5-4B Uncensored — 0/465 refusals. 6 domain LoRA adapters hot-swap at runtime via API (~100 MB each). Your sensitive data never refuses to be analyzed.

PIPELINE + 4 BACKENDS

Claude → QLoRA → domain GRPO → LoRA export. 300 traces/domain. GPU: NF4/AWQ + Flash Attn 2 + speculative (AMD ROCm / NVIDIA CUDA). vLLM: PagedAttention. CPU: llama.cpp.

SPIKE QAT + RAG

Straight-Through Estimator spike encoding + k-curriculum annealing. Qwen3.5-0.8B draft for 2–3× speculative speed. RAG: FAISS, RAM-only, crypto-wipe.

Installation & Deployment

Every platform. One command.

Choose your platform. We handle the rest.

One-Click Installers

🐧

Linux

Ubuntu/Debian · .deb + AppImage · AMD ROCm

Installs to /opt/kwyre, configures systemd service, sets up iptables rules for process-level network lockdown, creates dedicated kwyre user. Auto-detects AMD GPUs via ROCm.

sudo bash installer/install_linux.sh sudo systemctl start kwyre
  • iptables rules (L2 kernel enforcement)
  • systemd service management
  • Dedicated kwyre system user
  • Auto-detect AMD GPU (ROCm)
🪟

Windows

Windows 10/11 · .exe + Portable ZIP · NVIDIA CUDA

PowerShell installer configures Windows Firewall rules for process-level network lockdown, registers Windows Service, auto-detects NVIDIA GPUs via CUDA. Requires NVIDIA CUDA 12.4+ drivers.

.\installer\install_windows.ps1 Start-Service kwyre
  • Windows Firewall rules (L2 enforcement)
  • Windows Service management
  • NVIDIA CUDA 12.4+ auto-detection
  • Docker Desktop + WSL2 supported
🍎

macOS (Apple Silicon)

macOS 12+ · Python 3.10+ · .pkg + Tarball

Native installer for Apple Silicon Macs. Configures PF firewall isolation, registers launchd service, and auto-detects M-series chips for MLX acceleration. Unified memory means no VRAM limits.

sudo ./install_macos.sh sudo launchctl load /Library/LaunchDaemons/com.kwyre.plist
  • MLX native Metal acceleration
  • PF firewall isolation (L2 enforcement)
  • launchd service management
  • .pkg installer + tarball
😈

FreeBSD

FreeBSD 13+ · Python 3.10+ · .txz + Tarball

FreeBSD installer configures PF firewall rules for process-level network lockdown, registers rc.d service, and auto-detects NVIDIA GPUs via CUDA. Native FreeBSD package for clean system integration.

sudo ./install_freebsd.sh sudo service kwyre start
  • PF firewall isolation (L2 enforcement)
  • rc.d service management
  • NVIDIA CUDA auto-detection
  • .txz package + tarball

Deployment Options

Docker Compose

Recommended for isolated deployments

Non-root container with dedicated kwyre user. Port mapping restricts to 127.0.0.1:8000 on host. Models auto-download on first run.

git clone https://github.com/blablablasealsaresoft/kwyre-ai cd kwyre-ai && cp .env.example .env docker compose up

Direct Python

For development and customization

Install inference-only dependencies. Place pre-quantized models in dist/. Full access to all source code for audit.

pip install -r requirements-inference.txt python server/serve_local_4bit.py

Kwyre Air (CPU)

No GPU required — any hardware

Uses llama.cpp via llama-cpp-python. Convert models to GGUF format or use pre-built GGUF from kwyre.com. Same API, same security stack.

KWYRE_GGUF_PATH=./models/kwyre-4b.gguf \ python server/serve_cpu.py

MLX (Apple Silicon)

Metal-optimized · M1/M2/M3/M4

Uses mlx-lm for native Metal inference on Apple Silicon Macs. Unified memory architecture means no VRAM limits — the model uses system RAM directly. Same API, same security stack.

python server/serve_mlx.py
  • Native Metal acceleration
  • M1/M2/M3/M4 unified memory
  • No CUDA dependency
  • Same 6-layer security stack

vLLM (High-Throughput)

PagedAttention · Continuous Batching · Multi-user

Production-grade serving with vLLM. PagedAttention dramatically increases throughput for multi-user deployments. Continuous batching handles concurrent requests without GPU memory waste.

KWYRE_BACKEND=vllm python server/serve_vllm.py
  • PagedAttention memory management
  • Continuous batching
  • Multi-user concurrent requests
  • Tensor parallelism (KWYRE_VLLM_TENSOR_PARALLEL)

Kubernetes / Helm

GPU scheduling · Health probes · PVC · Secrets

Full Helm chart for Kubernetes deployments. Includes GPU resource scheduling, liveness/readiness probes, persistent volume claims for model storage, and Kubernetes secrets for API keys.

helm install kwyre ./deploy/helm/kwyre
  • GPU resource requests/limits
  • Liveness + readiness probes
  • PVC for model storage
  • Kubernetes secrets

Build from Source

What gets compiled vs. what stays as data

Nuitka compiles Python to C and builds a standalone executable. Data files are bundled alongside.

Compiled (Protected)

  • serve_local_4bit.py — inference server, API, security
  • tools.py — external API tool router
  • verify_deps.py — L3 dependency integrity
  • license.py — Ed25519 license validation
  • spike_serve.py — SpikeServe encoding

Data (Not Compiled)

  • chat/*.html — frontend UI (served as-is)
  • docs/ — compliance documentation
  • .env.example — configuration template
  • Model weights (.safetensors) — loaded at runtime
pip install nuitka ordered-set zstandard python build.py all # Compile + package + installer python build.py compile # Nuitka → kwyre-server python build.py package # Stage data files python build.py installer # Linux installer python build.py clean # Clean artifacts
PlatformFormatInstaller Type
Linux.deb / .AppImageDebian package + AppImage
Windows.exe / .zipInstaller + Portable ZIP
macOS.pkg / .tar.gzPackage installer + Tarball
FreeBSD.txz / .tar.gzFreeBSD package + Tarball

Ready to deploy.

3.3 GB download. One command. Linux, Windows, macOS & FreeBSD native. Six layers of security.