Kwyre AI — Platform & Deployment

Choose Your Product

Every product. Every platform.

Same security stack. Same API. All products air-gapped by design. Different hardware targets.

Product	Hardware	Model	Install Method	Air-Gapped
Kwyre Personal	AMD GPU (RX 7900 XT+) / NVIDIA GPU (RTX 3060+)	Qwen3.5-4B Uncensored NF4 + 0.8B draft (3.3 GB) — 0/465 refusals	`docker compose up` or installer	✓ Always
Kwyre Professional	AMD GPU (RX 7900 XTX / MI210) / NVIDIA GPU (RTX 4090 / A100 / H100)	Qwen3.5-4B Uncensored + 6 LoRA hot-swap adapters + GRPO (3.3 GB)	`docker compose up` or installer	✓ Always
Kwyre Air	Any CPU (8GB+ RAM)	GGUF (KWYRE_GGUF_PATH)	`python server/serve_cpu.py`	✓ Always
vLLM Backend	AMD / NVIDIA GPU (multi-GPU supported)	Qwen3.5-4B Uncensored or custom — PagedAttention + continuous batching	`KWYRE_BACKEND=vllm python server/serve_vllm.py`	✓ Always
Kwyre Apple Silicon	Apple M1/M2/M3/M4 (macOS 12+)	Any MLX model — Metal-optimized, unified memory	`python server/serve_mlx.py`	✓ Always
FreeBSD	NVIDIA GPU (FreeBSD 13+)	Qwen3.5-4B Uncensored NF4 + GGUF — CUDA or CPU	`sudo ./install_freebsd.sh`	✓ Always
Custom LLM	Any (we configure)	Custom-trained for your domain	Turnkey delivery	✓ Always

UNCENSORED + HOT-SWAP

Qwen3.5-4B Uncensored — 0/465 refusals. 6 domain LoRA adapters hot-swap at runtime via API (~100 MB each). Your sensitive data never refuses to be analyzed.

PIPELINE + 4 BACKENDS

Claude → QLoRA → domain GRPO → LoRA export. 300 traces/domain. GPU: NF4/AWQ + Flash Attn 2 + speculative (AMD ROCm / NVIDIA CUDA). vLLM: PagedAttention. CPU: llama.cpp.

SPIKE QAT + RAG

Straight-Through Estimator spike encoding + k-curriculum annealing. Qwen3.5-0.8B draft for 2–3× speculative speed. RAG: FAISS, RAM-only, crypto-wipe.

Installation & Deployment

Every platform. One command.

Choose your platform. We handle the rest.

One-Click Installers

🐧

Linux

Ubuntu/Debian · .deb + AppImage · AMD ROCm

Installs to /opt/kwyre, configures systemd service, sets up iptables rules for process-level network lockdown, creates dedicated kwyre user. Auto-detects AMD GPUs via ROCm.

sudo bash installer/install_linux.sh sudo systemctl start kwyre

iptables rules (L2 kernel enforcement)
systemd service management
Dedicated kwyre system user
Auto-detect AMD GPU (ROCm)

🪟

Windows

Windows 10/11 · .exe + Portable ZIP · NVIDIA CUDA

PowerShell installer configures Windows Firewall rules for process-level network lockdown, registers Windows Service, auto-detects NVIDIA GPUs via CUDA. Requires NVIDIA CUDA 12.4+ drivers.

.\installer\install_windows.ps1 Start-Service kwyre

Windows Firewall rules (L2 enforcement)
Windows Service management
NVIDIA CUDA 12.4+ auto-detection
Docker Desktop + WSL2 supported

🍎

macOS (Apple Silicon)

macOS 12+ · Python 3.10+ · .pkg + Tarball

Native installer for Apple Silicon Macs. Configures PF firewall isolation, registers launchd service, and auto-detects M-series chips for MLX acceleration. Unified memory means no VRAM limits.

sudo ./install_macos.sh sudo launchctl load /Library/LaunchDaemons/com.kwyre.plist

MLX native Metal acceleration
PF firewall isolation (L2 enforcement)
launchd service management
.pkg installer + tarball

😈

FreeBSD

FreeBSD 13+ · Python 3.10+ · .txz + Tarball

FreeBSD installer configures PF firewall rules for process-level network lockdown, registers rc.d service, and auto-detects NVIDIA GPUs via CUDA. Native FreeBSD package for clean system integration.

sudo ./install_freebsd.sh sudo service kwyre start

PF firewall isolation (L2 enforcement)
rc.d service management
NVIDIA CUDA auto-detection
.txz package + tarball

Deployment Options

Docker Compose

Recommended for isolated deployments

Non-root container with dedicated kwyre user. Port mapping restricts to 127.0.0.1:8000 on host. Models auto-download on first run.

git clone https://github.com/blablablasealsaresoft/kwyre-ai cd kwyre-ai && cp .env.example .env docker compose up

Direct Python

For development and customization

Install inference-only dependencies. Place pre-quantized models in dist/. Full access to all source code for audit.

pip install -r requirements-inference.txt python server/serve_local_4bit.py

Kwyre Air (CPU)

No GPU required — any hardware

Uses llama.cpp via llama-cpp-python. Convert models to GGUF format or use pre-built GGUF from kwyre.com. Same API, same security stack.

KWYRE_GGUF_PATH=./models/kwyre-4b.gguf \ python server/serve_cpu.py

MLX (Apple Silicon)

Metal-optimized · M1/M2/M3/M4

Uses mlx-lm for native Metal inference on Apple Silicon Macs. Unified memory architecture means no VRAM limits — the model uses system RAM directly. Same API, same security stack.

python server/serve_mlx.py

Native Metal acceleration
M1/M2/M3/M4 unified memory
No CUDA dependency
Same 6-layer security stack

vLLM (High-Throughput)

PagedAttention · Continuous Batching · Multi-user

Production-grade serving with vLLM. PagedAttention dramatically increases throughput for multi-user deployments. Continuous batching handles concurrent requests without GPU memory waste.

KWYRE_BACKEND=vllm python server/serve_vllm.py

PagedAttention memory management
Continuous batching
Multi-user concurrent requests
Tensor parallelism (KWYRE_VLLM_TENSOR_PARALLEL)

Kubernetes / Helm

GPU scheduling · Health probes · PVC · Secrets

Full Helm chart for Kubernetes deployments. Includes GPU resource scheduling, liveness/readiness probes, persistent volume claims for model storage, and Kubernetes secrets for API keys.

helm install kwyre ./deploy/helm/kwyre

GPU resource requests/limits
Liveness + readiness probes
PVC for model storage
Kubernetes secrets

Build from Source

What gets compiled vs. what stays as data

Nuitka compiles Python to C and builds a standalone executable. Data files are bundled alongside.

Compiled (Protected)

serve_local_4bit.py — inference server, API, security
tools.py — external API tool router
verify_deps.py — L3 dependency integrity
license.py — Ed25519 license validation
spike_serve.py — SpikeServe encoding

Data (Not Compiled)

chat/*.html — frontend UI (served as-is)
docs/ — compliance documentation
.env.example — configuration template
Model weights (.safetensors) — loaded at runtime

pip install nuitka ordered-set zstandard python build.py all # Compile + package + installer python build.py compile # Nuitka → kwyre-server python build.py package # Stage data files python build.py installer # Linux installer python build.py clean # Clean artifacts

Platform	Format	Installer Type
Linux	.deb / .AppImage	Debian package + AppImage
Windows	.exe / .zip	Installer + Portable ZIP
macOS	.pkg / .tar.gz	Package installer + Tarball
FreeBSD	.txz / .tar.gz	FreeBSD package + Tarball

Linux. Windows. macOS. FreeBSD.
One command.

Every product. Every platform.

Every platform. One command.

One-Click Installers

Linux

Windows

macOS (Apple Silicon)

FreeBSD

Deployment Options

Docker Compose

Direct Python

Kwyre Air (CPU)

MLX (Apple Silicon)

vLLM (High-Throughput)

Kubernetes / Helm

Build from Source

What gets compiled vs. what stays as data

Compiled (Protected)

Data (Not Compiled)

Ready to deploy.

Linux. Windows. macOS. FreeBSD.One command.

Every product. Every platform.

Every platform. One command.

One-Click Installers

Linux

Windows

macOS (Apple Silicon)

FreeBSD

Deployment Options

Docker Compose

Direct Python

Kwyre Air (CPU)

MLX (Apple Silicon)

vLLM (High-Throughput)

Kubernetes / Helm

Build from Source

What gets compiled vs. what stays as data

Compiled (Protected)

Data (Not Compiled)

Ready to deploy.

Linux. Windows. macOS. FreeBSD.
One command.