Native installers for Linux (AMD ROCm), Windows (NVIDIA CUDA), macOS (Apple Silicon MLX), and FreeBSD (NVIDIA CUDA). Docker, Helm, vLLM, Direct Python. Every product runs fully air-gapped — no cloud calls, no telemetry, no data leaves your machine. Fine-tuning pipeline and benchmark suite included.
Same security stack. Same API. All products air-gapped by design. Different hardware targets.
| Product | Hardware | Model | Install Method | Air-Gapped |
|---|---|---|---|---|
| Kwyre Personal | AMD GPU (RX 7900 XT+) / NVIDIA GPU (RTX 3060+) | Qwen3.5-4B Uncensored NF4 + 0.8B draft (3.3 GB) — 0/465 refusals | docker compose up or installer |
✓ Always |
| Kwyre Professional | AMD GPU (RX 7900 XTX / MI210) / NVIDIA GPU (RTX 4090 / A100 / H100) | Qwen3.5-4B Uncensored + 6 LoRA hot-swap adapters + GRPO (3.3 GB) | docker compose up or installer |
✓ Always |
| Kwyre Air | Any CPU (8GB+ RAM) | GGUF (KWYRE_GGUF_PATH) | python server/serve_cpu.py |
✓ Always |
| vLLM Backend | AMD / NVIDIA GPU (multi-GPU supported) | Qwen3.5-4B Uncensored or custom — PagedAttention + continuous batching | KWYRE_BACKEND=vllm python server/serve_vllm.py |
✓ Always |
| Kwyre Apple Silicon | Apple M1/M2/M3/M4 (macOS 12+) | Any MLX model — Metal-optimized, unified memory | python server/serve_mlx.py |
✓ Always |
| FreeBSD | NVIDIA GPU (FreeBSD 13+) | Qwen3.5-4B Uncensored NF4 + GGUF — CUDA or CPU | sudo ./install_freebsd.sh |
✓ Always |
| Custom LLM | Any (we configure) | Custom-trained for your domain | Turnkey delivery | ✓ Always |
Qwen3.5-4B Uncensored — 0/465 refusals. 6 domain LoRA adapters hot-swap at runtime via API (~100 MB each). Your sensitive data never refuses to be analyzed.
Claude → QLoRA → domain GRPO → LoRA export. 300 traces/domain. GPU: NF4/AWQ + Flash Attn 2 + speculative (AMD ROCm / NVIDIA CUDA). vLLM: PagedAttention. CPU: llama.cpp.
Straight-Through Estimator spike encoding + k-curriculum annealing. Qwen3.5-0.8B draft for 2–3× speculative speed. RAG: FAISS, RAM-only, crypto-wipe.
Choose your platform. We handle the rest.
Installs to /opt/kwyre, configures systemd service, sets up iptables rules for process-level network lockdown, creates dedicated kwyre user. Auto-detects AMD GPUs via ROCm.
PowerShell installer configures Windows Firewall rules for process-level network lockdown, registers Windows Service, auto-detects NVIDIA GPUs via CUDA. Requires NVIDIA CUDA 12.4+ drivers.
Native installer for Apple Silicon Macs. Configures PF firewall isolation, registers launchd service, and auto-detects M-series chips for MLX acceleration. Unified memory means no VRAM limits.
FreeBSD installer configures PF firewall rules for process-level network lockdown, registers rc.d service, and auto-detects NVIDIA GPUs via CUDA. Native FreeBSD package for clean system integration.
Non-root container with dedicated kwyre user. Port mapping restricts to 127.0.0.1:8000 on host. Models auto-download on first run.
Install inference-only dependencies. Place pre-quantized models in dist/. Full access to all source code for audit.
Uses llama.cpp via llama-cpp-python. Convert models to GGUF format or use pre-built GGUF from kwyre.com. Same API, same security stack.
Uses mlx-lm for native Metal inference on Apple Silicon Macs. Unified memory architecture means no VRAM limits — the model uses system RAM directly. Same API, same security stack.
Production-grade serving with vLLM. PagedAttention dramatically increases throughput for multi-user deployments. Continuous batching handles concurrent requests without GPU memory waste.
Full Helm chart for Kubernetes deployments. Includes GPU resource scheduling, liveness/readiness probes, persistent volume claims for model storage, and Kubernetes secrets for API keys.
| Platform | Format | Installer Type |
|---|---|---|
| Linux | .deb / .AppImage | Debian package + AppImage |
| Windows | .exe / .zip | Installer + Portable ZIP |
| macOS | .pkg / .tar.gz | Package installer + Tarball |
| FreeBSD | .txz / .tar.gz | FreeBSD package + Tarball |