Are you the model seller?Model sellers automatically have access to their own models (public or private) without purchase. You can skip directly to Deploy Model section.

Buy Model

Before purchasing, make sure you have completed the Account Setup.

Want to try it out first? Use our free demo model koalavault/qwen3-0.6b (0 USDT). Complete steps 1-2, skip payment steps 3-4, then jump to Download Model.

Browse & Select Model

Browse the marketplace to discover available models
Find a model that fits your needs (e.g., koalavault/qwen3-0.6b for testing)
Click on the model to view details, pricing, and specifications

Create Order

On the model page, click Purchase in the top right corner
Select your preferred pricing plan
Review the order details and total amount
Click Create Order

First-time buyer? We recommend using anonymous payment for your first purchase. This allows you to pay directly from exchanges like Binance without setting up a wallet first.

Make the Payment

After creating your order, you’ll see the payment address and instructions on the order detail page. Send USDT payment on BSC network using a supported exchange or wallet.

Payment Requirements:

Must use USDT (Tether) cryptocurrency
Must be on BSC (BNB Smart Chain) network
Token standard: BEP-20

New to Crypto Payments?

Don’t worry! Follow our beginner-friendly payment guide with step-by-step instructions for Binance, crypto wallets, and more.

Submit Transaction ID

After completing your payment, you’ll receive a Transaction ID (txid) from your exchange or wallet. Submit this to KoalaVault for verification:

Find your order: Go to your order detail page or Subscriptions to view pending orders
Submit txid: Paste your (starts with 0x...) and click Confirm Payment

Once verified on the blockchain (usually 1-5 minutes), you’ll see an active subscription for the model in your Subscriptions page. This means the model is now available for download and deployment.

Download Model

Install Required Tools

Install Koava with HuggingFace support, which includes the hf command for downloading models:

pip install -U "koava[huggingface]"

Download Model

Go to your purchased model’s detail page on the KoalaVault platform
Click the Deploy tab and copy the download command
Run the command to download the encrypted model:

hf download <PUBLISHER_USERNAME>/<MODEL_NAME> --local-dir ./models/<MODEL_NAME>

Replace <PUBLISHER_USERNAME>/<MODEL_NAME> with your actual model identifier from the Deploy tab.Example (using the free demo model):

hf download koalavault/qwen3-0.6b --local-dir ./models/qwen3-0.6b

Deploy Model

Before deploying, ensure you have Docker installed on your system (Install Docker). GPU support is recommended for optimal performance.

For detailed system requirements and hardware specifications, see the vLLM Installation Guide.

Currently, KoalaVault only supports:

Models in safetensors format (other formats like GGUF are not supported yet)
Deployment via vLLM inference engine only

Pull the Docker Image

Pull the KoalaVault enhanced vLLM docker image for your architecture:

GPU (x86/arm64)
CPU (x86/arm64)
Apple Silicon (arm)

docker pull koalavault/vllm-openai:latest

docker pull koalavault/vllm-cpu:latest

docker pull koalavault/vllm-cpu:latest

Get API Key and Set Environment Variable

Generate your KoalaVault API key (generate KoalaVault API key)
Set the API key as an environment variable:

export KOALAVAULT_API_KEY=sk-your-api-key-here

Deploy with Docker

Run the Docker container with the downloaded model:

<PUBLISHER_KOALAVAULT_USERNAME> is the KoalaVault username of the model publisher (who sells the model), not necessarily the publisher’s username on HuggingFace.

Docker Mount Limitation: Due to security restrictions, the entire ./models directory must be mounted to /models in the container. You cannot mount individual model subdirectories. This ensures proper model decryption and security enforcement.

Nvidia GPU (x86/arm64)
CPU (x86/arm64)
Apple Silicon (arm)

For GPU deployment details, see vLLM Docker Deployment Guide.

docker run --runtime nvidia --gpus all \
  -v ./models:/models \
  -p 8000:8000 \
  --ipc=host \
  -e KOALAVAULT_API_KEY=$KOALAVAULT_API_KEY \
  --read-only \
  --cap-drop ALL \
  --cap-add DAC_OVERRIDE \
  --tmpfs /tmp:exec,nosuid,nodev \
  koalavault/vllm-openai:latest \
  --koalavault-model <PUBLISHER_KOALAVAULT_USERNAME>/<MODEL_NAME> \
  --model /models/<MODEL_NAME> \
  --max-model-len=2048 

Example:

docker run --runtime nvidia --gpus all \
  -v ./models:/models \
  -p 8000:8000 \
  --ipc=host \
  -e KOALAVAULT_API_KEY=$KOALAVAULT_API_KEY \
  --read-only \
  --cap-drop ALL \
  --cap-add DAC_OVERRIDE \
  --tmpfs /tmp:exec,nosuid,nodev \
  koalavault/vllm-openai:latest \
  --koalavault-model koalavault/qwen3-0.6b \
  --model /models/qwen3-0.6b \
  --max-model-len=2048 

For CPU deployment details, see vLLM CPU Installation Guide for x86 and ARM AArch64 for ARM.

docker run --rm \
  -v ./models:/models \
  --shm-size=4g \
  -p 8000:8000 \
  -e VLLM_CPU_KVCACHE_SPACE=<KV cache space> \
  -e VLLM_CPU_OMP_THREADS_BIND=<CPU cores for inference> \
  -e KOALAVAULT_API_KEY=$KOALAVAULT_API_KEY \
  --read-only \
  --cap-drop ALL \
  --cap-add DAC_OVERRIDE \
  --cap-add SYS_NICE \
  --tmpfs /tmp:exec,nosuid,nodev \
  koalavault/vllm-cpu:latest \
  --koalavault-model <PUBLISHER_KOALAVAULT_USERNAME>/<MODEL_NAME> \
  --model /models/<MODEL_NAME> \
  --max-model-len=2048 

Example:

docker run --rm \
  -v ./models:/models \
  --shm-size=4g \
  -p 8000:8000 \
  -e VLLM_CPU_KVCACHE_SPACE=4 \
  -e VLLM_CPU_OMP_THREADS_BIND=0-3 \
  -e KOALAVAULT_API_KEY=$KOALAVAULT_API_KEY \
  --read-only \
  --cap-drop ALL \
  --cap-add DAC_OVERRIDE \
  --cap-add SYS_NICE \
  --tmpfs /tmp:exec,nosuid,nodev \
  koalavault/vllm-cpu:latest \
  --koalavault-model koalavault/qwen3-0.6b \
  --model /models/qwen3-0.6b \
  --max-model-len=2048 

For Apple Silicon deployment details, see vLLM Apple Silicon Installation Guide.

docker run --rm \
  -v ./models:/models \
  --shm-size=4g \
  -p 8000:8000 \
  -e VLLM_CPU_KVCACHE_SPACE=<KV cache space> \
  -e VLLM_CPU_OMP_THREADS_BIND=<CPU cores for inference> \
  -e KOALAVAULT_API_KEY=$KOALAVAULT_API_KEY \
  --read-only \
  --cap-drop ALL \
  --cap-add SYS_NICE \
  --tmpfs /tmp:exec,nosuid,nodev \
  koalavault/vllm-cpu:latest \
  --koalavault-model <PUBLISHER_KOALAVAULT_USERNAME>/<MODEL_NAME> \
  --model /models/<MODEL_NAME> \
  --max-model-len=2048 

Example (using the pre-downloaded demo model):

docker run --rm \
  -v ./models:/models \
  --shm-size=4g \
  -p 8000:8000 \
  -e VLLM_CPU_KVCACHE_SPACE=4 \
  -e VLLM_CPU_OMP_THREADS_BIND=0-3 \
  -e KOALAVAULT_API_KEY=$KOALAVAULT_API_KEY \
  --read-only \
  --cap-drop ALL \
  --cap-add SYS_NICE \
  --tmpfs /tmp:exec,nosuid,nodev \
  koalavault/vllm-cpu:latest \
  --koalavault-model koalavault/qwen3-0.6b \
  --model /models/qwen3-0.6b \
  --max-model-len=2048 

Test Your Deployment

Once the container is running, you can test it with a simple request:

curl http://localhost:8000/v1/chat/completions \
  -H "Content-Type: application/json" \
  -d '{
    "model": "/models/qwen3-0.6b",
    "messages": [
      {"role": "user", "content": "Say hello in one sentence"}
    ],
    "max_tokens": 50
  }'

You should see a JSON response with the model’s generated text.

For more API examples and usage patterns, see the vLLM OpenAI-Compatible API documentation.

Understanding Model Encryption

Curious what happens without proper authentication/decryption?There are two scenarios to understand:

KoalaVault images without authorization: Will fail to start with “No suitable decryption key available” error
Standard vLLM images with encrypted models: Will start but produce gibberish or crash during inference

The following demonstrates scenario 2 - what occurs when running an encrypted model with standard vLLM images.

Expected Result:

✅ Container starts and model loads (due to valid safetensors format)
❌ Produces gibberish output or crashes (due to encrypted tensor data)
❌ No meaningful text generation possible (of course!)

Run Without Decryption

Try running the encrypted model with standard vLLM images (without KoalaVault authentication):

Nvidia GPU (x86/arm64)
CPU (x86/arm64)
Apple Silicon (arm)

# This will load but produce gibberish output
docker run --runtime nvidia --gpus all \
  -v ./models:/models \
  -p 8000:8000 \
  --ipc=host \
  vllm/vllm-openai:latest \
  --model /models/qwen3-0.6b \
  --max-model-len=2048

# This will load but produce gibberish output
docker run --rm \
  -v ./models:/models \
  --shm-size=4g \
  -p 8000:8000 \
  -e VLLM_CPU_KVCACHE_SPACE=4 \
  -e VLLM_CPU_OMP_THREADS_BIND=0-3 \
  --cap-add SYS_NICE \
  koalavault/vllm-cpu-base-amd64:v0.11.0 \
  --model /models/qwen3-0.6b \
  --max-model-len=2048

# This will load but produce gibberish output
docker run --rm \
  -v ./models:/models \
  --shm-size=4g \
  -p 8000:8000 \
  -e VLLM_CPU_KVCACHE_SPACE=4 \
  -e VLLM_CPU_OMP_THREADS_BIND=0-3 \
  --cap-add SYS_NICE \
  koalavault/vllm-cpu-base-arm64:v0.11.0 \
  --model /models/qwen3-0.6b \
  --max-model-len=2048

Test the Deployment

Try making a request to the running container:

curl http://localhost:8000/v1/chat/completions \
  -H "Content-Type: application/json" \
  -d '{
    "model": "/models/qwen3-0.6b",
    "messages": [
      {"role": "user", "content": "Say hello in one sentence"}
    ],
    "max_tokens": 50
  }'

The request may complete but return nonsensical text, or the container may crash during processing.Why? The tensor data (model weights) is encrypted. Standard vLLM images can read the safetensors file format, but without proper decryption via KoalaVault authentication, the model processes encrypted weights, resulting in nonsensical output or system instability.

Next Steps

Advanced Docker Options

Learn about advanced deployment configurations and security considerations

Security Architecture

Learn how KoalaVault protects encrypted models through sealed container environments and cryptographic attestation

​Buy Model

New to Crypto Payments?

​Download Model

​Deploy Model

​Understanding Model Encryption

​Next Steps

Advanced Docker Options

Security Architecture

Buy Model

Download Model

Deploy Model

Understanding Model Encryption

Next Steps