Skip to main content
  • The enhanced vLLM Docker images are built on top of the official vLLM container to provide secure inference services for encrypted models.
  • To preserve security, we enforce additional restrictions on container startup.
  • The requirements are outlined below, including immutable and configurable flags.
Want to understand how security is implemented?
See our Security Architecture documentation for detailed technical insights.

Supported vLLM Versions

All images are available on our Docker Hub:
PlatformImage NameBase Image
GPU (x86_64/ARM64)koalavault/vllm-openai:v0.11.0vllm/vllm-openai:v0.11.0
CPU (x86_64/ARM64)koalavault/vllm-cpu:v0.11.0koalavault/vllm-cpu-base-amd64:v0.11.0,koalavault/vllm-cpu-base-arm64:v0.11.0
  • latest tag: Points to the latest vLLM version (currently v0.11.0) with the most recent CryptoTensors build

Pull the Docker Image

docker pull koalavault/vllm-openai:latest

Deploy with Docker

Deploying encrypted models with KoalaVault is almost identical to deploying standard models, except for adding a few additional flags to the container startup command to ensure model decryption occurs in a secure environment. The typical docker deployment command looks like below:
docker run --runtime nvidia --gpus all \
  -v ./models:/models \
  -p 8000:8000 \
  --ipc=host \
  -e KOALAVAULT_API_KEY=$KOALAVAULT_API_KEY \
  --read-only \
  --cap-drop ALL \
  --cap-add DAC_OVERRIDE \
  --tmpfs /tmp:exec,nosuid,nodev \
  koalavault/vllm-openai:latest \
  --koalavault-model <PUBLISHER_KOALAVAULT_USERNAME>/<MODEL_NAME> \
  --model /models/<MODEL_NAME> \
  --max-model-len=2048 
The additional flags/setup are detailed below:
--read-only
required, cannot be changed
--cap-drop ALL
required, cannot be changed
--tmpfs /tmp:exec,nosuid,nodev
required, cannot be changed
koalavault-api-key
required, choose one
--koalavault-model
required, configurable
model-mount-options
required, choose one
--cap-add SYS_NICE
required, CPU deployment only
--cap-add DAC_OVERRIDE
required, Linux only

Additional Notes

All other requirements and best practices remain the same as the official vLLM container.
Please refer to the vLLM documentation for further details on system requirements, GPU support, and advanced configuration:
👉 vLLM Official Documentation