Today is an exciting day! Docker have just announced Docker Offload, their latest offering which extends your local development workflow into a scalable cloud-powered environment. Ideal for when you want to leverage cloud resources or if your local machine doesn’t have the required hardware (or gpu’s) to run heavy compute or AI models locally.
In this tutorial, I’m going to show you how you can build your own AI application in Docker using Docker Model Runner and how you can run this locally (if you have a locally supported GPU).
I’ll also show you how you can run this without a local GPU, remotely (but with that same local app feeling) using the awesome new Docker Offload functionality!
Docker Offload
Docker Offload allows you to seamlessly execute Docker builds and run containers in the cloud while retaining the familiar local development UX that we all know and love. This service provides on-demand cloud infrastructure that can be used for fast consistent builds, compute-heavy workloads such as running LLMs, machine learning pipelines and GPU-accelerated applications.
Previously I created this story to show another recent feature - Docker Model Runner. With the recent launch of Docker Offload, I’ve now updated this story to show you both of them in action together. The two of these combined are truly incredible!
Along the way, there are also some further features and enhancements that have been released which I’ll reference throughout! So to begin:
Docker Model Runner
Docker Model Runner allows you to run AI models locally using the same workflow and development environment that you’ll commonly use with Docker for other services. It includes an inference engine within Docker Desktop, built on llama.cpp and accessible through the use of the OpenAI API standard - allowing you to test and iterate on models directly on your machine, without additional tools or setup.
We’ll explore the rapid development of the following example application - Dev Whisperer, a Development Copilot, providing you with the ability to locally Explain, Suggest Improvements and Refactor Code!
If you have a system with a GPU supported by Docker Desktop, you’ll be able to run this locally. Fear not though and don’t let this put you off if not.
Later on I’ll show you also how to run this also using Docker’s new Offload Cloud, without the need of a locally attached GPU -
Getting Started: Enable Docker Model Runner and Docker Offload
- Update/Install Docker Desktop (You’ll want the latest release for support of both Docker Model Runner, and Docker Offload).
- Head to Settings → Beta Features in Docker Desktop.
- Enable the Docker Model Runner toggle.
- (Optional) Enable host-side TCP support if you want to do more advanced troubleshooting or connect to the service from outside containers. I recommend following this step for this tutorial.
- Enable Docker Offload (if not already enabled)
After applying these settings and restarting Docker Desktop, you can move to your terminal and start exploring Docker Model Runner.
Pulling and Running a Model
To get started, we’ll use an OCI-wrapped image model of Mistral. There’s a variety of models available on Docker Hub under the designated ai profile: https://hub.docker.com/u/ai
Pulling the model image is as simple as running:
% docker model pull ai/mistral
Once it’s available locally, run your first command and interact with the model:
% docker model run ai/mistral Interactive chat mode started. Type '/bye' to exit. > What is Docker Desktop Docker Desktop is a client application for Docker Engine, an open-source containerization platform that allows you to develop, ship, and run applications in containers. Docker Desktop is specifically designed for macOS, Linux, and Windows and provides a user-friendly interface to manage containers, services, and volumes on your local machine. It also includes features like Kubernetes support, multi-host networking, and support for popular Integrated Development Environments (IDEs) like Visual Studio Code. Docker Desktop is essential for developers who want to create portable and lightweight applications, as well as for teams that need to manage and deploy their applications consistently across different environments. It provides a unified workflow for application development, testing, and deployment, reducing the complexity of managing various dependencies and configurations. >
(You’ll see the AI’s response about Docker Desktop, summarising what it is and how it’s used)
The real power however is how Docker Model Runner integrates deeply with your local Docker environment. With host-side TCP support, you can if you wish, reach the Docker Model Runner’s endpoint (by default on port 12434
) right from your local machine. Give it a try:
% curl http://localhost:12434 Docker Model Runner The service is running.
You’ll see a simple confirmation message that the service is running.
OpenAPI Integration Example
Let’s do something more interesting by calling the /engines/llama.cpp/v1/chat/completions
endpoint.
% curl http://localhost:12434/engines/llama.cpp/v1/chat/completions \ -H "Content-Type: application/json" \ -d '{ "model": "ai/mistral", "messages": [ { "role": "user", "content": "What is Docker Desktop?" } ] }' {"choices":[{"finish_reason":"stop","index":0,"message":{"role":"assistant","content":" Docker Desktop is a product from Docker Inc. that allows developers to develop, test, and deploy applications within containers. It provides a user-friendly interface for managing Docker on Windows, Mac, and Linux, and it includes features like Kubernetes support, multi-host management, and support for popular Integrated Development Environments (IDEs).\n\nDocker Desktop uses containerization technology to encapsulate applications and their dependencies into a single, portable unit, making it easier to run applications consistently across different environments. It also allows for efficient resource management, as multiple applications can share a single operating system instance, reducing the overall resource usage.\n\nIn addition, Docker Desktop integrates with cloud platforms like AWS, Azure, and Google Cloud, enabling seamless deployment of containers directly to the cloud. Overall, Docker Desktop is a powerful tool for developers looking to streamline their workflows and ensure consistency across different development, testing, and production environments."}}],"created":1752175200,"model":"ai/mistral","system_fingerprint":"b1-20b7bf8","object":"chat.completion","usage":{"completion_tokens":202,"prompt_tokens":11,"total_tokens":213},"id":"chatcmpl-m7raUKJeEzXK9gWX3YW71A94sKdIi6Tf","timings":{"prompt_n":11,"prompt_ms":98.194,"prompt_per_token_ms":8.926727272727273,"prompt_per_second":112.02313786993095,"predicted_n":202,"predicted_ms":2788.027,"predicted_per_token_ms":13.80211386138614,"predicted_per_second":72.45266993468859}}%
You’ll receive a JSON response with an answer generated by your local Llama model. This is the Docker Desktop Model Runner in action!
Building a Quick AI App with Docker Compose
Let’s raise it up a notch and build a mini app that interacts with our local AI model. We’ll create the Dev Whisperer web application to explain, suggest improvements for, or refactor your code, all powered by the Docker Model Runner.
Should you require, the code for this is available on GitHub at: https://github.com/spurin/devwhisperer
1. The Web App (HTML/JavaScript)
Below is a single-page application - index.template.html
- that calls our AI model’s endpoint:
<!DOCTYPE html> <html lang="en"> <head> <meta charset="UTF-8" /> <title>Dev Whisperer (Local Dev Copilot)</title> <style> body { font-family: sans-serif; padding: 2rem; max-width: 800px; margin: auto; background-color: #f9f9f9; } textarea { width: 100%; height: 200px; font-family: monospace; font-size: 14px; padding: 1rem; border-radius: 8px; border: 1px solid #ccc; background: #fff; } select, button { margin-top: 1rem; padding: 0.6rem 1rem; font-size: 1rem; } pre { background: #f4f4f4; padding: 1rem; white-space: pre-wrap; border-left: 4px solid #ccc; margin-top: 1.5rem; border-radius: 6px; } </style> </head> <body> <h1>🧠 Dev Whisperer (Local Dev Copilot)</h1> <label for="code"><strong>Paste your code:</strong></label><br /> <textarea id="code" placeholder="Type or paste your code here..."></textarea><br /> <label for="action"><strong>Select Action:</strong></label><br /> <select id="action"> <option value="explain">Explain</option> <option value="suggest">Suggest Improvements</option> <option value="refactor">Refactor</option> </select><br /> <button onclick="askAI()">Run</button> <h3>💬 Response:</h3> <pre id="response">Waiting for input...</pre> <script> const MODEL_NAME = "${MODEL_NAME}"; async function askAI() { const code = document.getElementById("code").value.trim(); const action = document.getElementById("action").value; const responseBox = document.getElementById("response"); if (!code) { responseBox.textContent = "⚠️ Please enter some code first."; return; } let systemPrompt = ""; switch (action) { case "explain": systemPrompt = "You are a helpful programming assistant. Explain what the following code does:"; break; case "suggest": systemPrompt = "You are a code reviewer. Suggest improvements for the following code:"; break; case "refactor": systemPrompt = "You are an expert software engineer. Refactor the following code for readability and performance:"; break; } const payload = { model: MODEL_NAME, messages: [ { role: "system", content: systemPrompt }, { role: "user", content: code } ] }; responseBox.textContent = "🧠 Thinking..."; try { const res = await fetch("/v1/chat/completions", { method: "POST", headers: { "Content-Type": "application/json" }, body: JSON.stringify(payload) }); const result = await res.json(); responseBox.textContent = result.choices?.[0]?.message?.content || "🤖 No response from model."; } catch (err) { console.error(err); responseBox.textContent = "❌ Error communicating with the model."; } } </script> </body> </html>
We’ll dynamically inject the MODEL_NAME
to ensure we’re talking to the correct model, and we’re calling the endpoint /v1/chat/completions
.
2. The Dockerfile
Next up, we’ll use nginx:alpine
to serve our web page. We also need envsubst
to dynamically insert environment variables into our HTML and Nginx config:
FROM nginx:alpine # Install envsubst for dynamic config RUN apk add --no-cache bash gettext # Copy files COPY index.template.html /usr/share/nginx/html/index.template.html COPY nginx.conf.template /etc/nginx/templates/nginx.conf.template COPY entrypoint.sh /entrypoint.sh RUN chmod +x /entrypoint.sh # Start with custom entrypoint ENTRYPOINT ["/entrypoint.sh"]
3. The Nginx Configuration
In nginx.conf.template
, we set up a proxy pass that targets our model runner’s endpoint. This assists in bypassing CORS. Notice the placeholder ${MODEL_API_URL}
:
server { listen 80; server_name localhost; location / { root /usr/share/nginx/html; index index.html; try_files $uri $uri/ =404; } # Proxy API requests to dynamic backend location /v1/ { proxy_pass ${MODEL_API_URL}; proxy_set_header Host $host; proxy_http_version 1.1; proxy_set_header Upgrade $http_upgrade; proxy_set_header Connection 'upgrade'; proxy_set_header X-Forwarded-For $proxy_add_x_forwarded_for; proxy_set_header X-Real-IP $remote_addr; proxy_set_header X-Forwarded-Proto $scheme; } }
4. The Entrypoint Script
Our entrypoint.sh
script uses envsubst
to replace placeholders in the Nginx config and the HTML file:
#!/bin/sh # Inject MODEL_API_URL into nginx config envsubst '${MODEL_API_URL}' < /etc/nginx/templates/nginx.conf.template > /etc/nginx/conf.d/default.conf # Inject MODEL_NAME into HTML envsubst '${MODEL_NAME}' < /usr/share/nginx/html/index.template.html > /usr/share/nginx/html/index.html # Start Nginx exec nginx -g 'daemon off;'
5. The Environment Variables
We’re handling variables for our app, via compose and if you’re already with Docker Compose, you may notice some nice new features. Firstly, we’re able to specify our models allowing them to be auto-pulled where required, upon startup.
We’re also able to dynamically inject variables relating to our model, into the running container. In this case, the devwhisper container will receive the environment variables MODEL_API_URL
and MODEL_NAME
, both derived from the model and in the correct format (using the OpenAPI url standard).
services: devwhisperer: build: . ports: - "3000:80" models: mistral: endpoint_var: MODEL_API_URL model_var: MODEL_NAME models: mistral: model: ai/mistral
With all of this in place, we can build and start our application with Docker Compose:
% docker compose up --build Compose can now delegate builds to bake for better performance. To do so, set COMPOSE_BAKE=true. [+] Building 0.9s (13/13) FINISHED docker-container:build_cross => [devwhisperer internal] load build definition from Dockerfile 0.0s => => transferring dockerfile: 409B 0.0s => [devwhisperer internal] load metadata for docker.io/library/nginx:alpine 0.5s => [devwhisperer internal] load .dockerignore 0.0s => => transferring context: 2B 0.0s => [devwhisperer 1/6] FROM docker.io/library/nginx:alpine@sha256:4ff102c5d78d254a6f0da062b3cf39eaf07f01eec0927fd21e219d0af8bc0591 0.0s => => resolve docker.io/library/nginx:alpine@sha256:4ff102c5d78d254a6f0da062b3cf39eaf07f01eec0927fd21e219d0af8bc0591 0.0s => [devwhisperer internal] load build context 0.0s => => transferring context: 113B 0.0s => CACHED [devwhisperer 2/6] RUN apk add --no-cache bash gettext 0.0s => CACHED [devwhisperer 3/6] COPY index.template.html /usr/share/nginx/html/index.template.html 0.0s => CACHED [devwhisperer 4/6] COPY nginx.conf.template /etc/nginx/templates/nginx.conf.template 0.0s => CACHED [devwhisperer 5/6] COPY entrypoint.sh /entrypoint.sh 0.0s => CACHED [devwhisperer 6/6] RUN chmod +x /entrypoint.sh 0.0s => [devwhisperer] exporting to oci image format 0.3s => => exporting layers 0.0s => => exporting manifest sha256:a7fc0fc00e99669952ae22cc264a2dd5f2c972a6a8435c992a019d3da802010a 0.0s => => exporting config sha256:88ad9e541edac58babd07a82def2411744f0ab53ff01398cabbe5588f876d5d0 0.0s => => sending tarball 0.3s => [devwhisperer] importing to docker 0.0s => [devwhisperer] resolving provenance for metadata file 0.0s [+] Running 2/2 ✔ devwhisperer Built 0.0s ✔ Container devwhisperer-devwhisperer-1 Created 0.0s Attaching to devwhisperer-1 devwhisperer-1 | 2025/03/26 19:09:20 [notice] 1#1: using the "epoll" event method devwhisperer-1 | 2025/03/26 19:09:20 [notice] 1#1: nginx/1.27.4 devwhisperer-1 | 2025/03/26 19:09:20 [notice] 1#1: built by gcc 14.2.0 (Alpine 14.2.0) devwhisperer-1 | 2025/03/26 19:09:20 [notice] 1#1: OS: Linux 6.10.14-linuxkit devwhisperer-1 | 2025/03/26 19:09:20 [notice] 1#1: getrlimit(RLIMIT_NOFILE): 1048576:1048576 devwhisperer-1 | 2025/03/26 19:09:20 [notice] 1#1: start worker processes devwhisperer-1 | 2025/03/26 19:09:20 [notice] 1#1: start worker process 8 devwhisperer-1 | 2025/03/26 19:09:20 [notice] 1#1: start worker process 9 devwhisperer-1 | 2025/03/26 19:09:20 [notice] 1#1: start worker process 10 devwhisperer-1 | 2025/03/26 19:09:20 [notice] 1#1: start worker process 11 devwhisperer-1 | 2025/03/26 19:09:20 [notice] 1#1: start worker process 12 devwhisperer-1 | 2025/03/26 19:09:20 [notice] 1#1: start worker process 13 devwhisperer-1 | 2025/03/26 19:09:20 [notice] 1#1: start worker process 14 devwhisperer-1 | 2025/03/26 19:09:20 [notice] 1#1: start worker process 15 devwhisperer-1 | 2025/03/26 19:09:20 [notice] 1#1: start worker process 16 devwhisperer-1 | 2025/03/26 19:09:20 [notice] 1#1: start worker process 17 devwhisperer-1 | 2025/03/26 19:09:20 [notice] 1#1: start worker process 18 devwhisperer-1 | 2025/03/26 19:09:20 [notice] 1#1: start worker process 19 devwhisperer-1 | 2025/03/26 19:09:20 [notice] 1#1: start worker process 20 devwhisperer-1 | 2025/03/26 19:09:20 [notice] 1#1: start worker process 21 devwhisperer-1 | 2025/03/26 19:09:20 [notice] 1#1: start worker process 22 devwhisperer-1 | 2025/03/26 19:09:20 [notice] 1#1: start worker process 23 devwhisperer-1 | 2025/03/26 19:09:20 [notice] 1#1: start worker process 24 devwhisperer-1 | 2025/03/26 19:09:20 [notice] 1#1: start worker process 25 devwhisperer-1 | 2025/03/26 19:09:20 [notice] 1#1: start worker process 26 devwhisperer-1 | 2025/03/26 19:09:20 [notice] 1#1: start worker process 27
If we take a quick peek, we can confirm that the devwhisperer container received those two variables as expected -
% docker ps CONTAINER ID IMAGE COMMAND CREATED STATUS PORTS NAMES 89e5f10c43f7 devwhisperer-devwhisperer "/entrypoint.sh" 3 minutes ago Up 3 minutes 0.0.0.0:3000->80/tcp, [::]:3000->80/tcp devwhisperer-devwhisperer-1 % docker exec devwhisperer-devwhisperer-1 env PATH=/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin HOSTNAME=89e5f10c43f7 MODEL_NAME=ai/mistral MODEL_API_URL=http://model-runner.docker.internal/engines/v1/ NGINX_VERSION=1.29.0 PKG_RELEASE=1 DYNPKG_RELEASE=1 NJS_VERSION=0.9.0 NJS_RELEASE=1 HOME=/root
After building and running via compose, browse to http://localhost:3000 to see your Dev Whisperer app in action!
Try pasting in code snippets or LeetCode examples. Choose one of the actions - Explain, Suggest Improvements, or Refactor - then click Run. You’ll see the AI response right there in the browser, courtesy of Docker Model Runner.
Docker Offload
Whilst this is exciting, if you’ve been following (or attempting to follow this tutorial) on a system without a GPU, the end result may not have gone to plan, however, the tides will change with the use of Docker Offload.
With the latest version of Docker Desktop running, there’s different ways in which you can enable this functionality.
Personally, I love the toggle at the top of the Docker UI, changing the top bar to a gradient purple colour -
Alternatively, when running in non Docker Offload mode you can perform this same step from the CLI (which in turn, will update the UI status bar):
Once running via Docker Offload, you can perform the very same steps as outlined above and the app will work for you locally, again via http://localhost:3000
To prove this, here’s the full log extract, showing this in action:
james@Mac devwhisperer % docker offload start ╭────────────────────────────────────────────────────────────────────────────────╮ │ │ │ Successfully started Docker Offload session through the "spurin" account! │ │ │ │ --- │ │ New docker context created: docker-cloud │ │ │ │ --- │ │ What's next? │ │ Run container -> docker run -it ubuntu bash │ │ Learn more about Docker Offload -> │ │ https://app.docker.com/accounts/spurin/cloud │ │ │ ╰────────────────────────────────────────────────────────────────────────────────╯ james@Mac devwhisperer % james@Mac devwhisperer % james@Mac devwhisperer % docker compose up --build [+] Building 11.3s (16/16) FINISHED => [internal] load local bake definitions 0.0s => => reading from stdin 384B 0.0s => [internal] connected to docker build cloud service 0.0s => [internal] load build definition from Dockerfile 0.3s => => transferring dockerfile: 409B 0.3s => [internal] load metadata for docker.io/library/nginx:alpine 0.8s => [auth] library/nginx:pull token for registry-1.docker.io 0.0s => [internal] load .dockerignore 0.2s => => transferring context: 2B 0.2s => [1/6] FROM docker.io/library/nginx:alpine@sha256:b2e814d28359e77bd0aa5fed1939620075e4ffa0eb20423cc557b375bd5c14ad 0.0s => => resolve docker.io/library/nginx:alpine@sha256:b2e814d28359e77bd0aa5fed1939620075e4ffa0eb20423cc557b375bd5c14ad 0.0s => [internal] load build context 0.2s => => transferring context: 113B 0.2s => CACHED [2/6] RUN apk add --no-cache bash gettext 0.0s => CACHED [3/6] COPY index.template.html /usr/share/nginx/html/index.template.html 0.0s => CACHED [4/6] COPY nginx.conf.template /etc/nginx/templates/nginx.conf.template 0.0s => CACHED [5/6] COPY entrypoint.sh /entrypoint.sh 0.0s => CACHED [6/6] RUN chmod +x /entrypoint.sh 0.0s => exporting to image 3.7s => => exporting layers 0.0s => => exporting manifest sha256:28fd01bca68c4dc18a46814d57cc050581ec4a2823ce85473e2acad9252d063e 0.0s => => exporting config sha256:040f44d87c120f09cacbca472476e8b6973bfe9d11117e073285029e4aac2368 0.0s => => exporting to cloud pull 3.7s => cloud pull 3.6s => => pulling layer 9c76171a7393 0.6s => => pulling layer 205e0925dd62 0.6s => => pulling layer 70c405aad68d 0.6s => => pulling layer 63dda2adf85b 0.6s => => pulling layer 6b15f78d59f2 0.6s => => pulling layer 92971aeb101e 16.78MB / 16.78MB 0.6s => => pulling layer fe07684b16b8 3.80MB / 3.80MB 0.0s => => pulling layer 3b7062d09e02 1.81MB / 1.81MB 0.6s => => pulling layer b55ed7d7b2de 0.0s => => pulling layer fb746e72516f 0.6s => => pulling layer a9ff9baf1741 0.6s => => pulling layer 2c127093dfc7 0.0s => => pulling layer f19299dee6a5 2.00MB / 2.00MB 0.6s => resolving provenance for metadata file 0.1s [+] Running 3/4 ✔ devwhisperer Built 0.0s ⠙ mistral Configuring 28.1s ✔ Network devwhisperer_default Created 0.1s ✔ Container devwhisperer-devwhisperer-1 Created 7.3s Attaching to devwhisperer-1 devwhisperer-1 | 2025/07/10 19:37:29 [notice] 1#1: using the "epoll" event method devwhisperer-1 | 2025/07/10 19:37:29 [notice] 1#1: nginx/1.29.0 devwhisperer-1 | 2025/07/10 19:37:29 [notice] 1#1: built by gcc 14.2.0 (Alpine 14.2.0) devwhisperer-1 | 2025/07/10 19:37:29 [notice] 1#1: OS: Linux 6.8.0-1029-aws devwhisperer-1 | 2025/07/10 19:37:29 [notice] 1#1: getrlimit(RLIMIT_NOFILE): 1048576:1048576 devwhisperer-1 | 2025/07/10 19:37:29 [notice] 1#1: start worker processes devwhisperer-1 | 2025/07/10 19:37:29 [notice] 1#1: start worker process 9 devwhisperer-1 | 2025/07/10 19:37:29 [notice] 1#1: start worker process 10 devwhisperer-1 | 2025/07/10 19:37:29 [notice] 1#1: start worker process 11 devwhisperer-1 | 2025/07/10 19:37:29 [notice] 1#1: start worker process 12
And again testing locally via http://localhost:3000 -
It’s also worth highlighting that whilst running via Docker Offload, the model uses a different endpoint URL. Fortunately though, the docker compose model variable handling, manages this for us transparently.
% docker exec -it devwhisperer-devwhisperer-1 env PATH=/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin HOSTNAME=7c5f91e2c4c7 TERM=xterm MODEL_NAME=ai/mistral MODEL_API_URL=http://172.17.0.1:12435/engines/v1/ NGINX_VERSION=1.29.0 PKG_RELEASE=1 DYNPKG_RELEASE=1 NJS_VERSION=0.9.0 NJS_RELEASE=1 HOME=/root
Wrapping Up
Congratulations! You’ve successfully made use of the Docker Desktop Model Runner to build and prototype an AI-powered application. In a matter of minutes, we’ve shown how easy it is to pull an AI model, connect it to a small web app using Docker Compose and a lightweight Nginx container, and start playing around with locally hosted AI.
We’ve also seen how we can offload this, to the power of Docker Offload Cloud.
This is just the beginning. AI is revolutionising how we build, test, and refine our software. With Docker Desktop, you’re well-positioned to be at the forefront of that movement.
Now is the perfect time to dive in, experiment, and create pioneering applications.
Thanks for reading, and happy coding!
If you have any questions or run into any hiccups, feel free to drop a comment below or reach out on Twitter/LinkedIn.
Good Luck!