Engineering Blog

Back
Published on April 14, 2025 by

DIY AI Chatbot with Ollama, Open WebUI & DeepSeek-R1 on NVIDIA L40S

I used the launch of the dedicated GPUs as an opportunity to show off some of the new possibilities it adds to our platform. In this Engineering Blog Post I'm going through the setup of a self-hosted AI chatbot using Ollama and Open WebUI, powered by the DeepSeek-R1 70B model running on one of our brand-new NVIDIA L40S GPUs.

But why? There are already many free AI chatbots available…

With so many AI chatbots available for free, why go through the trouble of setting up a self-hosted one, beside technical curiosity? The answer lies in the often repeated and always neglected concerns about privacy, data protection, and trust:

  • Privacy & data protection: Your chat conversations are often logged, analyzed, and stored. Even if providers claim to anonymize data, you can never be certain how your information is being used.
  • Company secrets: Using an external AI chatbot risks exposing trade secrets, internal strategies, or confidential client information.
  • If a service is free, you are the product: The meanwhile worn-out phrase still holds true.

Evaluation

Disclaimer: I did not spend too much time for the evaluation of the stack and got a lot of suggestions of my teammate Alain, which already has some more experience with LLMs (large language model).

TLDR:

Web-Based Chat Interface

In my short research, I found two very promising Open-source tools as a AI chatbot Web UI:

I just went for to seemingly more light-weight solution, which is Open WebUI.

LLM Management Tool

Because Ollama is a really easy to use solution for managing LLMs and is natively supported by Open WebUI (no additional configuration needed) and also installs all necessary drivers and dependencies, I just went with this tool.

Large Language Model

Here comes the tricky part, selecting the right model. To ensure maximum performance, it needs to fit into the GPUs VRAM. The NVIDIA L40S GPUs, provided at cloudscale, has 48 GB of GDDR6 VRAM. Even though it is possible to use multiple L40S GPUs to run even larger models, I wanted to use a single GPU for this setup. The VRAM requirements can be found on the Ollama website or other model repositories. The DeepSeek-R1 70B model requires approximately 41 GB of VRAM, making it a great fit for this GPU.

Instructions

Setting up a GPU server on cloudscale

GPU servers are subject to Addendum for GPU servers / Vertragszusatz für GPU-Server. If you are interested or have any questions, please contact support.

I just created a GPU VM via the cloudscale control panel with the following specifications:

  • Flavor: GPU1-160-20-1-400 (could also be a GPU flavor with less RAM or fewer CPU cores, the GPU is what matters)
  • GPU Type: 1x NVIDIA L40S
  • Scratch Disk: 400 GB on RAID 1 (this will be handy for persisting the LLM)
  • Source Image: Debian 12 - Bookworm

If you want to follow this guide, I strongly recommend using Debian as the base image, as it ensures that all the steps will work as expected.

Mount the Scratch Disk

Even though we can load models up to 48 GB directly into the NVIDIA L40S's VRAM, we need to download and persist the models before we can run them. For this reason, at cloudscale, every GPU server comes equipped with an additional, local Scratch Disk. The Scratch Disk is tied directly to the server, but offers better performance than our usual volumes. Like other volumes, we have to mount it in our VM first:

# List all disks
lsblk -l

# Create a new folder for the mount
sudo mkdir -p /mnt/scratch

# Identify and mount Scratch Disk, e.g.: /dev/sdb
sudo mount /dev/sdb /mnt/scratch

# Get the device's UUID (note it somewhere or copy to clipboard)
sudo blkid /dev/sdb

# Persist the Scratch Disk mount
sudo vim /etc/fstab

# Add the following line
UUID=<device-uuid> /mnt/scratch ext4 defaults 0 0

Later, we will configure Ollama to use the Scratch Disk to download and persist the models.

Manually install NVIDIA drivers (optional)

This step can be skipped entirely, because the Ollama install script will install all necessary dependencies automatically. Follow this guide, if you are interested in how to manually install the NVIDIA GPU drivers on a VM.

In the following section, I will give you a step-by-step guide for installing the necessary NVIDIA GPU driver on Debian 12 - Bookworm.

# SSH into the rebooted server
ssh debian@<public-ip>

# Upgrade Debian to the latest version
# In the presented dialog, you can just confirm the default setting
sudo apt update && sudo apt upgrade -y

# Edit the sources list to enable non-free software (e.g. NVIDIA Drivers)
sudo vim /etc/apt/sources.list

The file should be updated as follows:

# See /etc/apt/sources.list.d/debian.sources
deb http://deb.debian.org/debian bookworm main contrib non-free non-free-firmware
deb http://deb.debian.org/debian-security bookworm-security main contrib non-free non-free-firmware
deb http://deb.debian.org/debian bookworm-updates main contrib non-free non-free-firmware
# Update the package list
sudo apt update

# Install the NVIDIA driver package
# Multiple dialogs will pop up, I just confirmed the default settings as they seemed reasonable enough
sudo apt install nvidia-driver

# Reboot VM as recommended
sudo reboot

# SSH into the rebooted server
ssh debian@<public-ip>

# Verify that the NVIDIA GPU drivers are installed and the GPU is correctly recognized by the VM
nvidia-smi

Install Ollama

# SSH into the server
ssh debian@<public-ip>

# Download and execute the Ollama install script
curl -fsSL https://ollama.com/install.sh | sh

# Reboot VM as recommended
sudo reboot

# SSH into the rebooted server
ssh debian@<public-ip>

# Verify that the NVIDIA GPU drivers are installed and the GPU is correctly recognized by the VM
nvidia-smi

# Verify that Ollama is installed successfully
ollama -v
sudo systemctl status ollama

# Configure Ollama to use the mounted Scratch Disk instead of the root volume
sudo mkdir -p /mnt/scratch/ollama_models
sudo chown ollama:ollama /mnt/scratch/ollama_models/

# Edit the ollama service configuration
sudo vim /etc/systemd/system/ollama.service

# Add the following line as the last one in the [Service] section
Environment="OLLAMA_MODELS=/mnt/scratch/ollama_models"

# Reload the ollama service
sudo systemctl daemon-reload
sudo systemctl restart ollama

# Download a small model to verify that it's stored on the Scratch Disk, the directory should not be empty any more
ollama pull smollm
ls /mnt/scratch/ollama_models/

# Cleanup the unused model
ollama rm smollm

# Install and run DeepSeek's R1
ollama run deepseek-r1:70b

# You can now test the LLM via CLI, exit with ctrl+d

Secure your service (optional)

If you plan to keep the service online, it's strongly advised to follow this section, or to implement alternative measures to protect it from unwanted access. This setup requires that you have a domain name.

Install nginx with certbot

In this guide, I will install and configure the web server nginx with certbot, to restrict the access and enable HTTPS for Open WebUI. Before your start, make sure that your domain's or subdomain's A and AAAA records are pointing to the GPU server's public IPv4 and IPv6 addresses respectively.

# Install nginx, certbot and certbot nginx plugin
sudo apt install nginx certbot python3-certbot-nginx

# Verify installation
sudo nginx -v

# Edit the nginx default configuration
sudo vim /etc/nginx/sites-available/default
# Replace <your-ip-v4> and <your-ip-v6> with the addresses you are using to access the internet.
# If you don't want to protect your service from unwanted access or you have other measures in place (e.g.: HTTP BasicAuth), the following three lines can be removed.
allow <your-ip-v4>;
allow <your-ip-v6>;
deny all;

upstream open_webui {
    # We will configure Open WebIU to listen on this port
    server 127.0.0.1:8080;
}

server {
    listen 80 default_server;
    listen [::]:80 default_server;

    root /var/www/html;

    index index.html;

    # Replace <your-domain> with the domamin where you configured the A and AAAA records, the certbot plugin will also use this to automatically extend this file with the HTTPS configuration
    server_name <your-domain>;

    # The proxy configuration is taken from: https://docs.openwebui.com/tutorials/https-nginx/#steps
    location / {
            proxy_pass http://open_webui;

            # Add WebSocket support (Necessary for version 0.5.0 and up)
            proxy_http_version 1.1;
            proxy_set_header Upgrade $http_upgrade;
            proxy_set_header Connection "upgrade";

            proxy_set_header Host $host;
            proxy_set_header X-Real-IP $remote_addr;
            proxy_set_header X-Forwarded-For $proxy_add_x_forwarded_for;
            proxy_set_header X-Forwarded-Proto $scheme;

            # (Optional) Disable proxy buffering for better streaming response from models
            proxy_buffering off;

            # (Optional) Increase max request size for large attachments and long audio messages
            client_max_body_size 20M;
            proxy_read_timeout 10m;
        }
}
# Verify that the nginx config is valid
sudo nginx -t

# Reload configuration
sudo nginx -s reload

# Setup certbot
sudo certbot --nginx -d <your-domain>

# Check the changes made by certbot:
# There should be new entries 'managed by certbot'
cat /etc/nginx/sites-available/default

Now you can open https://<your-domain>/ in a browser to verify if the TLS Certificate is correctly setup for your domain, and the page is served via HTTPS. The response will be "502 Bad Gateway", as the Open WebUI service is not reachable yet. If you receive a "403 Forbidden", this probably means you have to change the nginx configuration.

Install UFW (Uncomplicated Firewall)

# Install UFW
sudo apt install ufw

# Allow egress (we will monitor egress with OpenSnitch)
sudo ufw default allow outgoing

# Block ingress
sudo ufw default deny incoming

# Allow services, if you forget to allow SSH you will be blocked from accessing the VM!
sudo ufw allow ssh
sudo ufw allow vnc
sudo ufw allow http
sudo ufw allow https

# Enable UFW
sudo ufw enable

# Check status
sudo ufw status

# Verify that apt and certbot still can do their job
sudo apt update
sudo certbot renew --dry-run

Install Open WebUI

In this section, I will install Open WebUI into a Python Virtualenv, but there are other installation methods (e.g.: with Docker).

# Install dependencies
sudo apt install python3 python3-venv

# Verify python version is 3.11 to avoid compatibility issues
python3 --version

# Create a new user for running Open WebUI
sudo useradd -m open_webui

# Change directory and switch to user open_webui
cd /home/open_webui
sudo su open_webui

# Create a new directory and add a virtual environment
python3 -m venv venv

# Install Open WebUI in the virtual environment
venv/bin/pip install open-webui

# Start Open WebUI via the virtual environment, the default port is 8080
venv/bin/open-webui serve

You can now verify that your own Open WebUI is running, and create your admin account:

  • If you skipped the "Secure your service" section: http://<public-ip>:8080
  • Otherwise: https://<your-domain>

Now let's create a SystemD service, which runs Open WebUI in the background:

# Ctrl+c to stop Open WebUI and exit from the user open_webui
exit

# Create a new systemd service configuration
sudo vim /etc/systemd/system/open-webui.service
[Unit]
Description=Open WebUI Service
After=network.target

[Service]
Type=simple
WorkingDirectory=/home/open_webui
ExecStart=/home/open_webui/venv/bin/open-webui serve
KillSignal=SIGTERM
KillMode=mixed
Restart=always
RestartSec=3
StandardOutput=syslog
StandardError=syslog
User=open_webui
Group=open_webui

[Install]
WantedBy=multi-user.target
# Reload systemd
sudo systemctl daemon-reload

# Enable and start open-webui service
sudo systemctl enable open-webui.service
sudo systemctl start open-webui.service

# Check for any errors
sudo systemctl status open-webui.service
Asking DeepSeek-R1 via Open WebUI to tell a joke.

If you have comments or corrections to share, you can reach our engineers at engineering-blog@cloudscale.ch.

Back to overview