Howto:Setup a LLM Server

How to provide a LLM Server (experimental - not yet finished and fully tested)

You may decide to host your own free model on your own server or in your cloud. This is how you would approach the issue. First you need the server with a a rather performant GPU and sufficient VRAM. On our hardware (2xNVIDIA GeForce RTX 4090) we tested


Name	VRAM/GB
deepseek-r1:70b	42
deepseek-r1:32b	19
deepseek-r1:14b	9
mixtral	26

So the deepseek-r1:70b was the most performant model we could run sufficiently fast with our hardware. For the smaller models one GPU (NVIDIA GeForce RTX 4090) would be sufficient..

The server with corresponding GPUs can be rented from various providers. A European cloud provider is for example hostkey But there are many more. So if you got a running linux machine you should install ollama

curl -fsSL https://ollama.com/install.sh | sh

Then you get the model from the repository

ollama pull mixtral

Restrict access to your Application platform

systemctl edit ollama.service

under [Service] add a line like

Environment="OLLAMA_ORIGINS=http://myap.example.com"

Reload systemd and restart ollama

systemctl daemon-reload
systemctl restart ollama

That's basically all. You may need to setup some security too, firewall ...

Howto:Setup a LLM Server

How to provide a LLM Server (experimental - not yet finished and fully tested)

Related Articles

Navigation menu

Howto:Setup a LLM Server

How to provide a LLM Server (experimental - not yet finished and fully tested)

Related Articles

Navigation menu

Search