Howto:Setup a LLM Server
How to provide a LLM Server (experimental - not yet finished and fully tested)
You may decide to host your own free model on your own server or in your cloud. This is how you would approach the issue. First you need the server with a a rather performant GPU and sufficient VRAM. On our hardware (2xNVIDIA GeForce RTX 4090) we tested
| Name | VRAM/GB | 
|---|---|
| deepseek-r1:70b | 42 | 
| deepseek-r1:32b | 19 | 
| deepseek-r1:14b | 9 | 
| mixtral | 26 | 
So the deepseek-r1:70b was the most performant model we could run sufficiently fast with our hardware. For the smaller models one GPU (NVIDIA GeForce RTX 4090) would be sufficient..
The server with corresponding GPUs can be rented from various providers. A European cloud provider is for example hostkey But there are many more. So if you got a running linux machine you should install ollama
curl -fsSL https://ollama.com/install.sh | sh
Then you get the model from the repository
ollama pull mixtral
Restrict access to your Application platform
systemctl edit ollama.service
under [Service] add a line like
Environment="OLLAMA_ORIGINS=http://myap.example.com"
Reload systemd and restart ollama
systemctl daemon-reload systemctl restart ollama
That's basically all. You may need to setup some security too, firewall ...