Howto:Setup a LLM Server

From innovaphone wiki
Jump to navigation Jump to search

How to provide a LLM Server (experimental - not yet finished and fully tested)

You may decide to host your own free model on your own server or in your cloud. This is how you would approach the issue. First you need the server with a a rather performant GPU and sufficient VRAM. On our hardware (2xNVIDIA GeForce RTX 4090) we tested

Name VRAM/GB
deepseek-r1:70b 42
deepseek-r1:32b 19
deepseek-r1:14b 9
mixtral 26

So the deepseek-r1:70b was the most performant model we could run sufficiently fast with our hardware. For the smaller models one GPU (NVIDIA GeForce RTX 4090) would be sufficient..

The server with corresponding GPUs can be rented from various providers. A European cloud provider is for example hostkey But there are many more. So if you got a running linux machine you should install ollama

curl -fsSL https://ollama.com/install.sh | sh

Then you get the model from the repository

ollama pull mixtral

Restrict access to your Application platform

systemctl edit ollama.service

under [Service] add a line like

Environment="OLLAMA_ORIGINS=http://myap.example.com"

Reload systemd and restart ollama

systemctl daemon-reload
systemctl restart ollama

That's basically all. You may need to setup some security too, firewall ...

Related Articles

Reference15r1:Concept_App_Service_myApps_Assistant