.NET Technology Show

Using llama.cpp to self-host Large Language Models in Production

A practical guide to self-hosting LLMs in production using llama.cpp's llama-server with Docker compose and Systemd