Prerequisites
- Create a project and add an SSH key
- Optionally download CLI tool
- Choose a virtual machine with an NVIDIA GPU and Configure
- Use the Ubuntu 22.04 + NVIDIA drivers + Docker image (in CLI tool type
-image ubuntu-2204-nvidia-535-docker-v20240214) - Start a virtual machine with one or more GPUs
Start AI API
We will start a docker network and run a docker container with Ollama to deploy LLMs. Then we will run a second docker container with Kong API Gateway that will connect to Ollama. Kong is being run without a database, so it simply requires a yaml file. SSH on to your CUDO GPU virtual machine and create a docker networkMake SSL Keys
On the CUDO virtual machine create an SSL certificate, replace the IP with the CUDO virtual machine IP addressMake a yaml file
This yaml file will configure kong to connect to the Ollama docker container. If you are using another service, change the name and port of your docker container in the url:http://ollama:11434.
Here the key-auth kong plugin is used to add key based authentication. Swap my-key for your secure key. Change the path to your desired path.
kong.yaml
Run Kong docker container
Run a detached docker container with Kong:Testing
Testing on virtual machine
SSH on to the CUDO virtual machine and run:/ollama for the path defined in the yaml file. You should see the expected output from your API.