Skip to content

Enterprise Install

Looking to get an installer and host Lamini on-premise or on a GPU VM in your VPC? Contact us! We're happy to advise you on purchasing and configuring the right machines for your needs, picking the right model, handling your data volume and number of users, and the other challenges of running an LLM application.

System requirements

Before getting started, make sure your machine is set up to run Lamini smoothly. Check that your machine has at least:

  • 64 GB CPU memory
  • 32 GB GPU memory
  • 1 TB disk
  • Ubuntu 22*

*other linux distros should work as long as they run Docker/OCI

You can run Lamini on your laptop for dev and testing. CPUs can run LLMs with hundreds of millions of parameters (like hf-internal-testing/tiny-random-gpt2) just fine.

Dependencies

Lamini is entirely self contained and can run on any machine that can run Docker or OCI containers. In addition to the operating system, provisioning involves installing Docker, and installing the GPU driver.

Docker

Install Docker by following the instructions here 🔗.

GPU Driver

  1. Install the GPU driver for the operating system following the manufacturer's instructions.
    1. Note that the driver version must be compatible with PyTorch: https://pytorch.org/get-started/locally/.
  2. Run system management interface (SMI) tests inside of a GPU enabled docker container to verify the installation.

Installation

Docker

  1. Using the link provided by Lamini, download the installer: $ wget -O lamini-installer.sh 'link-to-installer'.
  2. Add execute permissions: $ chmod +x lamini-installer.sh.
  3. Run the installer: $ ./lamini-installer.sh.

Kubernetes

Docs coming soon! Contact us for help.

Running Lamini

Congrats and welcome to the herd!!

Go to the lamini installer directory: $ cd build-lamini-installer/lamini-installer

Get your Hugging Face Access Token from: https://huggingface.co/settings/tokens

Enter the token in the config file, configs/llama_config_edits.yaml, under the huggingface token field:

huggingface: # This is the Hugging Face API token, it will default to offline mode if no token is provided
    token: ""
Start Lamini with $ ./lamini-up.

Once running, you can view the UI at http://localhost:5001!

Using your local instance

To use your running Lamini instance with the Lamini library, set the API url to your local instance:

import lamini

lamini.api_url = "http://localhost:5001"
lamini.api_key = "test_token"

llm = lamini.Lamini(model_name="meta-llama/Meta-Llama-3-8B-Instruct")
print(llm.generate("How are you?", output_type={"Response":"str"}))

Configuring Lamini

llama_config.yaml

Most configuration options for Lamini are available in a single yaml configuration file, which is installed at: $ ./build-lamini-installer/lamini-installer/configs/llama_config.yaml

Some common config values:

  1. verbose : Set to true to enable verbose logging
  2. powerml : A list of API endpoints. If you want to run different services on different machines, e.g. in a kubernetes cluster, configure each service's endpoints here.

docker-compose.yaml

The list of all Lamini services is available in the docker-compose.yaml file $ ./build-lamini-installer/lamini-installer/docker-compose.yaml

Some common config values:

1volumes.slurm-volume: where do you want fine-tuned models to be stored (they are saved in pytorch format)