Enterprise Install
Looking to get an installer and host Lamini on-premise or on a GPU VM in your VPC? Contact us! We're happy to advise you on purchasing and configuring the right machines for your needs, picking the right model, handling your data volume and number of users, and the other challenges of running an LLM application.
System requirements
Before getting started, make sure your machine is set up to run Lamini smoothly. Check that your machine has at least:
- 64 GB CPU memory
- 32 GB GPU memory
- 1 TB disk
- Ubuntu 22*
*other linux distros should work as long as they run Docker/OCI
You can run Lamini on your laptop for dev and testing. CPUs can run LLMs with hundreds of millions of parameters (like hf-internal-testing/tiny-random-gpt2
) just fine.
Dependencies
Lamini is entirely self contained and can run on any machine that can run Docker or OCI containers. In addition to the operating system, provisioning involves installing Docker, and installing the GPU driver.
Docker
Install Docker by following the instructions here 🔗.
GPU Driver
- Install the GPU driver for the operating system following the manufacturer's instructions.
- Note that the driver version must be compatible with PyTorch: https://pytorch.org/get-started/locally/.
- Run system management interface (SMI) tests inside of a GPU enabled docker container to verify the installation.
Installation
Docker
- Using the link provided by Lamini, download the installer:
$ wget -O lamini-installer.sh '
link-to-installer
'
. - Add execute permissions:
$ chmod +x lamini-installer.sh
. - Run the installer:
$ ./lamini-installer.sh
.
Kubernetes
Docs coming soon! Contact us for help.
Running Lamini
Congrats and welcome to the herd!!
Go to the lamini installer directory: $ cd build-lamini-installer/lamini-installer
Get your Hugging Face Access Token from: https://huggingface.co/settings/tokens
Enter the token in the config file, configs/llama_config_edits.yaml
, under the huggingface token field:
huggingface: # This is the Hugging Face API token, it will default to offline mode if no token is provided
token: ""
$ ./lamini-up
.
Once running, you can view the UI at http://localhost:5001!
Using your local instance
To use your running Lamini instance with the Lamini library, set the API url to your local instance:
import lamini
lamini.api_url = "http://localhost:5001"
lamini.api_key = "test_token"
llm = lamini.Lamini(model_name="meta-llama/Meta-Llama-3-8B-Instruct")
print(llm.generate("How are you?", output_type={"Response":"str"}))
Configuring Lamini
llama_config.yaml
Most configuration options for Lamini are available in a single
yaml configuration file, which is installed at:
$ ./build-lamini-installer/lamini-installer/configs/llama_config.yaml
Some common config values:
verbose
: Set to true to enable verbose loggingpowerml
: A list of API endpoints. If you want to run different services on different machines, e.g. in a kubernetes cluster, configure each service's endpoints here.
docker-compose.yaml
The list of all Lamini services is available in the docker-compose.yaml file
$ ./build-lamini-installer/lamini-installer/docker-compose.yaml
Some common config values:
1volumes.slurm-volume
: where do you want fine-tuned models to be stored (they are saved in pytorch format)