Run other Models

Running other models

Do you have already a model file? Skip to Run models manually.

To load models into LocalAI, you can either use models manually or configure LocalAI to pull the models from external sources, like Huggingface and configure the model.

To do that, you can point LocalAI to an URL to a YAML configuration file - however - LocalAI does also have some popular model configuration embedded in the binary as well. Below you can find a list of the models configuration that LocalAI has pre-built, see Model customization on how to configure models from URLs.

There are different categories of models: LLMs, Multimodal LLM , Embeddings, Audio to Text, and Text to Audio depending on the backend being used and the model architecture.

💡

To customize the models, see Model customization. For more model configurations, visit the Examples Section and the configurations for the models below is available here.

💡Don’t need GPU acceleration? use the CPU images which are lighter and do not have Nvidia dependencies

Model	Category	Docker command
phi-2	LLM	`docker run -ti -p 8080:8080 localai/localai: 🖼️ v2.13.0 - Model gallery edition-ffmpeg-core phi-2`
🌋 bakllava	Multimodal LLM	`docker run -ti -p 8080:8080 localai/localai: 🖼️ v2.13.0 - Model gallery edition-ffmpeg-core bakllava`
🌋 llava-1.5	Multimodal LLM	`docker run -ti -p 8080:8080 localai/localai: 🖼️ v2.13.0 - Model gallery edition-ffmpeg-core llava-1.5`
🌋 llava-1.6-mistral	Multimodal LLM	`docker run -ti -p 8080:8080 localai/localai: 🖼️ v2.13.0 - Model gallery edition-ffmpeg-core llava-1.6-mistral`
🌋 llava-1.6-vicuna	Multimodal LLM	`docker run -ti -p 8080:8080 localai/localai: 🖼️ v2.13.0 - Model gallery edition-ffmpeg-core llava-1.6-vicuna`
mistral-openorca	LLM	`docker run -ti -p 8080:8080 localai/localai: 🖼️ v2.13.0 - Model gallery edition-ffmpeg-core mistral-openorca`
bert-cpp	Embeddings	`docker run -ti -p 8080:8080 localai/localai: 🖼️ v2.13.0 - Model gallery edition-ffmpeg-core bert-cpp`
all-minilm-l6-v2	Embeddings	`docker run -ti -p 8080:8080 localai/localai: 🖼️ v2.13.0 - Model gallery edition-ffmpeg all-minilm-l6-v2`
whisper-base	Audio to Text	`docker run -ti -p 8080:8080 localai/localai: 🖼️ v2.13.0 - Model gallery edition-ffmpeg-core whisper-base`
rhasspy-voice-en-us-amy	Text to Audio	`docker run -ti -p 8080:8080 localai/localai: 🖼️ v2.13.0 - Model gallery edition-ffmpeg-core rhasspy-voice-en-us-amy`
🐸 coqui	Text to Audio	`docker run -ti -p 8080:8080 localai/localai: 🖼️ v2.13.0 - Model gallery edition-ffmpeg coqui`
🐶 bark	Text to Audio	`docker run -ti -p 8080:8080 localai/localai: 🖼️ v2.13.0 - Model gallery edition-ffmpeg bark`
🔊 vall-e-x	Text to Audio	`docker run -ti -p 8080:8080 localai/localai: 🖼️ v2.13.0 - Model gallery edition-ffmpeg vall-e-x`
mixtral-instruct Mixtral-8x7B-Instruct-v0.1	LLM	`docker run -ti -p 8080:8080 localai/localai: 🖼️ v2.13.0 - Model gallery edition-ffmpeg-core mixtral-instruct`
tinyllama-chat original model	LLM	`docker run -ti -p 8080:8080 localai/localai: 🖼️ v2.13.0 - Model gallery edition-ffmpeg-core tinyllama-chat`
dolphin-2.5-mixtral-8x7b	LLM	`docker run -ti -p 8080:8080 localai/localai: 🖼️ v2.13.0 - Model gallery edition-ffmpeg-core dolphin-2.5-mixtral-8x7b`
🐍 mamba	LLM	GPU-only
animagine-xl	Text to Image	GPU-only
transformers-tinyllama	LLM	GPU-only
codellama-7b (with transformers)	LLM	GPU-only
codellama-7b-gguf (with llama.cpp)	LLM	`docker run -ti -p 8080:8080 localai/localai: 🖼️ v2.13.0 - Model gallery edition-ffmpeg-core codellama-7b-gguf`
hermes-2-pro-mistral	LLM	`docker run -ti -p 8080:8080 localai/localai: 🖼️ v2.13.0 - Model gallery edition-ffmpeg-core hermes-2-pro-mistral`

To know which version of CUDA do you have available, you can check with nvidia-smi or nvcc --version see also GPU acceleration.

Model	Category	Docker command
phi-2	LLM	`docker run -ti -p 8080:8080 --gpus all localai/localai: 🖼️ v2.13.0 - Model gallery edition-cublas-cuda11-core phi-2`
🌋 bakllava	Multimodal LLM	`docker run -ti -p 8080:8080 --gpus all localai/localai: 🖼️ v2.13.0 - Model gallery edition-cublas-cuda11-core bakllava`
🌋 llava-1.5	Multimodal LLM	`docker run -ti -p 8080:8080 localai/localai: 🖼️ v2.13.0 - Model gallery edition-cublas-cuda11-core llava-1.5`
🌋 llava-1.6-mistral	Multimodal LLM	`docker run -ti -p 8080:8080 localai/localai: 🖼️ v2.13.0 - Model gallery edition-cublas-cuda11-core llava-1.6-mistral`
🌋 llava-1.6-vicuna	Multimodal LLM	`docker run -ti -p 8080:8080 localai/localai: 🖼️ v2.13.0 - Model gallery edition-cublas-cuda11-core llava-1.6-vicuna`
mistral-openorca	LLM	`docker run -ti -p 8080:8080 --gpus all localai/localai: 🖼️ v2.13.0 - Model gallery edition-cublas-cuda11-core mistral-openorca`
bert-cpp	Embeddings	`docker run -ti -p 8080:8080 --gpus all localai/localai: 🖼️ v2.13.0 - Model gallery edition-cublas-cuda11-core bert-cpp`
all-minilm-l6-v2	Embeddings	`docker run -ti -p 8080:8080 --gpus all localai/localai: 🖼️ v2.13.0 - Model gallery edition-cublas-cuda11 all-minilm-l6-v2`
whisper-base	Audio to Text	`docker run -ti -p 8080:8080 --gpus all localai/localai: 🖼️ v2.13.0 - Model gallery edition-cublas-cuda11-core whisper-base`
rhasspy-voice-en-us-amy	Text to Audio	`docker run -ti -p 8080:8080 --gpus all localai/localai: 🖼️ v2.13.0 - Model gallery edition-cublas-cuda11-core rhasspy-voice-en-us-amy`
🐸 coqui	Text to Audio	`docker run -ti -p 8080:8080 --gpus all localai/localai: 🖼️ v2.13.0 - Model gallery edition-cublas-cuda11 coqui`
🐶 bark	Text to Audio	`docker run -ti -p 8080:8080 --gpus all localai/localai: 🖼️ v2.13.0 - Model gallery edition-cublas-cuda11 bark`
🔊 vall-e-x	Text to Audio	`docker run -ti -p 8080:8080 --gpus all localai/localai: 🖼️ v2.13.0 - Model gallery edition-cublas-cuda11 vall-e-x`
mixtral-instruct Mixtral-8x7B-Instruct-v0.1	LLM	`docker run -ti -p 8080:8080 --gpus all localai/localai: 🖼️ v2.13.0 - Model gallery edition-cublas-cuda11-core mixtral-instruct`
tinyllama-chat original model	LLM	`docker run -ti -p 8080:8080 --gpus all localai/localai: 🖼️ v2.13.0 - Model gallery edition-cublas-cuda11-core tinyllama-chat`
dolphin-2.5-mixtral-8x7b	LLM	`docker run -ti -p 8080:8080 --gpus all localai/localai: 🖼️ v2.13.0 - Model gallery edition-cublas-cuda11-core dolphin-2.5-mixtral-8x7b`
🐍 mamba	LLM	`docker run -ti -p 8080:8080 --gpus all localai/localai: 🖼️ v2.13.0 - Model gallery edition-cublas-cuda11 mamba-chat`
animagine-xl	Text to Image	`docker run -ti -p 8080:8080 -e COMPEL=0 --gpus all localai/localai: 🖼️ v2.13.0 - Model gallery edition-cublas-cuda11 animagine-xl`
transformers-tinyllama	LLM	`docker run -ti -p 8080:8080 --gpus all localai/localai: 🖼️ v2.13.0 - Model gallery edition-cublas-cuda11 transformers-tinyllama`
codellama-7b	LLM	`docker run -ti -p 8080:8080 --gpus all localai/localai: 🖼️ v2.13.0 - Model gallery edition-cublas-cuda11 codellama-7b`
codellama-7b-gguf	LLM	`docker run -ti -p 8080:8080 --gpus all localai/localai: 🖼️ v2.13.0 - Model gallery edition-cublas-cuda11-core codellama-7b-gguf`
hermes-2-pro-mistral	LLM	`docker run -ti -p 8080:8080 --gpus all localai/localai: 🖼️ v2.13.0 - Model gallery edition-cublas-cuda11-core hermes-2-pro-mistral`

To know which version of CUDA do you have available, you can check with nvidia-smi or nvcc --version see also GPU acceleration.

Model	Category	Docker command
phi-2	LLM	`docker run -ti -p 8080:8080 --gpus all localai/localai: 🖼️ v2.13.0 - Model gallery edition-cublas-cuda12-core phi-2`
🌋 bakllava	Multimodal LLM	`docker run -ti -p 8080:8080 --gpus all localai/localai: 🖼️ v2.13.0 - Model gallery edition-cublas-cuda12-core bakllava`
🌋 llava-1.5	Multimodal LLM	`docker run -ti -p 8080:8080 localai/localai: 🖼️ v2.13.0 - Model gallery edition-cublas-cuda12-core llava-1.5`
🌋 llava-1.6-mistral	Multimodal LLM	`docker run -ti -p 8080:8080 localai/localai: 🖼️ v2.13.0 - Model gallery edition-cublas-cuda12-core llava-1.6-mistral`
🌋 llava-1.6-vicuna	Multimodal LLM	`docker run -ti -p 8080:8080 localai/localai: 🖼️ v2.13.0 - Model gallery edition-cublas-cuda12-core llava-1.6-vicuna`
mistral-openorca	LLM	`docker run -ti -p 8080:8080 --gpus all localai/localai: 🖼️ v2.13.0 - Model gallery edition-cublas-cuda12-core mistral-openorca`
bert-cpp	Embeddings	`docker run -ti -p 8080:8080 --gpus all localai/localai: 🖼️ v2.13.0 - Model gallery edition-cublas-cuda12-core bert-cpp`
all-minilm-l6-v2	Embeddings	`docker run -ti -p 8080:8080 --gpus all localai/localai: 🖼️ v2.13.0 - Model gallery edition-cublas-cuda12 all-minilm-l6-v2`
whisper-base	Audio to Text	`docker run -ti -p 8080:8080 --gpus all localai/localai: 🖼️ v2.13.0 - Model gallery edition-cublas-cuda12-core whisper-base`
rhasspy-voice-en-us-amy	Text to Audio	`docker run -ti -p 8080:8080 --gpus all localai/localai: 🖼️ v2.13.0 - Model gallery edition-cublas-cuda12-core rhasspy-voice-en-us-amy`
🐸 coqui	Text to Audio	`docker run -ti -p 8080:8080 --gpus all localai/localai: 🖼️ v2.13.0 - Model gallery edition-cublas-cuda12 coqui`
🐶 bark	Text to Audio	`docker run -ti -p 8080:8080 --gpus all localai/localai: 🖼️ v2.13.0 - Model gallery edition-cublas-cuda12 bark`
🔊 vall-e-x	Text to Audio	`docker run -ti -p 8080:8080 --gpus all localai/localai: 🖼️ v2.13.0 - Model gallery edition-cublas-cuda12 vall-e-x`
mixtral-instruct Mixtral-8x7B-Instruct-v0.1	LLM	`docker run -ti -p 8080:8080 --gpus all localai/localai: 🖼️ v2.13.0 - Model gallery edition-cublas-cuda12-core mixtral-instruct`
tinyllama-chat original model	LLM	`docker run -ti -p 8080:8080 --gpus all localai/localai: 🖼️ v2.13.0 - Model gallery edition-cublas-cuda12-core tinyllama-chat`
dolphin-2.5-mixtral-8x7b	LLM	`docker run -ti -p 8080:8080 --gpus all localai/localai: 🖼️ v2.13.0 - Model gallery edition-cublas-cuda12-core dolphin-2.5-mixtral-8x7b`
🐍 mamba	LLM	`docker run -ti -p 8080:8080 --gpus all localai/localai: 🖼️ v2.13.0 - Model gallery edition-cublas-cuda12 mamba-chat`
animagine-xl	Text to Image	`docker run -ti -p 8080:8080 -e COMPEL=0 --gpus all localai/localai: 🖼️ v2.13.0 - Model gallery edition-cublas-cuda12 animagine-xl`
transformers-tinyllama	LLM	`docker run -ti -p 8080:8080 --gpus all localai/localai: 🖼️ v2.13.0 - Model gallery edition-cublas-cuda12 transformers-tinyllama`
codellama-7b	LLM	`docker run -ti -p 8080:8080 --gpus all localai/localai: 🖼️ v2.13.0 - Model gallery edition-cublas-cuda12 codellama-7b`
codellama-7b-gguf	LLM	`docker run -ti -p 8080:8080 --gpus all localai/localai: 🖼️ v2.13.0 - Model gallery edition-cublas-cuda12-core codellama-7b-gguf`
hermes-2-pro-mistral	LLM	`docker run -ti -p 8080:8080 --gpus all localai/localai: 🖼️ v2.13.0 - Model gallery edition-cublas-cuda12-core hermes-2-pro-mistral`

💡

Tip You can actually specify multiple models to start an instance with the models loaded, for example to have both llava and phi-2 configured:

  docker run -ti -p 8080:8080 localai/localai: 🖼️ v2.13.0 - Model gallery edition-ffmpeg-core llava phi-2

Edit this page

Last updated 28 Mar 2024, 12:42 +0100 . history

Quickstart

Customizing the Model

Run other Models

Running other models link

Running other models