Model | Model size | Template |
---|---|---|
Baichuan 2 | 7B/13B Baichuan2-7B-Chat Baichuan2-13B-Chat |
baichuan2 |
BLOOM/BLOOMZ | 7.1B/176B bloom-7b1 tr11-176B-logs |
|
ChatGLM3 | 6B chatglm3-6b |
chatglm3 |
Command R | 35B/104B aya-23-35B c4ai-command-r-plus |
cohere |
DeepSeek (Code/MoE) | 7B/16B/67B/236B deepseek-vl-7b-chat deepseek-moe-16b-chat deepseek-llm-67b-chat deepSeek-V2 |
deepseek |
DistilGPT2 | 82M distilgpt2 |
|
Falcon | 7B/11B/40B/180B falcon-7b falcon-11B Falcon-40b falcon-180B |
falcon |
Gemma/Gemma 2/CodeGemma | 2B/7B/9B/27B gemma-2b gemma-7b codegemma-7b gemma-2-9b gemma-2-27b |
gemma |
GLM-4 | 9B glm-4-9b-chat |
glm4 |
GPT2 | 124M/ gpt2 gpt2-large gpt2-xl |
|
InternLM2 | 7B/20B internlm2-7b internlm2-20b |
intern2 |
Llama 2 | 7B/13B/70B Llama-2-7b-chat-hf Llama-2-13b-hf Llama-2-70b-hf |
llama2 |
Llama 3 | 8B/70B Meta-Llama-3-8B Meta-Llama-3-8B-Instruct Meta-Llama-3-70B |
llama3 |
LLaVA-1.5 | 7B/13B llava-1.5-7b-hf llava-1.5-13b-hf |
vicuna |
Mistral/Mixtral | 7B/8x7B/8x22B Mistral-7B-Instruct-v0.1 Mistral-7B-Instruct-v0.3 Mixtral-8x7B-v0.1 Mixtral-8x7B-Instruct-v0.1 Mixtral-8x22B-v0.1 |
mistral |
OLMo | 1B/7B OLMo-1B-hf OLMo-7B-hf |
|
OPT | 125M/350M opt-125m opt-350m |
|
Phi-1.5/Phi-2 | 1.3B/2.7B phi-1_5 phi-2 |
|
Phi-3 | 4B/7B/14B Phi-3-mini-128k-instruct Phi-3-small-8k-instruct Phi-3-medium-128k-instruct |
phi |
Qwen/Qwen1.5/Qwen2 (Code/MoE) | 0.5B/1.5B/7B/14B/32B/72B/110B Qwen-7B-Chat Qwen-72B Qwen1.5-110B-Chat Qwen2-0.5B Qwen2-1.5B Qwen2-7B-Instruct Qwen2-57B-A14B-Instruct Qwen2-72B-Instruct |
qwen |
StarCoder 2 | 3B/7B/15B starcoder2-3b Starcoder2-7b starcoder2-15b |
|
tiny-gpt2 | 110MB tiny-gpt2 |
|
TinyLlama | 1.1B TinyLlama-1.1B-Chat-v1.0 |
|
vicuna | 7B/13B Vicuna-7b-v1.5 vicuna-13b-v1.5 |
vicuna |
Yi/Yi-1.5 | 6B/9B/34B Yi-34B-Chat Yi-1.5-6B-Chat Yi-1.5-9B-Chat Yi-1.5-34B-Chat |
yi |