Guide to Huggingface Phixtral-4x2_8

Phixtral-4x2_8 is a model created using the Mixure of Experts (MoE) approach, combining four microsoft/phi-2 models. The model was inspired by the mistralai/Mixtral-8x7B-v0.1 architecture and has shown better performance compared to individual expert models.

🏆 Evaluation
To compare Phixtral-4x2_8 with other models, check the Yet Another LLM Leaderboard (YALL) at https://huggingface.co/spaces/mlabonne/Yet_Another_LLM_Leaderboard.

🧩 Configuration
The model has been built using a custom version of the mergekit library (mixtral branch) and the following configuration:

base_model: cognitivecomputations/dolphin-2_6-phi-2
gate_mode: cheap_embed
experts:
– source_model: cognitivecomputations/dolphin-2_6-phi-2
positive_prompts: [“”]
– source_model: lxuechen/phi-2-dpo
positive_prompts: [“”]
– source_model: Yhyu13/phi-2-sft-dpo-gpt4_en-ep1
positive_prompts: [“”]
– source_model: mrm8488/phi-2-coder
positive_prompts: [“”]

💻 Usage
To run Phixtral-4x2_8 in 4-bit precision on a free T4 GPU, use the provided Colab notebook at https://colab.research.google.com/drive/1k6C_oJfEKUq0mtuWKisvoeMHxTcIxWRa?usp=sharing. The notebook provides the necessary instructions and code to run the model.

For more detailed understanding on how to specify the num_experts_per_tok and num_local_experts, please refer to the configuration file (config.json) and the MoE class in the modeling_phi.py file at https://huggingface.co/mlabonne/phixtral-4x2_8/blob/main/modeling_phi.py#L293-L317.

🤝 Acknowledgments
Special thanks to vince62s for the MoE inference code and dynamic configuration of the number of experts. Thanks to Charles Goddard for the mergekit library and MoE for clowns implementation. Appreciation to ehartford, lxuechen, Yhyu13, and mrm8488 for their fine-tuned phi-2 models.

For additional guidance, visit the provided links and resources.

[Information and image source: huggingface.co]

Source link
# Huggingface Phixtral-4x2_8 Manual

## Introduction
Phixtral-4x2_8 is a Mixure of Experts (MoE) model made with four [microsoft/phi-2](https://huggingface.co/microsoft/phi-2) models, inspired by the architecture of [mistralai/Mixtral-8x7B-v0.1](https://huggingface.co/mistralai/Mixtral-8x7B-v0.1). It outperforms each individual expert.

## 🏆 Evaluation
Check [YALL – Yet Another LLM Leaderboard](https://huggingface.co/spaces/mlabonne/Yet_Another_LLM_Leaderboard) to compare Phixtral with other models.

## 🧩 Configuration
The model has been made with a custom version of the [mergekit](https://github.com/cg123/mergekit) library (mixtral branch) and the following configuration:

“`yaml
base_model: cognitivecomputations/dolphin-2_6-phi-2
gate_mode: cheap_embed
experts:
– source_model: cognitivecomputations/dolphin-2_6-phi-2
positive_prompts: [“”]
– source_model: lxuechen/phi-2-dpo
positive_prompts: [“”]
– source_model: Yhyu13/phi-2-sft-dpo-gpt4_en-ep1
positive_prompts: [“”]
– source_model: mrm8488/phi-2-coder
positive_prompts: [“”]
“`

## 💻 Usage
Here’s a [Colab notebook](https://colab.research.google.com/drive/1k6C_oJfEKUq0mtuWKisvoeMHxTcIxWRa?usp=sharing) to run Phixtral in 4-bit precision on a free T4 GPU.

“`python
!pip install -q –upgrade transformers einops accelerate bitsandbytes

import torch
from transformers import AutoModelForCausalLM, AutoTokenizer

model_name = “phixtral-4x2_8″
instruction = ”’
def print_prime(n):
“””
Print all primes between 1 and n
“””
”’

torch.set_default_device(“cuda”)

model = AutoModelForCausalLM.from_pretrained(
f”mlabonne/{model_name}”,
torch_dtype=”auto”,
load_in_4bit=True,
trust_remote_code=True
)
tokenizer = AutoTokenizer.from_pretrained(
f”mlabonne/{model_name}”,
trust_remote_code=True
)

inputs = tokenizer(
instruction,
return_tensors=”pt”,
return_attention_mask=False
)

outputs = model.generate(**inputs, max_length=200)

text = tokenizer.batch_decode(outputs)[0]
print(text)
“`

Inspired by [mistralai/Mixtral-8x7B-v0.1](https://huggingface.co/mistralai/Mixtral-8x7B-v0.1), you can specify the `num_experts_per_tok` and `num_local_experts` in the [config.json](https://huggingface.co/mlabonne/phixtral-4x2_8/blob/main/config.json#L26-L27) file (2 and 4 by default). This configuration is automatically loaded in `configuration.py`.

[vince62s](https://huggingface.co/vince62s) implemented the MoE inference code in the `modeling_phi.py` file. In particular, see the [MoE class](https://huggingface.co/mlabonne/phixtral-4x2_8/blob/main/modeling_phi.py#L293-L317).

## 🤝 Acknowledgments
A special thanks to [vince62s](https://huggingface.co/vince62s) for the inference code and the dynamic configuration of the number of experts. Also, thanks to [Charles Goddard](https://github.com/cg123) for the [mergekit](https://github.com/cg123/mergekit) library and the implementation of the [MoE for clowns](https://goddard.blog/posts/clown-moe). Further, thanks to [ehartford](https://huggingface.co/ehartford), [lxuechen](https://huggingface.co/lxuechen), [Yhyu13](https://huggingface.co/Yhyu13), and [mrm8488](https://huggingface.co/mrm8488) for their fine-tuned phi-2 models.

phixtral-4x2_8 is the first Mixure of Experts (MoE) made with four microsoft/phi-2 models, inspired by the mistralai/Mixtral-8x7B-v0.1 architecture. It performs better than each individual expert.



🏆 Evaluation

Check YALL – Yet Another LLM Leaderboard to compare it with other models.



🧩 Configuration

The model has been made with a custom version of the mergekit library (mixtral branch) and the following configuration:

base_model: cognitivecomputations/dolphin-2_6-phi-2
gate_mode: cheap_embed
experts:
  - source_model: cognitivecomputations/dolphin-2_6-phi-2
    positive_prompts: [""]
  - source_model: lxuechen/phi-2-dpo
    positive_prompts: [""]
  - source_model: Yhyu13/phi-2-sft-dpo-gpt4_en-ep1
    positive_prompts: [""]
  - source_model: mrm8488/phi-2-coder
    positive_prompts: [""]



💻 Usage

Here’s a Colab notebook to run Phixtral in 4-bit precision on a free T4 GPU.

!pip install -q --upgrade transformers einops accelerate bitsandbytes

import torch
from transformers import AutoModelForCausalLM, AutoTokenizer

model_name = "phixtral-4x2_8"
instruction = '''
    def print_prime(n):
        """
        Print all primes between 1 and n
        """
'''

torch.set_default_device("cuda")


model = AutoModelForCausalLM.from_pretrained(
    f"mlabonne/{model_name}", 
    torch_dtype="auto", 
    load_in_4bit=True, 
    trust_remote_code=True
)
tokenizer = AutoTokenizer.from_pretrained(
    f"mlabonne/{model_name}", 
    trust_remote_code=True
)


inputs = tokenizer(
    instruction, 
    return_tensors="pt", 
    return_attention_mask=False
)


outputs = model.generate(**inputs, max_length=200)


text = tokenizer.batch_decode(outputs)[0]
print(text)

Inspired by mistralai/Mixtral-8x7B-v0.1, you can specify the num_experts_per_tok and num_local_experts in the config.json file (2 and 4 by default). This configuration is automatically loaded in configuration.py.

vince62s implemented the MoE inference code in the modeling_phi.py file. In particular, see the MoE class.



🤝 Acknowledgments

A special thanks to vince62s for the inference code and the dynamic configuration of the number of experts. He was very patient and helped me to debug everything.

Thanks to Charles Goddard for the mergekit library and the implementation of the MoE for clowns.

Thanks to ehartford, lxuechen, Yhyu13, and mrm8488 for their fine-tuned phi-2 models.

The

tag in HTML is used to create a division or section in a document. It is often used to group together HTML elements, such as text, images, or other tags, and apply styles or scripting to them.

In the provided HTML code, the

tag encapsulates a series of

and

tags with various anchor links and content. The content inside the

tag seems to be related to machine learning models, evaluation, configuration, usage, and acknowledgments. Each section contains information related to Mixure of Experts (MoE) and usage advice.

Here are some potential use cases for the

tag based on the content within the given HTML snippet:

1. Organizing Content: The

tag can be used to organize content into logical sections on a webpage. In the provided code, it groups together information related to the evaluation, configuration, usage, and acknowledgments of a machine learning model.

2. Applying Styles: The

tag can be used to apply styles to a group of elements within a webpage. This can include setting backgrounds, borders, or other visual properties. In the given code, the

tag wraps content related to machine learning models, and styles can be applied to this entire section.

3. Scripting: The

tag can be targeted using scripting languages such as JavaScript to manipulate its content or behavior. This can be useful for dynamically updating or changing the content within the

. For instance, the

content about model usage could be updated dynamically based on user interactions.

4. Responsive Design: The

tag can be used to create responsive design layouts by dividing the webpage into different sections using CSS media queries. This can ensure that the content inside the

adapts to different screen sizes or devices.

5. Search Engine Optimization (SEO): The

tag can be used to structure content in a way that is more easily understood by search engines. The logical organization of content within a

can help improve the visibility and ranking of the webpage in search results.

Overall, the

tag is a versatile tool in HTML for structuring and organizing content, applying styles and scripting, and optimizing webpages for various purposes. In the given code, it helps to group related content about machine learning models and their usage, making it easier to manage and style the content as a cohesive unit.

2024-01-10T13:26:36+01:00