# Guide to Hugging Face: Phixtral

If you’re interested in trying out the “phixtral-2x2_8” model, here are some tips and resources that can help you get started.

## 🏆 Evaluation

Before diving in, you might want to check out the [YALL – Yet Another LLM Leaderboard](https://huggingface.co/spaces/mlabonne/Yet_Another_LLM_Leaderboard) to see how the “phixtral-2x2_8” model compares with other models.

## 🧩 Configuration

The “phixtral-2x2_8” model has been made with a custom version of the [mergekit](https://github.com/cg123/mergekit) library (mixtral branch). Here’s the configuration:

“`yaml
base_model: cognitivecomputations/dolphin-2_6-phi-2
gate_mode: cheap_embed
experts:
– source_model: cognitivecomputations/dolphin-2_6-phi-2
positive_prompts: [“”]
– source_model: lxuechen/phi-2-dpo
positive_prompts: [“”]
“`

## 💻 Usage

To test the “phixtral-2x2_8” model, you can use this [Colab notebook](https://colab.research.google.com/drive/1k6C_oJfEKUq0mtuWKisvoeMHxTcIxWRa?usp=sharing) to run Phixtral in 4-bit precision on a free T4 GPU. Here’s a snippet of the Python code you might use:

“`python
!pip install -q –upgrade transformers einops accelerate bitsandbytes

import torch
from transformers import AutoModelForCausalLM, AutoTokenizer

model_name = “phixtral-2x2_8″
instruction = ”’
def print_prime(n):
“””
Print all primes between 1 and n
“””
”’

torch.set_default_device(“cuda”)

model = AutoModelForCausalLM.from_pretrained(
f”mlabonne/{model_name}”,
torch_dtype=”auto”,
load_in_4bit=True,
trust_remote_code=True
)
tokenizer = AutoTokenizer.from_pretrained(
f”mlabonne/{model_name}”,
trust_remote_code=True
)

inputs = tokenizer(
instruction,
return_tensors=”pt”,
return_attention_mask=False
)

outputs = model.generate(**inputs, max_length=200)

text = tokenizer.batch_decode(outputs)[0]
print(text)
“`

## 🤝 Acknowledgments

Special thanks to [vince62s](https://huggingface.co/vince62s) for the inference code and the dynamic configuration of the number of experts. Also, thanks to [Charles Goddard](https://github.com/cg123) for the [mergekit](https://github.com/cg123/mergekit) library and the implementation of the [MoE for clowns](https://goddard.blog/posts/clown-moe/). Lastly, thanks to [ehartford](https://huggingface.co/ehartford) and [lxuechen](https://huggingface.co/lxuechen) for their fine-tuned phi-2 models.

To try out the “phixtral-2x2_8” model and explore its capabilities, you can visit the following [Space](https://huggingface.co/spaces/mlabonne/phixtral-chat).

Happy experimenting!

Source link
# HuggingFace Phixtral-2x2_8 Manual

## Introduction
The Phixtral-2x2_8 is the first Mixure of Experts (MoE) made with two microsoft/phi-2 models, inspired by the mistralai/Mixtral-8x7B-v0.1 architecture. It performs better than each individual expert. This manual will guide you through the evaluation, configuration, and usage of the Phixtral-2x2_8 model.

## 🏆 Evaluation
Check YALL – Yet Another LLM Leaderboard to compare it with other models.

## 🧩 Configuration
The model has been made with a custom version of the mergekit library (mixtral branch) and the following configuration:
“`yaml
base_model: cognitivecomputations/dolphin-2_6-phi-2
gate_mode: cheap_embed
experts:
– source_model: cognitivecomputations/dolphin-2_6-phi-2
positive_prompts: [“”]
– source_model: lxuechen/phi-2-dpo
positive_prompts: [“”]
“`

## 💻 Usage
Here’s a Colab notebook to run Phixtral in 4-bit precision on a free T4 GPU. You will need to install the required packages and import the necessary modules to run the model.

“`python
!pip install -q –upgrade transformers einops accelerate bitsandbytes

import torch
from transformers import AutoModelForCausalLM, AutoTokenizer

model_name = “phixtral-2x2_8”

# Code snippet for using the model

“`

You can also specify the `num_experts_per_tok` and `num_local_experts` in the config.json file. This configuration is automatically loaded in configuration.py.

## 🤝 Acknowledgments
A special thanks to vince62s for the inference code and the dynamic configuration of the number of experts. Thanks to Charles Goddard for the mergekit library and the implementation of the MoE for clowns. Thanks to ehartford and lxuechen for their fine-tuned phi-2 models.

For more information and details about the Phixtral-2x2_8 model, you can visit the official HuggingFace website and explore the model’s Space.

This concludes the manual for the HuggingFace Phixtral-2x2_8 model. Thank you for using Huggingface!

phixtral-2x2_8 is the first Mixure of Experts (MoE) made with two microsoft/phi-2 models, inspired by the mistralai/Mixtral-8x7B-v0.1 architecture. It performs better than each individual expert.

You can try it out using this Space.



🏆 Evaluation

Check YALL – Yet Another LLM Leaderboard to compare it with other models.



🧩 Configuration

The model has been made with a custom version of the mergekit library (mixtral branch) and the following configuration:

base_model: cognitivecomputations/dolphin-2_6-phi-2
gate_mode: cheap_embed
experts:
  - source_model: cognitivecomputations/dolphin-2_6-phi-2
    positive_prompts: [""]
  - source_model: lxuechen/phi-2-dpo
    positive_prompts: [""]



💻 Usage

Here’s a Colab notebook to run Phixtral in 4-bit precision on a free T4 GPU.

!pip install -q --upgrade transformers einops accelerate bitsandbytes

import torch
from transformers import AutoModelForCausalLM, AutoTokenizer

model_name = "phixtral-2x2_8"
instruction = '''
    def print_prime(n):
        """
        Print all primes between 1 and n
        """
'''

torch.set_default_device("cuda")


model = AutoModelForCausalLM.from_pretrained(
    f"mlabonne/{model_name}", 
    torch_dtype="auto", 
    load_in_4bit=True, 
    trust_remote_code=True
)
tokenizer = AutoTokenizer.from_pretrained(
    f"mlabonne/{model_name}", 
    trust_remote_code=True
)


inputs = tokenizer(
    instruction, 
    return_tensors="pt", 
    return_attention_mask=False
)


outputs = model.generate(**inputs, max_length=200)


text = tokenizer.batch_decode(outputs)[0]
print(text)

Inspired by mistralai/Mixtral-8x7B-v0.1, you can specify the num_experts_per_tok and num_local_experts in the config.json file (2 for both by default). This configuration is automatically loaded in configuration.py.

vince62s implemented the MoE inference code in the modeling_phi.py file. In particular, see the MoE class.



🤝 Acknowledgments

A special thanks to vince62s for the inference code and the dynamic configuration of the number of experts. He was very patient and helped me to debug everything.

Thanks to Charles Goddard for the mergekit library and the implementation of the MoE for clowns.

Thanks to ehartford and lxuechen for their fine-tuned phi-2 models.

The

tag in HTML is a versatile and flexible element that can be used to define divisions or sections in a document. It can be used for various purposes, and below are some use cases of the

tag in HTML without the presence of h1 and body tags:

1. Embedding Images and Links:
The provided code snippet includes image and link tags within the

. This allows for embedding images and adding hyperlinks to the content. The

tag can be used to encapsulate these elements and organize them within a specific section of the webpage.

2. Text Content:
The

tag can be used to structure and organize textual content on a webpage. In the given code, paragraphs of text are enclosed within the

tag. This helps in separating and arranging the text content within the document.

3. Configuration and Code Snippets:
The code snippet within the

 tags is used to display configuration settings and Python code. The 

tag can be used to enclose such code snippets and provide a clear division for displaying programming instructions or setup details.

4. Interactive Spaces and Notebooks:
The

tag is utilized to enclose a link to an interactive space or notebook. This allows for the organization of interactive content within a webpage, making it easily accessible to users.

5. Acknowledgments and Credits:
The

tag is used to group and present acknowledgments to individuals who have contributed to the project. By encapsulating these acknowledgments within a

, it helps to structurally organize the content and distinguish it from other sections of the page.

Overall, the

tag in HTML serves as a fundamental building block for organizing and structuring content on a webpage. It can be employed for a wide range of use cases, from grouping related content to embedding multimedia elements and presenting textual information. When used effectively, the

tag contributes to the visual layout and organization of web content, enhancing the overall user experience.

2024-01-11T03:11:54+01:00