**Huggingface – A Guide**

This guide provides information on how to use Huggingface’s AIM (Autoregressive Image Models) software. This software project is associated with the research paper, “Scalable Pre-training of Large Autoregressive Image Models”.

**Introduction**

Huggingface AIM is a collection of vision models pre-trained with an autoregressive generative objective. The software is built to leverage large collections of uncurated image data and exhibits similar scaling properties to large language models.

**Installation**

To install Huggingface AIM, you will need to:
1. Install pytorch using the official installation instructions available at [pytorch Installation Instructions](https://pytorch.org/get-started/locally/).
2. After installing pytorch, install the AIM package using the following command:
“`
pip install git+https://git@github.com/apple/ml-aim.git
“`
Additionally, you can enable MLX support by running the command:
“`
pip install mlx
“`

**Usage in pytorch**

To use Huggingface AIM in pytorch, you can refer to the following example:

“`python
from PIL import Image
from aim.utils import load_pretrained
from aim.torch.data import val_transforms

img = Image.open(…)
model = load_pretrained(“aim-600M-2B-imgs”, backend=”torch”)
transform = val_transforms()
inp = transform(img).unsqueeze(0)
logits, _ = model(inp)
“`

**Pre-trained Checkpoints**

The pre-trained models can be accessed via pytorch Hub using the following code:
“`python
import torch
aim_600m = torch.hub.load(“apple/ml-aim”, “aim_600M”)
aim_1b = torch.hub.load(“apple/ml-aim”, “aim_1B”)
aim_3b = torch.hub.load(“apple/ml-aim”, “aim_3B”)
aim_7b = torch.hub.load(“apple/ml-aim”, “aim_7B”)
“`

**Reproducing the IN-1k Classification Results**

To reproduce the attention probe results on the ImageNet-1k validation set, you can run the evaluation using the following command:

“`bash
torchrun –standalone –nnodes=1 –nproc-per-node=8 main_attnprobe.py \
–model=aim-7B \
–batch-size=64 \
–data-path=/path/to/imagenet \
–probe-layers=last \
–backbone-ckpt-path=/path/to/backbone_ckpt.pth \
–head-ckpt-path=/path/to/head_ckpt.pth
“`

This guide provides an overview of the Huggingface AIM software and its usage in pytorch, pre-trained checkpoints, and reproducing IN-1k classification results. For more detailed information and usage examples, please refer to the official documentation and resources provided by Huggingface.

Source link
# Huggingface Software Project Manual

## Introduction
This manual is a guide for using the Huggingface AIM (Auto-regressive Image Models) software project. The project is a collection of vision models pre-trained with an autoregressive generative objective. It is designed to demonstrate that autoregressive pre-training of image features has similar scaling properties to large language models, allowing for the easy scaling of model capacity to billions of parameters.

## Installation
To install the Huggingface AIM software project, please follow the steps below:

1. First, install pytorch using the official installation instructions from [pytorch](https://pytorch.org/get-started/locally/).
2. After installing pytorch, install the AIM package using pip:
“`
pip install git+https://git@github.com/apple/ml-aim.git
“`
3. For MLX backend support, you can also install MLX by running:
“`
pip install mlx
“`

## Usage
The software project can be used with pytorch, MLX, and JAX. Below are examples of usage in pytorch, MLX, and JAX:

### Usage in pytorch:
“`python
from PIL import Image
from aim.utils import load_pretrained
from aim.torch.data import val_transforms

img = Image.open(…)
model = load_pretrained(“aim-600M-2B-imgs”, backend=”torch”)
transform = val_transforms()

inp = transform(img).unsqueeze(0)
logits, _ = model(inp)
“`

### Usage in MLX:
“`python
from PIL import Image
import mlx.core as mx
from aim.utils import load_pretrained
from aim.torch.data import val_transforms

img = Image.open(…)
model = load_pretrained(“aim-600M-2B-imgs”, backend=”mlx”)
transform = val_transforms()

inp = transform(img).unsqueeze(0)
inp = mx.array(inp.numpy())
logits, _ = model(inp)
“`

### Usage in JAX:
“`python
from PIL import Image
import jax.numpy as jnp
from aim.utils import load_pretrained
from aim.torch.data import val_transforms

img = Image.open(…)
model, params = load_pretrained(“aim-600M-2B-imgs”, backend=”jax”)
transform = val_transforms()

inp = transform(img).unsqueeze(0)
inp = jnp.array(inp)
(logits, _), _ = model.apply(params, inp, mutable=[‘batch_stats’])
“`

## Pre-trained Checkpoints
Pre-trained models can be accessed via pytorch Hub using the following code:
“`python
import torch

aim_600m = torch.hub.load(“apple/ml-aim”, “aim_600M”)
aim_1b = torch.hub.load(“apple/ml-aim”, “aim_1B”)
aim_3b = torch.hub.load(“apple/ml-aim”, “aim_3B”)
aim_7b = torch.hub.load(“apple/ml-aim”, “aim_7B”)
“`

## Pre-trained Backbones
The following table contains pre-trained backbones used in the paper:

| Model | #Params | Attn (Best Layer) | Backbone, SHA256 |
|———|———|——————–|——————|
| AIM-0.6B| 0.6B | 79.4% | [Link](https://huggingface.co/apple/AIM/resolve/main/aim_600m_2bimgs_attnprobe_backbone.pth), 0d6f6b8f |
| AIM-1B | 1B | 82.3% | [Link](https://huggingface.co/apple/AIM/resolve/main/aim_1b_5bimgs_attnprobe_backbone.pth), d254ecd3 |
| AIM-3B | 3B | 83.3% | [Link](https://huggingface.co/apple/AIM/resolve/main/aim_3b_5bimgs_attnprobe_backbone.pth), 8475ce4e |
| AIM-7B | 7B | 84.0% | [Link](https://huggingface.co/apple/AIM/resolve/main/aim_7b_5bimgs_attnprobe_backbone.pth), 184ed94c |

## Pre-trained Attention Heads
The table below contains the classification results on ImageNet-1k validation set:

| Model | Top-1 IN-1k (Last Layer) | Top-1 IN-1k (Best Layer) | Attention head, SHA256 (Last Layer) | Attention head, SHA256 (Best Layer) |
|———|—————————|—————————|————————————-|————————————–|
| AIM-0.6B| 78.5% | 79.4% | [Link](https://huggingface.co/apple/AIM/resolve/main/aim_600m_2bimgs_attnprobe_head_last_layers.pth), 5ce5a341 | [Link](https://huggingface.co/apple/AIM/resolve/main/aim_600m_2bimgs_attnprobe_head_best_layers.pth), ebd45c05 |
| AIM-1B | 80.6% | 82.3% | [Link](https://huggingface.co/apple/AIM/resolve/main/aim_1b_5bimgs_attnprobe_head_last_layers.pth), db3be2ad | [Link](https://huggingface.co/apple/AIM/resolve/main/aim_1b_5bimgs_attnprobe_head_best_layers.pth), f1ed7852 |
| AIM-3B | 82.2% | 83.3% | [Link](https://huggingface.co/apple/AIM/resolve/main/aim_3b_5bimgs_attnprobe_head_last_layers.pth), 5c057b30 | [Link](https://huggingface.co/apple/AIM/resolve/main/aim_3b_5bimgs_attnprobe_head_best_layers.pth), ad380e16 |
| AIM-7B | 82.4% | 84.0% | [Link](https://huggingface.co/apple/AIM/resolve/main/aim_7b_5bimgs_attnprobe_head_last_layers.pth), 1e5c99ba | [Link](https://huggingface.co/apple/AIM/resolve/main/aim_7b_5bimgs_attnprobe_head_best_layers.pth), 73ecd732 |

## Reproducing the IN-1k Classification Results
To reproduce the attention probe results on ImageNet-1k validation set, use the following command:
“`commandline
torchrun –standalone –nnodes=1 –nproc-per-node=8 main_attnprobe.py \
–model=aim-7B \
–batch-size=64 \
–data-path=/path/to/imagenet \
–probe-layers=last \
–backbone-ckpt-path=/path/to/backbone_ckpt.pth \
–head-ckpt-path=/path/to/head_ckpt.pth
“`
By default, the last 6 layers are probed. To change this, pass `–probe-layers=best`.

This manual is a comprehensive guide for using the Huggingface AIM software project, providing installation instructions, usage examples, and information on pre-trained models and attention heads. Please refer to the official documentation for more detailed information.

Alaaeldin El-Nouby, Michal Klein, Shuangfei Zhai, Miguel Angel Bautista, Alexander Toshev, Vaishaal Shankar,
Joshua M Susskind, and Armand Joulin

This software project accompanies the research paper, Scalable Pre-training of Large Autoregressive Image Models.

We introduce AIM a collection of vision models pre-trained with an autoregressive generative objective.
We show that autoregressive pre-training of image features exhibits similar scaling properties to their
textual counterpart (i.e. Large Language Models). Specifically, we highlight two findings:

  1. the model capacity can be trivially scaled to billions of parameters, and
  2. AIM effectively leverages large collections of uncurated image data.



Installation

Please install pytorch using the official pytorch.org/get-started/locally/”>installation instructions.
Afterward, install the package as:

pip install git+https://git@github.com/apple/ml-aim.git

We also offer MLX backend support for research and experimentation on Apple silicon.
To enable MLX support, simply run:

pip install mlx



Usage

Below we provide an example of usage in pytorch.org/”>pytorch:

from PIL import Image

from aim.utils import load_pretrained
from aim.torch.data import val_transforms

img = Image.open(...)
model = load_pretrained("aim-600M-2B-imgs", backend="torch")
transform = val_transforms()

inp = transform(img).unsqueeze(0)
logits, _ = model(inp)
and in both MLX
from PIL import Image
import mlx.core as mx

from aim.utils import load_pretrained
from aim.torch.data import val_transforms

img = Image.open(...)
model = load_pretrained("aim-600M-2B-imgs", backend="mlx")
transform = val_transforms()

inp = transform(img).unsqueeze(0)
inp = mx.array(inp.numpy())
logits, _ = model(inp)
and JAX
from PIL import Image
import jax.numpy as jnp

from aim.utils import load_pretrained
from aim.torch.data import val_transforms

img = Image.open(...)
model, params = load_pretrained("aim-600M-2B-imgs", backend="jax")
transform = val_transforms()

inp = transform(img).unsqueeze(0)
inp = jnp.array(inp)
(logits, _), _ = model.apply(params, inp, mutable=['batch_stats'])



Pre-trained checkpoints

The pre-trained models can be accessed via pytorch.org/hub/”>pytorch Hub as:

import torch

aim_600m = torch.hub.load("apple/ml-aim", "aim_600M")
aim_1b   = torch.hub.load("apple/ml-aim", "aim_1B")
aim_3b   = torch.hub.load("apple/ml-aim", "aim_3B")
aim_7b   = torch.hub.load("apple/ml-aim", "aim_7B")



Pre-trained backbones

The following table contains pre-trained backbones used in our paper.

model #params attn (best layer) backbone, SHA256
AIM-0.6B 0.6B 79.4% link, 0d6f6b8f
AIM-1B 1B 82.3% link, d254ecd3
AIM-3B 3B 83.3% link, 8475ce4e
AIM-7B 7B 84.0% link, 184ed94c



Pre-trained attention heads

The table below contains the classification results on ImageNet-1k validation set.

model top-1 IN-1k attention head, SHA256
last layer best layer last layer best layer
AIM-0.6B 78.5% 79.4% link, 5ce5a341 link, ebd45c05
AIM-1B 80.6% 82.3% link, db3be2ad link, f1ed7852
AIM-3B 82.2% 83.3% link, 5c057b30 link, ad380e16
AIM-7B 82.4% 84.0% link, 1e5c99ba link, 73ecd732



Reproducing the IN-1k classification results

The commands below reproduce the attention probe results on ImageNet-1k
validation set. We run the evaluation using 1 node with 8 GPUs:

torchrun --standalone --nnodes=1 --nproc-per-node=8 main_attnprobe.py \
  --model=aim-7B \
  --batch-size=64 \
  --data-path=/path/to/imagenet \
  --probe-layers=last \
  --backbone-ckpt-path=/path/to/backbone_ckpt.pth \
  --head-ckpt-path=/path/to/head_ckpt.pth

By default, we probe the last 6 layers. To change this, simply pass --probe-layers=best.

is a versatile and widely used HTML tag that is used for various purposes. Some use cases of

include:

1. Grouping Content: The

tag is commonly used to group together sections of content within a website. This can allow for easier styling and manipulation of the grouped elements.

2. Layouts:

tags are commonly used to create layout structures for web pages. By using nested

tags, web developers can create complex and flexible layouts that can adapt to different screen sizes and resolutions.

3. Script Containers:

tags can be used as containers for scripts. This can help to organize and encapsulate scripts that are used to add interactivity and functionality to a web page.

4. Modals and Dialogs:

tags can be used as containers for modal windows, dialogs, and pop-up messages. By encapsulating the content of these UI elements within

tags, developers can easily manipulate and control their behavior using CSS and JavaScript.

5. Image Galleries:

tags are commonly used to create image galleries on web pages. By wrapping sets of images within

tags, developers can create image carousels, sliders, and other interactive image displays.

6. Embedding Other Content:

tags can be used to embed other types of content, such as videos, maps, and iframes. By wrapping external content within

tags, developers can control the positioning and behavior of the embedded content.

7. Application Components:

tags are commonly used as containers for various application components, such as forms, navigation menus, and interactive widgets. By encapsulating these components within

tags, developers can create modular and reusable elements for their applications.

Overall, the

tag is a fundamental building block of web development, and its use cases are diverse and varied. It is a versatile and powerful tool for structuring and organizing content on the web, and it is essential for creating modern and interactive web pages and applications.
2024-01-21T05:03:17+01:00