# Guide to Hugging Face

Hugging Face offers a model called Stable Cascade, which is a diffusion model trained to generate images given a text prompt. It is built upon the Würstchen architecture and stands out from other models due to its much smaller latent space. This guide will provide details about the model, its sources, evaluation, code example, uses, limitations, and how to get started with the model.

## Model Details

– Developed by: Stability AI
– Funded by: Stability AI
– Model type: Generative text-to-image model

## Model Sources

For research purposes, it is recommended to explore the StableCascade Github repository [here](https://github.com/Stability-AI/StableCascade).

## Model Description

Stable Cascade consists of three models: Stage A, Stage B, and Stage C, representing a cascade to generate images.

## Model Overview

The model achieves a compression factor of 42, encoding a 1024 x 1024 image to 24 x 24, while maintaining crisp reconstructions. It provides two checkpoints for Stage C, two for Stage B, and one for Stage A.

## Evaluation

According to the evaluation, Stable Cascade performs best in prompt alignment and aesthetic quality in comparison to other models.

## Code Example

To use the code below, you need to install `diffusers` from the specific branch while the PR is a work in progress.

“`shell
pip install git+https://github.com/kashif/diffusers.git@wuerstchen-v3
“`

“`python
import torch
from diffusers import StableCascadeDecoderPipeline, StableCascadePriorPipeline

# Code example continued
“`

## Uses

### Direct Use

The model is intended for research purposes, covering areas such as generative models, safe deployment, limitations, and biases of generative models, as well as artistic and educational applications.

#### Out-of-Scope Use

The model was not trained to be factual or true representations of people or events. Therefore, using the model to generate such content is out-of-scope.

## Limitations and Bias

The model may have limitations in generating faces and people. The autoencoding part of the model is lossy. It is recommended to use the model for research purposes only.

## How to Get Started with the Model

To get started with the model, check out the [StableCascade Github repository](https://github.com/Stability-AI/StableCascade).

Using this guide, you can understand the architecture, training process, evaluation, and potential uses of the Hugging Face Stable Cascade model. Remember to utilize the model responsibly and always adhere to the Acceptable Use Policy.

Source link
Manual/Tutorial: HuggingFace Stable Cascade

Introduction
The Hugging Face Stable Cascade model is a generative text-to-image model developed by Stability AI, using the Würstchen architecture with a much smaller latent space. This manual/tutorial will provide an overview of the model, its uses, limitations, and how to get started with it.

Model Details
Stable Cascade is a diffusion model trained to generate images given a text prompt. It is developed and funded by Stability AI.

Model Sources
For research purposes, it is recommended to access the StableCascade Github repository at https://github.com/Stability-AI/StableCascade.

Model Overview
Stable Cascade consists of three models: Stage A, Stage B, and Stage C, which work together to generate images. Stage A & B compress images, while Stage C is responsible for generating the small 24 x 24 latents given a text prompt.

Evaluation
According to our evaluation, Stable Cascade performs best in both prompt alignment and aesthetic quality in almost all comparisons. The model has been compared against other models and has shown excellent results.

Code Example
To use the Stable Cascade model, first install the ‘diffusers’ package from this branch. Then, use the provided Python code to generate images based on text prompts.

Uses
The model is intended for research purposes. Possible research areas and tasks include research on generative models, safe deployment of models, understanding the limitations and biases of generative models, generation of artworks, and use in design and other artistic processes. Excluded uses are described below.

Out-of-Scope Use
The model should not be used to generate content that violates Stability AI’s Acceptable Use Policy.

Limitations and Bias
The model has limitations in properly generating faces and people, and the autoencoding part of the model is lossy. It is recommended to use the model for research purposes only.

How to Get Started with the Model
To get started with the Stable Cascade model, visit the Stability-AI/StableCascade repository on Github.

This manual provides an overview and basic usage of the Hugging Face Stable Cascade model. For advanced usage and specific applications, refer to the model’s documentation and resources provided by Stability AI.

This model is built upon the Würstchen architecture and its main
difference to other models like Stable Diffusion is that it is working at a much smaller latent space. Why is this
important? The smaller the latent space, the faster you can run inference and the cheaper the training becomes.
How small is the latent space? Stable Diffusion uses a compression factor of 8, resulting in a 1024×1024 image being
encoded to 128×128. Stable Cascade achieves a compression factor of 42, meaning that it is possible to encode a
1024×1024 image to 24×24, while maintaining crisp reconstructions. The text-conditional model is then trained in the
highly compressed latent space. Previous versions of this architecture, achieved a 16x cost reduction over Stable
Diffusion 1.5.

Therefore, this kind of model is well suited for usages where efficiency is important. Furthermore, all known extensions
like finetuning, LoRA, ControlNet, IP-Adapter, LCM etc. are possible with this method as well.



Model Details



Model Description

Stable Cascade is a diffusion model trained to generate images given a text prompt.

  • Developed by: Stability AI
  • Funded by: Stability AI
  • Model type: Generative text-to-image model



Model Sources

For research purposes, we recommend our StableCascade Github repository (https://github.com/Stability-AI/StableCascade).



Model Overview

Stable Cascade consists of three models: Stage A, Stage B and Stage C, representing a cascade to generate images,
hence the name “Stable Cascade”.
Stage A & B are used to compress images, similar to what the job of the VAE is in Stable Diffusion.
However, with this setup, a much higher compression of images can be achieved. While the Stable Diffusion models use a
spatial compression factor of 8, encoding an image with resolution of 1024 x 1024 to 128 x 128, Stable Cascade achieves
a compression factor of 42. This encodes a 1024 x 1024 image to 24 x 24, while being able to accurately decode the
image. This comes with the great benefit of cheaper training and inference. Furthermore, Stage C is responsible
for generating the small 24 x 24 latents given a text prompt. The following picture shows this visually.

For this release, we are providing two checkpoints for Stage C, two for Stage B and one for Stage A. Stage C comes with
a 1 billion and 3.6 billion parameter version, but we highly recommend using the 3.6 billion version, as most work was
put into its finetuning. The two versions for Stage B amount to 700 million and 1.5 billion parameters. Both achieve
great results, however the 1.5 billion excels at reconstructing small and fine details. Therefore, you will achieve the
best results if you use the larger variant of each. Lastly, Stage A contains 20 million parameters and is fixed due to
its small size.



Evaluation


According to our evaluation, Stable Cascade performs best in both prompt alignment and aesthetic quality in almost all
comparisons. The above picture shows the results from a human evaluation using a mix of parti-prompts (link) and
aesthetic prompts. Specifically, Stable Cascade (30 inference steps) was compared against Playground v2 (50 inference
steps), SDXL (50 inference steps), SDXL Turbo (1 inference step) and Würstchen v2 (30 inference steps).



Code Example

⚠️ Important: For the code below to work, you have to install diffusers from this branch while the PR is WIP.

pip install git+https://github.com/kashif/diffusers.git@wuerstchen-v3
import torch
from diffusers import StableCascadeDecoderPipeline, StableCascadePriorPipeline

device = "cuda"
num_images_per_prompt = 2

prior = StableCascadePriorPipeline.from_pretrained("stabilityai/stable-cascade-prior", torch_dtype=torch.bfloat16).to(device)
decoder = StableCascadeDecoderPipeline.from_pretrained("stabilityai/stable-cascade",  torch_dtype=torch.float16).to(device)

prompt = "Anthropomorphic cat dressed as a pilot"
negative_prompt = ""

prior_output = prior(
    prompt=prompt,
    height=1024,
    width=1024,
    negative_prompt=negative_prompt,
    guidance_scale=4.0,
    num_images_per_prompt=num_images_per_prompt,
    num_inference_steps=20
)
decoder_output = decoder(
    image_embeddings=prior_output.image_embeddings.half(),
    prompt=prompt,
    negative_prompt=negative_prompt,
    guidance_scale=0.0,
    output_type="pil",
    num_inference_steps=10
).images




Uses



Direct Use

The model is intended for research purposes for now. Possible research areas and tasks include

  • Research on generative models.
  • Safe deployment of models which have the potential to generate harmful content.
  • Probing and understanding the limitations and biases of generative models.
  • Generation of artworks and use in design and other artistic processes.
  • Applications in educational or creative tools.

Excluded uses are described below.



Out-of-Scope Use

The model was not trained to be factual or true representations of people or events,
and therefore using the model to generate such content is out-of-scope for the abilities of this model.
The model should not be used in any way that violates Stability AI’s Acceptable Use Policy.



Limitations and Bias



Limitations

  • Faces and people in general may not be generated properly.
  • The autoencoding part of the model is lossy.



Recommendations

The model is intended for research purposes only.



How to Get Started with the Model

Check out https://github.com/Stability-AI/StableCascade

The use case of the

tag in HTML can be illustrated with the given HTML code snippet. In this example, the

tag is used to contain a variety of content elements, such as images, paragraphs, headings, lists, code examples, and more.

One of the primary use cases for the

tag is to create a container for grouping and organizing multiple HTML elements. In the provided HTML snippet, the

tag is used to encapsulate various sections of content, including text, images, and links. This allows for the creation of structured and visually appealing web pages.

Additionally, the

tag can be used in conjunction with CSS to apply styling and formatting to the enclosed content. By assigning class or ID attributes to the

tag, developers can target specific elements for styling, layout, or interactivity.

In the example, the

tag is used to create multiple headings (h2 and h3 elements) that serve as anchors for navigation. The

tag allows for the grouping of these headings and their associated content, providing a clean and organized structure for the web page.

The

tag is also utilized to contain images (“img” tags) and paragraphs of text, effectively grouping related elements together. This helps in maintaining the visual layout of the content and ensures proper alignment and styling through CSS.

Furthermore, the

tag can be used to enclose and organize code examples, providing a clear demarcation of code blocks within the HTML document. By encapsulating code within

tags, it becomes easier to apply consistent formatting and styling to the code snippets using CSS.

Additionally, the use of

tags allows for the creation of sections within the web page that can be manipulated and styled as cohesive units, providing an effective way to structure and organize the content of the web page.

From a developer’s perspective, the

tag is instrumental in segmenting and structuring the content of the web page, making it easier to apply consistent styling and formatting across different sections of the page. The versatility of the

tag makes it an essential tool for creating aesthetically pleasing and well-organized web pages.

2024-02-13T18:22:34+01:00