## HuggingFace Model: Screenshot to HTML/CSS Code Conversion

Welcome to the guide for using the HuggingFace model that converts screenshots of website components into HTML/CSS codes. This guide will provide you with an overview of the model, the code to use the model, and additional information about the model’s development and licenses.

### Overview
– **Demo**: Before diving into the code, it’s highly recommended to try out the [demo](https://huggingface.co/spaces/HuggingFaceM4/screenshot2html) of the model.
– **Model Description**: The model is based on a very early checkpoint of the forthcoming vision-language foundation model. It has been fine-tuned using the [Websight](https://huggingface.co/datasets/HuggingFaceM4/Websight) dataset.
– **Alpha Version**: This model is an alpha version aimed at initiating the development of improved models capable of converting website screenshots into actual code.

### Code Usage
The following Python code demonstrates how to use the HuggingFace model for screenshot to HTML/CSS code conversion:
“`python
# Import necessary packages and define device
# Define the custom transform and convert to RGB functions
# Tokenize the input image and generate the HTML/CSS code
import torch
from PIL import Image
from transformers import AutoModelForCausalLM, AutoProcessor
from transformers.image_utils import to_numpy_array, PILImageResampling, ChannelDimension
from transformers.image_transforms import resize, to_channel_dimension_format

# Complete code available in the reference section
“`

### Additional Information
– **Developed by**: Hugging Face
– **Model Type**: Multi-modal model (screenshot of website component to HTML/CSS code)
– **Language(s) (NLP)**: English
– **License**: Apache-2.0 (Refer to the License section)
– **Parent Models**: [SigLIP](https://github.com/huggingface/transformers/pull/26522) and [mistralai/Mistral-7B-v0.1](https://huggingface.co/mistralai/Mistral-7B-v0.1)

### License
The model is built on top of two pre-trained models: [SigLIP](https://github.com/huggingface/transformers/pull/26522) and [mistralai/Mistral-7B-v0.1](https://huggingface.co/mistralai/Mistral-7B-v0.1), which are delivered under an Apache-2.0 license. Users should comply with the licenses of these models. The additional weights trained by HuggingFace are also released under the Apache-2.0 license.

For more information and resources, visit the official [Hugging Face website](https://huggingface.co/).

Now you are all set to use the HuggingFace model for converting website screenshots to HTML/CSS codes. Happy coding!

Source link
**Huggingface Tutorial: Converting Website Screenshots into HTML/CSS Code**

In this tutorial, we will guide you through the process of using the Huggingface model to convert screenshots of website components into HTML/CSS codes.

**Step 1: Setup and Installation**
– Make sure you have Python installed on your machine.
– Install the required libraries by running the following commands in your terminal:
“`bash
pip install torch transformers
“`
– Ensure that you have the necessary permissions and access to APIs mentioned in the code.

**Step 2: Importing the Required Libraries**
– In your Python script or Jupyter notebook, import the necessary modules as shown in the code snippet provided.

**Step 3: Initialization and Configuration**
– Set up the device configuration and define the necessary parameters.
– Ensure that you have the required API token and replace `API_TOKEN` variable with your actual token.

**Step 4: Preprocessing the Image**
– Implement the `convert_to_rgb` function to convert the image to RGB format.
– Define a custom transformation function to resize, rescale, and normalize the image.

**Step 5: Generating HTML/CSS Code from Image**
– Use the provided code to generate HTML/CSS code from the website screenshot.
– Ensure that the input image and required tokens are correctly processed and fed into the model.

**Step 6: Testing and Analysis**
– Run the code and observe the generated HTML/CSS code based on the input screenshot.
– Analyze and modify the code as per your requirements.

**License Information**
– This Huggingface model is built on top of pre-trained models such as SigLIP and mistralai/Mistral-7B-v0.1, which are delivered under an Apache-2.0 license.
– Users are required to comply with the licenses of these base models.

**Resources for More Information**
– For more information and detailed documentation about the Huggingface model, refer to the official documentation and resources available on the Huggingface website.

**Conclusion**
In this tutorial, you learned how to use the Huggingface model to convert website screenshot images into HTML/CSS code. Experiment with different screenshots and explore the capabilities of this model to generate code for various website components.

Happy coding!

Try out the demo!

This model converts screenshots of website components into HTML/CSS codes.

It is based on a very early checkpoint of our forthcoming vision-language foundation model, which has been fine-tuned using the Websight dataset.

This is very much an alpha version. The goal is to kick off an effort to develop improved models capable of converting a website screenshot into actual code.

import torch

from PIL import Image
from transformers import AutoModelForCausalLM, AutoProcessor

from transformers.image_utils import to_numpy_array, PILImageResampling, ChannelDimension
from transformers.image_transforms import resize, to_channel_dimension_format

DEVICE = torch.device("cuda")
PROCESSOR = AutoProcessor.from_pretrained(
    "HuggingFaceM4/VLM_WebSight_finetuned",
    token=API_TOKEN,
)
MODEL = AutoModelForCausalLM.from_pretrained(
    "HuggingFaceM4/VLM_WebSight_finetuned",
    token=API_TOKEN,
    trust_remote_code=True,
    torch_dtype=torch.bfloat16,
).to(DEVICE)
image_seq_len = MODEL.config.perceiver_config.resampler_n_latents
BOS_TOKEN = PROCESSOR.tokenizer.bos_token
BAD_WORDS_IDS = PROCESSOR.tokenizer(["<image>", "<fake_token_around_image>"], add_special_tokens=False).input_ids


def convert_to_rgb(image):
    
    
    if image.mode == "RGB":
        return image

    image_rgba = image.convert("RGBA")
    background = Image.new("RGBA", image_rgba.size, (255, 255, 255))
    alpha_composite = Image.alpha_composite(background, image_rgba)
    alpha_composite = alpha_composite.convert("RGB")
    return alpha_composite



def custom_transform(x):
    x = convert_to_rgb(x)
    x = to_numpy_array(x)
    x = resize(x, (960, 960), resample=PILImageResampling.BILINEAR)
    x = PROCESSOR.image_processor.rescale(x, scale=1 / 255)
    x = PROCESSOR.image_processor.normalize(
        x,
        mean=PROCESSOR.image_processor.image_mean,
        std=PROCESSOR.image_processor.image_std
    )
    x = to_channel_dimension_format(x, ChannelDimension.FIRST)
    x = torch.tensor(x)
    return x

inputs = PROCESSOR.tokenizer(
    f"{BOS_TOKEN}<fake_token_around_image>{'<image>' * image_seq_len}<fake_token_around_image>",
    return_tensors="pt",
    add_special_tokens=False,
)
inputs["pixel_values"] = PROCESSOR.image_processor([image], transform=custom_transform)
inputs = {k: v.to(DEVICE) for k, v in inputs.items()}
generated_ids = MODEL.generate(**inputs, bad_words_ids=BAD_WORDS_IDS, max_length=4096)
generated_text = PROCESSOR.batch_decode(generated_ids, skip_special_tokens=True)[0]

print(generated_text)

The model is built on top of two pre-trained models: SigLIP and mistralai/Mistral-7B-v0.1, which are delivered under an Apache-2.0 license. As such, users should comply with the licenses of these models.

The two pre-trained models are connected to each other with newly initialized parameters that we train. These are not based on any of the two base frozen models forming the composite model. We release the additional weights we trained under an Apache-2.0 license.

In HTML, the

tag is used as a container for other HTML elements, providing a way to group them together and apply styles, scripts, and other attributes to them collectively. While the sample code provided showcases the use of the

tag within a larger HTML structure, it does not explicitly demonstrate specific use cases for the

tag itself. However, considering the broader contexts of artificial intelligence, frameworks, Python, coding, Hugging Face, creation, AI, Flutter, Dialogflow, Firebase, Google Cloud, databases, and vector databases, we can envision several potential use cases for the

tag.

Artificial Intelligence: In the context of AI, the

tag can be utilized to structure and organize the presentation of AI-powered content on web pages. This could include displaying the results of machine learning models, visual representations of data, or interactive AI-powered features. The

tag allows for the flexible arrangement of these elements on a webpage, helping to deliver a cohesive and visually appealing user experience.

Frameworks and Python: When working with front-end frameworks such as React, Angular, or Vue.js, the

tag is commonly used to create components or sections of a web application. Python developers can leverage the

tag within frameworks like Django or Flask to design and structure the user interface of web applications powered by Python backend logic.

Coding and Hugging Face: For developers working with Hugging Face‘s technologies, the

tag can serve as a fundamental building block for creating interactive interfaces that showcase natural language processing (NLP) models, machine learning insights, or AI-driven conversational agents.

Creation and AI: As businesses and creators explore the intersection of technology and creativity, the

tag can be employed to craft visually compelling websites that incorporate AI-generated content, immersive experiences, or interactive storytelling formats.

Flutter and Dialogflow: When building cross-platform mobile applications with Flutter and integrating conversational interfaces using Dialogflow, the

tag is crucial for structuring the layout of chat interfaces, displaying contextual information, or housing interactive elements within the app’s UI.

Firebase and Google Cloud: Within the context of web development with Firebase hosting or deploying applications on Google Cloud Platform, the

tag is central to the organization and structure of web content. It enables developers to arrange components, data visualizations, and user interface elements effectively.

Database and Vector Databases: While the

tag itself is not directly related to database operations, it is an essential part of creating dynamic and data-driven web applications that interface with traditional SQL databases or emerging vector databases. It allows for the presentation of database content and interactive visualization of data on the web.

In summary, the

tag in HTML plays a pivotal role in structuring and organizing web content, making it an essential tool for developers and creators across a wide range of technical domains, including artificial intelligence, frameworks, Python, Hugging Face, creation, AI, Flutter, Dialogflow, Firebase, Google Cloud, databases, and vector databases. Its versatility and flexibility enable web developers to build dynamic, visually appealing, and functional user interfaces that integrate seamlessly with various technologies and frameworks.

2024-01-17T11:58:23+01:00