## A Complete Guide to Natural-SQL-7B by Huggingface

### Introduction
Natural-SQL-7B is a model designed to excel in Text-to-SQL instructions. It demonstrates a superior understanding of complex questions and has outperformed models of the same size in its space.

### Results on Novel Datasets
The model has been tested on novel datasets not trained on via SQL-Eval, showing its robustness and adaptability to new scenarios.

### Loading the Model
To load the model in your Python environment, use the following code:
“`python
from transformers import AutoModelForCausalLM, AutoTokenizer

# Load the tokenizer
tokenizer = AutoTokenizer.from_pretrained(“chatdb/natural-sql-7b”)

# Load the model
model = AutoModelForCausalLM.from_pretrained(
“chatdb/natural-sql-7b”,
device_map=”auto”,
torch_dtype=torch.float16
)
“`

### License
The model weights are licensed under CC BY-SA 4.0, with extra guidelines for responsible use expanded from the original model’s Deepseek license. Users are free to use and adapt the model, even commercially. If the weights are altered, such as through fine-tuning, the changes must be publicly shared under the same CC BY-SA 4.0 license.

### Generating SQL
To generate an SQL query using the model, use the following Python code:
“`python
inputs = tokenizer(prompt, return_tensors=”pt”).to(“cuda”)
generated_ids = model.generate(
**inputs,
num_return_sequences=1,
eos_token_id=100001,
pad_token_id=100001,
max_new_tokens=400,
do_sample=False,
num_beams=1
)

outputs = tokenizer.batch_decode(generated_ids, skip_special_tokens=True)
print(outputs[0].split(““`sql”)[-1])
“`
Additionally, the guide provides a template for the task and PostgreSQL database schema to assist in generating SQL queries.

### Example Schemas
The guide includes example SQL table DDL statements for users, projects, tasks, task assignments, and comments to demonstrate the structure of the database schema.

### Example SQL Outputs
Multiple SQL queries and their expected outputs are provided as examples to showcase the capability of Natural-SQL-7B in generating accurate SQL queries from natural language questions.

By following this guide, users can successfully utilize Natural-SQL-7B for their Text-to-SQL needs.

Source link
Hugging Face Natural-SQL-7B Model Tutorial

Introduction:
The Natural-SQL-7B model is a powerful model developed by Hugging Face that has very strong performance in Text-to-SQL instructions. It has an excellent understanding of complex questions, outperforming models of the same size in its space. This tutorial will guide you through the process of using the Natural-SQL-7B model to generate SQL queries from natural language questions.

Prerequisites:
Before using the Natural-SQL-7B model, ensure that you have the correct version of the transformers library installed. You can install it using the following command:
pip install transformers==4.35.2

Loading the Model:
To load the Natural-SQL-7B model in Python, use the following code:
import torch
from transformers import AutoModelForCausalLM, AutoTokenizer
tokenizer = AutoTokenizer.from_pretrained(“chatdb/natural-sql-7b”)
model = AutoModelForCausalLM.from_pretrained(“chatdb/natural-sql-7b”, device_map=”auto”, torch_dtype=torch.float16)

License:
The model weights are licensed under CC BY-SA 4.0, with extra guidelines for responsible use. If you alter the weights, such as through fine-tuning, you must publicly share your changes under the same CC BY-SA 4.0 license.

Generating SQL:
You can generate an SQL query from a natural language question using the following Python code:
inputs = tokenizer(prompt, return_tensors=”pt”).to(“cuda”)
generated_ids = model.generate(
**inputs,
num_return_sequences=1,
eos_token_id=100001,
pad_token_id=100001,
max_new_tokens=400,
do_sample=False,
num_beams=1,
)
outputs = tokenizer.batch_decode(generated_ids, skip_special_tokens=True)
print(outputs[0].split(““`sql”)[-1])

Example Schemas and Example SQL Outputs:
The tutorial provides example schemas for PostgreSQL databases and example SQL queries generated from natural language questions.

Conclusion:
The Natural-SQL-7B model by Hugging Face is a powerful tool for generating SQL queries from natural language questions. We hope this tutorial helps you get started with using this model effectively.

For more information, you can visit the official website of Hugging Face and explore additional resources such as notebooks and social media links provided by the developers.



Natural-SQL-7B is a model with very strong performance in Text-to-SQL instructions, has an excellent understanding of complex questions, and outperforms models of the same size in its space.

ChatDB.ai | Notebook | Twitter



Results on Novel Datasets not trained on via SQL-Eval

Big thanks to the defog team for open sourcing sql-evalūüĎŹ

Natural-SQL also can handle complex, compound questions that other models typically struggle with. There is a more detailed writeup Here is a write up, small test done here.

Make sure you have the correct version of the transformers library installed:

pip install transformers==4.35.2



Loading the Model

Use the following Python code to load the model:

import torch
from transformers import AutoModelForCausalLM, AutoTokenizer
tokenizer = AutoTokenizer.from_pretrained("chatdb/natural-sql-7b")
model = AutoModelForCausalLM.from_pretrained(
    "chatdb/natural-sql-7b",
    device_map="auto",
    torch_dtype=torch.float16,
)



License

The model weights are licensed under CC BY-SA 4.0, with extra guidelines for responsible use expanded from the original model’s Deepseek license.
You’re free to use and adapt the model, even commercially.
If you alter the weights, such as through fine-tuning, you must publicly share your changes under the same CC BY-SA 4.0 license.



Generating SQL

inputs = tokenizer(prompt, return_tensors="pt").to("cuda")
generated_ids = model.generate(
    **inputs,
    num_return_sequences=1,
    eos_token_id=100001,
    pad_token_id=100001,
    max_new_tokens=400,
    do_sample=False,
    num_beams=1,
)

outputs = tokenizer.batch_decode(generated_ids, skip_special_tokens=True)
print(outputs[0].split("```sql")[-1])
# Task 
Generate a SQL query to answer the following question: `{natural language question}`

### PostgreSQL Database Schema 
The query will run on a database with the following schema: 

<SQL Table DDL Statements>

# SQL 
Here is the SQL query that answers the question: `{natural language question}` 
'''sql



Example Schemas

CREATE TABLE users (
        user_id SERIAL PRIMARY KEY,
        username VARCHAR(50) NOT NULL,
        email VARCHAR(100) NOT NULL,
        password_hash TEXT NOT NULL,
        created_at TIMESTAMP NOT NULL DEFAULT CURRENT_TIMESTAMP
    );
CREATE TABLE projects (
    project_id SERIAL PRIMARY KEY,
    project_name VARCHAR(100) NOT NULL,
    description TEXT,
    start_date DATE,
    end_date DATE,
    owner_id INTEGER REFERENCES users(user_id)
);
CREATE TABLE tasks (
    task_id SERIAL PRIMARY KEY,
    task_name VARCHAR(100) NOT NULL,
    description TEXT,
    due_date DATE,
    status VARCHAR(50),
    project_id INTEGER REFERENCES projects(project_id)
);
CREATE TABLE taskassignments (
    assignment_id SERIAL PRIMARY KEY,
    task_id INTEGER REFERENCES tasks(task_id),
    user_id INTEGER REFERENCES users(user_id),
    assigned_date DATE NOT NULL DEFAULT CURRENT_TIMESTAMP
);
CREATE TABLE comments (
    comment_id SERIAL PRIMARY KEY,
    content TEXT NOT NULL,
    created_at TIMESTAMP NOT NULL DEFAULT CURRENT_TIMESTAMP,
    task_id INTEGER REFERENCES tasks(task_id),
    user_id INTEGER REFERENCES users(user_id)
);



Example SQL Outputs

Question: Show me the day with the most users joining

SELECT created_at::DATE AS day, COUNT(*) AS user_count
FROM users
GROUP BY day
ORDER BY user_count DESC
LIMIT 1;

Question: Show me the project that has a task with the most comments

SELECT p.project_name, t.task_name, COUNT(c.comment_id) AS comment_count
FROM projects p
JOIN tasks t ON p.project_id = t.project_id
JOIN comments c ON t.task_id = c.task_id
GROUP BY p.project_name, t.task_name
ORDER BY comment_count DESC
LIMIT 1;

Question: What is the ratio of users with gmail addresses vs without?

SELECT 
    SUM(CASE WHEN email ILIKE '%@gmail.com%' THEN 1 ELSE 0 END)::FLOAT / NULLIF(SUM(CASE WHEN email NOT ILIKE '%@gmail.com%' THEN 1 ELSE 0 END), 0) AS gmail_ratio
FROM 
    users;


The

tag in HTML is a very versatile and commonly used element that is used to group, structure, and define the layout of a web page. It can contain other HTML elements such as headings, paragraphs, links, images, and more. The

tag is often used in combination with CSS to style and format the content within it. Here are a few common use cases of the

tag:

1. Grouping Content: The

tag is often used to group together related content on a web page. For example, all the content within a specific section of a webpage can be wrapped inside a

element to define the section and apply specific styles, such as background color or padding, to the entire group of content.

2. Creating Layouts: The

tag is key to creating the layout of a web page. It allows developers to divide the page into various sections, such as a header, sidebar, main content, and footer, by using multiple

elements and positioning them using CSS.

3. Styling and Formatting: The

tag can be used to define specific areas of a web page that require unique styling. For example, a

can be used to create a colored box, a border around an image, or to apply custom spacing and alignment to content.

4. Embedding Content: The

tag allows for the embedding of third-party content, such as Twitter widgets. This is often done using JavaScript and can be a way to include interactive or dynamic content within a web page.

In the provided HTML code, the

element is used to encapsulate a variety of content, including headings, images, links, code snippets, and more. In this example, the

element is being used to structure and organize the content of a webpage related to a specific AI model called Natural-SQL-7B. The

contains headings, images, links, and code snippets that provide information and usage instructions for the AI model.

Overall, the

tag is highly flexible and widely used in web development. Its ability to group content, define layout, and style web content makes it an essential part of building and designing web pages.

2024-02-08T14:04:58+01:00