Text-to-SQL from Scratch — Tutorial For…

Apr 23

Ever wished you could just ask your database questions in plain English instead of wrestling with complex SQL queries?

6 Comments

What else can you do to improve the accuracy from there? I'm curious what would it take to build some SOTA level system. Would it be something like this (https://arxiv.org/abs/2401.08500) generate 10s of samples and take majority vote + self reflection with errors + something else (not sure what that is).

Expand full comment

Reply (1)

Zachary Huang

Apr 26

A lot! This paper is a bit outdated. "10s of samples and take majority vote + self reflection with errors" is less effective with modern LLMs like gemini 2.5 pro, o1, claude 3.7 thinking... as they already show high accuracy. The bottleneck is more on the context provided to LLMs.

E.g., for text-to-sql, you can do EDA to explore the table to know the domain for string column, range/distribution for numerical columns. You can also feed previous successful query history for more domain specific knowledge (e.g., how people compute profit as revenue minus cost).

Expand full comment

Reply (1)

patcap

Apr 26

So you mean rely on in context learning at this point, and less systems like above or (ADAS https://arxiv.org/abs/2408.08435 or similar prompt optimization systems)

1) X number of examples added to context - maybe use RAG to pull in relevant examples only - if you have a lots of examples.

2) EDA to provide more context to the query

3) Probably lots of eval to help tune 1+2

Expand full comment

Reply (1)

Zachary Huang

Apr 26

You always need in context learning, but the important part is still the design of systems, e.g., to pull data from past data warehouses queries, find related discussion from slack channel, or read the source code for data ingestion. I would say ADAS is very relavant.

Expand full comment

Reply (1)

patcap

Apr 26

Thanks for the pointers!

Expand full comment

Mark

Apr 27

Hi Zachary,

I came across your repo from the incredible work - Pocketflow and keep tracing everything you ever created including this post. It's really informative and inspiring.

One thing hits me most is that I notice every post coming with a consistent theme for a robot image, may I please know the secrete behind it, like what's the tool you use, and any guideline for the prompts I could use to create a series of thumbnails, images to have the engaging contents. Thanks a lot.

Expand full comment

Pocket Flow

Text-to-SQL from Scratch — Tutorial For…