00. Structured Data & Code Generation — The Five-Year-Old Version¶

You now know how models talk. This module shows how they talk to databases and code tools without bluffing.

Imagine you go to a crowded foreign market. You know what you want. Maybe you want the cheapest rice. Maybe you want three shirt options under one budget. Maybe you want the vendor to stitch a new pocket. The problem is not desire. The problem is language.

You speak plain requests. The market vendors speak strict languages. One vendor understands only SQL. Another vendor understands only Python. A third vendor inspects code diffs like a picky shopkeeper. So you bring the translator. That translator is the LLM.

You hand over the shopping list. That is your natural-language request. The translator looks at the phrasebook. That is the schema, API docs, tests, and repository context. Then the translator talks to the market vendor. If the translation is good, the vendor replies with a receipt. That receipt is the query result, program output, or test report.

Now comes the important part. A translator is useful. A translator is also dangerous. If the translator guesses a table name, the database vendor gets angry. If the translator guesses an API call, the compiler vendor throws an error. If the translator hides uncertainty, you get a beautiful lie. Simple, no?

So what do strong systems do? They do not trust the translator alone. They give the translator a better phrasebook. They make the vendor execute the exact request. They read the receipt carefully. And when the first attempt fails, they allow a little haggling. That haggling is the retry, debug, and test loop.

Look at one tiny market scene. You ask, "Which fruit seller made the most yesterday?" The translator should not answer from memory. The translator should ask the vendor to total the rows. The vendor returns a receipt saying mango stall: ₹4,200, banana stall: ₹3,600. Now the answer is grounded. Without the receipt, the translator might just guess mango because mango sounds popular.

This module is about building that disciplined market workflow. First we see why asking the translator directly fails. Then we see text-to-SQL pipelines, schema design, validation loops, and table reasoning. After that, we shift to code. The same translator must complete code, generate programs, review diffs, and debug with execution feedback. Yes, the languages change. The engineering pattern stays.

By the end, you should see one unifying idea. LLMs are best when they translate into strict systems. They are weakest when they impersonate strict systems. See the difference. One gets a receipt. The other gets applause and then a bug.

The placeholders you will see called back¶

Placeholder	Meaning
translator	The LLM that turns plain language into SQL, code, or review comments.
market vendor	The strict system on the other side, like a database, compiler, test runner, or reviewer.
shopping list	The user's request in plain language.
receipt	The returned result, logs, rows, or test output proving what happened.
phrasebook	The schema, docs, examples, types, and repository context that guide translation.
haggling	The retry loop where we fix errors, rerun, and refine the request.

Top resources¶

Spider benchmark — the text-to-SQL benchmark that forces schema reasoning, not keyword matching.
BIRD benchmark — harder enterprise-style SQL tasks with bigger databases and noisy questions.
HumanEval — tiny coding tasks, but excellent for understanding execution-based evaluation.
SWE-bench — the benchmark that asks models to fix real repository issues, not toy snippets.
Stanford DAIL text-to-SQL notes — practical schema-linking ideas and dataset context.
GitHub Copilot docs — useful to see how code completion and repository context are productized.
LangChain SQL QA docs — shows the standard LLM-to-database control loop.

What's coming¶

01-natural-language-db-failure.md — why asking an LLM about live tables directly gives polished nonsense.
02-text-to-sql-pipeline.md — the end-to-end path from question to SQL to formatted answer.
03-schema-representation.md — how the phrasebook shapes table and column selection.
04-sql-validation-execution.md — why generated SQL needs guards, sandboxes, and retries.
05-tabular-reasoning.md — what LLMs can and cannot do when the receipt is a table.
06-code-completion-models.md — how code models fill gaps, use context, and save tokens.
07-code-generation-pipeline.md — how strong systems generate, validate, and test code in loops.
08-program-synthesis.md — the harder jump from vague spec to correct program.
09-code-review-ai.md — how models inspect diffs, spot risks, and still miss things.
10-execution-feedback-loops.md — why runtime receipts make code generation much stronger.
11-multi-file-code-understanding.md — how models gather repository-wide phrasebooks before editing.
12-evaluation-benchmarks.md — how teams measure SQL and coding systems with execution-based benchmarks.
13-honest-admission.md — what still breaks in structured data and code generation.

Bridge. First we must remove the most tempting mistake: treating the translator like the database itself. → 01-natural-language-db-failure.md