AI is genuinely transformational for non-engineers who need to pull data from databases. PMs, marketers, ops people, and analysts who used to wait hours or days for an engineer to write a query can now self-serve. The catch: AI-generated SQL is wrong often enough that blindly trusting it can produce confidently incorrect numbers. Bad numbers driving bad decisions is worse than no data.
What AI does well for SQL
- Translating natural language to SQL for clear, single-table queries
- Suggesting JOIN structures across well-named tables
- Explaining what an existing query does line-by-line
- Adapting a query you found to your schema
- Generating syntactically valid SQL across dialects (PostgreSQL, MySQL, BigQuery, Snowflake, etc.)
What AI does poorly
- Knowing your specific schema's quirks (which is the active customers table — is it
customerswith deleted_at IS NULL, oractive_customersview?) - Distinguishing similar-looking columns (
created_atvscreated_at_local,revenuevsgross_revenue) - Catching that two columns can be joined but shouldn't be (foreign key by data type, not relationship)
- Knowing your soft-delete conventions, partitioning rules, business filter requirements
- Performance optimization that depends on indexes and data volumes you haven't told it about
A workflow that works
Step 1: feed AI your schema. Don't ask SQL questions without context. Give Claude or GPT your schema (table names, column names, types, brief description of what each table represents). For databases with hundreds of tables, give the relevant subset for the question.
Useful trick: ask AI to summarize the schema first. "Based on this DDL, summarize the data model in 2 paragraphs and call out anything that looks unusual." If AI's summary is wrong, your follow-up queries will be wrong.
Step 2: ask in natural language with context. "From the orders table joined to customers, show me the top 10 customers by total spend in Q3 2025, excluding refunded orders. Format the spend as USD currency."
More context = more accurate query. Mention: time periods, business filters ("exclude test accounts"), output format requirements, edge cases you care about.
Step 3: read the query before running it. If you don't understand it line-by-line, ask AI to explain. "Walk me through this query in plain English, what each clause does and why." If something doesn't match what you intended, fix the prompt and regenerate.
Step 4: run on a sample first. Add LIMIT 100 (or WHERE date > '2025-12-01') to test the query on a small slice. If results look reasonable, expand to full date range.
Step 5: sanity-check the output. Does the count fit your expectation of business volume? Do top-N results have names you recognize? Are there any NULLs or zeros in unexpected places? Trust your business intuition; if a number feels too high or too low, dig into why.
Common AI SQL failures
- Wrong table for the concept — AI sees a
usersandcustomerstable, picksusers, but customers is what you wanted - Missing soft-delete filter — AI's query doesn't include
WHERE deleted_at IS NULLand counts archived records - Time zone confusion — AI uses UTC when your business uses local; off-by-one errors on date boundaries
- Wrong aggregation —
COUNT(*)vsCOUNT(DISTINCT customer_id)matters and AI sometimes picks wrong - Inflated joins — AI joins tables in a way that multiplies rows; sums become 3× what they should be
- Hardcoded test data — AI writes the right structure but uses example data from training; you run it and get nothing
The fix for all of these: read the query, run on a sample, sanity-check.
Building a per-database knowledge file
For any database you query repeatedly, build a markdown file with:
- Schema overview
- Important business rules ("customers with status='trial' should be excluded from revenue queries")
- Common pitfalls you've hit ("the events table has duplicates from reruns; use DISTINCT on event_id")
- Standard filters ("production data only: env='prod'")
- Saved queries you trust
Paste this file as context every time you ask AI for SQL on that database. The accuracy improvement is dramatic.
Tools that help
- Cursor / VS Code with AI — write SQL with autocomplete that knows your schema
- Hex / Mode / Metabase — analytics platforms with AI-assisted SQL built in
- Supabase / Neon SQL editor — built-in AI for the database you're already using
- dbt + AI — for production analytics, AI helps draft transformations you'll review and version
For casual one-off queries, ChatGPT or Claude with schema context is enough. For repeated work, an integrated tool that knows your schema is much faster.
When NOT to AI-generate SQL
Production-affecting queries. Anything that writes (INSERT, UPDATE, DELETE) or that affects production performance. Have an engineer review.
Compliance-relevant data. Queries pulling PII, financial data subject to audit, or anything regulated. The query and its output need an audit trail; AI generation doesn't fit cleanly.
Performance-sensitive queries. If the query will run on millions of rows or in a hot path, it needs index awareness AI doesn't have. Have an engineer optimize.
Critical reporting numbers. Numbers shown to executives or used for paying commissions need to be triple-checked. AI-generated queries are fine for first draft; verification is non-negotiable.
The data quality trap
The danger isn't getting wrong SQL once — that's caught quickly. The danger is getting subtly wrong SQL repeatedly that produces plausible numbers nobody questions. "Customer churn was 4.3% last quarter" feels precise. If the query miscounted, the number is fiction, but it'll be repeated for months.
Develop the habit: any number that drives a decision gets verified by either running a different angle on the same question or asking someone who knows the data well. "Does this look right?" to an analyst friend takes 30 seconds and prevents months of wrong-direction decisions.
Decision tree
- One-off curious query, low stakes: AI with schema context, sample-test
- Recurring report you'll rely on: AI for first draft, engineer review for production
- Customer-facing or compliance data: engineer-written, AI for explanation only
- Learning SQL: AI as tutor + write your own queries
Next steps
- Build the schema knowledge file for your most-queried database
- Always read SQL before running it; ask AI to explain anything you don't get
- Bookmark good queries; AI rewrites are easier when you have working examples
- For team work, share schema knowledge files; everyone benefits from one person's investment