AI Data Transformer

Natural-language data ops. Get the result and the reusable script.

Coming soon · Beta testing Beta pricing — join the waitlist 10–60 seconds

What it does

Describe what you want to do with a dataset. AI Data Transformer runs the query and returns two things: the transformed data AND the Polars script that produced it — version-controllable, reusable, testable.

Bring a CSV URL or an inline JSON; describe "group by region, sum revenue, top 10 desc" in natural language; get back a file in the format you want plus the exact code. Under the hood: multi-provider fallback (one LLM fails, the next tries), streaming for multi-gigabyte files, and RAG memory for recurring datasets.

What's different about this API

01 Returns executable code, not just a result. You can commit it, modify it, replay it offline. Zapier AI Steps and Make give you output-only.
02 Multi-provider fallback baked in. If one LLM refuses or errors, the next tries the same brief. Your job does not die because a provider had a bad day.
03 Streams multi-gigabyte inputs. Not every "data tool" scales past the laptop demo.

Inputs

Dataset (URL, upload, inline JSON)
Prompt describing the transformation

Outputs

CSV
JSON
Parquet
Excel

Example prompts

01 Group orders by region, sum revenue, filter YTD, top 10 desc, export as CSV.

02 Read this JSON of GitHub issues, deduplicate by title, tag with sentiment, export Parquet.

03 Transform this CRM export: clean emails, dedupe by domain, enrich with industry, output XLSX.

API

Single endpoint, JSON in, URL out. Standard REST — works with any HTTP client, n8n, Make, Zapier, and MCP.

# Preview — type "data" not yet enabled (returns 422). Join the waitlist.
curl -X POST https://42rows.com/api/v1/tools/generate \
  -H "Authorization: Bearer 42r_sk_..." \
  -H "Content-Type: application/json" \
  -d '{
    "type": "data",
    "prompt": "group by region, sum revenue, top 10 desc",
    "source": "https://example.com/orders.csv",
    "output_format": "csv"
  }'

FAQ

When does this leave beta?

We are finalising pricing tiers (hosted vs bring-your-own-key) and query limits. Join the waitlist and you will get an invite before public launch.

What file sizes are supported?

Streaming is auto-enabled for files over 100MB. Inline JSON is capped at 9MB. There is no hard upper bound on URL-fetched files.

Which LLM runs the query?

Your choice at request time: hosted (we manage the LLM) or BYOK (bring your own key). BYOK supports Groq, Claude, OpenAI, and Gemini.

Does the returned code run anywhere?

It is standard Polars Python — any runtime with `polars` installed executes it. Ideal for CI/CD pipelines and scheduled jobs.

GDPR — does my data leave my cloud?

In BYOK mode, the LLM provider sees the schema only, never the rows. In hosted mode, the data passes through our compute; for EU-resident operations, pick the EU region option at request time.

Other APIs

Get early access

AI Data Transformer is in private beta. Drop us a line with your use case and we will send an invite.

Join beta waitlist