Diffusion models do not draw letters, they draw pixels that resemble letters. That is why DALL·E, Midjourney, and Ideogram all produce 'M4RK3T1NG' instead of 'MARKETING' once you push past one or two words. For ad creative, this means you generate the visual and then retype the headline in Photoshop.
This guide explains how the 42rows image creative API works around the problem. The visual is generated by Imagen 4 (a diffusion model). The headline, stat, and CTA are then composited as an HTML/CSS layer on top — using a real font renderer. Letters are vector-clean at any zoom.
Step by step
- 01
Two-stage pipeline
Internally the request becomes: (1) Imagen 4 generates the background and visual elements at the target aspect ratio; (2) a Playwright-based compositor renders the headline, subheadline, and CTA as HTML/CSS and merges them onto the image. The output you receive is the merged PNG — you do not see the two stages.
- 02
Be explicit about the copy
The compositor renders exactly what you put in the prompt for the headline and CTA fields. Treat them like form inputs: type the literal sentence you want to read on the ad. The art director extracts them from the prompt — putting them in quotes helps disambiguation ("Headline: \"Build streaks that stick\"").
- 03
Test at thumb-stop scale
Letters that read at 100% may still feel small at 30% (mobile feed). For the LinkedIn 1200×628 case, headlines longer than 9 words start to look cramped. Test the output at 30% zoom in your browser to approximate how it will display in feed.
Example prompts
Copy, click, tweak — the CTA opens the terminal with the prompt pre-loaded.
1080×1080 ad. Headline: "Stop the meeting marathon". CTA: "Try async reviews". Brand colour: navy. Style: editorial type-driven, minimal visual. Try →1200×628 LinkedIn ad. Headline: "47%". Sub: "of B2B teams ship faster with async standups". CTA: "Learn how". Style: clean dashboard mock. Try →Square ad with a customer quote. Quote: "We cut reporting time by two thirds in week one". Attribution: "VP Finance, Acme Corp". CTA: "Read the case study". Try →API call
Standard REST. Bearer token, JSON body, URL response. Works in any HTTP client, n8n, Make, Zapier, or MCP agent.
curl -X POST https://api.42rows.com/v1/image-creative \
-H "Authorization: Bearer sk_..." \
-H "Content-Type: application/json" \
-d '{
"prompt": "Headline: \"Stop the meeting marathon\". CTA: \"Try async reviews\". Square 1080x1080.",
"format": "1080x1080"
}'Pricing
Pay-per-call, no subscription. Subscription plans are on the roadmap — they will not change pay-per-call rates.
FAQ
Which font is used for the overlay?
A small set of brand-neutral sans-serif typefaces, picked by the art director based on tone (technical, editorial, friendly, etc.). Custom font upload is on the roadmap.
Can I get the visual without the text overlay?
Yes. The actor input has a `skip_compositor` flag that returns just the Imagen output without the HTML/CSS pass. Useful if you want to do the text layer yourself in Figma.
What about non-Latin scripts?
The compositor renders any script supported by the bundled font fallback chain (Latin, Greek, Cyrillic, basic CJK). Devanagari and Arabic are partial — long-form Arabic right-to-left layouts are roadmap.
Why HTML/CSS specifically and not SVG?
HTML/CSS gives us multi-line wrap, line-height tuning, and gradient text out of the box, all rendered through Chromium under Playwright. SVG would work for static layouts but is more painful for responsive headline lengths.
How does this compare to Bannerbear or Placid?
Bannerbear and Placid expect you to design templates first and fill them in via API. Our endpoint takes a brief and chooses the template internally. Trade-off: less control over final layout, much faster to ship.
Ship it
Use the first example prompt as a starter — the button opens the public terminal with it pre-filled.