April 4, 2026 · 11 min read

AI Image Generation That Actually Works

I generated 100+ production assets using Google's Nano Banana models — 3D icons, character illustrations, brand badges, hero images, social cards, and OG images. Here's every technique that worked, every trap I fell into, and the two pipelines I use for everything now.

The Two Problems

AI image generation in 2026 is great at one thing and terrible at another:

Great at: Generating objects, illustrations, icons, characters, scenes — creative visual content where "close enough" is fine.

Terrible at: Text rendering, pixel-perfect layouts, UI screenshots, anything where specific words need to be readable.

Once I accepted this split, everything clicked. I stopped trying to make one tool do both jobs and built two separate pipelines:

Nano Banana for visual assets (icons, characters, illustrations)
HTML-to-PNG for anything with text (social cards, OG images, cover art)

Between these two pipelines, I've never needed Figma, Canva, or a designer. Let me show you exactly how each one works.

Pipeline 1: Nano Banana for Visual Assets

"Nano Banana" is Google's branding for Gemini's native image generation. Three models, one name:

Model	ID	Best For	Speed
Nano Banana Pro	`gemini-3-pro-image-preview`	Quality-critical single assets. Has "Thinking" mode.	Slower
Nano Banana 2	`gemini-3.1-flash-image-preview`	Batch work, iteration. Up to 4K, 14 refs, Search grounding.	Fast
Nano Banana	`gemini-2.5-flash-image`	Quick tests, speed-optimized.	Fastest

I used Nano Banana 2 for 90% of work and Pro for final polished assets. Here's the workflow.

The Transparent Background Problem

Nano Banana cannot generate transparent PNGs. Every image comes with a background. For icons, characters, and badges that need to overlay on other designs, you need transparency.

The solution: the green screen trick.

The Green Screen Pipeline

1. Prompt for a solid #00FF00 green background
2. Post-process with NumPy to detect green pixels
3. Set green pixels to alpha = 0 (transparent)
4. Anti-alias edges to prevent green fringing
Result: Clean transparent PNGs, every time.

The prompt injection is critical. Every single image prompt gets this prefix automatically:

CRITICAL: Generate the subject on a SOLID BRIGHT GREEN
(#00FF00) background. The green must be pure and uniform
— no gradients, no variation, no shadow on the green.
The ENTIRE background must be #00FF00 green with
absolutely nothing else.

And the green removal is more sophisticated than a simple color threshold. Here's the actual logic:

# Detection: HSV-like green channel dominance
is_green = (g > r + 30) & (g > b + 30) & (g > 100)

# Hard transparency on clearly green pixels
alpha[is_green] = 0

# Edge treatment: dilated mask finds boundary pixels
# Reduces green fringing, softens alpha to 0.8x
# Pulls green channel down toward red/blue average

The edge anti-aliasing is what separates this from a naive chromakey. Without it, you get cyan and yellow fringing around every edge — the classic green screen artifact that makes everything look cheap.

The Style Prefix System

Every prompt also gets a brand style injection:

Flat vector illustration style.
Color palette ONLY: deep purple (#4007AD),
golden yellow (#FFDF6B), dark purple (#260A72),
and white. No other colors.
Clean, modern, professional tech company aesthetic.
No text unless specifically requested.

This is how I maintained visual consistency across 100+ assets. Without it, each generation looks like it came from a different designer. With it, they all feel like one cohesive brand.

The Prompt Patterns That Work

After generating dozens of assets, I found a formula that consistently produces good results:

# The winning pattern for 3D icons:
"A [object], isometric 3/4 view, gaming collectible
style, glossy plastic material, just the object
floating by itself, nothing else"

# Example that shipped:
"A golden trophy, isometric 3/4 view, gaming
collectible style, glossy plastic material,
just the object floating by itself, nothing else"

The key phrases:

"isometric 3/4 view" — consistent angle across all assets
"gaming collectible style" — gives them that polished, tactile feel
"glossy plastic material" — consistent material treatment
"just the object floating by itself, nothing else" — critical. Without this, Gemini adds pedestals, frames, cards, backgrounds, sparkle borders, and other visual noise you didn't ask for

Reference Images: When to Use Them (and When Not To)

This one cost me hours. Nano Banana 2 supports up to 14 reference images (10 objects + 4 characters). Sounds great, right?

The trap: If you pass brand logo images as references, Gemini tries to reproduce the logo instead of generating the object you asked for. You ask for a trophy and get a weird morphed version of your company logo shaped vaguely like a trophy.

Use references for: Complex illustrations where style consistency matters — hero characters, scene compositions, multi-asset collections that need to feel cohesive.

Skip references for: Simple icons, geometric shapes, coins, badges, progress bars. Let the style prefix handle consistency and keep the model focused on the object itself.

4 Style Variations I Tested

Before committing to a visual direction, I generated the same 5 assets (coin, shield, chart, handshake, trophy) in 4 different styles. This is the fastest way to explore without wasting time on full batches:

3D Gaming — "Glossy plastic material, soft studio lighting, isometric 3/4 view" — Won. Professional, tactile, consistent.
Flat Modern — "Minimal design, solid colors, no gradients, no shadows, clean geometric shapes, modern SaaS style" — Too generic. Looked like every other SaaS landing page.
Outlined — "Thin purple outline, minimal fill, white interior with yellow accents, single weight stroke" — Nice for some use cases but hard to maintain consistency at scale.
Isometric Tech — "3/4 view, subtle grid lines, tech blueprint aesthetic, duotone purple + yellow" — Cool but niche. Only works for technical content.

Lesson: Always test 4-5 styles with the same 3-5 assets before committing to a direction. It takes 20 minutes and saves days of rework.

SVG Conversion

For simple icons that need to scale infinitely (favicons, app icons), I pipe the PNG through potrace to trace the silhouette into SVG:

brew install potrace
python3 generate_image.py -p "..." -n icon --svg

This works great for simple shapes (coins, badges, geometric icons). It does not work for complex illustrations, characters, or anything with gradients. Keep those as PNG.

Pipeline 2: HTML-to-PNG for Perfect Text

Here's the thing nobody tells you about AI image generation: use it for images, not for text.

Nano Banana Pro is better at text rendering than the older models — it can do headlines and short labels. But for anything precise (social cards, OG images, cover art, infographics), you want pixel-perfect control. And the fastest way to get pixel-perfect control is to just... write HTML.

The Technique: Puppeteer Screenshots

HTML template → Google Fonts loaded → Puppeteer / Chromium → PNG screenshot

Write your design as an HTML page. Set the viewport to the exact dimensions you need (1200x630 for OG images, 1200x628 for LinkedIn, 1600x900 for Twitter). Screenshot it with headless Chromium. Done.

// Puppeteer + @sparticuz/chromium
const browser = await puppeteer.launch({
  args: chromium.args,
  executablePath: await chromium.executablePath()
});
const page = await browser.newPage();
await page.setViewport({ width: 1200, height: 630 });
await page.setContent(html);
await page.waitForFunction('document.fonts.ready');
const png = await page.screenshot({
  type: 'png',
  clip: { x: 0, y: 0, width: 1200, height: 630 }
});

The document.fonts.ready wait is important — without it, Google Fonts haven't loaded yet and your text renders in the system fallback font.

Why This Beats Figma for Programmatic Assets

Templatable: One HTML template, swap the title and subtitle, generate 50 OG images. Try doing that in Figma.
Version controlled: Your design is code. It lives in git. You can diff changes.
Dynamic: Pull data from an API, generate the image on the fly. Social cards that update with real metrics.
CSS is your design tool: Gradients, blur, shadows, text effects, responsive layout — you already know how to do all of this.

The OG Image Template Pattern

Here's the template I use for all my OG images and Gumroad cover art. It's a full HTML page that screenshots to exactly 1200x630:

<body style="width:1200px;height:630px;background:#08070e">
  <!-- Decorative blurred orbs -->
  <div class="glow"
    style="background:rgba(124,58,237,.15)"></div>

  <!-- Geometric SVG lines (low opacity) -->
  <svg style="opacity:.06">
    <line x1="600" y1="0" x2="0" y2="630" />
  </svg>

  <!-- Content -->
  <div class="badge">MCP Memory Server</div>
  <h1>Brain Kit</h1>
  <p>Your tagline here.</p>
  <div class="features">
    <span>Feature 1</span>
    <span>Feature 2</span>
  </div>

  <!-- AI-generated image on the right -->
  <img src="brain.png" />

  <!-- Bottom bar -->
  <div class="bottom-bar">
    <span>dashbuilds.dev</span>
    <span>$29</span>
  </div>
</body>

The trick: combine both pipelines. Generate the hero illustration with Nano Banana (transparent PNG via green screen), then embed it in the HTML template alongside pixel-perfect text. Screenshot the whole thing. Best of both worlds.

Client-Side Fallback: html2canvas

For in-browser image generation (social media dashboards, live previews), I use html2canvas as a fallback when Puppeteer isn't available:

// Inline external images + fonts as data URLs first
// Create hidden iframe with full HTML
// Render at 2x scale for retina
const canvas = await html2canvas(element, { scale: 2 });
canvas.toBlob((blob) => {
  // Download or upload the PNG
});

The key gotcha: you need to inline all external resources (Google Fonts, images) as data URLs before rendering. html2canvas can't fetch cross-origin resources.

The Hybrid: AI Images + HTML Templates

The real power move is combining both pipelines. Here's what my production workflow looks like:

Generate visual assets with Nano Banana (3D icons, illustrations, characters) using the green screen trick for transparency
Design layouts in HTML with CSS (text, spacing, brand colors, gradients)
Embed the AI assets into the HTML template as <img> tags
Screenshot the combined result with Puppeteer

This gives you AI-generated creative with pixel-perfect text and layout. The AI handles what it's good at (visual creativity), and HTML handles what it's good at (text, precision, consistency).

I use this for:

Gumroad cover art: AI-generated brain illustration + HTML text + radial gradient mask to blend the watercolor into a dark background
Social cards: AI-generated icons + HTML layout with platform-specific dimensions (LinkedIn 1200x628, Twitter 1600x900, etc.)
OG images: AI-generated hero image + branded HTML template with feature badges

The Prompt Mistakes (And What They Produce)

Every technique above was learned by getting it wrong first. Here's what each mistake looks like — the bad prompt vs the fix, with real Nano Banana outputs:

Mistake #1

Fix

Asking for text in the image

"A badge with the text 'CERTIFIED EXPERT' in the center"

Result: "CERTIFEID EXPRRT" or "CERTIFIELD EXPRET" — garbled every time. Nano Banana Pro is better but still unreliable for multi-word text.

"A golden badge icon. No text." + render text via HTML-to-PNG overlay

Result: Perfect badge illustration + pixel-perfect text from HTML. Best of both worlds.

Mistake #2

Fix

Asking for "transparent background"

"A trophy on a transparent background"

Result: Trophy on a white background, or a gray checkerboard pattern (Gemini draws the transparency indicator as a literal pattern), or a random dark scene.

"...on a SOLID BRIGHT GREEN (#00FF00) background. The green must be pure and uniform."

Result: Clean green background → NumPy removes it → true RGBA transparency with anti-aliased edges.

Asked for transparent — got checkerboard drawn as texture

Asked for "transparent" — got a literal checkerboard

Green screen trick — clean, removable

Mistake #3

Fix

Including brand logo as reference image

generate_image.py -p "A golden coin" (with logo refs)

Result: Gemini tries to reproduce the logo on the coin, but distorted — wrong proportions, extra lines, morphed shapes. The coin becomes a logo variant.

generate_image.py -p "A golden coin" --no-refs

Result: Clean coin focused on the object itself. Use refs only for complex illustrations where style consistency matters.

Mistake #4

Fix

Including "white outline" in style prefix

"...flat vector, white outline, gaming style"

Result: Literal white border around the object, plus a dark card/frame behind it. Every generation had a white rectangle that ruined the green screen removal.

"...flat vector, gaming style. Do NOT add any outline or border."

Result: Clean object on green, no artifacts. Removing two words fixed dozens of broken generations.

White outline creates literal white borders

Actual output with "white outline" in prompt — flat white border, no depth

Mistake #5

Fix

Not saying "floating by itself"

"A purple shield with a golden star, isometric view, on green background."

Result: Shield on a pedestal, or inside a display case, or on a desk, or with sparkle borders and decorative elements you never asked for.

"...on green background. Just the object floating by itself, nothing else."

Result: Clean isolated object. This phrase is the single most important addition to any icon prompt.

Without floating by itself — gets pedestal, text, extra elements

No "floating by itself" — got pedestal, garbled text, extra elements

With floating by itself — clean isolated object

With "floating by itself" — clean, isolated, brand colors

Mistake #6

Fix

No color palette constraint

"A treasure chest with coins, gaming style"

Result: Random colors every time — red, blue, green, orange. Looks like a completely different brand on each generation. Impossible to maintain consistency across 100 assets.

"...Color palette ONLY: deep purple (#4007AD), golden yellow (#FFDF6B), dark purple (#260A72), and white. No other colors."

Result: Consistent brand look across every generation. Same palette, same feel, same family.

Mistake #7

Fix

Overloaded prompt

"A laptop showing a dashboard, a person standing next to it, a chart on the wall, coffee cup, plant, window with city view, all isometric"

Result: Cluttered mess. Elements overlap, proportions are wrong, some objects are missing or deformed. The model can't compose 6+ objects coherently.

"A laptop showing a dashboard, isometric 3/4 view. Just the object, nothing else."

Result: One clean object. Generate each element separately, compose in HTML/CSS. You get better quality AND more control.

The full pipeline result: green screen → NumPy removal → transparent PNG

→

Green screen removed with NumPy + edge anti-aliasing. Ready for HTML composition.

10 Lessons from 100+ Assets

1. "No text unless specifically requested"

Add this to every prompt. Without it, Nano Banana will cheerfully add garbled text labels, watermarks, or random words to your image. Even if you didn't mention text at all.

2. Never say "white outline"

I had "white outline" in my style prefix for weeks. It caused literal white borders and dark background cards behind every object. Removing those two words fixed dozens of broken generations.

3. "Just the object floating by itself, nothing else"

This is the most important phrase in icon generation. Without it, Gemini adds pedestals, display cases, frames, sparkle borders, and background elements you never asked for.

4. Test styles before committing

Generate the same 5 objects in 4 different styles. 20 minutes. This saves you from generating 50 assets in a style that doesn't work and having to redo everything.

5. Skip brand logos as references for simple objects

Gemini will try to morph your logo into the shape you asked for. Use references only for complex illustrations where style consistency matters more than accuracy.

6. Use Pro for heroes, Flash for everything else

Nano Banana Pro's "Thinking" mode genuinely produces better results for complex compositions (characters, scenes, multi-element layouts). For simple icons and badges, Flash is indistinguishable and 3x faster.

7. The green must be pre-set, not prompted

If you're feeding images to Veo for animation, pre-composite the character on green with PIL before sending to Veo. Prompting Veo for "green screen background" starts black for ~1 second, ruining chromakey. (More on this in my Veo 3.1 post.)

8. Edge anti-aliasing is non-negotiable

A simple "if green, make transparent" produces awful edges. You need a dilated mask that finds boundary pixels and softens them. The extra 10 lines of NumPy code save every single image.

9. HTML-to-PNG is underrated

Everyone's chasing AI-generated text rendering. Meanwhile, a 60-line HTML template + Puppeteer produces perfect text at any resolution, in any font, with any layout. Use AI for the creative parts, HTML for the precise parts.

10. Trim Veo character animations to 2.5 seconds

If you're animating characters: Veo morphs faces and arms after ~3 seconds. Loop a 2.5-second trim and it looks like an intentional idle animation.

The Toolkit

Task	Tool	Why
3D icons, badges, coins	Nano Banana 2 + green screen	Fast, consistent with style prefix
Hero illustrations, characters	Nano Banana Pro + references	Thinking mode for complex compositions
Style exploration	Nano Banana 2 (4-5 variations)	Fast iteration before committing
OG images, social cards	HTML + Puppeteer	Perfect text, templatable, version controlled
Cover art, marketing images	Nano Banana + HTML (hybrid)	AI illustration embedded in HTML layout
Simple icon SVGs	Nano Banana + potrace	Scalable vectors from AI-generated shapes
In-browser image generation	html2canvas (2x scale)	Client-side fallback, no server needed
Character animation	Nano Banana → PIL → Veo → ffmpeg	Green screen pipeline for animated overlays

The Bottom Line

Stop trying to make AI generate text. Stop fighting Canva for programmatic assets. Stop paying a designer for social cards you need 50 of.

The split is simple: Nano Banana for creative visuals, HTML-to-PNG for text and layouts, and the hybrid approach when you need both. This combination replaced my entire creative toolchain — 100+ assets, zero design tools, zero dollars spent on design.

"Did you make these? They look like they came from a design agency."
— a colleague, looking at assets I generated during a lunch break

The design agency is a Python script, a green screen trick, and an HTML template. That's the whole secret.

From the shop

Brain Kit ($29)

Never lose a working prompt again. Brain Kit gives your AI tools persistent memory — capture image generation settings, style preferences, and workflows with semantic search.

Get Brain Kit — $29

Like what I build? Check out the shop — deploy-ready kits starting at $14.

More AI production guides

This post covers images. For AI video production (Veo 3.1, Remotion, ElevenLabs, extension chaining), read the companion post.

Read the Veo 3.1 guide