April 4, 2026 · 11 min read

AI Image Generation That Actually Works

I generated 100+ production assets using Google's Nano Banana models — 3D icons, character illustrations, brand badges, hero images, social cards, and OG images. Here's every technique that worked, every trap I fell into, and the two pipelines I use for everything now.

The Two Problems

AI image generation in 2026 is great at one thing and terrible at another:

Great at: Generating objects, illustrations, icons, characters, scenes — creative visual content where "close enough" is fine.
Terrible at: Text rendering, pixel-perfect layouts, UI screenshots, anything where specific words need to be readable.

Once I accepted this split, everything clicked. I stopped trying to make one tool do both jobs and built two separate pipelines:

  1. Nano Banana for visual assets (icons, characters, illustrations)
  2. HTML-to-PNG for anything with text (social cards, OG images, cover art)

Between these two pipelines, I've never needed Figma, Canva, or a designer. Let me show you exactly how each one works.

Pipeline 1: Nano Banana for Visual Assets

"Nano Banana" is Google's branding for Gemini's native image generation. Three models, one name:

ModelIDBest ForSpeed
Nano Banana Progemini-3-pro-image-previewQuality-critical single assets. Has "Thinking" mode.Slower
Nano Banana 2gemini-3.1-flash-image-previewBatch work, iteration. Up to 4K, 14 refs, Search grounding.Fast
Nano Bananagemini-2.5-flash-imageQuick tests, speed-optimized.Fastest

I used Nano Banana 2 for 90% of work and Pro for final polished assets. Here's the workflow.

The Transparent Background Problem

Nano Banana cannot generate transparent PNGs. Every image comes with a background. For icons, characters, and badges that need to overlay on other designs, you need transparency.

The solution: the green screen trick.

The Green Screen Pipeline

1. Prompt for a solid #00FF00 green background
2. Post-process with NumPy to detect green pixels
3. Set green pixels to alpha = 0 (transparent)
4. Anti-alias edges to prevent green fringing
Result: Clean transparent PNGs, every time.

The prompt injection is critical. Every single image prompt gets this prefix automatically:

CRITICAL: Generate the subject on a SOLID BRIGHT GREEN
(#00FF00) background. The green must be pure and uniform
— no gradients, no variation, no shadow on the green.
The ENTIRE background must be #00FF00 green with
absolutely nothing else.

And the green removal is more sophisticated than a simple color threshold. Here's the actual logic:

# Detection: HSV-like green channel dominance
is_green = (g > r + 30) & (g > b + 30) & (g > 100)

# Hard transparency on clearly green pixels
alpha[is_green] = 0

# Edge treatment: dilated mask finds boundary pixels
# Reduces green fringing, softens alpha to 0.8x
# Pulls green channel down toward red/blue average

The edge anti-aliasing is what separates this from a naive chromakey. Without it, you get cyan and yellow fringing around every edge — the classic green screen artifact that makes everything look cheap.

The Style Prefix System

Every prompt also gets a brand style injection:

Flat vector illustration style.
Color palette ONLY: deep purple (#4007AD),
golden yellow (#FFDF6B), dark purple (#260A72),
and white. No other colors.
Clean, modern, professional tech company aesthetic.
No text unless specifically requested.

This is how I maintained visual consistency across 100+ assets. Without it, each generation looks like it came from a different designer. With it, they all feel like one cohesive brand.

The Prompt Patterns That Work

After generating dozens of assets, I found a formula that consistently produces good results:

# The winning pattern for 3D icons:
"A [object], isometric 3/4 view, gaming collectible
style, glossy plastic material, just the object
floating by itself, nothing else"

# Example that shipped:
"A golden trophy, isometric 3/4 view, gaming
collectible style, glossy plastic material,
just the object floating by itself, nothing else"

The key phrases:

Reference Images: When to Use Them (and When Not To)

This one cost me hours. Nano Banana 2 supports up to 14 reference images (10 objects + 4 characters). Sounds great, right?

The trap: If you pass brand logo images as references, Gemini tries to reproduce the logo instead of generating the object you asked for. You ask for a trophy and get a weird morphed version of your company logo shaped vaguely like a trophy.

Use references for: Complex illustrations where style consistency matters — hero characters, scene compositions, multi-asset collections that need to feel cohesive.
Skip references for: Simple icons, geometric shapes, coins, badges, progress bars. Let the style prefix handle consistency and keep the model focused on the object itself.

4 Style Variations I Tested

Before committing to a visual direction, I generated the same 5 assets (coin, shield, chart, handshake, trophy) in 4 different styles. This is the fastest way to explore without wasting time on full batches:

  1. 3D Gaming"Glossy plastic material, soft studio lighting, isometric 3/4 view" — Won. Professional, tactile, consistent.
  2. Flat Modern"Minimal design, solid colors, no gradients, no shadows, clean geometric shapes, modern SaaS style" — Too generic. Looked like every other SaaS landing page.
  3. Outlined"Thin purple outline, minimal fill, white interior with yellow accents, single weight stroke" — Nice for some use cases but hard to maintain consistency at scale.
  4. Isometric Tech"3/4 view, subtle grid lines, tech blueprint aesthetic, duotone purple + yellow" — Cool but niche. Only works for technical content.

Lesson: Always test 4-5 styles with the same 3-5 assets before committing to a direction. It takes 20 minutes and saves days of rework.

SVG Conversion

For simple icons that need to scale infinitely (favicons, app icons), I pipe the PNG through potrace to trace the silhouette into SVG:

brew install potrace
python3 generate_image.py -p "..." -n icon --svg

This works great for simple shapes (coins, badges, geometric icons). It does not work for complex illustrations, characters, or anything with gradients. Keep those as PNG.

Pipeline 2: HTML-to-PNG for Perfect Text

Here's the thing nobody tells you about AI image generation: use it for images, not for text.

Nano Banana Pro is better at text rendering than the older models — it can do headlines and short labels. But for anything precise (social cards, OG images, cover art, infographics), you want pixel-perfect control. And the fastest way to get pixel-perfect control is to just... write HTML.

The Technique: Puppeteer Screenshots

HTML template Google Fonts loaded Puppeteer / Chromium PNG screenshot

Write your design as an HTML page. Set the viewport to the exact dimensions you need (1200x630 for OG images, 1200x628 for LinkedIn, 1600x900 for Twitter). Screenshot it with headless Chromium. Done.

// Puppeteer + @sparticuz/chromium
const browser = await puppeteer.launch({
  args: chromium.args,
  executablePath: await chromium.executablePath()
});
const page = await browser.newPage();
await page.setViewport({ width: 1200, height: 630 });
await page.setContent(html);
await page.waitForFunction('document.fonts.ready');
const png = await page.screenshot({
  type: 'png',
  clip: { x: 0, y: 0, width: 1200, height: 630 }
});

The document.fonts.ready wait is important — without it, Google Fonts haven't loaded yet and your text renders in the system fallback font.

Why This Beats Figma for Programmatic Assets

The OG Image Template Pattern

Here's the template I use for all my OG images and Gumroad cover art. It's a full HTML page that screenshots to exactly 1200x630:

<body style="width:1200px;height:630px;background:#08070e">
  <!-- Decorative blurred orbs -->
  <div class="glow"
    style="background:rgba(124,58,237,.15)"></div>

  <!-- Geometric SVG lines (low opacity) -->
  <svg style="opacity:.06">
    <line x1="600" y1="0" x2="0" y2="630" />
  </svg>

  <!-- Content -->
  <div class="badge">MCP Memory Server</div>
  <h1>Brain Kit</h1>
  <p>Your tagline here.</p>
  <div class="features">
    <span>Feature 1</span>
    <span>Feature 2</span>
  </div>

  <!-- AI-generated image on the right -->
  <img src="brain.png" />

  <!-- Bottom bar -->
  <div class="bottom-bar">
    <span>dashbuilds.dev</span>
    <span>$29</span>
  </div>
</body>

The trick: combine both pipelines. Generate the hero illustration with Nano Banana (transparent PNG via green screen), then embed it in the HTML template alongside pixel-perfect text. Screenshot the whole thing. Best of both worlds.

Client-Side Fallback: html2canvas

For in-browser image generation (social media dashboards, live previews), I use html2canvas as a fallback when Puppeteer isn't available:

// Inline external images + fonts as data URLs first
// Create hidden iframe with full HTML
// Render at 2x scale for retina
const canvas = await html2canvas(element, { scale: 2 });
canvas.toBlob((blob) => {
  // Download or upload the PNG
});

The key gotcha: you need to inline all external resources (Google Fonts, images) as data URLs before rendering. html2canvas can't fetch cross-origin resources.

The Hybrid: AI Images + HTML Templates

The real power move is combining both pipelines. Here's what my production workflow looks like:

  1. Generate visual assets with Nano Banana (3D icons, illustrations, characters) using the green screen trick for transparency
  2. Design layouts in HTML with CSS (text, spacing, brand colors, gradients)
  3. Embed the AI assets into the HTML template as <img> tags
  4. Screenshot the combined result with Puppeteer

This gives you AI-generated creative with pixel-perfect text and layout. The AI handles what it's good at (visual creativity), and HTML handles what it's good at (text, precision, consistency).

I use this for:

The Prompt Mistakes (And What They Produce)

Every technique above was learned by getting it wrong first. Here's what each mistake looks like — the bad prompt vs the fix, with real Nano Banana outputs:

Mistake #1
Fix
Asking for text in the image
"A badge with the text 'CERTIFIED EXPERT' in the center"
Result: "CERTIFEID EXPRRT" or "CERTIFIELD EXPRET" — garbled every time. Nano Banana Pro is better but still unreliable for multi-word text.
"A golden badge icon. No text." + render text via HTML-to-PNG overlay
Result: Perfect badge illustration + pixel-perfect text from HTML. Best of both worlds.
Mistake #2
Fix
Asking for "transparent background"
"A trophy on a transparent background"
Result: Trophy on a white background, or a gray checkerboard pattern (Gemini draws the transparency indicator as a literal pattern), or a random dark scene.
"...on a SOLID BRIGHT GREEN (#00FF00) background. The green must be pure and uniform."
Result: Clean green background → NumPy removes it → true RGBA transparency with anti-aliased edges.
Asked for transparent — got checkerboard drawn as texture

Asked for "transparent" — got a literal checkerboard

Green screen technique — clean removal

Green screen trick — clean, removable

Mistake #3
Fix
Including brand logo as reference image
generate_image.py -p "A golden coin" (with logo refs)
Result: Gemini tries to reproduce the logo on the coin, but distorted — wrong proportions, extra lines, morphed shapes. The coin becomes a logo variant.
generate_image.py -p "A golden coin" --no-refs
Result: Clean coin focused on the object itself. Use refs only for complex illustrations where style consistency matters.
Mistake #4
Fix
Including "white outline" in style prefix
"...flat vector, white outline, gaming style"
Result: Literal white border around the object, plus a dark card/frame behind it. Every generation had a white rectangle that ruined the green screen removal.
"...flat vector, gaming style. Do NOT add any outline or border."
Result: Clean object on green, no artifacts. Removing two words fixed dozens of broken generations.
White outline creates literal white borders

Actual output with "white outline" in prompt — flat white border, no depth

Mistake #5
Fix
Not saying "floating by itself"
"A purple shield with a golden star, isometric view, on green background."
Result: Shield on a pedestal, or inside a display case, or on a desk, or with sparkle borders and decorative elements you never asked for.
"...on green background. Just the object floating by itself, nothing else."
Result: Clean isolated object. This phrase is the single most important addition to any icon prompt.
Without floating by itself — gets pedestal, text, extra elements

No "floating by itself" — got pedestal, garbled text, extra elements

With floating by itself — clean isolated object

With "floating by itself" — clean, isolated, brand colors

Mistake #6
Fix
No color palette constraint
"A treasure chest with coins, gaming style"
Result: Random colors every time — red, blue, green, orange. Looks like a completely different brand on each generation. Impossible to maintain consistency across 100 assets.
"...Color palette ONLY: deep purple (#4007AD), golden yellow (#FFDF6B), dark purple (#260A72), and white. No other colors."
Result: Consistent brand look across every generation. Same palette, same feel, same family.
Mistake #7
Fix
Overloaded prompt
"A laptop showing a dashboard, a person standing next to it, a chart on the wall, coffee cup, plant, window with city view, all isometric"
Result: Cluttered mess. Elements overlap, proportions are wrong, some objects are missing or deformed. The model can't compose 6+ objects coherently.
"A laptop showing a dashboard, isometric 3/4 view. Just the object, nothing else."
Result: One clean object. Generate each element separately, compose in HTML/CSS. You get better quality AND more control.

The full pipeline result: green screen → NumPy removal → transparent PNG

Trophy on green screen Trophy with transparent background

Green screen removed with NumPy + edge anti-aliasing. Ready for HTML composition.

10 Lessons from 100+ Assets

1. "No text unless specifically requested"

Add this to every prompt. Without it, Nano Banana will cheerfully add garbled text labels, watermarks, or random words to your image. Even if you didn't mention text at all.

2. Never say "white outline"

I had "white outline" in my style prefix for weeks. It caused literal white borders and dark background cards behind every object. Removing those two words fixed dozens of broken generations.

3. "Just the object floating by itself, nothing else"

This is the most important phrase in icon generation. Without it, Gemini adds pedestals, display cases, frames, sparkle borders, and background elements you never asked for.

4. Test styles before committing

Generate the same 5 objects in 4 different styles. 20 minutes. This saves you from generating 50 assets in a style that doesn't work and having to redo everything.

5. Skip brand logos as references for simple objects

Gemini will try to morph your logo into the shape you asked for. Use references only for complex illustrations where style consistency matters more than accuracy.

6. Use Pro for heroes, Flash for everything else

Nano Banana Pro's "Thinking" mode genuinely produces better results for complex compositions (characters, scenes, multi-element layouts). For simple icons and badges, Flash is indistinguishable and 3x faster.

7. The green must be pre-set, not prompted

If you're feeding images to Veo for animation, pre-composite the character on green with PIL before sending to Veo. Prompting Veo for "green screen background" starts black for ~1 second, ruining chromakey. (More on this in my Veo 3.1 post.)

8. Edge anti-aliasing is non-negotiable

A simple "if green, make transparent" produces awful edges. You need a dilated mask that finds boundary pixels and softens them. The extra 10 lines of NumPy code save every single image.

9. HTML-to-PNG is underrated

Everyone's chasing AI-generated text rendering. Meanwhile, a 60-line HTML template + Puppeteer produces perfect text at any resolution, in any font, with any layout. Use AI for the creative parts, HTML for the precise parts.

10. Trim Veo character animations to 2.5 seconds

If you're animating characters: Veo morphs faces and arms after ~3 seconds. Loop a 2.5-second trim and it looks like an intentional idle animation.

The Toolkit

TaskToolWhy
3D icons, badges, coinsNano Banana 2 + green screenFast, consistent with style prefix
Hero illustrations, charactersNano Banana Pro + referencesThinking mode for complex compositions
Style explorationNano Banana 2 (4-5 variations)Fast iteration before committing
OG images, social cardsHTML + PuppeteerPerfect text, templatable, version controlled
Cover art, marketing imagesNano Banana + HTML (hybrid)AI illustration embedded in HTML layout
Simple icon SVGsNano Banana + potraceScalable vectors from AI-generated shapes
In-browser image generationhtml2canvas (2x scale)Client-side fallback, no server needed
Character animationNano Banana → PIL → Veo → ffmpegGreen screen pipeline for animated overlays

The Bottom Line

Stop trying to make AI generate text. Stop fighting Canva for programmatic assets. Stop paying a designer for social cards you need 50 of.

The split is simple: Nano Banana for creative visuals, HTML-to-PNG for text and layouts, and the hybrid approach when you need both. This combination replaced my entire creative toolchain — 100+ assets, zero design tools, zero dollars spent on design.

"Did you make these? They look like they came from a design agency."

— a colleague, looking at assets I generated during a lunch break

The design agency is a Python script, a green screen trick, and an HTML template. That's the whole secret.

From the shop

Brain Kit ($29)

Never lose a working prompt again. Brain Kit gives your AI tools persistent memory — capture image generation settings, style preferences, and workflows with semantic search.

Get Brain Kit — $29

Like what I build? Check out the shop — deploy-ready kits starting at $14.


More AI production guides

This post covers images. For AI video production (Veo 3.1, Remotion, ElevenLabs, extension chaining), read the companion post.

Read the Veo 3.1 guide