AI images just got dangerously good (RIP diffusion??)

Watch on YouTube (Embed)

Switch Invidious Instance

Show annotations

186,080

4,882

Genre: Science & Technology

License: Standard YouTube license

Family friendly? Yes

Shared March 26, 2025

OpenAI has always been at the forefront of LLMs, but they've never been great at generating images. At least until today's 4o image generation... Thank you Posthog for sponsoring! Check them out at: soydev.link/posthog SOURCE x.com/OpenAI/status/1904602845221187829 T3 Chat (image generation coming soon™️): soydev.link/chat Want to sponsor a video? Learn more here: soydev.link/sponsor-me Check out my Twitch, Twitter, Discord more at t3.gg/ S/O Ph4se0n3 for the awesome edit 🙏

@figloalds

"Create a <describes Theo>" The AI: ** creates an image of Primeagen **

4 days ago | [YT] | 508  

@DavidMishchenko

Looks like someone forgot to turn off night filter when making the thumbnail >_>

4 days ago | [YT] | 775  

@frittex

new way to end up homeless after uni just dropped

4 days ago | [YT] | 407

@JeremyDawesJezweb

It’s like watching a progressively loading JPEG on dial up

4 days ago | [YT] | 242

@markopavlovic8750

Previous video: google won. This video: OpenAI is pretty much unmatched

4 days ago | [YT] | 145

@lisan_al_g4ib

I gave it this prompt: “ Create a realistic image of a 100% full glass of wine, where the wine is on the verge of overflowing and only the surface tension of the wine is preventing it from spilling over the top. Literally no more empty space in the glass for more wine. This is a tricky task for AI image generation, so think long and hard about it first. Don’t mess this up.” And it actually worked!

4 days ago | [YT] | 113

@hjups

It's still diffusion, partly. Mostly likely, the LLM first autoregressively generates a set of semantic embeddings, which are then decoded with diffusion (e.g. similar to Wurstschen, but the stage C model is 4o - autoregressive). Since it was trained multi-modal, they may be using hierarchical CLIP image embeddings (e.g. 64 full-scale tokens + 4x64 overlapping patch tokens), which then go into the latent diffusion decoder. My guess is that the top-to-bottom effect is on the user side (e.g. "a big reveal" and it reduces the request rate), otherwise it wouldn't be possible to show a "preview" image - A blurry preview could be decoded from the CLIP image embeddings before passed to the diffusion model. On the other hand, if it was purely autoregressive, it might be similar to VAR, but that should be much faster and would eat too much of 4o's capacity (a diffusion decoder would allow most of the "generation" capability to be in an auxiliary model).

4 days ago | [YT] | 37

@ShaharHarshuv

Interesting thing about the text - it seems to be only able to render it like a font is being rendered. I tried to make a "broken sign" and it couldn't bring half of the text to be misaligned with the rest.

4 days ago | [YT] | 16

@GiblikJovanovic

it's really crazy how nobody is talking about the book bevelorus the hidden codex of the financial alchemists

4 days ago | [YT] | 214

@0x.rorschach

the craziest part is they're using the 4o model to do the diffusion. I wonder if 4o is generating the picture in the same way it does text to create an svg-like image where it's just an array of pixel data. that would explain the way it renders

4 days ago | [YT] | 8

@yankotliarov9239

I'll wait fo Fireship video on it

4 days ago | [YT] | 149

@mrgyani

You need to start the podcast with - "As a $400 haircut user, I need to pay my bills. Today's sponsors are.."

4 days ago | [YT] | 10

@Ownedyou

13:38 I am laughing my ass off! Now the OpaBI makes sense - the couple is bisexual and the girl in blue is checcking out the girl in red too!!

4 days ago | [YT] | 9

@KevinDay

27:18 "A little choppiness in his ear..." 😐..

4 days ago | [YT] | 23

@0xLostInCode

Honestly, people are gonna forget about this model in a couple of weeks.

4 days ago | [YT] | 11

@CloakDev

This is human concurrency at its peak. I strive to be this efficient. Me: Let me tell you about the new image processing (waits until image processes and explains whats happening in the meantime) Theo: Let me tell you about the new image processing. While thats going let me spin up another example with the text updated. Lets also spin up a Dall-e example. While thats going lets also read this article and react to a video.

4 days ago (edited) | [YT] | 4

@TristanWayne-h1j

All of these image generators are just party tricks if theres no consistency. Consistent, characters, sets, vehicles, weapons and landscapes. You can't do long form content with random images.

4 days ago (edited) | [YT] | 24

@theDanielJLewis

Shoutout to Posthog! Thanks to their session-recording feature, I was recently able to prove that a user did access something they claimed they never got access to.

4 days ago | [YT] | 11

@Bub24

I am still surprised, that no one talks about how dangerous this new image generation is. Internet will be full of AI generated images that will be hard to realize if it's AI generated or not. It is scary.

4 days ago | [YT] | 18

@solmateusbraga

Does anyone know the name of the app/site he uses to sketch?

4 days ago | [YT] | 7