Back to Top

Stable diffusion and Virtual Try-On (VTON)

Updated 13 May 2024

Stable Diffusion

Stable diffusion is a deep learning, text to image model. Stability AI released Stable Diffusion in 2022. Stable diffusion generates detailed images based on text prompts.

It can be also powerful tool for inpainting(filling in missing image areas) and outpainting(extending an image beyond its original borders).

Stable Diffusion leverages a diffusion model architecture called the latent diffusion model.

Variational autoencoder(VAE), U-Net and Optional text encoder are the part of Stable diffusion .

The VAE encoder compress the image from pixel space to smaller dimensional latent space. Gaussian noise is iteratively applied to the compressed latent representation during forward diffusion. Finally, the VAE decoder generates the final image by converting the representation back into pixel space.

Start your headless eCommerce
Find out More
Stable Diffusion

Capabilities of Stable Diffusion :

1. text2img:

It generates the Image as per given prompt instructions.Each txt2img generation will involve a specific seed value which affects the output image.

You can opt to randomise the seed in order to explore different generated outputs or use the same seed to obtain the same image output as a previously generated image.

Text to image generation example:


Image generated by AI

2. img2img :

It has capability to made image to image generation or modification.
In this process it uses text prompt, existing image and strength value between 0.0 and 1.0.

The amount of noise added to the output image, A higher strength value produces more variation within the image but may produce an image that is not semantically consistent with the prompt provided.

Image to Image generation example:

Colourise the sketch by Stable Diffusion

3. img2img style transfer:

We can also use in inpainting for style transfer where we give two existing image for generate new image.

Select 1 image where we want to made changes and select and mask the area of image 1 then select image 2 for reference image then select strength 0.0 to 1.0. After start inpainting and it start making changes in the selected area.

Style transfer example:


Virtual Try-On (VTON):

Virtual Try-On is a technique that allows users to virtually try on clothes, accessories, or other items without actually wearing them. It typically involves image synthesis, where a model generates an output image of the user wearing the desired item.


Virtual Try-On (VTON)


VTON architecture

How it differs from Standard VTON (Virtual Try-On)

The main differences between stable diffusion and standard VTON are:

  • Generation Process: Stable diffusion involves a stochastic process that refines the input noise signal to produce a realistic output. In contrast, standard VTON typically uses a deterministic approach, where the output is generated through a fixed transformation of the input image.
  • Distributional Stability: Stable diffusion maintains a stable distribution throughout the generation process, ensuring that the generated samples are realistic and diverse. Standard VTON methods may not guarantee this stability, which can result in less realistic or diverse outputs.
  • Flexibility and Controllability: SD models can be conditioned on various factors, such as pose, expression, or clothing style, allowing for more flexibility and controllability in the generation process. Standard VTON methods might not offer the same level of flexibility.
  • Realism and Diversity: SD models are known for generating highly realistic and diverse outputs, which is crucial for applications like VTON. Standard VTON methods may not be able to achieve the same level of realism and diversity.
  • Architecture and Training: SD models typically require a different architecture and training regimen compared to standard VTON methods.
  • SD models often employ a noise schedule and a series of transformations, whereas standard VTON methods might use encoder-decoder architectures or other techniques.

In summary, stable diffusion offers a more flexible, controllable, and realistic approach to VTON, whereas standard VTON methods might be more limited in their capabilities.

However, both approaches have their strengths and weaknesses, and the choice of method depends on the specific application and requirements.

. . .

Leave a Comment

Your email address will not be published. Required fields are marked*

Be the first to comment.

Back to Top

Message Sent!

If you have more details or questions, you can reply to the received confirmation email.

Back to Home