Image-to-Image

After Text-to-Image, Image-to-Image is the most common use case of any diffusion models on Automatic 1111. You can very easily generate images using text prompts using reference images using Auto 1111 SDK in the following way:

  1. Load a .safetensors weights file or .checkpoint (ckpt) file and initialize a StableDiffusionPipeline. This initialized pipeline can be the same for Text to Image, or any of our other inference pipelines.

from auto1111sdk import StableDiffusionPipeline

pipe = StableDiffusionPipeline("model.safetensors")
  1. Load the input image

from PIL import Image

input_image = Image.open("<path to your local image file>")
  1. Pass a prompt to the pipeline to generate an image:

prompt = "A picture of a realistic dog"
output = pipe.generate_img2img(prompt = prompt, init_image = input_image)

The output type will be a list of Image PIL objects. By default, the pipeline will generate 1 image, however, you can specify this in the list of parameters:

output = pipe.generate_img2img(prompt = prompt, init_image = input_image, num_images = 2)
output[0].save("dog1.png")
output[1].save("dog2.png")

Parameters

Currently, Auto 1111 SDK supports the following parameters (and the corresponding default values):

init_image: a PIL Image object
prompt: str
negative_prompt: str = ''
seed: int = -1
steps: int = 20
height: int = 512
width: int = 512
cfg_scale: float = 7.5
num_images: int = 1
sampler_name: str = 'Euler'
denoising_strength: float = 0.75

Last updated