Fine-Tuning Stable Diffusion XL Model for Personalized Image Generation on AWS SageMaker

Model Fine Tuning with Amazon SageMaker Studio

Introduction

Building upon a previous machine learning blog post to create personalized avatars by fine-tuning and hosting the Stable Diffusion 2.1 model at scale using Amazon SageMaker, this post takes the journey a step further. As technology continues to evolve, newer models are emerging, offering higher quality, increased flexibility, and faster image generation capabilities.

One such groundbreaking model is Stable Diffusion XL (SDXL), released by StabilityAI, advancing the text-to-image generative AI technology to unprecedented heights. In this post, we demonstrate how to efficiently apply model fine-tuning using SageMaker Studio and prepare the fine-tuned model to run on AWS Inferentia2 powered Amazon EC2 Inf2 instances, unlocking superior price-performance for inference workloads.

Solution Overview

SDXL 1.0 is a text-to-image generation model developed by Stability AI, consisting of over 3 billion parameters. It comprises key components such as a text encoder that converts input prompts into latent representations and a U-Net model that generates images through a diffusion process.

Despite its impressive capabilities, model fine-tuning is often required when app builders need to generate images for a specific subject or style that are hard to describe in words. This is where DreamBooth and Low-Rank Adaptation (LoRA) come into play. These techniques allow for efficient fine-tuning, enabling better relevance and personalization using custom datasets.

By fine-tuning SDXL on Amazon SageMaker, businesses can create personalized models with reduced storage overhead while improving training speed. Once fine-tuned, the model can be compiled and deployed on Inf2 instances using the AWS Neuron SDK, which enhances performance and cost efficiency for inference workloads.

Prerequisites

Before you get started, review the list of services and instance types required to run the sample notebooks provided at this GitHub location.

– Basic understanding of Stable Diffusion models. Refer to Create high-quality images with Stable Diffusion models and deploy them cost-efficiently with Amazon SageMaker for more information.
– General knowledge about foundation models and how fine-tuning brings value. Read more on Fine-tune a foundation model.
– An Amazon Web Services account. Confirm your AWS identity has the requisite permissions, including the ability to create SageMaker resources (domain, model, and endpoints) and Amazon Simple Storage Service access to upload model artifacts. Alternatively, you can attach the AmazonSageMakerFullAccess managed policy to your AWS Identity and Access Management user or role.
– This notebook is tested using the default Python 3 kernel on SageMaker Studio. A GPU instance such as ml.g5.2xlarge is recommended. Refer to the documentation on setting up a domain for SageMaker Studio.
– For compiling the fine-tuned model, an inf2.8xlarge or larger Amazon Elastic Compute Cloud instance with Hugging Face Neuron Deep Learning AMI (Ubuntu 22.04) is required. The instance comes with the required neuron drivers, libraries, and Jupyter Lab preinstalled.

By following these prerequisites, you will have the necessary knowledge and AWS resources to run the sample notebooks and work with Stable Diffusion models and foundation models on Amazon SageMaker.

Fine-Tuning SDXL on SageMaker

To fine-tune SDXL on SageMaker, follow the steps in the next sections.

Prepare the Images

The first step in model fine-tuning is preparing training images. Using DreamBooth, you only need 10–12 images to fine-tune the model efficiently. It is recommended to use:

The training images should include selfies taken from different angles, covering various perspectives of your face. Include images with different facial expressions, such as smiling, frowning, and neutral. Preferably, use images with different backgrounds to help the model identify the subject more effectively. By providing a diverse set of images, DreamBooth can better identify the subject from the pictures and generalize your facial features.

Training Images

Additionally, use 1024×1024 pixel square images for fine-tuning. To simplify the process of preparing the images, there is a utility function that automatically crops and adjusts your images to the correct dimensions.

Train the Personalized Model

After preparing images, model fine-tuning begins using the autoTrain library from Hugging Face. This library simplifies fine-tuning and deployment without requiring extensive coding knowledge.

!autotrain dreambooth \
–prompt “${INSTANCE_PROMPT}” \
–class-prompt “${CLASS_PROMPT}” \
–model ${MODEL_NAME} \
–project-name ${PROJECT_NAME} \
–image-path “${IMAGE_PATH}” \
–resolution ${RESOLUTION} \
–batch-size ${BATCH_SIZE} \
–num-steps ${NUM_STEPS} \
–gradient-accumulation ${GRADIENT_ACCUMULATION} \
–lr ${LEARNING_RATE} \
–fp16 \
–gradient-checkpointing

First, you need to set the prompt and class-prompt. The prompt should include a unique identifier or token that the model can reference to the subject. The class-prompt, on the other hand, is used to subsidize the model training with similar subjects of the same class. This is a requirement for the DreamBooth technique to better associate the new token with the subject of interest. This is why the DreamBooth technique can generate exceptional fine-tuned results with fewer input images. Additionally, you’ll notice that even though you didn’t provide examples of the top or back of our head, the model still knows how to generate them because of the class prompt. In this example, you are using <> as a unique identifier to avoid a name that the model might already be familiar with.

“`python
instance_prompt = “photo of <>”
class_prompt = “photo of a person”
“`

Next, you need to provide the model, image-path, and project-name. The model name loads the base model from the Hugging Face Hub or locally. The image-path is the location of the training images. By default, autoTrain uses LoRA, a parameter-efficient way to fine-tune. Unlike traditional fine-tuning, LoRA fine-tunes by attaching a small transformer adapter model to the base model. Only the adapter weights are updated during training to achieve fine-tuning behavior. Additionally, these adapters can be attached and detached at any time, making them highly efficient for storage as well. These supplementary LoRA adapters are 98% smaller in size compared to the original model, allowing us to store and share the LoRA adapters without having to duplicate the base model repeatedly.

The rest of the configuration parameters are as follows. You are recommended to start with these values first. Adjust them only if the fine-tuning results don’t meet your expectations.

“`python
resolution = 1024 # resolution or size of the generated images
batch_size = 1 # number of samples in one forward and backward pass
num_steps = 500 # number of training steps
gradient_accumulation = 4 # accumulating gradients over number of batches
learning_rate = 1e-4 # step size
fp16 # half-precision
gradient-checkpointing # technique to reduce memory consumption during training
“`

The entire training process takes about 30 mins with the preceding configuration. After the training is done, you can load the LoRA adapter, such as the following code, and generate fine-tuned images.

“`python
from diffusers import DiffusionPipeline
import random

seed = random.randint(0, 100000)

# loading the base model
pipeline = DiffusionPipeline.from_pretrained(
model_name_base,
torch_dtype=torch.float16,
).to(device)

# attach the LoRA adapter
pipeline.load_lora_weights(
project_name,
weight_name=”pytorch_lora_weights.safetensors”,
)

# generate fine tuned images
generator = torch.Generator(device).manual_seed(seed)
base_image = pipeline(
prompt=prompt,
negative_prompt=negative_prompt,
num_inference_steps=50,
generator=generator,
height=1024,
width=1024,
output_type=”pil”,
).images[0]
base_image
“`

Deploy on Amazon EC2 Inf2 Instances

In this section, you learn to compile and host the fine-tuned SDXL model on Inf2 instances. To begin, you need to clone the repository and upload the LoRA adapter onto the Inf2 instance created in the prerequisites section. Then, run the compilation notebook to compile the fine-tuned SDXL model using the Optimum Neuron library. Visit the Optimum Neuron page for more details.

The NeuronStableDiffusionXLPipeline class in Optimum Neuron now has direct support for LoRA. All you need to do is to supply the base model, LoRA adapters, and supply the model input shapes to start the compilation process. The following code snippet illustrates how to compile and then export the compiled model to a local directory.

“`python
from optimum.neuron import NeuronStableDiffusionXLPipeline

model_id = “stabilityai/stable-diffusion-xl-base-1.0”
adapter_id = “lora”
input_shapes = {“batch_size”: 1, “height”: 1024, “width”: 1024, “num_images_per_prompt”: 1}

# Compile
pipe = NeuronStableDiffusionXLPipeline.from_pretrained(
model_id,
export=True,
lora_model_ids=adapter_id,
lora_weight_names=”pytorch_lora_weights.safetensors”,
lora_adapter_names=”sttirum”,
**input_shapes,
)

# Save locally or upload to the HuggingFace Hub
save_directory = “sd_neuron_xl/”
pipe.save_pretrained(save_directory)
“`

The compilation process takes about 35 minutes. After the process is complete, you can use the NeuronStableDiffusionXLPipeline again to load the compiled model back.

“`python
from optimum.neuron import NeuronStableDiffusionXLPipeline

stable_diffusion_xl = NeuronStableDiffusionXLPipeline.from_pretrained(“sd_neuron_xl”)
“`

You can then test the model on Inf2 and make sure that you can still generate the fine-tuned results.

“`python
import torch
# Run pipeline
prompt = “””
photo of <> , 3d portrait, ultra detailed, gorgeous, 3d zbrush, trending on dribbble, 8k render
“””

negative_prompt = “””
ugly, tiling, poorly drawn hands, poorly drawn feet, poorly drawn face, out of frame, extra limbs, disfigured, deformed, body out of frame, blurry, bad anatomy, blurred,
watermark, grainy, signature, cut off, draft, amateur, multiple, gross, weird, uneven, furnishing, decorating, decoration, furniture, text, poor, low, basic, worst, juvenile,
unprofessional, failure, crayon, oil, label, thousand hands
“””

seed = 491057365
generator = [torch.Generator().manual_seed(seed)]
image = stable_diffusion_xl(prompt,
num_inference_steps=50,
guidance_scale=7,
negative_prompt=negative_prompt,
generator=generator).images[0]
“`

Here are a few avatar images generated using the fine-tuned model on Inf2. The corresponding prompts are the following:

Prompts for Generated Images

– emoji of <>, astronaut, space ship background
– oil painting of <>, business woman, suit
– photo of <>, 3d portrait, ultra detailed, 8k render
– anime of <>, ninja style, dark hair

Generated Images

Clean Up

To avoid incurring AWS charges after you finish testing this example, make sure you delete the following resources:

– Amazon SageMaker Studio Domain
– Amazon EC2 Inf2 instance

Conclusion

This post demonstrated how to perform model fine-tuning on Stable Diffusion XL (SDXL) using DreamBooth and LoRA techniques on Amazon SageMaker. These techniques allow businesses to generate highly personalized and domain-specific images with as few as 10–12 training images.

Additionally, we showcased how to compile and deploy the fine-tuned SDXL model on AWS Inferentia2-powered EC2 Inf2 instances, ensuring cost-efficient and high-performance inference.

Cloudastra supports enterprises in leveraging advanced AI technologies, ensuring efficient model fine-tuning and deployment in cloud environments.

Do you like to read more educational content? Read our blogs at Cloudastra Technologies or contact us for business enquiry at Cloudastra Contact Us.

 

Leave a Comment

Your email address will not be published. Required fields are marked *

Scroll to Top