top of page

Opinionated Guide to SDXL Lora Training

This is a HIGHLY OPINIONATED "SCIENCE THIS" edition of SDXL lora training - NONE OF THIS is perfect, and NONE OF THIS IS "you must do this my way or you're wrong!


SDXL is brand new, we're SO USED to SD 1.5 that this is going to take time to cement more regulatory and "better" training ideas.


You do you - if you have a cooler way of doing it - AMAZING!

Software Requirements I have been using BMALTAIS for this because at the current moment, yes Linaqruf had an 0.9 Lora setting and YES LastBen has a notebook as well for SDXL 1 - I've been trying to learn bmaltais for this. That being said: I think LastBen might be using the same code as bmaltais just not the gui, so most of these settings should work no matter what.

Repository Links

Bmaltais repo: https://github.com/bmaltais/kohya_ss Runpod Docker Templates: https://github.com/ashleykleynhans/kohya-docker https://github.com/ashleykleynhans/stable-diffusion-docker VAST will work if you know how to set up the docker templates - I don't, so i'm sadly back on runpod for this. I was going to pass my vast and Runpod affiliate links but let's be real: You'd hate me for shilling out anyways the way i've been hoarding my SDXL loras - so you do you :P LastBen: https://github.com/TheLastBen/fast-stable-diffusion - Use his link for runpod, it goes straight to his docker.

DISCLAIMER:

Most of my stupidity is based on HoloStrawberry's extremely helpful SD 1.5 notebook that I clung to for dear life + Linaqruf's SD 1.5. notebook previous to that. If you're still and SD 1.5 homie PROPS TO YOU! I'm not leaving 1.5, i'm just playing with the new toys while it's still fresh and hot! So basically: Don't scream at me if i don't know what i'm doing - I learned from the big kids earlier on and I still don't know what i'm doing - I hate math, numbers piss me off and I just go by how WELL it looks when it's done.

Training Time

Data collection & Prep

First of all you need to have DATA - and in this case it really doesn't matter anymore how many images you have (it never did, i'm just lazy)

I would recommend at least 10 images. You can literally get away with fudging repeats and epochs this way.

Reminder: THIS IS A SCIENCE THIS BISH level training, nobody's perfect - in fact i'll admit this: I learned my SDXL from Envy and a little from GoofyAI - you always got someone to help you!

I'm not going to school you in WHERE HOW AND WHY to get your images, this is up to you. I've been re-training content on AI outputs as well as other data - so i've got 6+ months worth of data sitting on my imac hard drive.

In THEORY though: It should be of quality for AT LEAST 1.5 settings to get "PERFECTION" the first time around.

This means: If you've got 90s screenshots of a cartoon, best upscale that in either Affinity or Automatic.

Don't be me training an X-men Celshade lora on content that MIGHT FLY for SD 1.5 but would still be WAVERS HANDS Meeeeeh. (It came out ok in the end, but my LORAs go stronk and need dialing down so that could be it?)

So your steps for this should be:

1 - COLLECT YOUR DATA 2- UPSCALE IF NEEDED 3 - PREP YOUR FOLDERS Your FOLDER STRUCTURE gets a bit odd if you're using BMALTAIS/Kohya, it won't matter so much on LastBen or when others start making Colab notebooks - but if you're using


Bmaltais on Runpod or local:

DATA FOLDER NAME

  • IMG

  • MODEL

  • LOGS

Under "IMG" you'll want to structure it with a basic concept - the documentation is a little unclear but here's how i've been doing it:

+ Number of Repeats_ CONCEPT NAME aka: 2_illustration

Your number of repeats is going to entirely depend on HOW LONG and how strong you want your lora. We'll get to that in the next section though.

BASE MODEL?

Envy recommends SDXL base. I've been using a mix of Linaqruf's model, Envy's OVERDRIVE XL and base SDXL to train stuff. Like SD 1.5, this is utterly preferential. Envy's model gave strong results, but it WILL BREAK the lora on other models. Sadly, anything trained on Envy Overdrive doesnt' work on OSEA SDXL model.

Repeats + Epochs

Again this is all a preference.

Your repeats is a math game, your epochs is a math game. I don't do math well, I'm the ARTISTIC AUTISTIC - so here's how I figure this out:

10-20 images needs AT LEAST 10 repeats PER EPOCH. With 10 images, i actually went for 40 repeats, because the epochs don't always match steps per repeat - so you sadly still have a math game - but i've been doing 5 epochs and 1-2 batch size max.

Your batch size will help calculate the steps.

Right now i've got an hour ish long train on a 2 repeat 500 image set - I'm still experimenting, and this one MIGHT BOMB on me but here's the setup:

running training / 学習開始

num train images * repeats / 学習画像の数×繰り返し回数: 1006

num reg images / 正則化画像の数: 0

num batches per epoch / 1epochのバッチ数: 503

num epochs / epoch数: 5

batch size per device / バッチサイズ: 2

gradient accumulation steps / 勾配を合計するステップ数 = 1

total optimization steps / 学習ステップ数: 2515 That's straight from my Kohya Logs. That should sort an idea of how it does stuff. This is also based on WHAT I USE FOR the learning rate etc. Remember: SD 1.5 settings don't always bode well for SDXL - AND ALSO: PREFERENCE! I've had success with Envy's main suggestions, and some tricks from GoofyAI i picked up.

So here's more information from my logs for you:

Using DreamBooth method. ignore directory without repeats / 繰り返し回数のないディレクトリを無視します: .ipynb_checkpoints prepare images. found directory dataset/Yashahime_Style/img/2_anime contains 503 image files 1006 train images with repeating. 0 reg images. no regularization images / 正則化画像が見つかりませんでした [Dataset 0] batch_size: 2 resolution: (1024, 1024) enable_bucket: True min_bucket_reso: 256 max_bucket_reso: 2048 bucket_reso_steps: 64 bucket_no_upscale: True [Subset 0 of Dataset 0] image_dir: "dataset/Yashahime_Style/img/2_anime" image_count: 503 num_repeats: 2 shuffle_caption: True keep_tokens: 0 caption_dropout_rate: 0.05 caption_dropout_every_n_epoches: 0 caption_tag_dropout_rate: 0.0 color_aug: False flip_aug: False face_crop_aug_range: None random_crop: False token_warmup_min: 1, token_warmup_step: 0, is_reg: False class_tokens: anime caption_extension: .txt

Learning Rate + Other Stuff

Ok LEARNING RATE IS GOING TO CAUSE A WORLD WAR SIX Level argument here but let me just remind you before you throw shade and facepalm at me: SDXL runs different than 1.5. It's a preference also in HOW and why and otherwise. Learning Rate I've been using with moderate to high success: 1e-7

Learning rate on SD 1.5 that CAN WORK if you know what you're doing but hasn't worked for me on SDXL: 5e4

(JSON FILES NOT INCLUDED HERE, SEE ORIGINAL CIVIT ARTICLE)

I have been using DADAPTATION, with COSINE - and i've not been using restarts unless it was the Adafactor version.


So with Yashahime Style i went back and added COSINE WITH RESTARTS like SD 1.5 and GoofyAI have been using.

I've also been adding in the salted 'MIN SNR GAMMA' with no clue if it influences SDXL or not, so i've set MIN SNR GAMMA to 5 like the original notebooks on 1.5 stated.

So far your settings should be:
  • Standard Lora (i have no clue what other stuff works on SD XL yet)

  • Train batch size NO MORE THAN 3 - You can push it to 3 for SDXL but it weakens it severely.

  • FP16

  • Epochs is a preference but 3-5 should be fine. (This also depends if you want more repeats or more cooking time, don't burn your lora or i'll smack you lol)

  • LR SCHEDULER: Cosine or Cosine with Restarts (Dunno if i set mine correctly i'm tired check my settings for Yashahime)

  • LR CYCLES - I set it to 3 cause i think that's what I was doing on Holostrawberry's stuff.

  • Cache Latents & CACHE THEM TO DISK (even on runpod do this)

  • SEED: I Dunno i just -- I had set mine the same way Envy did 12345 - I know normally seed is like -1 on 1.5 but i'm brain dead shht.

  • Optimizer: DaDaptation - For me this works, OR use Adafactor (Check the Pinkspider Json file for any extra arguments for the optimizer)

  • L% Warmup: I forgot like a naughty dunce to actually fix this, I can't remember the OG setting.

  • NO HALF VAE: you don't click this you get "NANS IN LATENT" -and you will scream.

BUT DUSK: YOUR LORAS ARE CHONK FOR SDXL

..... I've got no love loss for you on this one, you're not gonna get FallenIncursio 1mb loras right off the bat. I never make mine that small personally, his are AMAZING to work for that small and it's a concept that is LORA TRAINING hilarity and it's a preference! Sometimes people do that and things don't always pan out - Fallen's are great, and I have NO problem - but with SDXL the less the dim the ACTUAL LESS QUALITY as far as i've seen.

32 DIM should be your ABSOLUTE MINIMUM for SDXL at the current moment. This yes, is a large and strong opinionated YELL from me - you'll get a 100mb lora, unlike SD 1.5 where you're gonna get like a 70mb Lora.

Don't forget your FULL MODELS on SDXL are 6.2 GB and pruning has not been a thing yet. This is EARLY days - and everyone's putting their dollar and fifty cents in. Again: I'm opinionated and could be WRONG by all matter. Don't take my article as a bible, I'm trained in graphic design and illustration - not programming.

DIM SIZE?

I've been working on 64/32. It produces 300+ mb loras, YES. It's FREAKING ANNOYING Also that currently I almost REFUSE to learn ComfyUI, and Automatic1111 breaks when trying to use lora from SDXL. (And yes, i've had an updated one, the runpod docker image i've shown is the one with SD&CN&Roop as well as Kohya.)

Again you CAN EMPLOY SD1.5 shenanigans and try and DENY IT MORE DIM - and I WILL be proud of you if you can get a 1mb SDXL lora. But i can't promise that it'll hold a lot of data. SD 1.5 didn't NEED to hold that much data, and I"m not sure WHY or how or otherwise other than maybe it was the sheer brute force of "SCIIIIEEENCE!"

Other Settings?

Oh yes, let's finish this shall we?

  • Gradient Checkpointing

  • Shuffle caption (ALWAYS ALWAYS!)

  • Memory Effecient attention

  • Xformers (if ya got it flaunt it)

  • MIN SNR GAMMA - we talked about this earlier -5.

  • DO NOT UPSCALE BUCKET RESOLUTION (Dont ask don't tell it's auto click for me)

  • Bucket Reso steps 64.

  • (ABUNCH Of settings for time steps leave alone those are automatic)

  • Noise offset: OG and set to: 0.0357

  • Adaptive noise scale: 0.00357

  • Something about Rate of caption dropout (I dont use cause shuffle caption but it's set to: 0.05)

  • If you have a WANDB (Weights and Balances) API setup - go f or gold, I like this sometimes because tensorboard confuses me sometimes



Testing Time!

What do you do when you hate the idea of node based and A1111 aint working:

Take this link:

You have received FREE credits, start your ONLINE GENERATION now! https://tensor.art/models/624847635087557370?source_id=nz-3plnjkUG1ofAvanb09hMv (Yes i'm shilling tensor shht) - Upload your lora, TEST IT and if you don't want it exclusive to tensor - just TEST it there and then get your generation details set to a text file or just copy it over as you gen them.

Win/Win.

118 views0 comments

Comments


bottom of page