Diffusion Single File
comfyui

have Lora training times improved? and what am i missing...

#129
by aubreyz - opened

last time i tried it on prev 1 and 2 was getting 2h+ on 1500~ steps which is bad when i go for only 40 images and same goes for 60 images :/
i normally get around 1 hour on illustrator models so this one is a big jump
also are there resources / posts on people training tests / info i still don't know what's the perfect steps , epochs, images to go for when training

i mainly go for anime characters and styles
rtx 4070 ti 12gb Vram, 32 gb ddr5, have the training data and models on an m.2 ssd so should be fast,

First trying train on finetuned anima checkpoint.
I usually trying for 1 epoch with repeats was ~100/batch size steps, so usually on 15-18 Epochs getting much better results than it was on Illustrous.

i managed to train characters and styles fairly quick, characters are 40 mins and styles are 20 mins. The main saver for me was dropping the rank and steps. Characters is 1000 steps, 16/8, 512, adamw8bit. Styles is 400 steps, 4/2, 512, adamw8bit. Doing this on a 3050 8gb and happy with the results. 32gb internal ram, ssd. If this place had dms I'd share some of my results so you can stick 'em in your local gen and test them, alongside uploading to metadata viewers so you can verify the results.

i managed to train characters and styles fairly quick, characters are 40 mins and styles are 20 mins. The main saver for me was dropping the rank and steps. Characters is 1000 steps, 16/8, 512, adamw8bit. Styles is 400 steps, 4/2, 512, adamw8bit. Doing this on a 3050 8gb and happy with the results. 32gb internal ram, ssd. If this place had dms I'd share some of my results so you can stick 'em in your local gen and test them, alongside uploading to metadata viewers so you can verify the results.

....we are on huggingface. This website is literally made for uploading and sharing stuff.

i managed to train characters and styles fairly quick, characters are 40 mins and styles are 20 mins. The main saver for me was dropping the rank and steps. Characters is 1000 steps, 16/8, 512, adamw8bit. Styles is 400 steps, 4/2, 512, adamw8bit. Doing this on a 3050 8gb and happy with the results. 32gb internal ram, ssd. If this place had dms I'd share some of my results so you can stick 'em in your local gen and test them, alongside uploading to metadata viewers so you can verify the results.

How can training a character have more steps than training a style? Shouldn't it be the opposite? Same thing with network dim and network alpha...

i managed to train characters and styles fairly quick, characters are 40 mins and styles are 20 mins. The main saver for me was dropping the rank and steps. Characters is 1000 steps, 16/8, 512, adamw8bit. Styles is 400 steps, 4/2, 512, adamw8bit. Doing this on a 3050 8gb and happy with the results. 32gb internal ram, ssd. If this place had dms I'd share some of my results so you can stick 'em in your local gen and test them, alongside uploading to metadata viewers so you can verify the results.

i forgot i made the post whops, thanks for the info! i also was testing with 500 to 750 steps when anima came out but i was using 1024x1024, currently i am doing 768 for character and 1024 for character closeups i wasted 6 hour of two sessions training on 1500 steps today and last night and that's why it's annoying to test when train time is that long ahhhhhh
but i will be trying 512, 16/8 see how it goes thank again! and i too hope you share your results always nice to see results

First trying train on finetuned anima checkpoint.
I usually trying for 1 epoch with repeats was ~100/batch size steps, so usually on 15-18 Epochs getting much better results than it was on Illustrous.

always train on Base model for best results and compatibility
also i doubt there will be any speed benefit which is what i am currently suffering from,
also what do you mean 1 epoch and then better results on epoch 15? i am sorry i don't understand

i managed to train characters and styles fairly quick, characters are 40 mins and styles are 20 mins. The main saver for me was dropping the rank and steps. Characters is 1000 steps, 16/8, 512, adamw8bit. Styles is 400 steps, 4/2, 512, adamw8bit. Doing this on a 3050 8gb and happy with the results. 32gb internal ram, ssd. If this place had dms I'd share some of my results so you can stick 'em in your local gen and test them, alongside uploading to metadata viewers so you can verify the results.

i forgot i made the post whops, thanks for the info! i also was testing with 500 to 750 steps when anima came out but i was using 1024x1024, currently i am doing 768 for character and 1024 for character closeups i wasted 6 hour of two sessions training on 1500 steps today and last night and that's why it's annoying to test when train time is that long ahhhhhh
but i will be trying 512, 16/8 see how it goes thank again! and i too hope you share your results always nice to see results

I'm certainly no expert, but I can tell you right away that when training at 512 resolution, part of the character's head will get cropped out in generations (e.g., with standing + cowboy shot). Maybe the problem is something else, but when I kept the same settings and changed the resolution to 1024, the issue went away.

i managed to train characters and styles fairly quick, characters are 40 mins and styles are 20 mins. The main saver for me was dropping the rank and steps. Characters is 1000 steps, 16/8, 512, adamw8bit. Styles is 400 steps, 4/2, 512, adamw8bit. Doing this on a 3050 8gb and happy with the results. 32gb internal ram, ssd. If this place had dms I'd share some of my results so you can stick 'em in your local gen and test them, alongside uploading to metadata viewers so you can verify the results.

I really need to see the results of your trained loras, that low rank and tiny resolution sound like a recipe for a slop disaster, if your loras are really good with those settings then i've just burned 100 gpu hours with 32/16 rank 1024x for nothing.

i managed to train characters and styles fairly quick, characters are 40 mins and styles are 20 mins. The main saver for me was dropping the rank and steps. Characters is 1000 steps, 16/8, 512, adamw8bit. Styles is 400 steps, 4/2, 512, adamw8bit. Doing this on a 3050 8gb and happy with the results. 32gb internal ram, ssd. If this place had dms I'd share some of my results so you can stick 'em in your local gen and test them, alongside uploading to metadata viewers so you can verify the results.

i forgot i made the post whops, thanks for the info! i also was testing with 500 to 750 steps when anima came out but i was using 1024x1024, currently i am doing 768 for character and 1024 for character closeups i wasted 6 hour of two sessions training on 1500 steps today and last night and that's why it's annoying to test when train time is that long ahhhhhh
but i will be trying 512, 16/8 see how it goes thank again! and i too hope you share your results always nice to see results

I'm certainly no expert, but I can tell you right away that when training at 512 resolution, part of the character's head will get cropped out in generations (e.g., with standing + cowboy shot). Maybe the problem is something else, but when I kept the same settings and changed the resolution to 1024, the issue went away.

yeah i just trained on 512 on all body shots except the eyes 1024 and 1000 steps instead of 1500 steps 16/8 rank instead of 32/16, and while it reduced my time to 1 hour 20 minutes from 3 hours +,
it did get the character right (not the eyes) it's bad at full body / upper body, there is no head crops issues for me but the fingers and poses and angels are bad,
this model is gonna make me tweak with the long train time i can't test stuff and know the right params like i do in illust model
i would say 512 is not the way to go as when anima first released i did 750 steps and it was better than this i can't remember if i did 1024 or 768 but it was not 512 for sure,
i hope some one just shows up and help us lol

....we are on huggingface. This website is literally made for uploading and sharing stuff.

Fair enough. I'm not used to HF. Its just the get files and anima related talk forum to me.

How can training a character have more steps than training a style? Shouldn't it be the opposite? Same thing with network dim and network alpha...

Mayhaps? I'm fairly green on training loras and both of them gave me good results I'm happy with. To my understanding and from what I was reading: Characters have more details you wanna keep so you need a stronger training, styles can be done for less and get the same vibe. I made 2 styles I wanted to mix and I got what I want.

I really need to see the results of your trained loras, that low rank and tiny resolution sound like a recipe for a slop disaster, if your loras are really good with those settings then i've just burned 100 gpu hours with 32/16 rank 1024x for nothing.

Sure. I'll see how I can upload what I'm using and you can give it a peek. I'll upload the one I use rather than the entire thing. Also, as said before, i'm very green so maybe I got extremely lucky, I'm a genius or I did something wrong and I don't see it. If you see any flaws do lemme know, I'm open to learn. Edit: ok i uploaded some stuff. I forgot to mention that the way I did my loras like that is due to my own restrictions, I've got a 3050 rtx with 8gb and the anima stand alone trainer (the software can link if needed) that I use would move some files over to ram 32gb which would slow down things considerably on higher 768 or higher resolutions. Knowing that I can give it a bunch more steps and keep the speed I might consider it for future loras. These loras aren't meant to be a forever thing anyways, they were experiments and projects to learn that have been usable for me.

Even though the training time at [512] is nearly less than five times that at [1024] (roughly the case with the 5090, with the core voltage at 0.925v and overclocked VRAM),
it is still incredibly unbelievable that the 3050 completes style training in about 20 minutes.
And, putting aside the fact that some things at 512 cannot be learned, such as small decorations and part of the textures

Even though the training time at [512] is nearly less than five times that at [1024] (roughly the case with the 5090, with the core voltage at 0.925v and overclocked VRAM),
it is still incredibly unbelievable that the 3050 completes style training in about 20 minutes.
And, putting aside the fact that some things at 512 cannot be learned, such as small decorations and part of the textures

On Anima-Standalone-Trainer, at 1024x 15 images 1,000 steps + flash attention enabled,32/32 dim/alpha, i get 6s/it max totaling about an hour thirty minutes of training time on my 5060 ti 16gb, so however he somehow got it to 30 minutes, it still didn't bake a good enough lora. I do think it's mostly to due with the 512x resolution, though i can't be sure without the rest of the training parameters.

image
This is my training time for a full matrix lokr factor 6 batch 10 on a 350W power limited 4090
The style (186 images) becomes recognizable and "good" before that, but as you can see by the val loss it can obviously learn more than what a single person can capture by looking at results, which is why validation loss is so important, you may get subjectively satisfactory results with less but that's not a good measure, if I change to a lora, specially a low rank one, it will flatten out faster, but at a higher loss value, this is also dependent on the style being trained, the number of images and how consistent the style is, when trying to train a concept with 28 images I had to increase the lokr factor (make it smaller) and decrease training steps because it was quickly overfitting, with all that being said yes, anima is slower to train, I would say larger loras should train faster, thats the whole gist of how nn behave, but you may risk overfitting with fewer images/no augmentation, you should also testalpha=2xrank, as with a low alpha the lora is probably not being trained at full capacity (mostly for high rank), something you can also try is to train multiple resolutions at once or sequentially, I did some tests on SDXL and it works, you can do a quick "pre-train" on 512 and continue training on a higher resolution, but I think this is overkill and doesn't provide much (if any) speedup for these single concept loras that train in <2 hours.
There may be some workarounds but at the end of the day anima just seems to be more expensive to train (which is fair since the results are also much better).

Even though the training time at [512] is nearly less than five times that at [1024] (roughly the case with the 5090, with the core voltage at 0.925v and overclocked VRAM),
it is still incredibly unbelievable that the 3050 completes style training in about 20 minutes.
And, putting aside the fact that some things at 512 cannot be learned, such as small decorations and part of the textures

Unsure if I should take that as a compliment or with worry haha. I did notice some issues myself and I'm having weird fingers and toes but that's either anima being anima (just inpaint them) or the two style loras I use conflicting with eachother (one's at .7 the other is at .5). I did a character lora recently at 768 and it took me an hour and a bit for 1000 steps so I might do it for the styles as well. Also, I'm just working with the hardware I've got. Tbh I'm only doing this because of the "Damn, I miss..." effect in which I started with an sdxl base that had everything and every lora imaginable but anima's missing them so "Damn I miss generating my favorite crossdressing character from a 2007 anime..." or "Damn I miss having my style mix..." . If anyone more savvy is reading this then I have a request: Damn I miss comfycouple :v

On Anima-Standalone-Trainer, at 1024x 15 images 1,000 steps + flash attention enabled,32/32 dim/alpha, i get 6s/it max totaling about an hour thirty minutes of training time on my 5060 ti 16gb, so however he somehow got it to 30 minutes, it still didn't bake a good enough lora. I do think it's mostly to due with the 512x resolution, though i can't be sure without the rest of the training parameters.

Using the anima-standalone-trainer too. I did manage to get it down to almost 3 it/s with these settings. LR/TE 0.00005. Adamw8bit, consine w/ restarts. No warmup steps, weight decay 0.01, seed 42, 1000 steps, save every 250, BF16, 4 persistent data loaders, persistent dataloader workers on, gradient checkpointing, cache latents to disk, cache text encoder outputs to disk. Dataset tab: Resolution 512, batch size 1, gradient accumulation 1, enable aspect ratio bucketing, do not upscale images, num repeats 1. Network tab: 16/8, train unet only. I tend to use ~50 images when ever possible, most likely i end up between 20-30 due to me not wanting to edit text out of images, Kohya_ss used for wd14 captioning. That should be it. This last lora I trained for SymbareAngoramon had flash attention and was pushed to 768, spiked to 6.8gb on my gpu (had youtube on the side) and it took an hour. One of my goals when doing these settings is so my gpu isn't full so it won't start swapping data with the ram and cause major slowdowns.

Sign up or log in to comment