>In practice this means you can fine tune a 30B parameter model on a consumer GP... | Hacker News

Hacker Newsnew | past | comments | ask | show | jobs | submit

		arugulum on March 24, 2023 \| parent \| context \| favorite \| on: LoRA: Low-Rank Adaptation of Large Language Models >In practice this means you can fine tune a 30B parameter model on a consumer GPU in a couple of hours. Consumer GPU, yes, but in practice LoRA doesn't actually reduce training time. What it mainly reduces is memory requirements. In fact LoRA training can often require more training steps than full fine-tuning and therefore be slower (you can imagine why this is the case: the optimization is trying to modify the mode's behavior a smaller number of parameters, and so has a harder job)

MacsHeadroom on March 26, 2023 [–]

Modern peft methods with LoRA actually do reduce training time by orders of magnitude.

Here's an example of 20 seconds per epoch on a single consumer GPU: https://github.com/johnsmith0031/alpaca_lora_4bit/issues/7#i...

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact