Introduction
LoRA (Low-Rank Adaptation) is the most popular parameter-efficient fine-tuning method in 2026. It allows you to fine-tune large language models using a fraction of the compute and memory required for full fine-tuning. This practical guide walks you through fine-tuning any Hugging Face model using LoRA.
Prerequisites
- Python 3.10+
- NVIDIA GPU with 24GB+ VRAM (RTX 4090, A100, or similar)
- Basic PyTorch knowledge
- A dataset in chat format (JSONL)
Step 1: Environment Setup
pip install torch transformers datasets peft trl accelerate bitsandbytesStep 2: Load the Base Model
import torch
from transformers import AutoTokenizer, AutoModelForCausalLM
model_id = "meta-llama/Llama-4-Maverick-8B"
tokenizer = AutoTokenizer.from_pretrained(model_id)
tokenizer.pad_token = tokenizer.eos_token
model = AutoModelForCausalLM.from_pretrained(
model_id,
torch_dtype=torch.bfloat16,
device_map="auto",
)Step 3: Configure LoRA
from peft import LoraConfig, TaskType
lora_config = LoraConfig(
r=16,
lora_alpha=32,
target_modules=[
"q_proj", "k_proj", "v_proj", "o_proj",
"gate_proj", "up_proj", "down_proj"
],
lora_dropout=0.05,
bias="none",
task_type=TaskType.CAUSAL_LM,
)Understanding LoRA Parameters
- r (rank): Controls adapter capacity. 8-32 works for most tasks.
- lora_alpha: Scaling factor. Usually set to 2x the rank.
- target_modules: Which layers to adapt. Include attention and MLP layers for best results.
- lora_dropout: Regularization. 0.05-0.1 prevents overfitting.
Step 4: Prepare the Dataset
from datasets import load_dataset
dataset = load_dataset("json", data_files="training_data.jsonl")
def format_chat(example):
"""Format into chat template"""
text = tokenizer.apply_chat_template(
example["messages"],
tokenize=False,
add_generation_prompt=False,
)
return {"text": text}
formatted = dataset["train"].map(format_chat)
split = formatted.train_test_split(test_size=0.1)Step 5: Train with SFTTrainer
from trl import SFTTrainer
from transformers import TrainingArguments
training_args = TrainingArguments(
output_dir="./lora-output",
num_train_epochs=3,
per_device_train_batch_size=4,
gradient_accumulation_steps=4,
learning_rate=2e-4,
warmup_ratio=0.1,
lr_scheduler_type="cosine",
logging_steps=10,
save_strategy="epoch",
evaluation_strategy="epoch",
bf16=True,
gradient_checkpointing=True,
optim="paged_adamw_8bit",
)
trainer = SFTTrainer(
model=model,
train_dataset=split["train"],
eval_dataset=split["test"],
peft_config=lora_config,
args=training_args,
dataset_text_field="text",
max_seq_length=2048,
)
trainer.train()Step 6: Merge and Export
# Save LoRA adapter
trainer.model.save_pretrained("./lora-adapter")
# Merge with base model for deployment
from peft import PeftModel
base_model = AutoModelForCausalLM.from_pretrained(
model_id, torch_dtype=torch.bfloat16
)
merged_model = PeftModel.from_pretrained(base_model, "./lora-adapter")
merged_model = merged_model.merge_and_unload()
merged_model.save_pretrained("./merged-model")
tokenizer.save_pretrained("./merged-model")Troubleshooting
- Loss not decreasing: Check data format, try lower learning rate
- OOM errors: Enable gradient checkpointing, reduce batch size, use QLoRA
- Poor output quality: Increase training data, check for data quality issues
- Catastrophic forgetting: Reduce learning rate, train for fewer epochs
Conclusion
LoRA makes fine-tuning accessible on consumer hardware. With proper data preparation and hyperparameter tuning, you can create specialized models that outperform much larger general-purpose models on your specific tasks.
Key Takeaways
- Start with r=16 and adjust based on task complexity
- Data quality is more important than quantity
- Always evaluate on held-out data
- Merge adapters for simplified deployment