Fine-Tuning Any LLM with LoRA: A Practical Guide

Introduction

LoRA (Low-Rank Adaptation) is the most popular parameter-efficient fine-tuning method in 2026. It allows you to fine-tune large language models using a fraction of the compute and memory required for full fine-tuning. This practical guide walks you through fine-tuning any Hugging Face model using LoRA.

Prerequisites

Python 3.10+
NVIDIA GPU with 24GB+ VRAM (RTX 4090, A100, or similar)
Basic PyTorch knowledge
A dataset in chat format (JSONL)

Step 1: Environment Setup

bash

pip install torch transformers datasets peft trl accelerate bitsandbytes

Step 2: Load the Base Model

python

import torch
from transformers import AutoTokenizer, AutoModelForCausalLM

model_id = "meta-llama/Llama-4-Maverick-8B"

tokenizer = AutoTokenizer.from_pretrained(model_id)
tokenizer.pad_token = tokenizer.eos_token

model = AutoModelForCausalLM.from_pretrained(
    model_id,
    torch_dtype=torch.bfloat16,
    device_map="auto",
)

Step 3: Configure LoRA

python

from peft import LoraConfig, TaskType

lora_config = LoraConfig(
    r=16,
    lora_alpha=32,
    target_modules=[
        "q_proj", "k_proj", "v_proj", "o_proj",
        "gate_proj", "up_proj", "down_proj"
    ],
    lora_dropout=0.05,
    bias="none",
    task_type=TaskType.CAUSAL_LM,
)

Understanding LoRA Parameters

r (rank): Controls adapter capacity. 8-32 works for most tasks.
lora_alpha: Scaling factor. Usually set to 2x the rank.
target_modules: Which layers to adapt. Include attention and MLP layers for best results.
lora_dropout: Regularization. 0.05-0.1 prevents overfitting.

Step 4: Prepare the Dataset

python

from datasets import load_dataset

dataset = load_dataset("json", data_files="training_data.jsonl")

def format_chat(example):
    """Format into chat template"""
    text = tokenizer.apply_chat_template(
        example["messages"],
        tokenize=False,
        add_generation_prompt=False,
    )
    return {"text": text}

formatted = dataset["train"].map(format_chat)
split = formatted.train_test_split(test_size=0.1)

Step 5: Train with SFTTrainer

python

from trl import SFTTrainer
from transformers import TrainingArguments

training_args = TrainingArguments(
    output_dir="./lora-output",
    num_train_epochs=3,
    per_device_train_batch_size=4,
    gradient_accumulation_steps=4,
    learning_rate=2e-4,
    warmup_ratio=0.1,
    lr_scheduler_type="cosine",
    logging_steps=10,
    save_strategy="epoch",
    evaluation_strategy="epoch",
    bf16=True,
    gradient_checkpointing=True,
    optim="paged_adamw_8bit",
)

trainer = SFTTrainer(
    model=model,
    train_dataset=split["train"],
    eval_dataset=split["test"],
    peft_config=lora_config,
    args=training_args,
    dataset_text_field="text",
    max_seq_length=2048,
)

trainer.train()

Step 6: Merge and Export

python

# Save LoRA adapter
trainer.model.save_pretrained("./lora-adapter")

# Merge with base model for deployment
from peft import PeftModel

base_model = AutoModelForCausalLM.from_pretrained(
    model_id, torch_dtype=torch.bfloat16
)
merged_model = PeftModel.from_pretrained(base_model, "./lora-adapter")
merged_model = merged_model.merge_and_unload()

merged_model.save_pretrained("./merged-model")
tokenizer.save_pretrained("./merged-model")

Troubleshooting

Loss not decreasing: Check data format, try lower learning rate
OOM errors: Enable gradient checkpointing, reduce batch size, use QLoRA
Poor output quality: Increase training data, check for data quality issues
Catastrophic forgetting: Reduce learning rate, train for fewer epochs

Conclusion

LoRA makes fine-tuning accessible on consumer hardware. With proper data preparation and hyperparameter tuning, you can create specialized models that outperform much larger general-purpose models on your specific tasks.

Key Takeaways

Start with r=16 and adjust based on task complexity
Data quality is more important than quantity
Always evaluate on held-out data
Merge adapters for simplified deployment

Fine-Tuning Any LLM with LoRA: A Practical Guide

Introduction

Prerequisites

Step 1: Environment Setup

Step 2: Load the Base Model

Step 3: Configure LoRA

Understanding LoRA Parameters

Step 4: Prepare the Dataset

Step 5: Train with SFTTrainer

Step 6: Merge and Export

Troubleshooting

Conclusion

Key Takeaways

Related Articles

Fine-Tuning LLMs in 2026: Best Practices and Common Pitfalls

How to Deploy Llama 4 Locally with vLLM in 30 Minutes

Meta Open-Sources Llama 4 — 400B Parameter Model for Everyone

Stay Ahead in AI