DziriBERT — Algerian Darija Misinformation Detection
DziriBERT is a fine-tuned XLM-RoBERTa-large model for detecting misinformation in Algerian Darija text from social media and news.
- Base model:
xlm-roberta-large(355M parameters) - Task: Multi-class text classification (5 classes)
- Classes:
- F: Fake
- R: Real
- N: Non-new
- M: Misleading
- S: Satire
Performance (Test set: 3,344 samples)
- Accuracy: 78.32%
- Macro F1: 68.22%
- Weighted F1: 78.43%
Per-class F1:
- Fake (F): 85.04%
- Real (R): 80.44%
- Non-new (N): 83.23%
- Misleading (M): 64.57%
- Satire (S): 27.83%
Training Summary
- Max sequence length: 128
- Epochs: 3 (early stopping)
- Batch size: 8 (effective 16 with gradient accumulation)
- Learning rate: 1e-5
- Loss: Weighted CrossEntropy
- Data augmentation: Applied to minority classes (M, S)
- Seed: 42
Strengths & Limitations
Strengths
- Strong performance on Fake, Real, and Non-new classes
- Handles Darija, Arabic, and French code-switching well
Limitations
- Low performance on Satire due to limited samples
- Misleading class remains challenging
Usage
import os
import torch
from transformers import AutoTokenizer, AutoModelForSequenceClassification
os.environ["USE_TF"] = "0"
os.environ["USE_TORCH"] = "1"
MODEL_ID = "Rahilgh/model4_2"
DEVICE = "cuda" if torch.cuda.is_available() else "cpu"
tokenizer = AutoTokenizer.from_pretrained(MODEL_ID, use_fast=False)
model = AutoModelForSequenceClassification.from_pretrained(MODEL_ID).to(DEVICE)
model.eval()
LABEL_MAP = {0: "F", 1: "R", 2: "N", 3: "M", 4: "S"}
LABEL_NAMES = {
"F": "Fake",
"R": "Real",
"N": "Non-new",
"M": "Misleading",
"S": "Satire",
}
texts = [
"الجزائر فازت ببطولة امم افريقيا 2019",
"صورة زعيم عالمي يرتدي ملابس غريبة تثير السخرية",
]
for text in texts:
inputs = tokenizer(
text,
return_tensors="pt",
max_length=128,
truncation=True,
padding=True,
).to(DEVICE)
with torch.no_grad():
outputs = model(**inputs)
probs = torch.softmax(outputs.logits, dim=1)
pred_id = probs.argmax().item()
confidence = probs[0][pred_id].item()
label = LABEL_MAP[pred_id]
print(f"Text: {text}")
print(f"Prediction: {LABEL_NAMES[label]} ({label}) — {confidence:.2%}")
- Downloads last month
- 41
Model tree for Rahilgh/model4_2
Base model
FacebookAI/xlm-roberta-large