mBERT โ€” Algerian Darija Misinformation Detection

Fine-tuned BERT-base-multilingual-cased for detecting misinformation in Algerian Darija text.

  • Base model: bert-base-multilingual-cased (170M parameters)
  • Task: Multi-class text classification (5 classes)
  • Classes: F (Factual), R (Reporting), N (Non-factual), M (Misleading), S (Satire)

Performance (Test set: 3,344 samples)

  • Accuracy: 75.42%
  • Macro F1: 64.48%
  • Weighted F1: 75.70%

Per-class F1:

  • Factual (F): 83.72%
  • Reporting (R): 76.35%
  • Non-factual (N): 81.01%
  • Misleading (M): 61.46%
  • Satire (S): 19.86%

Training Summary

  • Max sequence length: 128
  • Epochs: 3 (early stopping)
  • Batch size: 16
  • Learning rate: 2e-5
  • Loss: Weighted CrossEntropy
  • Seed: 42 (reproducibility)

Usage

import torch
from transformers import AutoTokenizer, AutoModelForSequenceClassification

MODEL_ID = "Rahilgh/model4_1"
tokenizer = AutoTokenizer.from_pretrained(MODEL_ID)
model = AutoModelForSequenceClassification.from_pretrained(MODEL_ID)

device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
model.to(device).eval()

LABEL_MAP = {0: "F", 1: "R", 2: "N", 3: "M", 4: "S"}
LABEL_NAMES = {
    "F": "Factual",
    "R": "Reporting",
    "N": "Non-factual",
    "M": "Misleading",
    "S": "Satire"
}

texts = [
    "ู‚ุงู„ูƒ ุจู„ูŠ ุฑุงูŠุญูŠู† ูŠู†ุญูˆ ุงู„ุจุงูƒ ู‡ุฐุง ุงู„ุนุงู…",
    
]

for text in texts:
    inputs = tokenizer(
        text,
        return_tensors="pt",
        max_length=128,
        truncation=True,
        padding=True,
    ).to(device)

    with torch.no_grad():
        outputs = model(**inputs)
        probs = torch.softmax(outputs.logits, dim=1)[0]
        pred_id = probs.argmax().item()
        confidence = probs[pred_id].item()

    label = LABEL_MAP[pred_id]
    print(f"Text: {text}")
    print(f"Prediction: {LABEL_NAMES[label]} ({label}) โ€” {confidence:.2%}\n")
Downloads last month
19
Safetensors
Model size
0.2B params
Tensor type
F32
ยท
Inference Providers NEW
This model isn't deployed by any Inference Provider. ๐Ÿ™‹ Ask for provider support

Model tree for Rahilgh/model4_1

Finetuned
(931)
this model