railroad-engineering-bert

Fine-tuned bert-base-uncased for railroad DAS signal condition classification.

Model Description

Classifies text descriptions of Distributed Acoustic Sensing (DAS) fiber-optic signals into 4 condition classes from the CNN-LSTM-SW research paper (Rahman et al., Elsevier GEITS 2024).

Task: Text classification β†’ {NC, TP, AC1, AC2}

Condition Classes

Label Name Description
NC Normal Condition Background rail/environmental noise (~93% of data)
TP Train Position Acoustic signal from passing train
AC1 Anomaly Class 1 Light defect β€” wheel flat, minor surface irregularity
AC2 Anomaly Class 2 Heavy defect β€” rail joint, structural anomaly

Performance

Metric Value
Test Accuracy 100%
Macro F1 1.00
Macro Precision 1.00
Macro Recall 1.00
NC F1 1.00 (11/11 correct)
TP F1 1.00 (10/10 correct)
AC1 F1 1.00 (8/8 correct)
AC2 F1 1.00 (7/7 correct)

Perfect confusion matrix β€” zero misclassifications across all 4 classes on the test set (36 samples).

Confusion Matrix:
[[11  0  0  0]   ← NC:  11/11 βœ“
 [ 0 10  0  0]   ← TP:  10/10 βœ“
 [ 0  0  8  0]   ← AC1:  8/8  βœ“
 [ 0  0  0  7]]  ← AC2:  7/7  βœ“

Trained on T4 GPU (Google Colab free tier) in ~15 minutes.

Note on evaluation: Results are on synthetic test data generated from published feature tables (Rahman et al., Elsevier GEITS 2024). The clean pattern separation in synthetic descriptions accounts for the perfect score. On real noisy HTL loop DAS signals, expected performance is 94–97% β€” consistent with the CNN-LSTM-SW paper's published results. This model serves as a text-based classification demo grounded in real research findings.

Usage

from transformers import pipeline

classifier = pipeline(
    "text-classification",
    model="arifme071/railroad-engineering-bert"
)

result = classifier(
    "Sharp amplitude spike at 2,847m with spectral centroid drop "
    "and elevated kurtosis β€” consistent with rail joint signature"
)
# [{'label': 'AC2', 'score': 0.94}]

Training Data

Synthetic text descriptions generated from feature tables and experimental results in: Rahman MA, Jamal S, Taheri H. "Remote condition monitoring of rail tracks using distributed acoustic sensing (DAS): A deep CNN-LSTM-SW based model." Green Energy and Intelligent Transportation, Elsevier, 2024. DOI: 10.1016/j.geits.2024.100178

Training Configuration

  • Base model: bert-base-uncased
  • Epochs: 5
  • Batch size: 32
  • Learning rate: 2e-5
  • Warmup ratio: 0.1
  • Hardware: T4 GPU (Google Colab free tier)
  • Training time: ~15 minutes

Author

Md Arifur Rahman PIN Fellow Β· Georgia Tech | MSc Applied Engineering Β· Georgia Southern University

Scholar GitHub

Citation

@article{rahman2024railroad,
  title={Remote condition monitoring of rail tracks using distributed 
         acoustic sensing (DAS): A deep CNN-LSTM-SW based model},
  author={Rahman, Md Arifur and Jamal, S and Taheri, Hossein},
  journal={Green Energy and Intelligent Transportation},
  volume={3}, number={5}, pages={100178}, year={2024},
  publisher={Elsevier},
  doi={10.1016/j.geits.2024.100178}
}
Downloads last month

-

Downloads are not tracked for this model. How to track
Inference Providers NEW
This model isn't deployed by any Inference Provider. πŸ™‹ Ask for provider support

Model tree for arifme071/railroad-engineering-bert

Finetuned
(6682)
this model