railroad-engineering-bert
Fine-tuned bert-base-uncased for railroad DAS signal condition classification.
Model Description
Classifies text descriptions of Distributed Acoustic Sensing (DAS) fiber-optic signals into 4 condition classes from the CNN-LSTM-SW research paper (Rahman et al., Elsevier GEITS 2024).
Task: Text classification β {NC, TP, AC1, AC2}
Condition Classes
| Label | Name | Description |
|---|---|---|
| NC | Normal Condition | Background rail/environmental noise (~93% of data) |
| TP | Train Position | Acoustic signal from passing train |
| AC1 | Anomaly Class 1 | Light defect β wheel flat, minor surface irregularity |
| AC2 | Anomaly Class 2 | Heavy defect β rail joint, structural anomaly |
Performance
| Metric | Value |
|---|---|
| Test Accuracy | 100% |
| Macro F1 | 1.00 |
| Macro Precision | 1.00 |
| Macro Recall | 1.00 |
| NC F1 | 1.00 (11/11 correct) |
| TP F1 | 1.00 (10/10 correct) |
| AC1 F1 | 1.00 (8/8 correct) |
| AC2 F1 | 1.00 (7/7 correct) |
Perfect confusion matrix β zero misclassifications across all 4 classes on the test set (36 samples).
Confusion Matrix:
[[11 0 0 0] β NC: 11/11 β
[ 0 10 0 0] β TP: 10/10 β
[ 0 0 8 0] β AC1: 8/8 β
[ 0 0 0 7]] β AC2: 7/7 β
Trained on T4 GPU (Google Colab free tier) in ~15 minutes.
Note on evaluation: Results are on synthetic test data generated from published feature tables (Rahman et al., Elsevier GEITS 2024). The clean pattern separation in synthetic descriptions accounts for the perfect score. On real noisy HTL loop DAS signals, expected performance is 94β97% β consistent with the CNN-LSTM-SW paper's published results. This model serves as a text-based classification demo grounded in real research findings.
Usage
from transformers import pipeline
classifier = pipeline(
"text-classification",
model="arifme071/railroad-engineering-bert"
)
result = classifier(
"Sharp amplitude spike at 2,847m with spectral centroid drop "
"and elevated kurtosis β consistent with rail joint signature"
)
# [{'label': 'AC2', 'score': 0.94}]
Training Data
Synthetic text descriptions generated from feature tables and experimental results in: Rahman MA, Jamal S, Taheri H. "Remote condition monitoring of rail tracks using distributed acoustic sensing (DAS): A deep CNN-LSTM-SW based model." Green Energy and Intelligent Transportation, Elsevier, 2024. DOI: 10.1016/j.geits.2024.100178
Training Configuration
- Base model:
bert-base-uncased - Epochs: 5
- Batch size: 32
- Learning rate: 2e-5
- Warmup ratio: 0.1
- Hardware: T4 GPU (Google Colab free tier)
- Training time: ~15 minutes
Author
Md Arifur Rahman PIN Fellow Β· Georgia Tech | MSc Applied Engineering Β· Georgia Southern University
Citation
@article{rahman2024railroad,
title={Remote condition monitoring of rail tracks using distributed
acoustic sensing (DAS): A deep CNN-LSTM-SW based model},
author={Rahman, Md Arifur and Jamal, S and Taheri, Hossein},
journal={Green Energy and Intelligent Transportation},
volume={3}, number={5}, pages={100178}, year={2024},
publisher={Elsevier},
doi={10.1016/j.geits.2024.100178}
}
Model tree for arifme071/railroad-engineering-bert
Base model
google-bert/bert-base-uncased