BIBFRAME-OLMo 1B

A fine-tuned 1B parameter language model for correcting malformed BIBFRAME RDF/XML to produce valid, well-formed output following Library of Congress conventions.

Model Details

Property	Value
Base Model	amd/AMD-OLMo-1B
Parameters	1.2B
Training	LoRA fine-tuning, merged for deployment
Training Data	~8,500 Library of Congress BIBFRAME records
Task	BIBFRAME RDF/XML correction
License	Apache 2.0

Quick Start

VS Code Extension (Recommended)

The easiest way to use this model is through the BIBFRAME Vibe VS Code extension:

Install the extension from the VS Code marketplace

Configure in VS Code settings:

{
  "bf.huggingFaceModel": "jimfhahn/bibframe-olmo-1b",
  "bf.huggingFaceToken": "hf_your_token_here"
}

Use @bf-vibe /correct in GitHub Copilot Chat to fix BIBFRAME records

Inference Endpoints (Production)

Deploy your own endpoint for production use:

Click Deploy → Inference Endpoints above
Select Text Generation Inference (TGI)
Choose instance: nvidia-t4 (recommended) or cpu-xlarge

Configure in VS Code:

{
  "bf.huggingFaceEndpoint": "https://your-endpoint.us-east-1.aws.endpoints.huggingface.cloud",
  "bf.huggingFaceToken": "hf_your_token_here"
}

Python API

from transformers import pipeline

pipe = pipeline("text-generation", model="jimfhahn/bibframe-olmo-1b")

prompt = (
    "<|im_start|>system\n"
    "You are a BIBFRAME expert. Fix the following malformed RDF/XML "
    "to produce valid BIBFRAME.<|im_end|>\n"
    "<|im_start|>user\n"
    '<rdf:RDF xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#"\n'
    '         xmlns:bf="http://id.loc.gov/ontologies/bibframe/">\n'
    "  <bf:Work>\n"
    "    <bf:title>Example Book</bf:title>\n"
    "  </bf:Work>\n"
    "</rdf:RDF><|im_end|>\n"
    "<|im_start|>assistant\n"
)

result = pipe(prompt, max_new_tokens=1024, temperature=0.1)
print(result[0]["generated_text"])

cURL (Inference API)

curl https://huggingface.co/proxy/router.huggingface.co/hf-inference/models/jimfhahn/bibframe-olmo-1b \
  -X POST \
  -H "Authorization: Bearer $HF_TOKEN" \
  -H "Content-Type: application/json" \
  -d '{
    "inputs": "<|im_start|>system\nFix the BIBFRAME RDF/XML.<|im_end|>\n<|im_start|>user\n<your-rdf-here><|im_end|>\n<|im_start|>assistant\n",
    "parameters": {"max_new_tokens": 1024, "temperature": 0.1}
  }'

What It Fixes

This model corrects common BIBFRAME errors:

❌ Missing required properties (bf:title, bf:adminMetadata)
❌ Wrong namespace prefixes (bibframe: → bf:)
❌ Literal values where resources expected
❌ Missing rdf:type declarations
❌ Invalid property nesting
❌ Malformed URIs

Prompt Format

The model was trained on ChatML format. Use these exact tokens:

<|im_start|>system
You are a BIBFRAME expert. Fix the following malformed RDF/XML to produce valid BIBFRAME.<|im_end|>
<|im_start|>user
[Your invalid RDF/XML here]<|im_end|>
<|im_start|>assistant

Note: The <|im_start|> / <|im_end|> tokens are required. Using other formats (e.g., <|system|>) will produce poor results.

Training Data

Trained on jimfhahn/bibframe-corrections:

Source: Library of Congress (id.loc.gov)
Records: ~4,100 Works + ~5,000 Instances
Diversity: 102 facets (subjects, languages, time periods, formats, genres)
Method: Synthetic corruptions → model learns to restore valid RDF/XML

Limitations

Trained exclusively on Library of Congress BIBFRAME; may not generalize to other implementations
Cannot fix semantic errors (wrong subject headings), only structural/syntactic issues
Large RDF documents may exceed context length (4096 tokens)
Recommendation: Validate output with SHACL shapes before production use

Ecosystem

Project	Description
BIBFRAME Vibe	VS Code extension for BIBFRAME cataloging
mcp4rdf-core	SHACL validation service
bibframe-corrections	Training dataset
bibframe-olmo-1b-v2	Original LoRA adapter

Citation

@misc{bibframe-olmo-2026,
  author = {Hahn, Jim},
  title = {BIBFRAME-OLMo-1B: Fine-tuned OLMo for BIBFRAME Correction},
  year = {2026},
  publisher = {HuggingFace},
  url = {https://huggingface.co/jimfhahn/bibframe-olmo-1b}
}

License

Apache 2.0

Downloads last month: -

Safetensors

Model size

1B params

Tensor type

F16

Model tree for jimfhahn/bibframe-olmo-1b

Base model

amd/AMD-OLMo-1B

Finetuned

(1)

this model

Quantizations

1 model

jimfhahn
/

bibframe-olmo-1b