PIT-4B-FT β€” Point-In-Time GPT (Fine-tuned, 2016-12)

Point-In-Time (PIT) is a family of GPT-style language models trained on chronologically-ordered monthly snapshots of FineWeb. Each checkpoint captures the state of knowledge available up to a specific month, making them suitable for temporal reasoning and point-in-time analysis tasks.

This is the instruction-tuned variant. LoRA adapters were trained on instruction data and merged back into the base weights.

Model details

Property Value
Snapshot month 2016-12
Training step unknown
Architecture Decoder-only Transformer (GPT)
Layers 20
Hidden dim 4096
Attention heads 32
Vocab size 50304
Tokenizer GPT-2 BPE
Position encoding RoPE
Normalization RMSNorm on Q/K + pre-norm
Activation Squared ReLU
Weight tying Yes (input emb ↔ lm_head)

Requirements

pip install transformers torch safetensors

Quick start

import torch
from transformers import AutoTokenizer, AutoModelForCausalLM

repo_id = "Diamegs/PIT-4B-FT-201612"

tokenizer = AutoTokenizer.from_pretrained(repo_id)
model = AutoModelForCausalLM.from_pretrained(
    repo_id,
    trust_remote_code=True,   # required for custom architecture
    torch_dtype=torch.bfloat16,
)
model = model.cuda()
model.eval()

Instruction following

The fine-tuned model uses the following chat template:

from transformers import StoppingCriteria, StoppingCriteriaList

class StopOnEndMarker(StoppingCriteria):
    def __init__(self, end_ids):
        self.end_ids = end_ids
        self.n = len(end_ids)
    def __call__(self, input_ids, scores, **kwargs):
        return input_ids[0][-self.n:].tolist() == self.end_ids

end_ids = tokenizer.encode("<|end|>", add_special_tokens=False)
stopping_criteria = StoppingCriteriaList([StopOnEndMarker(end_ids)])

def format_prompt(instruction: str) -> str:
    return f"<|user|>\n{instruction}\n<|assistant|>\n"

prompt = format_prompt("What were the main economic trends in 2016?")
inputs = tokenizer(prompt, return_tensors="pt").to(model.device)
output = model.generate(
    **inputs,
    max_new_tokens=512,
    do_sample=True,
    temperature=0.7,
    top_p=0.9,
    pad_token_id=tokenizer.eos_token_id,
    stopping_criteria=stopping_criteria,
)
n_prompt = inputs["input_ids"].shape[1]
response = tokenizer.decode(output[0][n_prompt:], skip_special_tokens=False)
# Strip the end marker and any trailing whitespace
response = response.split("<|end|>")[0].strip()
print(response)

Temporal reasoning example

Because this model was trained on data up to 2016-12, it reflects the world as it was known at that point. You can use this for point-in-time analysis:

# What does the model "know" about events before its cutoff?
prompt = "The most important AI developments in early 2016 were"
inputs = tokenizer(prompt, return_tensors="pt").to(model.device)
output = model.generate(
    **inputs,
    max_new_tokens=150,
    do_sample=True,
    temperature=0.7,
    top_p=0.9,
    pad_token_id=tokenizer.eos_token_id,
)
n_prompt = inputs["input_ids"].shape[1]
print(tokenizer.decode(output[0][n_prompt:], skip_special_tokens=True))

Weights format

Weights are stored in safetensors format (model.safetensors) β€” memory-mapped, fast to load, and safe (no arbitrary code execution).

Limitations

  • Knowledge is limited to web text available up to 2016-12.
  • No RLHF or safety fine-tuning has been applied.
  • The model may reproduce biases present in FineWeb training data.
  • Not suitable for safety-critical applications without further alignment.

Citation

@misc{pit_llm,
  title     = {Point-In-Time LLM},
  author    = {Diamegs},
  year      = {2024},
  publisher = {HuggingFace},
  url       = {https://huggingface.co/Diamegs}
}
Downloads last month
83
Inference Providers NEW
This model isn't deployed by any Inference Provider. πŸ™‹ Ask for provider support