AIDO.RNA-1.6B (HF-loadable)

This repository re-packages the weights of genbio-ai/AIDO.RNA-1.6B (Apache-2.0) so that the model can be loaded directly via AutoModel.from_pretrained(..., trust_remote_code=True). The architecture and weights are unchanged — only the tokenizer plumbing and an auto_map entry in config.json have been added, along with a bundled modeling_aido.py / configuration_aido.py / tokenization_aido.py so HuggingFace's remote-code mechanism can discover them.

Source: https://github.com/genbio-ai/modelgenerator (Apache-2.0).

Quickstart

from transformers import AutoModel, AutoTokenizer

model = AutoModel.from_pretrained("EscheWang/AIDO.RNA-1.6B-hf", trust_remote_code=True)
tokenizer = AutoTokenizer.from_pretrained("EscheWang/AIDO.RNA-1.6B-hf", trust_remote_code=True)

# NOTE: the tokenizer is a BertWordPieceLowerCase variant that expects
# space-separated RNA characters (one token per whitespace unit).
enc = tokenizer("A C G U A C G U", return_tensors="pt", add_special_tokens=True)
out = model(enc.input_ids.int(), attention_mask=enc.attention_mask)
print(out.last_hidden_state.shape)  # (1, seq_len + 2, 2048)

Notes

Parameters: ~1.6B.
Hidden size: 2048; 32 layers; 32 attention heads; intermediate size 5440; RoPE positional embedding; LayerNorm; SwiGLU.
Max position embeddings: 1024.
Tokenizer vocab is 16 tokens (RNA alphabet + specials). Inputs MUST be space-separated (e.g. "A C G U"), not a contiguous string.
Weight shards keep the original bert. prefix from the upstream checkpoint; the base_model_prefix in modeling_aido.py is set to "bert" so HuggingFace's from_pretrained auto-strip correctly loads them into AidoRnaModel.

License

Apache-2.0, matching the upstream source release.

Downloads last month: 18

Safetensors

Model size

2B params

Tensor type

F32