AIDO.RNA-1.6B (HF-loadable)
This repository re-packages the weights of genbio-ai/AIDO.RNA-1.6B (Apache-2.0) so that the model can be loaded directly via AutoModel.from_pretrained(..., trust_remote_code=True). The architecture and weights are unchanged — only the tokenizer plumbing and an auto_map entry in config.json have been added, along with a bundled modeling_aido.py / configuration_aido.py / tokenization_aido.py so HuggingFace's remote-code mechanism can discover them.
Source: https://github.com/genbio-ai/modelgenerator (Apache-2.0).
Quickstart
from transformers import AutoModel, AutoTokenizer
model = AutoModel.from_pretrained("EscheWang/AIDO.RNA-1.6B-hf", trust_remote_code=True)
tokenizer = AutoTokenizer.from_pretrained("EscheWang/AIDO.RNA-1.6B-hf", trust_remote_code=True)
# NOTE: the tokenizer is a BertWordPieceLowerCase variant that expects
# space-separated RNA characters (one token per whitespace unit).
enc = tokenizer("A C G U A C G U", return_tensors="pt", add_special_tokens=True)
out = model(enc.input_ids.int(), attention_mask=enc.attention_mask)
print(out.last_hidden_state.shape) # (1, seq_len + 2, 2048)
Notes
- Parameters: ~1.6B.
- Hidden size: 2048; 32 layers; 32 attention heads; intermediate size 5440; RoPE positional embedding; LayerNorm; SwiGLU.
- Max position embeddings: 1024.
- Tokenizer vocab is 16 tokens (RNA alphabet + specials). Inputs MUST be space-separated (e.g.
"A C G U"), not a contiguous string. - Weight shards keep the original
bert.prefix from the upstream checkpoint; thebase_model_prefixinmodeling_aido.pyis set to"bert"so HuggingFace'sfrom_pretrainedauto-strip correctly loads them intoAidoRnaModel.
License
Apache-2.0, matching the upstream source release.
- Downloads last month
- 18