Post
O(1) inference is the foundational design of Spartacus-1B-Instruct š”ļø !
NoesisLab/Spartacus-1B-Instruct
We have successfully replaced the KV-cache bottleneck inherent in Softmax Attention with Causal Monoid State Compression. By defining the causal history as a monoid recurrence, , the entire prefix is lossily compressed into a fixed-size state matrix per head.
The technical core of this architecture relies on the associativity of the monoid operator:
Training: parallel prefix scan using Triton-accelerated JIT kernels to compute all prefix states simultaneously.
Inference: True sequential updates. Memory and time complexity per token are decoupled from sequence length.
Explicit Causality: We discard RoPE and attention masks. Causality is a first-class citizen, explicitly modeled through learned, content-dependent decay gates.
Current zero-shot benchmarks demonstrate that Spartacus-1B-Instruct (1.3B) is already outperforming established sub-quadratic models like Mamba-1.4B and RWKV-6-1.6B on ARC-Challenge (0.3063). Recent integration of structured Chain-of-Thought (CoT) data has further pushed reasoning accuracy to 75%.
The "Spartacus" era is about scaling intelligence, not the memory wall ā¾ļø.
NoesisLab/Spartacus-1B-Instruct
We have successfully replaced the KV-cache bottleneck inherent in Softmax Attention with Causal Monoid State Compression. By defining the causal history as a monoid recurrence, , the entire prefix is lossily compressed into a fixed-size state matrix per head.
The technical core of this architecture relies on the associativity of the monoid operator:
Training: parallel prefix scan using Triton-accelerated JIT kernels to compute all prefix states simultaneously.
Inference: True sequential updates. Memory and time complexity per token are decoupled from sequence length.
Explicit Causality: We discard RoPE and attention masks. Causality is a first-class citizen, explicitly modeled through learned, content-dependent decay gates.
Current zero-shot benchmarks demonstrate that Spartacus-1B-Instruct (1.3B) is already outperforming established sub-quadratic models like Mamba-1.4B and RWKV-6-1.6B on ARC-Challenge (0.3063). Recent integration of structured Chain-of-Thought (CoT) data has further pushed reasoning accuracy to 75%.
The "Spartacus" era is about scaling intelligence, not the memory wall ā¾ļø.