ReciFineGold RecipeRoBERTa (Knowledge-Augmented and Entity Type-Specific (KAES) Token Classification with Entity Type Knowledge Type) recifinegold-reciperoberta-ka-entity-type

This model is a RoBERTa-base token-classification model trained on ReciFineGold for recipe-focused Named Entity Recognition (NER) using a Knowledge-Augmented & Entity Type-Specific (KAES) formulation.

In KAES, the model extracts one entity type at a time by prepending a curated knowledge context (here: a natural-language entity type prompt) to the input recipe sentence. This encourages the encoder to focus on spans relevant to the requested entity type, rather than relying only on the sentence’s internal context.

What this checkpoint is

  • Backbone: RoBERTa-base
  • Task: Token classification (BIO-style tagging)
  • Formulation: Knowledge-Augmented + Entity Type-Specific (single entity type per run)
  • Knowledge type: Entity Type prompt (e.g., “FOOD STATE”)

How to use

The ReciFine library provides a lightweight inference wrapper (ReciFineNER) that handles the KAES prompting and decoding.

# Install ReciFine (see repo for latest install instructions)
pip install https://github.com/nuhu-ibrahim/ReciFine/archive/refs/tags/V1.zip

from recifine.inferencing.inference import ReciFineNER

ner = ReciFineNER.from_pretrained(
    model="reciperoberta",
    task_formulation="knowledge_guided",
    knowledge_type="entity_type",
)

text = "Add 2 cups of chopped onions and fry until golden."
prediction = ner.process_text(text, entity_type="QUANTITY")

print(prediction)

Quick links (documentation + notebook)

Intended use

Use this model to extract fine-grained recipe entities from procedural recipe text (e.g., instructions), including ingredients, tools, quantities, durations, actions, and state descriptors.

Typical applications:

  • Structured parsing of recipe steps
  • Ingredient and action extraction for downstream cooking assistants
  • Data normalisation and indexing for recipe search
  • Entity-aware prompting and evaluation pipelines for recipe generation

Knowledge-Augmented & Entity Type-Specific classification (KAES)

Unlike traditional token classification pipelines that rely solely on internal sentence context, we adopt a KA formulation that prepends curated knowledge contexts to the input to guide the model toward the entities of the relevant entity type.

Let the token sequence for a recipe sentence be denoted as:

x = {x1, x2, ..., xn}

and let the knowledge context associated with a particular entity type Ej be represented as a natural language context:

pj = {p1(j), ..., pm(j)}

Each context pj belongs to one of five types (question type, definition type, examples type, entity type name type, and combined type).

We construct the augmented input to the encoder as:

x̃ = {[CLS], pj, [SEP], x1, ..., xn, [SEP]}

where m is the length of the context and n is the length of the recipe tokens. The full sequence is passed through a transformer encoder (e.g., BERT or RoBERTa), and a feedforward classification layer predicts BIO tags for each token.

This setup introduces a form of entity type conditioning, enabling the model to modulate its attention and token representations based on the target entity type. Unlike multi-entity type classification with shared label spaces, our formulation uses a single encoder across all entity types but treats each classification task independently.

Supported knowledge types (KAES)

  1. Entity Type Name (entity_type): A plain directive that names the entity type, e.g., FOOD STATE.
  2. Question Prompt (question): A natural question about the entity type, e.g., Which words describe the state of the food?.
  3. Example Type (example): A list of entity examples, e.g., melted, frozen, chopped.
  4. Definitional Sentence (definition): A brief definition of the entity type, e.g., A STATE describes the physical condition of an ingredient.
  5. Combined Type (all): A concatenation of all the above, providing the richest context.

Supported entity types

Entity Type Definition
FOOD Edible items, including both raw ingredients and intermediate products
TOOL Cooking tools such as knives, bowls, pans
DURATION Time durations in cooking (e.g., 20 minutes)
QUANTITY Quantities associated with ingredients
ACTION_BY_CHEF Verbs for deliberate cook actions (e.g., bring in “Bring the mixture to a boil”)
ACTION_BY_CHEF_DISCONTINUOUS Non-contiguous parts of compound chef actions (e.g., to a boil)
ACTION_BY_FOOD Verbs where food is the agent (e.g., melt, boil)
ACTION_BY_TOOL Verbs denoting tool actions (e.g., grind, beat)
FOOD_STATE Descriptions of food’s state (e.g., chopped, soft)
TOOL_STATE Descriptions of tool readiness (e.g., preheated, greased, covered)

Research Papers

Knowledge-Augmented and Entity Type-Specific Token Classification

The knowledge-augmented and entity type-specific token classification model architecture is described in the paper Knowledge Augmentation Enhances Token Classification for Recipe Understanding.

@inproceedings{
  title     = {Knowledge Augmentation Enhances Token Classification for Recipe Understanding},
  author    = {Ibrahim, Nuhu and Stevens, Robert and Batista-Navarro, Riza},
  booktitle = {EACL},
  year      = {2026}
}

ReciFine Datasets and Controllable Recipe Generation

The ReciFine, ReciFineGold and ReciFineGen datasets are described in the paper ReciFine: Finely Annotated Recipe Dataset for Controllable Recipe Generation.

@inproceedings{
  title   = {ReciFine: Finely Annotated Recipe Dataset for Controllable Recipe Generation},
  author  = {Ibrahim, Nuhu and Ravikumar, Rishi and Stevens, Robert and Batista-Navarro, Riza},
  booktitle = {EACL},
  year    = {2026}
}
Downloads last month
3
Safetensors
Model size
0.1B params
Tensor type
F32
·
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for nuhuibrahim/recifinegold-reciperoberta-ka-entity-type

Finetuned
(2198)
this model

Dataset used to train nuhuibrahim/recifinegold-reciperoberta-ka-entity-type

Collection including nuhuibrahim/recifinegold-reciperoberta-ka-entity-type