Vectara

company

Verified

https://vectara.com

vectara

Activity Feed Request to join this org

AI & ML interests

retrieval augmented generation, grounded generation, large language models, LLMs, question answering, chatbot

Recent Activity

ahmed-d8k updated a dataset 17 days ago

vectara/results

ofermend updated a Space 24 days ago

vectara/leaderboard

ofermend updated a Space about 1 month ago

vectara/README

View all activity

ahmed-d8k

updated a dataset 17 days ago

vectara/results

Viewer • Updated 17 days ago • 75 • 1.15k • 1

ofermend

updated a Space 24 days ago

LLM Hallucination Leaderboard

🚀

185

View and filter LLM hallucination leaderboard

ofermend

updated a Space about 1 month ago

README

📚

ahmed-d8k

in vectara/leaderboard 2 months ago

Update app/requirements.txt

#13 opened 2 months ago by

forrestbao

joao-vectara

updated a Space 3 months ago

Clinical Trials Assistant

👨

Clinical Trial assistant using vectara-agentic

ofermend

updated a model 3 months ago

vectara/hallucination_evaluation_model

Text Classification • 0.1B • Updated Oct 20, 2025 • 166k • 338

david-oplatka

in vectara/hallucination_evaluation_model 3 months ago

the max token length for HHEMv2? Token indices error and killed.

#23 opened 3 months ago by

hustzjl

joao-vectara

updated a Space 4 months ago

DigitalBank

🦀

Upload files for processing and interact with AI assistant

ofermend

updated a Space 4 months ago

Hallucination Evaluation Leaderboard

⚡

Redirect to leaderboard page

ofermend

published a Space 4 months ago

Hallucination Evaluation Leaderboard

⚡

Redirect to leaderboard page

stsui96

published a dataset 5 months ago

vectara/hhem_leaderboard_datasets

Viewer • Updated Aug 18, 2025 • 1.97k • 43

stsui96

updated a dataset 5 months ago

vectara/hhem_leaderboard_datasets

Viewer • Updated Aug 18, 2025 • 1.97k • 43

nthakur

authored a paper 5 months ago

BrowseComp-Plus: A More Fair and Transparent Evaluation Benchmark of Deep-Research Agent

Paper • 2508.06600 • Published Aug 8, 2025 • 41

nthakur

authored 3 papers 7 months ago

FreshStack: Building Realistic Benchmarks for Evaluating Retrieval on Technical Documents

Paper • 2504.13128 • Published Apr 17, 2025 • 7

Chatbot Arena Meets Nuggets: Towards Explanations and Diagnostics in the Evaluation of LLM Responses

Paper • 2504.20006 • Published Apr 28, 2025

Fixing Data That Hurts Performance: Cascading LLMs to Relabel Hard Negatives for Robust Information Retrieval

Paper • 2505.16967 • Published May 22, 2025 • 24

clefourrier

posted an update 8 months ago

Post

2130

Always surprised that so few people actually read the FineTasks blog, on
✨how to select training evals with the highest signal✨

If you're serious about training models without wasting compute on shitty runs, you absolutely should read it!!

An high signal eval actually tells you precisely, during training, how wel & what your model is learning, allowing you to discard the bad runs/bad samplings/...!

The blog covers in depth prompt choice, metrics, dataset, across languages/capabilities, and my fave section is "which properties should evals have"👌
(to know on your use case how to select the best evals for you)

Blog: HuggingFaceFW/blogpost-fine-tasks

2 replies

ofermend

posted an update 8 months ago

Post

361

Excited to share open-rag-eval (https://github.com/vectara/open-rag-eval) a new open source project to help scale RAG evaluation. The key benefit: it does not require golden answers so much more scalable.
Would love any thoughts or feedback (or even better - if you want to contribute a PR that would be great).

nthakur

posted an update 9 months ago

Post

1833

Last year, I curated & generated a few multilingual SFT and DPO datasets by translating English SFT/DPO datasets into 9-10 languages using the mistralai/Mistral-7B-Instruct-v0.2 model.

I hope it helps the community for pretraining/instruction tuning multilingual LLMs! I added a small diagram to briefly describe which datasets are added and their sources.

Happy to collaborate in either using these datasets for instruction FT, or wishes to extend translated versions of newer SFT/DPO english datasets!

nthakur/multilingual-sft-and-dpo-datasets-67eaf56fe3feca5a57cf7d74

clefourrier

posted an update 10 months ago

Post

2668

Gemma3 family is out! Reading the tech report, and this section was really interesting to me from a methods/scientific fairness pov.

Instead of doing over-hyped comparisons, they clearly state that **results are reported in a setup which is advantageous to their models**.
(Which everybody does, but people usually don't say)

For a tech report, it makes a lot of sense to report model performance when used optimally!
On leaderboards on the other hand, comparison will be apples to apples, but in a potentially unoptimal way for a given model family (like some user interact sub-optimally with models)

Also contains a cool section (6) on training data memorization rate too! Important to see if your model will output the training data it has seen as such: always an issue for privacy/copyright/... but also very much for evaluation!

Because if your model knows its evals by heart, you're not testing for generalization.

AI & ML interests

Recent Activity

Team members 28

vectara's activity

LLM Hallucination Leaderboard

README

Update app/requirements.txt

Clinical Trials Assistant

the max token length for HHEMv2? Token indices error and killed.

DigitalBank

Hallucination Evaluation Leaderboard

Hallucination Evaluation Leaderboard