AI & ML interests

Structure based drug discovery

Tonicย 
posted an update 3 days ago
view post
Post
3790
๐Ÿ™‹๐Ÿปโ€โ™‚๏ธ Hey there folks,

since everyone liked my previous announcement post ( https://huggingface.co/posts/Tonic/338509028435394 ) so much , i'm back with more high quality proceedural datasets in the Geospacial domain for SFT training !

Check this one out :
NuTonic/sat-bbox-metadata-sft-v1

the goal is to be able to train vision models on multiple images for remote sensing analysis with one shot .

hope you like it ! ๐Ÿš€
  • 2 replies
ยท
Tonicย 
posted an update 7 days ago
view post
Post
3412
๐Ÿ™‹๐Ÿปโ€โ™‚๏ธ Hey there folks ,

I'm sharing huggingface's largest dataset of annotated statelite images today.

check it out here : NuTonic/sat-image-boundingbox-sft-full

I hope you like it , the idea is to be able to use this with small vision models ๐Ÿš€
Parveshiiiiย 
posted an update 17 days ago
view post
Post
517
๐Ÿš€ Sonic: A lightweight Python audio processing library with tempo matching, BPM detection, time-stretching, resampling & track blending โ€” now with GPU (CUDA) acceleration for 10x speed!

Perfect for quick remixes, batch edits or syncing tracks.

๐Ÿ‘‰ https://github.com/Parveshiiii/Sonic

#Python #AudioProcessing #OpenSource #PyTorch
Parveshiiiiย 
posted an update 24 days ago
view post
Post
1612
Excited to announce my latest open-source release on Hugging Face: Parveshiiii/breast-cancer-detector.

This model has been trained and validated on external datasets to support medical research workflows. It is designed to provide reproducible benchmarks and serve as a foundation for further exploration in healthcare AI.

Key highlights:
- Built for medical research and diagnostic study contexts
- Validated against external datasets for reliability
- Openly available to empower the community in building stronger, more effective solutions

This release is part of my ongoing effort to make impactful AI research accessible through **Modotte**. A detailed blog post explaining the methodology, dataset handling, and validation process will be published soon.

You can explore the model here: Parveshiiii/breast-cancer-detector

#AI #MedicalResearch #DeepLearning #Healthcare #OpenSource #HuggingFace

Parveshiiiiย 
posted an update about 1 month ago
view post
Post
2901
Just did something Iโ€™ve been meaning to try for ages.

In only 3 hours, on 10 billion+ tokens, I trained a custom BPE + tiktoken-style tokenizer using my new library microtok โ€” and it hits the same token efficiency as Qwen3.

Tokenizers have always felt like black magic to me. We drop them into every LLM project, but actually training one from scratch? That always seemed way too complicated.

Turns out it doesnโ€™t have to be.

microtok makes the whole process stupidly simple โ€” literally just 3 lines of code. No heavy setup, no GPU required. I built it on top of the Hugging Face tokenizers library so it stays clean, fast, and actually understandable.

If youโ€™ve ever wanted to look under the hood and build your own optimized vocabulary instead of just copying someone elseโ€™s, this is the entry point youโ€™ve been waiting for.

I wrote up the full story, threw in a ready-to-run Colab template, and dropped the trained tokenizer on Hugging Face.

Blog โ†’ https://parveshiiii.github.io/blogs/microtok/
Trained tokenizer โ†’ https://huggingface.co/Parveshiiii/microtok
GitHub repo โ†’ https://github.com/Parveshiiii/microtok
Tonicย 
posted an update 2 months ago
view post
Post
3681
๐Ÿค” Who would win ?

- a fully subsidized ai lab
OR
- 3 random students named
kurakurai
?

demo : Tonic/fr-on-device

if you like it give the demo a little star and send a shoutout to : @MaxLSB @jddqd and @GAD-cell for absolutely obliterating the pareto frontier of the french language understanding .
  • 4 replies
ยท
Tonicย 
posted an update 2 months ago
view post
Post
3409
๐Ÿ™‹๐Ÿปโ€โ™‚๏ธhello my lovelies ,

it is with great pleasure i present to you my working one-click deploy 16GB ram completely free huggingface spaces deployment.

repo : Tonic/hugging-claw (use git clone to inspect)
literally the one-click link : Tonic/hugging-claw

you can also run it locally and see for yourself :

docker run -it -p 7860:7860 --platform=linux/amd64 \
-e HF_TOKEN="YOUR_VALUE_HERE" \
-e OPENCLAW_GATEWAY_TRUSTED_PROXIES="YOUR_VALUE_HERE" \
-e OPENCLAW_GATEWAY_PASSWORD="YOUR_VALUE_HERE" \
-e OPENCLAW_CONTROL_UI_ALLOWED_ORIGINS="YOUR_VALUE_HERE" \
registry.hf.space/tonic-hugging-claw:latest


just a few quite minor details i'll take care of but i wanted to share here first
  • 2 replies
ยท
Parveshiiiiย 
posted an update 3 months ago
view post
Post
342
Introducing Seekify โ€” a truly nonโ€‘rateโ€‘limiting search library for Python

Tired of hitting rate limits when building search features? Iโ€™ve built Seekify, a lightweight Python library that lets you perform searches without the usual throttling headaches.

๐Ÿ”น Key highlights

- Simple API โ€” plug it in and start searching instantly

- No rateโ€‘limiting restrictions

- Designed for developers who need reliable search in projects, scripts, or apps

๐Ÿ“ฆ Available now on PyPI:

pip install seekify

๐Ÿ‘‰ Check out the repo: https:/github.com/Parveshiiii/Seekify
Iโ€™d love feedback, contributions, and ideas for realโ€‘world use cases. Letโ€™s make search smoother together!
Parveshiiiiย 
posted an update 3 months ago
view post
Post
1643
๐Ÿš€ Wanna train your own AI Model or Tokenizer from scratch?

Building models isnโ€™t just for big labs anymore โ€” with the right data, compute, and workflow, you can create **custom AI models** and **tokenizers** tailored to any domain. Whether itโ€™s NLP, domainโ€‘specific datasets, or experimental architectures, training from scratch gives you full control over vocabulary, embeddings, and performance.

โœจ Why train your own?
- Full control over vocabulary & tokenization
- Domainโ€‘specific optimization (medical, legal, technical, etc.)
- Better performance on niche datasets
- Freedom to experiment with architectures

โšก The best part?
- Tokenizer training (TikToken / BPE) can be done in **just 3 lines of code**.
- Model training runs smoothly on **Google Colab notebooks** โ€” no expensive hardware required.

๐Ÿ“‚ Try out my work:
- ๐Ÿ”— https://github.com/OE-Void/Tokenizer-from_scratch
- ๐Ÿ”— https://github.com/OE-Void/GPT
Parveshiiiiย 
posted an update 3 months ago
view post
Post
264
๐Ÿ“ข The Announcement
Subject: XenArcAI is now Modotte โ€“ A New Chapter Begins! ๐Ÿš€

Hello everyone,

We are thrilled to announce that XenArcAI is officially rebranding to Modotte!

Since our journey began, weโ€™ve been committed to pushing the boundaries of AI through open-source innovation, research, and high-quality datasets. As we continue to evolve, we wanted a name that better represents our vision for a modern, interconnected future in the tech space.

What is changing?

The Name: Moving forward, all our projects, models, and community interactions will happen under the Modotte banner.

The Look: Youโ€™ll see our new logo and a fresh color palette appearing across our platforms.

What is staying the same?

The Core Team: Itโ€™s still the same people behind the scenes, including our founder, Parvesh Rawal.

Our Mission: We remain dedicated to releasing state-of-the-art open-source models and datasets.

Our Continuity: All existing models, datasets, and projects will remain exactly as they areโ€”just with a new home.

This isnโ€™t just a change in appearance; itโ€™s a commitment to our next chapter of growth and discovery. We are so grateful for your ongoing support as we step into this new era.

Welcome to the future. Welcome to Modotte.

Best regards, The Modotte Team
Parveshiiiiย 
posted an update 4 months ago
view post
Post
3598
Hey everyone!
Weโ€™re excited to introduce our new Telegram group: https://t.me/XenArcAI

This space is built for **model builders, tech enthusiasts, and developers** who want to learn, share, and grow together. Whether youโ€™re just starting out or already deep into AI/ML, youโ€™ll find a supportive community ready to help with knowledge, ideas, and collaboration.

๐Ÿ’ก Join us to:
- Connect with fellow developers and AI enthusiasts
- Share your projects, insights, and questions
- Learn from others and contribute to a growing knowledge base

๐Ÿ‘‰ If youโ€™re interested, hop in and be part of the conversation: https://t.me/XenArcAI
  • 12 replies
ยท
Parveshiiiiย 
posted an update 5 months ago
view post
Post
1670
Another banger from XenArcAI! ๐Ÿ”ฅ

Weโ€™re thrilled to unveil three powerful new releases that push the boundaries of AI research and development:

๐Ÿ”— https://huggingface.co/XenArcAI/SparkEmbedding-300m

- A lightning-fast embedding model built for scale.
- Optimized for semantic search, clustering, and representation learning.

๐Ÿ”— https://huggingface.co/datasets/XenArcAI/CodeX-7M-Non-Thinking

- A massive dataset of 7 million code samples.
- Designed for training models on raw coding patterns without reasoning layers.

๐Ÿ”— https://huggingface.co/datasets/XenArcAI/CodeX-2M-Thinking

- A curated dataset of 2 million code samples.
- Focused on reasoning-driven coding tasks, enabling smarter AI coding assistants.

Together, these projects represent a leap forward in building smarter, faster, and more capable AI systems.

๐Ÿ’ก Innovation meets dedication.
๐ŸŒ Knowledge meets responsibility.


Parveshiiiiย 
posted an update 6 months ago
view post
Post
3065
SparkEmbedding - SoTA cross lingual retrieval

Iam very happy to announce our latest embedding model sparkembedding-300m base on embeddinggemma-300m we fine tuned it on 1m extra examples spanning over 119 languages and result is this model achieves exceptional cross lingual retrieval

Model: https://huggingface.co/XenArcAI/SparkEmbedding-300m
Parveshiiiiย 
posted an update 7 months ago
view post
Post
232
AIRealNet - SoTA - Image detection model

Weโ€™re proud to release AIRealNet โ€” a binary image classifier built to detect whether an image is AI-generated or a real human photograph. Based on SwinV2 and fine-tuned on the AI-vs-Real dataset, this model is optimized for high-accuracy classification across diverse visual domains.

If you care about synthetic media detection or want to explore the frontier of AI vs human realism, weโ€™d love your support. Please like the model and try it out. Every download helps us improve and expand future versions.

Model page: https://huggingface.co/XenArcAI/AIRealNet
Parveshiiiiย 
posted an update 7 months ago
view post
Post
4511
Ever wanted an openโ€‘source deep research agent? Meet Deepresearchโ€‘Agent ๐Ÿ”๐Ÿค–

1. Multiโ€‘step reasoning: Reflects between steps, fills gaps, iterates until evidence is solid.

2. Researchโ€‘augmented: Generates queries, searches, synthesizes, and cites sources.

3. Fullstack + LLMโ€‘friendly: React/Tailwind frontend, LangGraph/FastAPI backend; works with OpenAI/Gemini.


๐Ÿ”— GitHub: https://github.com/Parveshiiii/Deepresearch-Agent
Parveshiiiiย 
posted an update 7 months ago
view post
Post
3126
๐Ÿš€ Big news from XenArcAI!

Weโ€™ve just released our new dataset: **Bhagwatโ€‘Gitaโ€‘Infinity** ๐ŸŒธ๐Ÿ“–

โœจ Whatโ€™s inside:
- Verseโ€‘aligned Sanskrit, Hindi, and English
- Clean, structured, and ready for ML/AI projects
- Perfect for research, education, and openโ€‘source exploration

๐Ÿ”— Hugging Face: https://huggingface.co/datasets/XenArcAI/Bhagwat-Gita-Infinity

Letโ€™s bring timeless wisdom into modern AI together ๐Ÿ™Œ
Parveshiiiiย 
posted an update 7 months ago
view post
Post
2473
๐Ÿš€ New Release from XenArcAI
Weโ€™re excited to introduce AIRealNet โ€” our SwinV2โ€‘based image classifier built to distinguish between artificial and real images.

โœจ Highlights:
- Backbone: SwinV2
- Input size: 256ร—256
- Labels: artificial vs. real
- Performance: Accuracy 0.999 | F1 0.999 | Val Loss 0.0063

This model is now live on Hugging Face:
๐Ÿ‘‰ https://huggingface.co/XenArcAI/AIRealNet

We built AIRealNet to push forward openโ€‘source tools for authenticity detection, and we canโ€™t wait to see how the community uses it.
Tonicย 
posted an update 7 months ago
Tonicย 
posted an update 8 months ago
view post
Post
851
COMPUTER CONTROL IS ON-DEVICE !

๐Ÿก๐Ÿค– 78 % of EU smart-home owners DONโ€™T trust cloud voice assistants.

So we killed the cloud.

Meet Extรฉ: a palm-sized Android device that sees, hears & speaks your language - 100 % offline, 0 % data sent anywhere.

๐Ÿ”“ We submitted our technologies for consideration to the Liquid AI hackathon.

๐Ÿ“Š Dataset: 79 k UI-action pairs on Hugging Face (largest Android-control corpus ever) Tonic/android-operator-episodes

โšก Model: 98 % task accuracy, 678MB compressed , fits on existing android devices ! Tonic/l-android-control

๐Ÿ›ค๏ธ Experiment Tracker : check out the training on our TrackioApp Tonic/l-android-control

๐ŸŽฎ Live Model Demo: Upload an Android Screenshot and instructions to see the model in action ! Tonic/l-operator-demo



Built in a garage, funded by pre-orders, no VC. Now weโ€™re scaling to 1 k installer units.

Weโ€™re giving 50 limited-edition prototypes to investors , installers & researchers who want to co-design the sovereign smart home.

๐Ÿ‘‡ Drop โ€œEUSKERAโ€ in the comments if you want an invite, tag a friend who still thinks Alexa is โ€œconvenient,โ€ and smash โ™ฅ๏ธ if AI should belong to people - not servers.
  • 4 replies
ยท