Hugging Face Updates

Stay updated with the latest Hugging Face releases, security patches, and feature updates.

Latest Hugging Face Updates

Hugging Face Jul 28, 2026

The OlmoEarth Platform: Geospatial inference at planetary scale

Allen Institute for AI details the infrastructure behind OlmoEarth Platform, which runs geospatial foundation model inference at continent scale by splitting work across CPU and GPU stages with heavy parallelism.

Hugging Face Jul 27, 2026

NVIDIA Cosmos-H-Dreams: Bringing Real-Time Generative Simulation to Surgical Robotics

NVIDIA releases Cosmos-H-Dreams, a distilled world model that runs action-conditioned surgical simulation at ~160 fps on a single GPU, along with FlashDreams inference engine and a recipe for adapting to custom embodiments.

Hugging Face Jul 23, 2026

Bringing Nunchaku 4-bit Diffusion Inference to Diffusers

Hugging Face integrates Nunchaku Lite into Diffusers, enabling native loading of SVDQuant W4A4 checkpoints via from_pretrained() with no local CUDA compilation. This cuts diffusion model VRAM usage roughly in half while also delivering inference speedups.

Hugging Face Jul 20, 2026

Introducing Cosmos 3 Edge

NVIDIA released Cosmos 3 Edge, a 4B-parameter open world model for robotics and vision AI on edge devices, now available on Hugging Face. The model combines autoregressive and diffusion transformer towers with a shared representation to handle scene understanding, prediction, and action generation in one package.

Hugging Face Jul 17, 2026

Fine-tune video and image models at scale with NVIDIA NeMo Automodel and 🤗 Diffusers

NVIDIA and Hugging Face announce a NeMo Automodel integration that lets you fine-tune Diffusers-format diffusion models — including FLUX, Wan 2.1/2.2, HunyuanVideo, and Qwen-Image — at scale without checkpoint conversion or model rewrites. Training parallelism is driven by YAML config, not code changes.

Hugging Face Jul 16, 2026

NVIDIA Nemotron 3 Embed Ranks #1 Overall on RTEB, Advancing Agentic Retrieval

NVIDIA released Nemotron 3 Embed, a family of three open embedding models (8B, 1B BF16, 1B NVFP4) for retrieval-augmented generation and agentic workflows. The 8B variant currently holds the #1 position on the RTEB leaderboard.

Hugging Face Jul 15, 2026

What building Shippy taught us about building agents

Allen AI's Skylight team shares the architecture behind Shippy, a maritime AI agent, breaking down their approach to agent reliability through layered deterministic tooling, skill/soul/config separation, and per-user sandboxed Kubernetes sessions.

Hugging Face Jul 6, 2026

PRX Part 4: Our Data Strategy

Photoroom details the data pipeline behind PRX, their 7B image generation model, covering dataset assembly, captioning strategy, storage format choices (Lance + MDS), and measured tradeoffs like JPEG vs PNG training data.

Hugging Face Jul 6, 2026

🤗 Kernels: Major Updates

Hugging Face has redesigned its Kernels project with a new Hub repository type, a trust-based security model with code signing, cleaner CLI separation, and expanded framework support including Torch Stable ABI and Apache TVM FFI.

Hugging Face Jul 1, 2026

Hugging Face and Cerebras bring Gemma 4 to real-time voice AI

Hugging Face and Cerebras demo a cascaded speech-to-speech pipeline using Gemma 4 31B for the LLM layer, Nvidia Parakeet for ASR, and Qwen3TTS for synthesis, targeting low-latency voice AI interactions.

Hugging Face Jun 29, 2026

DiScoFormer: One transformer for density and score, across distributions

Allen AI introduces DiScoFormer, a single transformer that estimates both density and score of a distribution from a sample in one forward pass, without per-distribution retraining. It substantially outperforms kernel density estimation in high dimensions.

Hugging Face Jun 22, 2026

PP-OCRv6 on Hugging Face: 50-Language OCR from 1.5M to 34.5M Parameters

PaddleOCR's PP-OCRv6 ships three model tiers (1.5M–34.5M params) with 50-language support and multiple inference backends including Transformers, ONNX Runtime, and Paddle Inference, all hosted on Hugging Face Hub.

Hugging Face Jun 17, 2026

MolmoMotion: Language-guided 3D motion forecasting

Allen AI releases MolmoMotion, a model that predicts 3D point trajectories from a single video frame, query points, and natural language instructions, along with a 1.16M-video dataset and a human-validated benchmark.

Hugging Face Jun 12, 2026

olmo-eval: An evaluation workbench for the model development loop

Allen AI released olmo-eval, an open-source evaluation workbench that extends OLMES to cover the iterative model development loop — not just final-model benchmarking. It emphasizes modularity, pairwise checkpoint comparison, and flexible sandboxing.

Hugging Face Jun 9, 2026

Can Voice Agents Handle Bilingual Customers? Benchmarking Frontier ASR on Code-Switched Speech

ServiceNow AI published a benchmark and dataset evaluating seven ASR models on code-switched (bilingual) speech across four language pairs, released through their AU-Harness evaluation tool. ElevenLabs Scribe V2, Gemini 3 Flash, and AssemblyAI Universal 3-Pro came out on top, while Whisper Large V3 Turbo performed poorly due to defaulting to translation mode on mixed-language audio.

Hugging Face Jun 7, 2026

Amazing Digital Dentures (a failed project)

A Hugging Face hackathon participant documents their failed attempt to build an LLM-powered game generator using Nemotron 30b, detailing the prompt engineering and RAG strategies that didn't work and the scaled-back HTML toy maker that did.

Hugging Face Jun 6, 2026

Five labs, five minds: building a multi-model finance drama on small models

A Hugging Face hackathon project runs four different labs' small models as separate agents in an emergent economy simulation, surfacing practical lessons about serving heterogeneous models, information isolation, and bounded memory in multi-agent setups.

Hugging Face Jun 5, 2026

Thousand Token Wood: shipping a multi-agent economy on a 3B model

A Build Small Hackathon project runs five autonomous trading agents on Qwen2.5-3B via vLLM, demonstrating that a small model can reliably produce structured output for a multi-agent simulation while requiring heavy prompt engineering to compensate for weak reasoning.

Hugging Face Jun 4, 2026

Nemotron 3.5 Content Safety: Customizable Multimodal Safety for Global Enterprise AI

NVIDIA releases Nemotron 3.5 Content Safety, a 4B-parameter model built on Gemma 3 that adds custom policy enforcement, auditable reasoning traces, and a public safety dataset to its existing multimodal and multilingual classification capabilities.

Hugging Face Jun 2, 2026

Holo3.1: Fast & Local Computer Use Agents

H Company releases the Holo3.1 family of computer-use models in four sizes (0.8B to 35B-A3B) with quantized checkpoints for local inference, expanded mobile support, and native function-calling protocols.

Hugging Face Jun 1, 2026

Introducing Mellum2: A 12B Mixture-of-Experts Model by JetBrains

JetBrains released Mellum2, a 12B-parameter MoE model that activates 2.5B parameters per token, aimed at latency-sensitive code and text tasks. It's Apache 2.0 licensed and available on Hugging Face.

Hugging Face May 27, 2026

ITBench-AA: Frontier Models Score Below 50% on the First Benchmark for Agentic Enterprise IT Tasks — by Artificial Analysis and IBM

Artificial Analysis and IBM launch ITBench-AA, a benchmark testing frontier AI models on agentic SRE tasks like Kubernetes incident diagnosis. No model breaks 50%, making it one of the least saturated agentic benchmarks available.