A quantitative framework reveals and models the learning dynamics during continual pre-training (CPT) of large language models, deriving scaling laws that predict performance evolution while accounting for distribution shift and learning rate effects, enabling optimization of training parameters across general and domain-specific tasks.
DeepSeek-AI researchers present insights from developing DeepSeek-V3, documenting specific hardware constraints and architectural solutions that enable efficient large language model training through innovations in mixed-precision computation, network topology optimization, and memory management while achieving competitive performance with significantly reduced hardware requirements.
A unified vision-language-action framework called UniVLA enables robots to learn task-centric latent actions from unlabeled videos, achieving state-of-the-art performance across multiple manipulation and navigation benchmarks while reducing pre-training compute requirements by 95% compared to previous methods.
Walmart researchers developed a knowledge distillation framework that transfers semantic understanding capabilities from large language models to a smaller, production-ready model for e-commerce search ranking, achieving comparable performance while meeting strict latency requirements and demonstrating improved user engagement metrics in A/B testing on Walmart.com.
BLIP3-o introduces a family of open-source unified multimodal models that combine image understanding and generation capabilities through systematic architecture and training optimization, achieving state-of-the-art performance on multiple benchmarks while providing valuable insights on design choices like CLIP features versus VAE representations and sequential versus joint training approaches.
A comprehensive tutorial introduces deep reinforcement learning through the lens of Generalized Policy Iteration (GPI), focusing on the Proximal Policy Optimization (PPO) algorithm while providing practical implementation techniques and intuitive explanations aimed at bridging the gap between theory and application.
An advanced vision-language model from ByteDance Seed achieves state-of-the-art performance on 38 out of 60 public benchmarks through a three-stage pre-training pipeline and novel data synthesis approaches, demonstrating particularly strong capabilities in GUI control, document understanding, and video comprehension while maintaining a relatively compact architecture.
A comprehensive survey examines zero-shot quantization (ZSQ) methods for deep learning model compression, analyzing techniques across synthesis-free, generator-based, and noise-optimization approaches while providing performance comparisons on ResNet-18/ImageNet benchmarks and identifying key challenges in data-free model deployment.
The Qwen Team introduces Qwen3, a series of open-source large language models featuring dynamic thinking/non-thinking modes and mixture-of-experts architectures, expanding multilingual capabilities to 119 languages while achieving competitive performance with the Qwen3-235B-A22B model using only 22B activated parameters per token.