DeepSeek-AI researchers present insights from developing DeepSeek-V3, documenting specific hardware constraints and architectural solutions that enable efficient large language model training through innovations in mixed-precision computation, network topology optimization, and memory management while achieving competitive performance with significantly reduced hardware requirements.
A quantitative framework reveals and models the learning dynamics during continual pre-training (CPT) of large language models, deriving scaling laws that predict performance evolution while accounting for distribution shift and learning rate effects, enabling optimization of training parameters across general and domain-specific tasks.
BLIP3-o introduces a family of open-source unified multimodal models that combine image understanding and generation capabilities through systematic architecture and training optimization, achieving state-of-the-art performance on multiple benchmarks while providing valuable insights on design choices like CLIP features versus VAE representations and sequential versus joint training approaches.
Researchers from UCAS and ISCAS introduce an information-theoretic reinforcement fine-tuning framework that optimizes LLM reasoning efficiency by using parameter information gain as dense rewards, achieving 10% higher accuracy while doubling token efficiency compared to standard outcome-reward approaches.
The Qwen Team introduces Qwen3, a series of open-source large language models featuring dynamic thinking/non-thinking modes and mixture-of-experts architectures, expanding multilingual capabilities to 119 languages while achieving competitive performance with the Qwen3-235B-A22B model using only 22B activated parameters per token.
Walmart researchers developed a knowledge distillation framework that transfers semantic understanding capabilities from large language models to a smaller, production-ready model for e-commerce search ranking, achieving comparable performance while meeting strict latency requirements and demonstrating improved user engagement metrics in A/B testing on Walmart.com.
A unified vision-language-action framework called UniVLA enables robots to learn task-centric latent actions from unlabeled videos, achieving state-of-the-art performance across multiple manipulation and navigation benchmarks while reducing pre-training compute requirements by 95% compared to previous methods.
An advanced vision-language model from ByteDance Seed achieves state-of-the-art performance on 38 out of 60 public benchmarks through a three-stage pre-training pipeline and novel data synthesis approaches, demonstrating particularly strong capabilities in GUI control, document understanding, and video comprehension while maintaining a relatively compact architecture.
A comprehensive evaluation of 15 large language models reveals systematic performance degradation (39% average drop) in multi-turn conversations compared to single-turn interactions, with models struggling to maintain context and adapt to new information across conversation turns regardless of temperature settings or conversation granularity.