alphaXiv

Discover, Discuss, and Read arXiv papers

Discover new, recommended papers

Pinned

Suggested

•Showing all papers

41,413

12 May 2025

continual-learningtransformerstransfer-learning

Learning Dynamics in Continual Pre-Training for Large Language Models

A quantitative framework reveals and models the learning dynamics during continual pre-training (CPT) of large language models, deriving scaling laws that predict performance evolution while accounting for distribution shift and learning rate effects, enabling optimization of training parameters across general and domain-specific tasks.

14,074

14 May 2025

transformershardware-aware-algorithmsmodel-compression

Insights into DeepSeek-V3: Scaling Challenges and Reflections on Hardware for AI Architectures

DeepSeek-AI researchers present insights from developing DeepSeek-V3, documenting specific hardware constraints and architectural solutions that enable efficient large language model training through innovations in mixed-precision computation, network topology optimization, and memory management while achieving competitive performance with significantly reduced hardware requirements.

25,736

09 May 2025

imitation-learningrobotics-perceptionmulti-modal-learning

UniVLA: Learning to Act Anywhere with Task-centric Latent Actions

A unified vision-language-action framework called UniVLA enables robots to learn task-centric latent actions from unlabeled videos, achieving state-of-the-art performance across multiple manipulation and navigation benchmarks while reducing pre-training compute requirements by 95% compared to previous methods.

9,418

11 May 2025

knowledge-distillationtransfer-learningtransformers

Knowledge Distillation for Enhancing Walmart E-commerce Search Relevance Using Large Language Models

Walmart researchers developed a knowledge distillation framework that transfers semantic understanding capabilities from large language models to a smaller, production-ready model for e-commerce search ranking, achieving comparable performance while meeting strict latency requirements and demonstrating improved user engagement metrics in A/B testing on Walmart.com.

2,856

14 May 2025

generative-modelsmulti-modal-learningtransformers

BLIP3-o: A Family of Fully Open Unified Multimodal Models-Architecture, Training and Dataset

BLIP3-o introduces a family of open-source unified multimodal models that combine image understanding and generation capabilities through systematic architecture and training optimization, achieving state-of-the-art performance on multiple benchmarks while providing valuable insights on design choices like CLIP features versus VAE representations and sequential versus joint training approaches.

2,987

13 May 2025

deep-reinforcement-learningreinforcement-learningimitation-learning

A Practical Introduction to Deep Reinforcement Learning

A comprehensive tutorial introduces deep reinforcement learning through the lens of Generalized Policy Iteration (GPI), focusing on the Proximal Policy Optimization (PPO) algorithm while providing practical implementation techniques and intuitive explanations aimed at bridging the gap between theory and application.

14,927

11 May 2025

vision-language-modelsmulti-modal-learningtransformers

Seed1.5-VL Technical Report

An advanced vision-language model from ByteDance Seed achieves state-of-the-art performance on 38 out of 60 public benchmarks through a three-stage pre-training pipeline and novel data synthesis approaches, demonstrating particularly strong capabilities in GUI control, document understanding, and video comprehension while maintaining a relatively compact architecture.

2,134

14 May 2025

model-compressionlightweight-modelszero-shot-learning

Zero-shot Quantization: A Comprehensive Survey

A comprehensive survey examines zero-shot quantization (ZSQ) methods for deep learning model compression, analyzing techniques across synthesis-free, generator-based, and noise-optimization approaches while providing performance comparisons on ResNet-18/ImageNet benchmarks and identifying key challenges in data-free model deployment.

1,697

13 May 2025

q-bio.NC

Neural encoding of real world face perception

Social perception unfolds as we freely interact with people around us. We investigated the neural basis of real world face perception using multi electrode intracranial recordings in humans during spontaneous interactions with friends, family, and others. Computational models reconstructed the faces participants looked at during natural interactions, including facial expressions and motion, from brain activity alone. The results highlighted a critical role for the social vision pathway, a network of areas spanning parietal, temporal, and occipital cortex. This network was more sharply tuned to subtle expressions compared to intense expressions, which was confirmed with controlled psychophysical experiments. These findings reveal that the human social vision pathway encodes facial expressions and motion as deviations from a neutral expression prototype during natural social interactions in real life.

2,407

14 May 2025

transformersreasoningparameter-efficient-training

Qwen3 Technical Report

The Qwen Team introduces Qwen3, a series of open-source large language models featuring dynamic thinking/non-thinking modes and mixture-of-experts architectures, expanding multilingual capabilities to 119 languages while achieving competitive performance with the Qwen3-235B-A22B model using only 22B activated parameters per token.

Explore

Communities

Login

Get add-on

Papers

Comments

People

Assistant

Discover, Discuss, and Read arXiv papers

Discover new, recommended papers

Learning Dynamics in Continual Pre-Training for Large Language Models

Insights into DeepSeek-V3: Scaling Challenges and Reflections on Hardware for AI Architectures

UniVLA: Learning to Act Anywhere with Task-centric Latent Actions

Knowledge Distillation for Enhancing Walmart E-commerce Search Relevance Using Large Language Models

BLIP3-o: A Family of Fully Open Unified Multimodal Models-Architecture, Training and Dataset

A Practical Introduction to Deep Reinforcement Learning

Seed1.5-VL Technical Report

Zero-shot Quantization: A Comprehensive Survey

Neural encoding of real world face perception

Qwen3 Technical Report

Explore

Communities

Login

Get add-on

Papers

Comments

People

Assistant

Discover, Discuss, and Read arXiv papers

Discover new, recommended papers

Filters

Learning Dynamics in Continual Pre-Training for Large Language Models

Insights into DeepSeek-V3: Scaling Challenges and Reflections on Hardware for AI Architectures

UniVLA: Learning to Act Anywhere with Task-centric Latent Actions

Knowledge Distillation for Enhancing Walmart E-commerce Search Relevance Using Large Language Models

BLIP3-o: A Family of Fully Open Unified Multimodal Models-Architecture, Training and Dataset

A Practical Introduction to Deep Reinforcement Learning

Seed1.5-VL Technical Report

Zero-shot Quantization: A Comprehensive Survey

Neural encoding of real world face perception

Qwen3 Technical Report