Discover, Discuss, and Read arXiv papers

Discover new, recommended papers

Pinned
Suggested

Filters

12 May 2025
continual-learningtransformerstransfer-learning

A quantitative framework reveals and models the learning dynamics during continual pre-training (CPT) of large language models, deriving scaling laws that predict performance evolution while accounting for distribution shift and learning rate effects, enabling optimization of training parameters across general and domain-specific tasks.

14 May 2025
transformershardware-aware-algorithmsmodel-compression

DeepSeek-AI researchers present insights from developing DeepSeek-V3, documenting specific hardware constraints and architectural solutions that enable efficient large language model training through innovations in mixed-precision computation, network topology optimization, and memory management while achieving competitive performance with significantly reduced hardware requirements.

09 May 2025
imitation-learningrobotics-perceptionmulti-modal-learning

A unified vision-language-action framework called UniVLA enables robots to learn task-centric latent actions from unlabeled videos, achieving state-of-the-art performance across multiple manipulation and navigation benchmarks while reducing pre-training compute requirements by 95% compared to previous methods.

11 May 2025
knowledge-distillationtransfer-learningtransformers

Walmart researchers developed a knowledge distillation framework that transfers semantic understanding capabilities from large language models to a smaller, production-ready model for e-commerce search ranking, achieving comparable performance while meeting strict latency requirements and demonstrating improved user engagement metrics in A/B testing on Walmart.com.

14 May 2025
generative-modelsmulti-modal-learningtransformers

BLIP3-o introduces a family of open-source unified multimodal models that combine image understanding and generation capabilities through systematic architecture and training optimization, achieving state-of-the-art performance on multiple benchmarks while providing valuable insights on design choices like CLIP features versus VAE representations and sequential versus joint training approaches.

13 May 2025
deep-reinforcement-learningreinforcement-learningimitation-learning

A comprehensive tutorial introduces deep reinforcement learning through the lens of Generalized Policy Iteration (GPI), focusing on the Proximal Policy Optimization (PPO) algorithm while providing practical implementation techniques and intuitive explanations aimed at bridging the gap between theory and application.

11 May 2025
vision-language-modelsmulti-modal-learningtransformers

An advanced vision-language model from ByteDance Seed achieves state-of-the-art performance on 38 out of 60 public benchmarks through a three-stage pre-training pipeline and novel data synthesis approaches, demonstrating particularly strong capabilities in GUI control, document understanding, and video comprehension while maintaining a relatively compact architecture.

14 May 2025
model-compressionlightweight-modelszero-shot-learning

A comprehensive survey examines zero-shot quantization (ZSQ) methods for deep learning model compression, analyzing techniques across synthesis-free, generator-based, and noise-optimization approaches while providing performance comparisons on ResNet-18/ImageNet benchmarks and identifying key challenges in data-free model deployment.

13 May 2025
q-bio.NC
Social perception unfolds as we freely interact with people around us. We investigated the neural basis of real world face perception using multi electrode intracranial recordings in humans during spontaneous interactions with friends, family, and others. Computational models reconstructed the faces participants looked at during natural interactions, including facial expressions and motion, from brain activity alone. The results highlighted a critical role for the social vision pathway, a network of areas spanning parietal, temporal, and occipital cortex. This network was more sharply tuned to subtle expressions compared to intense expressions, which was confirmed with controlled psychophysical experiments. These findings reveal that the human social vision pathway encodes facial expressions and motion as deviations from a neutral expression prototype during natural social interactions in real life.
14 May 2025
transformersreasoningparameter-efficient-training

The Qwen Team introduces Qwen3, a series of open-source large language models featuring dynamic thinking/non-thinking modes and mixture-of-experts architectures, expanding multilingual capabilities to 119 languages while achieving competitive performance with the Qwen3-235B-A22B model using only 22B activated parameters per token.

Morty Proxy This is a proxified and sanitized view of the page, visit original site.