video-captioning

Here are 105 public repositories matching this topic...

YehLi / xmodaler

X-modaler is a versatile and high-performance codebase for cross-modal analytics(e.g., image captioning, video captioning, vision-language pre-training, visual question answering, visual commonsense reasoning, and cross-modal retrieval).

image-captioning video-captioning visual-question-answering vision-and-language cross-modal-retrieval pretraining tden

Updated Feb 27, 2023
Python

xiadingZ / video-caption.pytorch

Star

pytorch implementation of video captioning

deep-learning pytorch video-captioning

Updated Aug 19, 2019
Python

scopeInfinity / Video2Description

Star

Video to Text: Natural language description generator for some given video. [Video Captioning]

deep-neural-networks video-processing image-captioning cnn-keras audio-processing lstm-neural-networks video-captioning video-to-text

Updated May 3, 2022
Python

xid32 / NAACL_2025_TWM

Star

We introduce temporal working memory (TWM), which aims to enhance the temporal modeling capabilities of Multimodal foundation models (MFMs). This plug-and-play module can be easily integrated into existing MFMs. With our TWM, nine state-of-the-art models exhibit significant performance improvements across QA, captioning, and retrieval tasks.

question-answering video-captioning working-memory audio-visual-learning video-text-retrieval multimodal-large-language-models multimodal-foundation-model

Updated Jan 26, 2025
Python

tomchang25 / whisper-auto-transcribe

Star

Auto transcribe tool based on whisper

text-to-speech deep-learning pytorch speech-recognition speech-to-text language-model gradio speech-processing asr video-captioning voice-activity-detection gradio-interface

Updated Apr 27, 2023
Python

antoyang / VidChapters

Star

[NeurIPS 2023 D&B] VidChapters-7M: Video Chapters at Scale

video-understanding weakly-supervised-learning video-captioning multimodal-learning vision-and-language dense-video-captioning pre-training temporal-language-grounding video-chapter-generation vid2seq

Updated Nov 13, 2023
Jupyter Notebook

vijayvee / video-captioning

Star

This repository contains the code for a video captioning system inspired by Sequence to Sequence -- Video to Text. This system takes as input a video and generates a caption in English describing the video.

tensorflow seq2seq sequence-to-sequence video-captioning s2vt multimodal-deep-learning

Updated Oct 12, 2019
Python

jayleicn / recurrent-transformer

Star

[ACL 2020] PyTorch code for MART: Memory-Augmented Recurrent Transformer for Coherent Video Paragraph Captioning

pytorch video-captioning youcook2 activitynet-captions

Updated Dec 4, 2020
Jupyter Notebook

bytedance / Shot2Story

Star

A new multi-shot video understanding benchmark Shot2Story with comprehensive video summaries and detailed shot-level captions.

benchmark research video-summarization dataset video-captioning video-story vision-language video-question-answering video-language large-language-models video-language-pretraining video-story-generation

Updated Jan 30, 2025
Python

yao-jason / MLDS2018SPRING

Star

Machine Learning and having it Deep and Structured (MLDS) in 2018 spring

Updated Apr 19, 2019
Python

jpthu17 / EMCL

Star

[NeurIPS 2022 Spotlight] Expectation-Maximization Contrastive Learning for Compact Video-and-Language Representations

video-captioning neurips video-retrieval video-question-answering cross-modal-retrieval

Updated Apr 9, 2024
Python

jssprz / video_captioning_datasets

Star

Summary about Video-to-Text datasets. This repository is part of the review paper *Bridging Vision and Language from the Video-to-Text Perspective: A Comprehensive Review*

review video-captioning state-of-the-art vision-and-language charades video-to-text msvd video-dataset video-description activitynet-captions trecvid tgif-dataset msr-vtt vatex

Updated Oct 27, 2023
Jupyter Notebook

terry-r123 / Awesome-Captioning

Star

A curated list of Multimodal Captioning related research(including image captioning, video captioning, and text captioning)

image-captioning video-captioning text-captioning

Updated Jun 6, 2022

Kamino666 / Video-Captioning-Transformer

Star

这是一个基于Pytorch平台、Transformer框架实现的视频描述生成 (Video Captioning) 深度学习模型。视频描述生成任务指的是：输入一个视频，输出一句描述整个视频内容的文字（前提是视频较短且可以用一句话来描述）。本repo主要目的是帮助视力障碍者欣赏网络视频、感知周围环境，促进“无障碍视频”的发展。

pytorch transformer video-captioning