Surveillance system

  • Surveillance system에 적용된 Video captioning 연구 사례 & 활용 가능성
  • Dense Video captioning에 대한 최신 연구(2022-2033)

 

End-to-End Dense Video Captioning as Sequence Generation: This paper discusses the concept of dense video captioning, which involves identifying events of interest in a video and generating descriptive captions for each event. The process typically follows a two-stage generative approach, starting with proposing a segment for each event and then rendering a caption for each identified segment​​.

 

End-to-end Dense Video Captioning as Sequence Generation

Dense video captioning aims to identify the events of interest in an input video, and generate descriptive captions for each event. Previous approaches usually follow a two-stage generative process, which first proposes a segment for each event, then rende

arxiv.org

 

A Latent Topic-Aware Network for Dense Video Captioning - Xu - 2023: This research focuses on the challenging task of locating multiple events from a long, untrimmed video and describing them with sentences or a paragraph. The paper proposes a novel LTNet to enhance the relevance and semantic quality of the predicted captions​​.

https://ietresearch.onlinelibrary.wiley.com/doi/epdf/10.1049/cvi2.12195

 

Dense Video Captioning Based on Local Attention - Qian - 2023: This study, first published in May 2023, explores dense video captioning with a focus on local attention. It aims to locate multiple events in an untrimmed video and generate captions for each​​.

https://ietresearch.onlinelibrary.wiley.com/doi/epdf/10.1049/ipr2.12819

 

Dense Video Captioning Using BiLSTM Encoder | IEEE Conference: Presented in May 2022, this paper discusses dense video captioning as a newly emerging research subject. It involves presenting temporal events in a video and creating captions for each temporal event​​.

https://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=9824569

 

Dense Video Captioning with Early Linguistic Information Fusion (2023): This paper proposes a Visual-Semantic Embedding (ViSE) Framework that models word-context distributional properties over the semantic space and computes weights for n-grams, assigning higher weights to more informative n-grams​​.

https://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=9693415

 

Fusion of Multi-Modal Features to Enhance Dense Video Caption (2023)

https://www.mdpi.com/1424-8220/23/12/5565

 

Fusion of Multi-Modal Features to Enhance Dense Video Caption

Dense video caption is a task that aims to help computers analyze the content of a video by generating abstract captions for a sequence of video frames. However, most of the existing methods only use visual features in the video and ignore the audio featur

www.mdpi.com

 

+ Recent posts