A Summarize-then-Search Method for Long Video Question Answering: Conclusion
2024-5-26 21:0:26 Author: hackernoon.com(查看原文) 阅读量:3 收藏

Read on Terminal Reader

Too Long; Didn't Read

In this paper, researchers explore zero-shot video QA using GPT-3, outperforming supervised models, leveraging narrative summaries and visual matching.

featured image - A Summarize-then-Search Method for Long Video Question Answering: Conclusion

Kinetograph: The Video Editing Technology Publication HackerNoon profile picture

5. Conclusion

We introduced Long Story Short, a summarize-then-search method to understand both global narrative and the relevant details for video narrative QA. Our approach is effective when the context of QA is vast and a high-level interaction with such context is necessary to solve the said QA, which is the case in long video QAs. Also, we propose to further enhance the visual grounding of the model-generated answer by post-checking visual alignment with CLIPCheck. Our zero-shot method improves supervised state-of-art approaches in MovieQA and DramaQA benchmarks. We plan to release the code and the generated plot data to the public.

There are two possible research directions beyond this work: first, providing visual descriptions better aligned with the story with character re-identification and co-reference resolution improve input quality to GPT-3. Second, one can devise a more dynamic multi-hop search that combines global and local information in a hierarchical manner.


文章来源: https://hackernoon.com/a-summarize-then-search-method-for-long-video-question-answering-conclusion?source=rss
如有侵权请联系:admin#unsafe.sh