Automatic summarization and annotation of videos with lack of metadata information
The advances in computer and network infrastructure together with the fast evolution of multimedia data has resulted in the growth of attention to the digital video’s development. The scientific community has increased the amount of research into new technologies, with a view to improving the digital video utilization: its archiving, indexing, accessibility, acquisition, store and even its process and usability. All these parts of the video utilization entail the necessity of the extraction of all important information of a video, especially in cases of lack of metadata information. The main goal of this paper is the construction of a system that automatically generates and provides all the essential information, both in visual and textual form, of a video. By using the visual or the textual information, a user is facilitated on the one hand to locate a specific video and on the other hand is able to comprehend rapidly the basic points and generally, the main concept of a video without the need to watch the whole of it. The visual information of the system emanates from a video summarization method, while the textual one derives from a key-word-based video annotation approach. The video annotation technique is based on the key-frames, that constitute the video abstract and therefore, the first part of the system consists of the new video summarization method. According to the proposed video abstraction technique, initially, each frame of the video is described by the Compact Composite Descriptors (CCDs) and a visual word histogram. Afterwards, the proposed approach utilizes the Self-Growing and Self-Organized Neural Gas (SGONG) network, with a view to classifying the frames into clusters. The extraction of a representative key frame from every cluster leads to the generation of the video abstract. The most significant advantage of the video summarization approach is its ability to calculate dynamically the appropriate number of final clusters. In the sequel, a new video annotation method is applied to the generated video summary leading to the automatic generation of key-words capable of describing the semantic content of the given video. This approach is based on the recently proposed N-closest Photos Model (NCP). Experimental results on several videos are presented not only to evaluate the proposed system but also to indicate its effectiveness.