An Algorithm for Event Retrieval from a Sports Video
1. Introduction
The production and availability of large video collection necessitated video retrieval relevant to a user defined query. This has become one of the most popular topics in both real life applications and multimedia research. There are vast amount of video archives including broadcast news, documentary videos, meeting videos, sports, movies etc. The video sharing on the web is also growing with a tremendous speed which creates perhaps the most heterogeneous and the largest publicly available video archive. Finding the desired videos is becoming harder and harder everyday for the users. Research on video retrieval is aiming at the facilitation of this task. In a video there are three main type of information which can be used for video retrieval: visual content, text information and audio information. Even though there are some studies on the use of audio information, it is the least used source for retrieval of videos. Mostly audio information is converted into text using automatic speech recognition (ASR) engines and used as text information. Most of the current effective retrieval methods rely on the noisy text information attached to the videos. This text information can be ASR results, optical character recognition (OCR) texts, social tags or surrounding hypertext information. Nowadays most of the active research is conducted on the utilization of the visual content. Perhaps it is the richest source of information, however analyzing visual content is much harder than analyzing the other two. There are two main frameworks for video retrieval: text-based and content-based. The text-based methods achieve retrieval by using the text information attached to the video. Content-based approaches the visual features such as color, texture, shape, motion. Users express their needs in terms of queries. In contentbased retrieval there are several types of queries. These queries can
Computer Science and Engineering, BEC, Bagalkot Page 1
An Algorithm for Event Retrieval from a Sports Video
be defined with text keywords, video examples, or low-level visual features.
Queries are split into three levels:
Level 1: Retrieval by primitive visual features such as color,
texture, shape, motion or the spatial location of video elements. Examples of such queries might include find videos with long thin dark objects in the top left-hand corner, or most commonly find more videos that look like this.
Level 2: Retrieval of the concepts identified by derived
features, with some degree of logical inference. Examples of such queries might include find videos of a bus, or find videos of walking.
Level 3: Retrieval by abstract attributes, involving a significant
amount of high-level reasoning about the meaning and purpose of the objects or scenes depicted. Examples of such queries might include find videos of one or more people walking up stairs, or find videos of a road taken from a moving vehicle through the front windshield. Levels 2 and level 3 together are referred as semantic video retrieval, and the gap between level 1 and level 2 is named as semantic gap. More specifically the discrepancy between limited descriptive power of low level visual features and high-level semantics is referred as semantic gap. Users in level 1 retrieval are usually required to provide example videos as queries. However its not always possible to find examples of the desired video content. Moreover example videos may not express the users intent appropriately. Since people use languages for the main way of communication, the most natural way of expressing themselves is
Computer Science and Engineering, BEC, Bagalkot Page 2
An Algorithm for Event Retrieval from a Sports Video
through the words. In level 2 and 3 queries are mostly expressed with the words from the natural languages. Since level 3 subsumes level 2, well refer level 3 as semantic retrieval. And level 2 will be referred as concept retrieval. Mainly focus is on these two levels. Most information retrieval systems make indirect use of human knowledge in their retrieval process. The aim is here to efficiently use human knowledge directly in combination with support vector machines for clustering.
2. LITERATURE SURVEY
A review on the previous works on Event retrieval presented is summarized in this section. The soccer video event retrieval based on context cues is discussed in [1]. As to the soccer video, the event is defined as the mediumlevel spatiotemporal entity interesting to users, having certain context cues corresponding to the specific domain knowledge model. As a medium-level entity, the inference of soccer event is based on the fusion of context cues and domain knowledge model. The shooting event is chosen as research target and the event analysis method is expected to be reusable or other soccer events. According to the analysis of shooting event, the following seven kinds of context cues are extracted, respectively including one kind of caption detection, two kinds of face detection(single and multiple), one kind of audience detection, one kind of goal detection, and two kinds of motion estimation. The content-based TV sports video retrieval based on audio visual features and text information is discussed in [2]. Because video data is composed of multimodal information streams, this paper use multimodal information such as visual information, auditory
Computer Science and Engineering, BEC, Bagalkot Page 3
An Algorithm for Event Retrieval from a Sports Video
information and textual information to detect the events of sports video and realize content-based retrieval by quickly browsing treelike video clips and inputting keywords within predefined domain. This paper particularly concerned with the events that maybe change the score and are interesting for fans and coaches or kinematics researchers: i) penalty kicks (PK), ii) free kicks next to goal box (FK), and iii) corner kicks (CK). An enhanced query model used for soccer video retrieval using temporal relationships is discussed in [3]. This paper develops a general framework which can automatically analyze the sports video, detect the sports events, and finally offer an efficient and user-friendly system for sports video retrieval. Here, a temporal query model is designed to satisfy the comprehensive temporal query requirements, and the corresponding graphical query language is developed. The advanced characteristics make our model particularly well suited for searching events in a large scale video database. A soccer video retrieval system named SoccerQ is developed to support both the basic queries and the relative temporal queries. In SoccerQ, video shots are stored, managed, and retrieved along with their corresponding features and meta-data. A fuzzy logic based approach is introduced in [4], which
concurrently detects transition and shot boundaries. This paper introduces a unified algorithm for shot detection in sports video using Fuzzy Logic as a powerful inference mechanism. Fuzzy logic overcomes the problems of hard cut thresholds and the need to train large data. The proposed algorithm integrates many features like color histogram, edgeness, intensity variance, etc. Membership functions to represent different features and transitions between shots have been developed to detect different shot boundary and transition types. Here, algorithm addresses the detection of cut, fade, dissolve, and wipe shot transitions.
Computer Science and Engineering, BEC, Bagalkot Page 4
An Algorithm for Event Retrieval from a Sports Video
An unsupervised content-based indexing for sports video retrieval is discussed in [5]. This paper uses the concept of a grounded language model to motivate a framework in which video is searched using natural language with no reliance on predetermined concepts or hand labeled events. This grounded language model is learned from an unlabeled corpus of baseball games and the paired closedcaption transcripts of the game. Three important features in learning grounded language model are Feature extraction, temporal pattern mining and Linguistic mapping. In Feature extraction two types of features are used visual context, and camera motion. Visual context features encode general properties of the visual scene in a video segment. The first step in extracting such features is to split the raw video into shots based on changes in the visual scene due to editing. scene, After it is or segmented into other. Detecting shots, each shot is camera motion (i.e., automatically categorized into one of three categories: pitchingfield-scene, pan/tilt/zoom) is a well-studied problem in video analysis. Here a algorithm is used which computes the pan, tilt, and zoom motions using the parameters of a two-dimensional affine model fit to every pair of sequential frames in a video segment. Once feature streams have been extracted, we use temporal data mining to discover a codebook of temporal patterns that are used to represent the video events. The algorithm used here is fully unsupervised. It processes feature streams by examining the relations that occur between individual features within a moving time window. The last step in learning the grounded language model is to map words onto the representations of events mined from the raw video. A content-oriented video retrieval system is presented in [6], which is capable of handling high volumes of content as well as various functionality requirements. It allows the audience to access the video contents based on their different interests of selected video
Computer Science and Engineering, BEC, Bagalkot Page 5
An Algorithm for Event Retrieval from a Sports Video
program. The retrieval system consists of a content-based scalable access platform for supporting content-based scalable multifunctional video retrieval (MFVR). The retrieval methodology is based on different content semantic layers. Based upon their semantic significance video content is divided into video clip layer, object class layer, action class layer and conclusion layer. Each frame in each layer (except the video clip frame) has pointers to the related frames in the lower and the upper layer. Each frame in the upper three layers has a pointer to the related video clip frame in the bottom layer. This paper not only contains the content-scalable scheme that is adaptive for different subjective client user but also supports different video scalable content retrieval. An Event-based Soccer Video Retrieval with Interactive Genetic Algorithm is introduced in [7]. This paper presents a system to segment a soccer video into four different kinds of events (i.e. playevent, replay-event, goal-event, and break-event) and implements retrieval for different purposes to meet the different needs for the users with interactive genetic algorithm. Once soccer video is segmented firstly in this paper soccer game videos are divided into audio and video. Then, eight audio-visual features (i.e. average shot duration, standard deviation of shot duration, average motion activity, standard deviation of motion activity, average sound energy, standard deviation of sound energy, average speech rate and standard deviation of speech rate) are extracted from video and the corresponding audio track. These features are encoded as a chromosome and indexed into the database. For all the database videos, the above procedure is repeated. For retrieval, IGA is exploited that performs optimization with the human evaluation. System displays 15 videos, obtains a relevance feedback of the videos from a human, and selects the candidates based on the relevance. A genetic crossover operator is applied to the selected candidates. To find the next 15 videos, the stored video
Computer Science and Engineering, BEC, Bagalkot Page 6
An Algorithm for Event Retrieval from a Sports Video
information is evaluated by each criterion. Fifteen videos that have higher similarity to the candidates are provided as a result of the retrieval. An advanced sports video browsing and retrieval system based on multimodal analysis, SportsBR, is proposed in [8]. Here, system analyzes an inputting sports video by dividing it into video and audio streams respectively. In video streams, it processes the visual features and extracts the textual transcripts to detect the shots that probably contain the events. Visual features are not sufficient for detecting events, so the textual transcripts detection can improve the accuracy and it also use to generate the textual indices for users to query the video events clips. In audio streams the speech recognition of special words such as goal or penalty kick and use these words to generate the textual indices, too. In audio signal processing module, several parameters of audio signal are computed to find the interesting parts of sports video more accurately. After the video events clips is found, it is organized for content-based browsing and retrieval. Combining audio-visual features and caption text information, the system can automatically selects the interesting events. Then using automatically extracted text caption and results of speech recognition as index files, SportsBR supports keyword-based sports event retrieval. The system also supports event-based sports video clips browsing and generates key-frame-based video abstract for each clip. An advanced content-based broadcasted sports video retrieval system, SportBR, is proposed in [9]. Its main features include eventbased sports video browsing and keyword-based sports video retrieval. The algorithm introduces a novel approach that integrates multimodal analysis, such as visual streams analysis, speech recognition, and speech signal processing and text extraction to realize event-based video clips selection. The algorithm first selects
Computer Science and Engineering, BEC, Bagalkot Page 7
An Algorithm for Event Retrieval from a Sports Video
video clips that may contain events. Then it describes how to get keywords such as goal or penalty kick by speech recognition and detect interesting segments by computing the short time average energy and other parameters of audio. At last, a method of extracting textual transcript within video images is introduced to detect events and use these textual words to generate the indexing keywords. The system is proved to be helpful and effective for understanding the sports video content. An Event Retrieval in Soccer Video from coarse to fine based on Multi-modal Approach is introduced in [10]. A soccer video clip contains both visual data and audio data. If both types of data are available video clip could understood more comprehensively. This paper imitates the way human beings understand videos by multimodal retrieval model. It uses the multimodal method in video retrieval taking advantages of two types of data. Event could be described by shot sequence context containing interest objects, transcript text. Retrieval strategy of this algorithm is searching from coarse to fine. At first stage, the video sequence is segmented into shots and classified them into 4 basic types including long view, medium view, and close up and audience. This stage helps gathering event candidates in the pre-filter stage at high speed. In second stage, refining the results is done based on interest objects and transcript text. The soccer objects that we suggest using are referee, penalty area, text box, excitement (shout), whistle and audio keyword (translated from speech). Excitement sound here may stem from two sources; shouts from the audience and the report of the commentators that appears when there is an attractive situation. Whistle can be detected based on frequency domain with the appearance of strong signal in the specific prior range. In this paper, new query language is introduced to describe an event as sequence of components in visual and audio data. It is used to retrieve relevant sequences automatically. It can be performed
Computer Science and Engineering, BEC, Bagalkot Page 8
An Algorithm for Event Retrieval from a Sports Video
quickly and flexibly. More than that, system can be easily developed with new visual and audio components. Semantic analysis and retrieval of sports video is discussed in [11]. A fully automatic and computationally efficient for the indexing, browsing and retrieval of sports video is explained. A hierarchical organizing model based-on MPEG-7 is introduced to effectively represent high-level and low-level information in sports video. The models attributes have been instantiated through a series of multimodal processing approaches for soccer video. To effectively retrieve highlight clip of sport video, adaptive abstraction is designed based on the excitement time curve, and an XML-based query scheme for semantic search and retrieval is based on the hierarchical model. A robust detection algorithm is developed to automatically locate the field region of sports video based on lowlevel color features. An effective mid-level shot classification algorithm is proposed based on spatial features and the coarseness of human figures. For retrieval, XQuery, a XML query language, is employed in this paper. Semantic Segmentation and Event Detection in Sports Video using Rule Based Approach is discussed in [12]. The paper addresses two main problems of sports video processing: semantic segmentation and event detection. This paper proposes a novel hybrid multilayered approach for semantic segmentation of cricket video and major cricket events detection. The algorithm uses low level features and high level semantics with the rule based approach. The top layer uses the DLER tool to extract and recognize the super imposed text and the bottom layer applies the game rules to detect the boundaries of the video segments and major cricket events. Semantic Event Detection in Soccer video by Integrating Multifeatures using Bayesian Network is discussed in [13]. In this paper
Computer Science and Engineering, BEC, Bagalkot Page 9
An Algorithm for Event Retrieval from a Sports Video
Bayesian network is used to statistically model the scoring event detection based on the recording and editing rules of soccer video. The Bayesian network fuses the five low-level video content cues with the graphical model and probability theory. Thus the problem of event detection is converted to one of the statistical pattern classification. The learning and inference of Bayesian network are given in this paper. Goal Event Detection in Broadcast Soccer Videos by Combining Heuristic Rules with Unsupervised Fuzzy C-Means Algorithm is discussed in [14]. In this paper, based on three generally defined shot types, goal events are detected by combining heuristic rules with unsupervised fuzzy c-means algorithm. First heuristic rule based primary selection/filtering for potential goals is carried out in the shot layer which composed of three generally defined shot types, together with the number of frames within each shot is recorded as representative feature. Then to further classify goal events from other potential goals unsupervised fuzzy c-means (FCM) algorithm is adopted. The main contribution of this work is the combination of heuristic rule which is based on three generally defined shot types with unsupervised fuzzy c-means algorithm. And when defuzzified with prior knowledge of the number of goals within each match, accurate and robust results can be achieved over five half matches from different series produced by different broadcast stations. A Hybrid Method for Soccer Video Events Retrieval Using Fuzzy Systems is discussed in [15]. The new method here uses human knowledge directly and in a very efficient way by a fuzzy rule base. The presented structure allows the system to process based on soccer video shots available in the database. The first phase is devoted to extracting shots from each video and making a list of features extracted from each shot. Then a fuzzy system is used to
Computer Science and Engineering, BEC, Bagalkot Page 10
An Algorithm for Event Retrieval from a Sports Video
eliminate shots including insignificant events. Finally shots are classified and associated with predefined classed using a SVM. Then shots related to the class associated with the user query are provided as an answer to that query. The user may make queries on different events and concepts such as occurrence of penalties, corners or goals or team attacks throughout the DB.
3. ISSUES IN EVENT RETRIEVAL
Issues involved in event detection and retrieval are as follows: Automatically extracting interesting segments from the input
video.
Most of the event detection methods in sports video are based
on visual features, so both audio and video features are needed to be considered.
Best way to enable the system to interpret and understand
the user request semantically.
The way to reach the relationship between primitive and
semantic features.
Automatic indexing the multimedia contents by reducing
human intervention.
Computer Science and Engineering, BEC, Bagalkot Page 11
An Algorithm for Event Retrieval from a Sports Video
A customizable framework for video indexing is required that can index the video by using the modalities according to the user preferences.
Need for a system that can extract, analyze and model the
multiple semantics from the videos images by using these primitive features. Intelligent system having capabilities of extracting higher semantics and user behavior.
4. PROBLEM DEFINITION
To design and develop a methodologies for event detection and retrieval from a soccer video, which employs human expert knowledge embedded into a fuzzy rule. The system aims to classify events in video as relevant and irrelevant and further system works towards retrieval of relevant events. The relevance of the event is defined as per the user needs.
Computer Science and Engineering, BEC, Bagalkot Page 12
An Algorithm for Event Retrieval from a Sports Video
An Algorithm for Event Retrieval from a Sports Video.
An Algorithm for event retrieval from a sports video is proposed here. The method here uses human knowledge directly and in a very efficient way by a fuzzy rule base. The presented structure allows the system to process based on soccer video shots available in the database. The first phase is devoted to extracting shots from
Computer Science and Engineering, BEC, Bagalkot Page 13
An Algorithm for Event Retrieval from a Sports Video
each video and making a list of features extracted from each shot. Then a fuzzy system is used to eliminate shots including insignificant events. Finally shots are classified and associated with predefined classed using a SVM. Then shots related to the class associated with the user query are provided as an answer to that query. The user may make queries on different events and concepts such as occurrence of penalties, corners or goals or team attacks throughout the Data base. The proposed solution model is given below:-
PROPOSED SOLUTION MODEL
Most information retrieval systems make indirect use of human knowledge in their retrieval process (for example in the query by example, all of instance are a priori knowledge and in Semantic Based retrieval a meaningful description as priori knowledge should be added to the visual models manually.) The new method presenting here uses human knowledge directly and in a very efficient way by a fuzzy rule base. The presented structure allows the system to process based on soccer video shots available in the database. The first phase is devoted to extracting shots from each video and making a list of features extracted from each shot. Then a fuzzy system is used to eliminate shots including insignificant events. Finally shots are classified and associated with predefined classed using a SVM. Then shots related to the class associated with the user query are provided as an answer to that query. The user may make queries on different events and concepts such as occurrence of penalties, corners or goals or team attacks throughout the DB. The overall structure is provided in Figure
Computer Science and Engineering, BEC, Bagalkot Page 14
An Algorithm for Event Retrieval from a Sports Video
DB Input U s e r I n t e r f a c e
Events classifier
Low level/semantic features extractor
Query Analyze
Video Data
Meta DB
Loader
Figure.1 The overall proposed structure.
As can been seen each video is processed before entering the DB and its low level features and conceptual properties are extracted, and once the events are classified all collected information is stored along with all data provided by the user (such as names of teams, time and place of the match, etc.) inside the Meta DB, while the video itself is stored in another location. Indeed the assumed Shot boundary database is of Multimedia linked Meta database type. Figure 2
Feature Extraction demonstrates feature extraction; Shot filtering using fuzzy system Detection Grass Detection
and event classification stages
Special view detection Shot midlevel classification Slow- mo reply detection
Fuzzy Inference System Shot filtering Computer Science and Engineering, BEC, Bagalkot Event classifier Page 15
Rule Base
Video
Semantic Description
Meta Data
An Algorithm for Event Retrieval from a Sports Video
th
Figure.2 Proposed structure in detail
CONCLUSION
The amount of accessible video information has been increasing rapidly. People quickly myriad data as it is voluminous and hence it is very time consuming. Thus the automatic semantic video event data is very useful. The proposed method uses human
Computer Science and Engineering, BEC, Bagalkot Page 16
An Algorithm for Event Retrieval from a Sports Video
expert knowledge for soccer video events retrieval by fuzzy systems. After feature extraction, each shot is given a degree of significance by the fuzzy system, thereby significantly reducing time complexity of video processing. By defining a threshold on the output of the fuzzy system, the useful and none-useful shots are separated for the purpose of event classification. Support vector machines are then applied for final classification of video information.
REFERENCES
Computer Science and Engineering, BEC, Bagalkot Page 17
An Algorithm for Event Retrieval from a Sports Video
[1] SUN Xing-hua and YANG Jing-yu, Inference and retrieval of
soccer event, IEEE 2007.
[2] LIU Huayong, Content-based Tv sports video retrieval based on
audio-visual features and text information, IEEE 2004.
[3] Shu-Ching Chen, Mei-Ling Shyu and
Na Zhao, An Enhanced retrieval using temporal
query
model
for
soccer
video
relationships, IEEE 2005.
[4] Mohammed A. Refaey, Khaled M. Elsayed, Sanaa M. Hanaf and
Larry S. Davis, Concurrent transition and shot detection in football videos usingfuzzy logic , IEEE 2009.
[5] Michael
Fleischman,
Humberto
Evans
and
Deb
Roy,
Unsupervised content-based indexing for sports video retrieval, IEEE 2007.
[6] Huang-Chia Shih and Chung-Lin Huang , Content-based scalable
sports video retrieval system, IEEE 2005.
[7] Guangsheng Zhao , Event-based soccer video retrieval with
interactive genetic algorithm, IEEE 2008.
[8] HUA-YONG LIU and HUI ZHANG, A Sports video browsing and
retrieval system based on multimodal analysis: Sportsbr, IEEE 2005.
[9] Liu Huayong and Zhang Hui, A Content-based broadcasted
sports video retrieval system using multiple modalities: Sportbr IEEE 2005.
[10]
Tung Thanh Pham, Tuyet Thi Trinh, Viet Hoai Vo, Ngoc Quoc
Ly and Duc Anh Duong, Event retrieval in soccer video from coarse to fine based on multi-modal approach ,IEEE 2010. [11] YU Jun-qing, HE Yun-feng, SUN Kai, Wang Zhi-fang and WU Xiang-mei , Semantic analysis and retrieval of sports video , IEEE 2006.
[12]
Dr. Sunitha Abburu , Semantic segmentation and event Journal of Computer Science and Network
detection in sports video using rule based approach, IJCSNS International Security,2010.
Computer Science and Engineering, BEC, Bagalkot Page 18
An Algorithm for Event Retrieval from a Sports Video
[13]
Chen Jianyun, Li Yunhao, Wu Lingda and Lao Songyang, using Bayesian Network. Proceedings of 2004
Semantic Event Detection in Soccer video by Integrating Multifeatures International Symposium on Intelligent Multimedia, Video and Speech Processing, 2004.
[14]
Yina Han, Guizhong Liu and Gerard Chollet., Goal event c-means algorithm 2008 10th Intl.
detection in broadcast soccer videos by combining heuristic rules with unsupervised fuzzy Vietnam. [15] Ehsan Lotfi, M.-R. Akbarzadeh-T, H.R. Pourreza, Mona Yaghoubi and F. Dehghan, A Hybrid Method for Soccer Video Events Retrieval Using Fuzzy Systems, IEEE 2009. Conf. on Control, Automation, Robotics and Vision Hanoi,
Computer Science and Engineering, BEC, Bagalkot Page 19