Convolutional Gated Recurrent Networks for Video Segmentation

Siam, Mennatullah; Valipour, Sepehr; Jagersand, Martin; Ray, Nilanjan

Computer Science > Computer Vision and Pattern Recognition

arXiv:1611.05435 (cs)

[Submitted on 16 Nov 2016 (v1), last revised 21 Nov 2016 (this version, v2)]

Title:Convolutional Gated Recurrent Networks for Video Segmentation

Authors:Mennatullah Siam, Sepehr Valipour, Martin Jagersand, Nilanjan Ray

View PDF

Abstract:Semantic segmentation has recently witnessed major progress, where fully convolutional neural networks have shown to perform well. However, most of the previous work focused on improving single image segmentation. To our knowledge, no prior work has made use of temporal video information in a recurrent network. In this paper, we introduce a novel approach to implicitly utilize temporal data in videos for online semantic segmentation. The method relies on a fully convolutional network that is embedded into a gated recurrent architecture. This design receives a sequence of consecutive video frames and outputs the segmentation of the last frame. Convolutional gated recurrent networks are used for the recurrent part to preserve spatial connectivities in the image. Our proposed method can be applied in both online and batch segmentation. This architecture is tested for both binary and semantic video segmentation tasks. Experiments are conducted on the recent benchmarks in SegTrack V2, Davis, CityScapes, and Synthia. Using recurrent fully convolutional networks improved the baseline network performance in all of our experiments. Namely, 5% and 3% improvement of F-measure in SegTrack2 and Davis respectively, 5.7% improvement in mean IoU in Synthia and 3.5% improvement in categorical mean IoU in CityScapes. The performance of the RFCN network depends on its baseline fully convolutional network. Thus RFCN architecture can be seen as a method to improve its baseline segmentation network by exploiting spatiotemporal information in videos.

Comments:	arXiv admin note: text overlap with arXiv:1606.00487
Subjects:	Computer Vision and Pattern Recognition (cs.CV)
Cite as:	arXiv:1611.05435 [cs.CV]
	(or arXiv:1611.05435v2 [cs.CV] for this version)
	https://doi.org/10.48550/arXiv.1611.05435

Submission history

From: Mennatullah Siam M.S. [view email]
[v1] Wed, 16 Nov 2016 20:46:38 UTC (5,840 KB)
[v2] Mon, 21 Nov 2016 21:47:17 UTC (5,840 KB)

Computer Science > Computer Vision and Pattern Recognition

Title:Convolutional Gated Recurrent Networks for Video Segmentation

Submission history

Access Paper:

Current browse context:

References & Citations

DBLP - CS Bibliography

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Computer Science > Computer Vision and Pattern Recognition

Title:Convolutional Gated Recurrent Networks for Video Segmentation

Submission history

Access Paper:

Current browse context:

References & Citations

DBLP - CS Bibliography

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators