Lyndsey C. Pickup, Zheng Pan, Donglai Wei, Yichang Shih, Changshui Zhang, Andrew Zisserman, Bernhard Schölkopf and William T. Freeman
Overview
We explore whether we can observe Time's Arrow in a temporal sequence -- is it possible to tell whether a video is running forwards or backwards?
We developed three methods based on machine learning and image statistics, and evaluated these methods on a video dataset collected by us from YouTube.
Video Dataset
We collected 180 high-quality videos, and selected a 6-10 second clip from each. The dataset contains 155 forward sequences and 25 intentionally backward sequences (ie the play direction of the video goes backwards in time). The full dataset can be downloaded from the Arrow project data page.






Top and bottom rows: frames from two sequences from our dataset of 180
Method #1: Flow words
Videos are described by SIFT-like ''Flow-Words'', based on optical flow instead of image edges. We learn a dictionary of 4000 different Flow-Words from the YouTube data, then use a bag-of-words approach to training and testing using a balanced versions of the YouTube dataset (where all videos appear forward and backward in time). This method is our most successful, achiving 75%-90% classification accuracy in three-fold cross validation. Chance would be 50% accuracy.

Construction of the Flow-Words features. Top: pair of frames at times t-1 and t+1, warped into the coordinate frame of the intervening image. Left: vertical component of optic flow between this pair of frames; lower copy shows the same with the small SIFT-like descriptor grids overlaid. Right: expanded view of the SIFT-like descriptors shown left. Not shown: horizontal components of optic flow which are also required in constructing the descriptors.
Method #2: Motion causality
This method exploits the fact that it it smore common for one motion to cause several others than it is for several motions to combine into one smooth motion. For instance, one ball hitting a stack of stationary balls will probably cause several of the stationary ones to roll off in different directions. Using a method based on this cue, we achieve a classification accuracy of around 70% on the YouTube dataset.

Three frames from one of the Tennis-ball dataset sequences, in which a ball is rolled into a stack of static balls. Bottom row: regions of motion, identified using only the frames at t and t-1. Notice that the two rolling balls are identified as separate regions of motion, and coloured separately in the bottom rightmost plot. The fact that one rolling ball (first frame) causes two balls to end up rolling (last frame) is what the motion-causation method aims to detect and use.
Method #3: Auto-regressive model
If object motion is linear, then the current velocity of the object should be affected only by the past. Noise on this motion will be asymmetric in the forward and backward directions, and fitting an auto-regressive model to the linear motion ought to yeild independence between the noise and signal only in the forwards-time direction. This method attempts to find the forward direction by looking at the independence of AR fitting error on motion trajectories.

Top: tracked points from a sequence, and an example track. Bottom: Forward-time (left) and backward-time (right) vertical trajectory components, and the corresponding model residuals. Trajectories should be independent from model residuals (noise) in the forward-time direction only. For the example track shown, p-values for the forward and backward directions are 0.52 and 0.016 respectively, indicating that forwards time is more likely.
Publication
Acknowledgements
Funding was provided in the UK by the EPSRC, ERC grant VisRec no. 228180, in China by 973 Program (2013CB329503), NSFC Grant no. 91120301, and in the US by ONR MURI grant N00014-09-1-1051 and NSF CGV-1111415.