大家好,我是雷恩Layne,这是《深入浅出flink》系列的第十五篇文章,希望能对您有所收获O(∩_∩)O
通常,我们通过dataStream.timeWindow或在dataStream.window传入一个窗口分配器来定义一个窗口。那么,第一个窗口的起点是什么,窗口的偏移量又是什么?假如,将第一个窗口的起点记为start,窗口大小为size,那么第一个窗口的范围是[start,start+size)
,后面的窗口范围也可以定下来,在flink中是怎么算的呢?
源码分析
以滚动时间窗口为例,我们来看看dataStream.timeWindow
的源码
public WindowedStream<T, KEY, TimeWindow> timeWindow(Time size) {
if (environment.getStreamTimeCharacteristic() == TimeCharacteristic.ProcessingTime) {
return window(TumblingProcessingTimeWindows.of(size));
} else {
return window(TumblingEventTimeWindows.of(size));
}
}
生产环境下,一般使用TimeCharacteristic.EventTime
语义,所以上面代码通过TumblingEventTimeWindows.of(size)
返回一个TumblingEventTimeWindows(滚动时间窗口),窗口的起点和偏移量就在TumblingEventTimeWindows类中定义的。
TumblingEventTimeWindows中的assignWindows方法定义了窗口范围的计算公式:
@Override
public Collection<TimeWindow> assignWindows(Object element, long timestamp, WindowAssignerContext context) {
if (timestamp > Long.MIN_VALUE) {
// Long.MIN_VALUE is currently assigned when no timestamp is present
long start = TimeWindow.getWindowStartWithOffset(timestamp, offset, size);
return Collections.singletonList(new TimeWindow(start, start + size));
} else {
throw new RuntimeException("Record has Long.MIN_VALUE timestamp (= no timestamp marker). " +
"Is the time characteristic set to 'ProcessingTime', or did you forget to call " +
"'DataStream.assignTimestampsAndWatermarks(...)'?");
}
}
assignWindows用来分配Flink流处理中多个窗口的起止点,起点start计算如下:
long start = TimeWindow.getWindowStartWithOffset(timestamp, offset