DataStream API提供了一系列的Low-Level转换算子,可以访问时间戳、watermark以及注册定时事件,还可以输出特定的一些事件,例如超时事件等。
Process Function用来构建事件驱动的应用以及实现自定义的业务逻辑(使用之间的window函数和转换算子无法实现)
Flink提供了8个Process Function:
- ProcessFunction
- KeyedProcessFunction
- CoProcessFunction
- ProcessJoinFunction
- BroadcastProcessFunction
- KeyedBroadcastProcessFunction
- ProcessWindowFunction
- ProcessAllWindowFunction
1、KeyedPRocessFunction
KeyedProcessFunction用来操作KeyedStream
KeyedProcessFunction会处理流的每一个元素,输出为0个、1个或者多个元素。
所有的Process Function都继承自RichFunction接口,所以都有open()、close()和getRuntimeCOntext()等方法。
KeyedProcessFunction[KEY,IN,OUT]还额外提供了两个方法:
-
processElement(v: IN, ctx: Context, out: Collector[OUT]),
流中的每一个元素都会调用这个方法,调用结果将会放在Collector数据类型中输出。Context可以访问元素的时间戳,元素的key,以及TimerService时间服务。Context还可以将结果输出到别的流(side outputs)。 -
onTimer(timestamp: Long, ctx: OnTimerContext, out:
Collector[OUT])是一个回调函数。当之前注册的定时器触发时调用。参数timestamp为定时器所设定的触发的时间戳。Collector为输出结果的集合。OnTimerContext和processElement的Context参数一样,提供了上下文的一些信息,例如定时器触发的时间信息(事件时间或者处理时间)。
2、TimerService和定时器(Timers)
Context和OnTimerContext所持有的TImerService对象拥有一下方法:
- currentProcessingTime(): Long 返回当前处理时间
- currentWatermark(): Long 返回当前watermark的时间戳
- registerProcessingTimeTimer(timestamp: Long): Unit 会注册当前key的processing time的定时器。当processing time到达定时时间时,触发timer。
- registerEventTimeTimer(timestamp: Long): Unit 会注册当前key的event time
定时器。当水位线大于等于定时器注册的时间时,触发定时器执行回调函数。 - deleteProcessingTimeTimer(timestamp: Long): Unit
删除之前注册处理时间定时器。如果没有这个时间戳的定时器,则不执行。 - deleteEventTimeTimer(timestamp: Long): Unit
删除之前注册的事件时间定时器,如果没有此时间戳的定时器,则不执行。
实例如下:
实例1:
当定时器timer触发时,会执行回调函数onTimer()。
注意定时器timer只能在keyed streams上面使用。
需求:监控水位传感器的水位值,如果水位值在十秒之内(processing time)连续上升,则报警
代码如下:
package nj.zb.process
import nj.zb.source.WaterSensor
import org.apache.flink.api.common.state.{ValueState, ValueStateDescriptor}
import org.apache.flink.configuration.Configuration
import org.apache.flink.streaming.api.TimeCharacteristic
import org.apache.flink.streaming.api.functions.KeyedProcessFunction
import org.apache.flink.streaming.api.functions.timestamps.BoundedOutOfOrdernessTimestampExtractor
import org.apache.flink.streaming.api.scala._
import org.apache.flink.streaming.api.windowing.time.Time
import org.apache.flink.util.Collector
/**
* @Author Jalrs
* @Date 2021/1/4
* @Description Context和OnTimerContext所持有的TimerService对象拥有以下方法:
* currentProcessingTime(): Long 返回当前处理时间
* currentWatermark(): Long 返回当前watermark的时间戳
* registerProcessingTimeTimer(timestamp: Long): Unit 会注册当前key的processing time的定时器。当processing time到达定时时间时,触发timer。
* registerEventTimeTimer(timestamp: Long): Unit 会注册当前key的event time 定时器。当水位线大于等于定时器注册的时间时,触发定时器执行回调函数。
* deleteProcessingTimeTimer(timestamp: Long): Unit 删除之前注册处理时间定时器。如果没有这个时间戳的定时器,则不执行。
* deleteEventTimeTimer(timestamp: Long): Unit 删除之前注册的事件时间定时器,如果没有此时间戳的定时器,则不执行。
* 当定时器timer触发时,会执行回调函数onTimer()。
* 注意定时器timer只能在keyed streams上面使用。
* 需求:监控水位传感器的水位值,如果水位值在十秒之内(processing time)连续上升,则报警。
*/
object ProcessFunctionDemo {
def main(args: Array[String]): Unit = {
val env: StreamExecutionEnvironment = StreamExecutionEnvironment.getExecutionEnvironment
env.setStreamTimeCharacteristic(TimeCharacteristic.EventTime)
env.setParallelism(1)
val stream: DataStream[String] = env.socketTextStream("hadoop004", 7777)
val dataStream: DataStream[WaterSensor] = stream.map(data => {
val array: Array[String] = data.split(",")
WaterSensor(array(0).trim, array(1).trim.toLong, array(2).trim.toDouble)
}).assignTimestampsAndWatermarks(
new BoundedOutOfOrdernessTimestampExtractor[WaterSensor](Time.seconds(1)) {
override def extractTimestamp(t: WaterSensor): Long = {
t.ts * 1000
}
}
)
val processStream: DataStream[String] = dataStream.keyBy(_.id)
.process(new WaterLevelAlarm)
dataStream.print("data")
processStream.print("alarm:")
env.execute("keyedProcessFunction")
}
}
class WaterLevelAlarm extends KeyedProcessFunction[String, WaterSensor, String] {
private var waterHeightState: ValueState[Double] = _
private var currentTSState: ValueState[Long] = _
override def open(parameters: Configuration): Unit = {
waterHeightState = getRuntimeContext.getState(
new ValueStateDescriptor[Double]("waterHeight", classOf[Double])
)
currentTSState = getRuntimeContext.getState(
new ValueStateDescriptor[Long]("currentTS", classOf[Long])
)
}
override def processElement(value: WaterSensor, context: KeyedProcessFunction[String, WaterSensor, String]#Context, collector: Collector[String]): Unit = {
//获取上一条数据的水位线高度,用当前传入的数据value与之比较
val lastWaterHeight: Double = waterHeightState.value()
//获取当前注册的事件的时间戳,如果没有事件,返回0L
val currentTS: Long = currentTSState.value()
//如果当前传入的数据值比上一条水位线高度值大,注册报警事件
if (value.vc > lastWaterHeight && currentTS == 0) {
val timeTS: Long = context.timerService().currentProcessingTime() + 10000L
context.timerService().registerProcessingTimeTimer(timeTS)
currentTSState.update(timeTS)
} else if (value.vc <= lastWaterHeight || currentTS == 0) { //如果水位线下降,则解除报警事件
context.timerService().deleteProcessingTimeTimer(currentTS)
currentTSState.clear()
}
waterHeightState.update(value.vc)
}
override def onTimer(timestamp: Long,
ctx: KeyedProcessFunction[String, WaterSensor, String]#OnTimerContext,
out: Collector[String]): Unit = {
out.collect(ctx.getCurrentKey + "水位持续上涨,警告")
currentTSState.clear()
}
}
实例2:
需求:监控水位传感器的水位值,如果水位值变化(HeightChangeAlarm)超过10,则报警
代码如下:
package nj.zb.process
import nj.zb.source.WaterSensor
import org.apache.flink.api.common.functions.RichFlatMapFunction
import org.apache.flink.api.common.state.{ValueState, ValueStateDescriptor}
import org.apache.flink.configuration.Configuration
import org.apache.flink.streaming.api.TimeCharacteristic
import org.apache.flink.streaming.api.functions.KeyedProcessFunction
import org.apache.flink.streaming.api.functions.timestamps.BoundedOutOfOrdernessTimestampExtractor
import org.apache.flink.streaming.api.scala._
import org.apache.flink.streaming.api.windowing.time.Time
import org.apache.flink.util.Collector
/**
* @Author Jalrs
* @Date 2021/1/4
* @Description
*/
object ProcessFunctionDemo2 {
def main(args: Array[String]): Unit = {
val env: StreamExecutionEnvironment = StreamExecutionEnvironment.getExecutionEnvironment
env.setStreamTimeCharacteristic(TimeCharacteristic.EventTime)
env.setParallelism(1)
val stream: DataStream[String] = env.socketTextStream("hadoop004", 7777)
val dataStream: DataStream[WaterSensor] = stream.map(data => {
val array: Array[String] = data.split(",")
WaterSensor(array(0).trim, array(1).trim.toLong, array(2).trim.toDouble)
}).assignTimestampsAndWatermarks(
new BoundedOutOfOrdernessTimestampExtractor[WaterSensor](Time.seconds(1)) {
override def extractTimestamp(t: WaterSensor): Long = {
t.ts * 1000
}
}
)
val processStream: DataStream[(String, Double, Double)] = dataStream.keyBy(_.id).flatMap(new HeightChangeAlarm(10))
dataStream.print("data")
processStream.print("alarm:")
env.execute("keyedProcessFunction")
}
}
class HeightChangeAlarm(alarmValue: Double) extends RichFlatMapFunction[WaterSensor, (String, Double, Double)] {
var waterLevelState: ValueState[Double] = _
override def open(parameters: Configuration): Unit = {
waterLevelState = getRuntimeContext.
getState(new ValueStateDescriptor[Double](
"waterLevel", classOf[Double]))
}
override def flatMap(
in: WaterSensor,
collector: Collector[(String, Double, Double)]
): Unit = {
val lastValue: Double = waterLevelState.value()
val abs: Double = (lastValue - in.vc).abs
if (abs > alarmValue) {
collector.collect((in.id, lastValue, in.vc))
}
waterLevelState.update(in.vc)
}
}