This documentation is for an unreleased version of Apache Flink. We recommend you use the latest stable version.
Events #
Flink exposes a event reporting system that allows gathering and exposing events to external systems.
Reporting events #
You can access the event system from any user function that extends RichFunction by calling getRuntimeContext().getMetricGroup()
.
This method returns a MetricGroup
object via which you can report a new single event.
Reporting single Event #
An Event
represents something that happened in Flink at certain point of time, that will be reported to a TraceReporter
.
To report an Event
you can use the MetricGroup#addEvent(EventBuilder)
method.
public class MyClass {
void doSomething() {
// (...)
metricGroup.addEvent(
Event.builder(MyClass.class, "SomeEvent")
.setObservedTsMillis(observedTs) // Optional
.setAttribute("foo", "bar"); // Optional
}
}
Currently reporting Events from Python is not supported.
Reporter #
For information on how to set up Flink’s event reporters please take a look at the event reporters documentation.
System traces #
Flink reports events listed below.
The tables below generally feature 5 columns:
-
The “Scope” column describes what is that trace reported scope.
-
The “Name” column describes the name of the reported trace.
-
The “Attributes” column lists the names of all attributes that are reported with the given trace.
-
The “Description” column provides information as to what a given attribute is reporting.
Scope | Name | Severity | Attributes | Description |
---|---|---|---|---|
org.apache.flink.runtime.checkpoint.CheckpointStatsTracker | CheckpointEvent | INFO | ||
observedTs | Timestamp when the checkpoint has finished. | |||
checkpointId | Id of the checkpoint. | |||
checkpointedSize | Size in bytes of checkpointed state during this checkpoint. Might be smaller than fullSize if incremental checkpoints are used. | |||
fullSize | Full size in bytes of the referenced state by this checkpoint. Might be larger than checkpointSize if incremental checkpoints are used. | |||
checkpointStatus | What was the state of this checkpoint: FAILED or COMPLETED. | |||
checkpointType | Type of the checkpoint. For example: "Checkpoint", "Full Checkpoint" or "Terminate Savepoint" ... | |||
isUnaligned | Whether checkpoint was aligned or unaligned. | |||
org.apache.flink.runtime.jobmaster.JobMaster | JobStatusChangeEvent | INFO | ||
observedTs | Timestamp when the job's status has changed. | |||
newJobStatus | New job status that is being reported by this event. | |||
org.apache.flink.runtime.scheduler.adaptive.JobFailureMetricReporter | JobFailureEvent | INFO | ||
observedTs | Timestamp when the job has failed. | |||
canRestart | (optional) Whether the failure is terminal. | |||
isGlobalFailure | (optional) Whether the failure is global. Global failover requires all tasks to failver. | |||
failureLabel.KEY | (optional) For every failure label attached to this failure with a given KEY, the value of that label is attached as an attribute value. | |||
org.apache.flink.runtime.scheduler.metrics.AllSubTasksRunningOrFinishedStateTimeMetrics | AllSubtasksStatusChangeEvent(streaming jobs only) | INFO | ||
observedTs | Timestamp when all subtasks reached given status. | |||
status | ALL_RUNNING_OR_FINISHED means all subtasks are RUNNING or have already FINISHED | |||
NOT_ALL_RUNNING_OR_FINISHED means at least one subtask has switched away from RUNNING or FINISHED, after previously ALL_RUNNING_OR_FINISHED being reported |