Flink源码解读(二)：JobGraph源码解读

原创

已于 2022-05-04 16:37:13 修改 · 1.1k 阅读

2 ·

CC 4.0 BY-SA版权

文章标签：

#flink

于 2021-10-07 09:52:02 首次发布

JobGraph核心对象

1、JobVertex

2、JobEdge

3、IntermediateDataSet

JobGraph源码解读

上回说到，StreamGraph的源码其中是在客户端生成，并且是生成Node节点和Edge，主要是通过StreamAPI生成，表示拓扑结构，这次给大家讲讲JobGraph的生成(以Yarn集群模式)。

首先，JobGraph是基于StreamGraph进行优化(包括设置Checkpoint、slot分组策略，内存占比等)，最主要是将多个符合条件的StreamNode链接chain在一起作为一个节点，减少数据在节点之间的流动所需要的序列化、反序列化、传输的消耗。

简单讲一下JobGraph的过程，将符合条件的Operator算子组合成ChainableOperator，生成对应的JobVertex、InermediateDataSet和JobEdge等，并且通过JobEdge连接上IntermediateDataSet和JobVertex，这里只是生成粗粒度的用户代码逻辑结构(如数据结构)，真正的数据是在后续生成Task时构造的ResultSubPartition和InputGate才会交互用户的物理数据。

JobGraph核心对象

1、JobVertex

在StreamGraph中，每个算子对应一个StreamNode。在JobGraph中，符合条件的多个StreamNode会合并成一个JobVertex，即一个JobVertex包含一个或多个算子。

2、JobEdge

在StreamGraph中，StreamNode之间的连接使用StreamEdge表示，而在JobGraph中，JobVertex之间的连接使用JobEdge表示。JobEdge相当于JobGraph中的数据流转通道，上游数据是IntermediateDataSet，IntermediateDataSet是JobEdge的输入数据集，下游消费者是JobVertex。

JobEdge存储了目标JobVertex信息，没有源JobVertex信息，但是存储了源IntermediateDataSet。

3、IntermediateDataSet

IntermediateDataSet是由一个算子、源或任何中间操作产生的数据集，用于表示JobVertex的输出。

JobGraph生成过程

JobGraph的生成入口是StreamingJobGraphGenerator.createJobGraph(this, jobID)，最终调用StreamingJobGraphGenerator.createJobGraph()。

入口函数

入口函数调用的过程：executeAsync(生成YarnJobClusterExecutorFactory)->execute(生成JobGraph，并向集群发布部署任务)->getJobGraph(根据Pipeline类型生成离线planTranslator或者实时的streamGraphTranslator)->createJobGraph(生成StreamingJobGraphGenerator实例并创建JobGraph)

@Internal
	public JobClient executeAsync(StreamGraph streamGraph) throws Exception {
		checkNotNull(streamGraph, "StreamGraph cannot be null.");
		checkNotNull(configuration.get(DeploymentOptions.TARGET), "No execution.target specified in your configuration file.");
 
		//调用DefaultExecutorServiceLoader生成YarnJobClusterExecutorFactory
		final PipelineExecutorFactory executorFactory =
			executorServiceLoader.getExecutorFactory(configuration);
 
		checkNotNull(
			executorFactory,
			"Cannot find compatible factory for specified execution.target (=%s)",
			configuration.get(DeploymentOptions.TARGET));
 
		//生成YarnJobClusterExecutor调用生成JobGraph后向集群提交任务资源申请
		CompletableFuture<JobClient> jobClientFuture = executorFactory
			.getExecutor(configuration) //new YarnJobClusterExecutor
			.execute(streamGraph, configuration, userClassloader);
 
            ........
}
 
@Override
	public CompletableFuture<JobClient> execute(@Nonnull final Pipeline pipeline, @Nonnull final Configuration configuration, @Nonnull final ClassLoader userCodeClassloader) throws Exception {
		//生成JobGraph
		final JobGraph jobGraph = PipelineExecutorUtils.getJobGraph(pipeline, configuration);
 
		try (final ClusterDescriptor<ClusterID> clusterDescriptor = clusterClientFactory.createClusterDescriptor(configuration)) {
			final ExecutionConfigAccessor configAccessor = ExecutionConfigAccessor.fromConfiguration(configuration);
 
			final ClusterSpecification clusterSpecification = clusterClientFactory.getClusterSpecification(configuration);
 
			//开始向集群发布部署任务
			final ClusterClientProvider<ClusterID> clusterClientProvider = clusterDescriptor
					.deployJobCluster(clusterSpecification, jobGraph, configAccessor.getDetachedMode());
			LOG.info("Job has been submitted with JobID " + jobGraph.getJobID());
 
			//启动异步可回调线程，返会完成的部署任务
			return CompletableFuture.completedFuture(
					new ClusterClientJobClientAdapter<>(clusterClientProvider, jobGraph.getJobID(), userCodeClassloader));
		}
	}
 
 
public static JobGraph getJobGraph(
			Pipeline pipeline,
			Configuration optimizerConfiguration,
			int defaultParallelism) {
 
		//根据Pipeline类型生成离线planTranslator或者实时的streamGraphTranslator
		FlinkPipelineTranslator pipelineTranslator = getPipelineTranslator(pipeline);
 
		return pipelineTranslator.translateToJobGraph(pipeline,
				optimizerConfiguration,
				defaultParallelism);
	}
 
//生成StreamingJobGraphGenerator实例并创建JobGraph并
	public static JobGraph createJobGraph(StreamGraph streamGraph, @Nullable JobID jobID) {
		
		return new StreamingJobGraphGenerator(streamGraph, jobID).createJobGraph();
	}

createJobGraph函数

在StreamingJobGraphGenerator生成器当中，基本上所有的成员变量都是为了辅助生成最终的JobGraph。

其中createJobGraph函数的过程：首先为所有节点都生成一个唯一的hash id，这个哈希函数可以用户进行自己定义，如果节点在多次提交中没有改变(如组、并发度、上下游关系等)，那么这个hash id就不会改变，这个主要是用于故障恢复。然后在chaining处理、生成JobVetex、JobEdge等，之后就是写入各种配置信息例如缓存、checkpoints等。

public class StreamingJobGraphGenerator {
  private StreamGraph streamGraph;
  private JobGraph jobGraph;
  // id -> JobVertex
  private Map<Integer, JobVertex> jobVertices;
  // 已经构建的JobVertex的id集合
  private Collection<Integer> builtVertices;
  // 物理边集合（排除了chain内部的边）, 按创建顺序排序
  private List<StreamEdge> physicalEdgesInOrder;
  // 保存chain信息，部署时用来构建 OperatorChain，startNodeId -> (currentNodeId -> StreamConfig)
  private Map<Integer, Map<Integer, StreamConfig>> chainedConfigs;
  // 所有节点的配置信息，id -> StreamConfig
  private Map<Integer, StreamConfig> vertexConfigs;
  /