Minion Batch ingestion scheduling bottleneck

Hello, I've tried to debug why scheduling `SegmentGenerationAndPushTask` Minion jobs take so long to schedule and I've narrowed it down the problem to this part of the code. 

https://github.com/apache/pinot/blob/78308da90debe5b2fc958750a9a11acc3b9a9e8e/pinot-controller/src/main/java/org/apache/pinot/controller/helix/core/minion/PinotHelixTaskResourceManager.java#L297-L309

I'm currently use `POST /tasks/execute` API to schedule the job.
The culprit seems to be the while loop waiting for the task to get a state. I'm not familiar on how helix handles this in the background. Do you think it would be possible to avoid looping on `synchronized getTaskState()` and maybe implement a callback to get the result of a job scheduling.
This is a big deal for us since scheduling takes more than ingestion and doesn't allow to keep up with new data and scale. 
It might also be a misconfiguration problem but in this case I will need your help to find it.

Current configuration:
GKE
version 0.12.1
GCS for deep storage
3 ZK - 8 CPU and 18GB ram
6 Servers - 16CPU and 32 64GB ram 1.45TB SSD
2 Controllers - 16 CPU and 32GB ram
2 Brokers - 5 CPU 16.25GB ram
32 Minions - 2 CPU and 2GB of ram

1M Segments 4TB of data



	JobConfig.Builder jobBuilder =
	new JobConfig.Builder().addTaskConfigs(helixTaskConfigs).setInstanceGroupTag(minionInstanceTag)
	.setTimeoutPerTask(taskTimeoutMs).setNumConcurrentTasksPerInstance(numConcurrentTasksPerInstance)
	.setIgnoreDependentJobFailure(true).setMaxAttemptsPerTask(1).setFailureThreshold(Integer.MAX_VALUE)
	.setExpiry(_taskExpireTimeMs);
	_taskDriver.enqueueJob(getHelixJobQueueName(taskType), parentTaskName, jobBuilder);

	// Wait until task state is available
	while (getTaskState(parentTaskName) == null) {
	Uninterruptibles.sleepUninterruptibly(100, TimeUnit.MILLISECONDS);
	}

	return parentTaskName;

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Minion Batch ingestion scheduling bottleneck #11282

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Minion Batch ingestion scheduling bottleneck #11282

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions