Implementing a reconciler

Reconciliation Execution in a Nutshell

An event always triggers reconciliation execution. Events typically come from a primary resource, usually a custom resource, triggered by changes made to that resource on the server (e.g. a resource is created, updated, or deleted) or from secondary resources for which there is a registered event source. Reconciler implementations are associated with a given resource type and listen for such events from the Kubernetes API server so that they can appropriately react to them. It is, however, possible for secondary sources to trigger the reconciliation process. This occurs via the event source mechanism.

When we receive an event, it triggers the reconciliation unless a reconciliation is already underway for this particular resource. In other words, the framework guarantees that no concurrent reconciliation happens for a resource.

Once the reconciliation is done, the framework checks if:

  • an exception was thrown during execution, and if yes, schedules a retry.
  • new events were received during the controller execution; if yes, schedule a new reconciliation.
  • the reconciler results explicitly re-scheduled (UpdateControl.rescheduleAfter(..)) a reconciliation with a time delay, if yes, schedules a timer event with the specific delay.
  • if none of the above applies, the reconciliation is finished.

In summary, the core of the SDK is implemented as an eventing system where events trigger reconciliation requests.

Implementing a Reconciler and Cleaner interfaces

To implement a reconciler, you always have to implement the Reconciler interface.

The lifecycle of a Kubernetes resource can be separated into two phases depending on whether the resource has already been marked for deletion or not.

The framework out of the box supports this logic, it will always call the reconcile method unless the custom resource is marked from deletion.

On the other hand, if the resource is marked from deletion and if the Reconciler implements the Cleaner interface, only the cleanup method is called. By implementing this interface the framework will automatically handle (add/remove) the finalizers for you.

In short, if you need to provide explicit cleanup logic, you always want to use finalizers; for a more detailed explanation, see Finalizer support for more details.

Using UpdateControl and DeleteControl

These two classes control the outcome or the desired behavior after the reconciliation.

The UpdateControl can instruct the framework to update the status sub-resource of the resource and/or re-schedule a reconciliation with a desired time delay:

  @Override
  public UpdateControl<MyCustomResource> reconcile(
     EventSourceTestCustomResource resource, Context context) {
    // omitted code
    
    return UpdateControl.patchStatus(resource).rescheduleAfter(10, TimeUnit.SECONDS);
  }

without an update:

  @Override
  public UpdateControl<MyCustomResource> reconcile(
     EventSourceTestCustomResource resource, Context context) {
    // omitted code
    
    return UpdateControl.<MyCustomResource>noUpdate().rescheduleAfter(10, TimeUnit.SECONDS);
  }

Note, though, that using EventSources is the preferred way of scheduling since the reconciliation is triggered only when a resource is changed, not on a timely basis.

At the end of the reconciliation, you typically update the status sub-resources. It is also possible to update both the status and the resource with the patchResourceAndStatus method. In this case, the resource is updated first followed by the status, using two separate requests to the Kubernetes API.

From v5 UpdateControl only supports patching the resources, by default using Server Side Apply (SSA). It is important to understand how SSA works in Kubernetes. Mainly, resources applied using SSA should contain only the fields identifying the resource and those the user is interested in (a ‘fully specified intent’ in Kubernetes parlance), thus usually using a resource created from scratch, see sample. To contrast, see the same sample, this time without SSA.

Non-SSA based patch is still supported.
You can control whether or not to use SSA using ConfigurationServcice.useSSAToPatchPrimaryResource() and the related ConfigurationServiceOverrider.withUseSSAToPatchPrimaryResource method. Related integration test can be found here.

Handling resources directly using the client, instead of delegating these updates operations to JOSDK by returning an UpdateControl at the end of your reconciliation, should work appropriately. However, we do recommend to use UpdateControl instead since JOSDK makes sure that the operations are handled properly, since there are subtleties to be aware of. For example, if you are using a finalizer, JOSDK makes sure to include it in your fully specified intent so that it is not unintentionally removed from the resource (which would happen if you omit it, since your controller is the designated manager for that field and Kubernetes interprets the finalizer being gone from the specified intent as a request for removal).

DeleteControl typically instructs the framework to remove the finalizer after the dependent resource are cleaned up in cleanup implementation.


public DeleteControl cleanup(MyCustomResource customResource,Context context){
        // omitted code
    
        return DeleteControl.defaultDelete();
        }

However, it is possible to instruct the SDK to not remove the finalizer, this allows to clean up the resources in a more asynchronous way, mostly for cases when there is a long waiting period after a delete operation is initiated. Note that in this case you might want to either schedule a timed event to make sure cleanup is executed again or use event sources to get notified about the state changes of the deleted resource.

Finalizer Support

Kubernetes finalizers make sure that your Reconciler gets a chance to act before a resource is actually deleted after it’s been marked for deletion. Without finalizers, the resource would be deleted directly by the Kubernetes server.

Depending on your use case, you might or might not need to use finalizers. In particular, if your operator doesn’t need to clean any state that would not be automatically managed by the Kubernetes cluster (e.g. external resources), you might not need to use finalizers. You should use the Kubernetes garbage collection mechanism as much as possible by setting owner references for your secondary resources so that the cluster can automatically delete them for you whenever the associated primary resource is deleted. Note that setting owner references is the responsibility of the Reconciler implementation, though dependent resources make that process easier.

If you do need to clean such a state, you need to use finalizers so that their presence will prevent the Kubernetes server from deleting the resource before your operator is ready to allow it. This allows for clean-up even if your operator was down when the resource was marked for deletion.

JOSDK makes cleaning resources in this fashion easier by taking care of managing finalizers automatically for you when needed. The only thing you need to do is let the SDK know that your operator is interested in cleaning the state associated with your primary resources by having it implement the Cleaner<P> interface. If your Reconciler doesn’t implement the Cleaner interface, the SDK will consider that you don’t need to perform any clean-up when resources are deleted and will, therefore, not activate finalizer support. In other words, finalizer support is added only if your Reconciler implements the Cleaner interface.

The framework automatically adds finalizers as the first step, thus after a resource is created but before the first reconciliation. The finalizer is added via a separate Kubernetes API call. As a result of this update, the finalizer will then be present on the resource. The reconciliation can then proceed as normal.

The automatically added finalizer will also be removed after the cleanup is executed on the reconciler. This behavior is customizable as explained above when we addressed the use of DeleteControl.

You can specify the name of the finalizer to use for your Reconciler using the @ControllerConfiguration annotation. If you do not specify a finalizer name, one will be automatically generated for you.

From v5, by default, the finalizer is added using Server Side Apply. See also UpdateControl in docs.

Making sure the primary resource is up to date for the next reconciliation

It is typical to want to update the status subresource with the information that is available during the reconciliation. This is sometimes referred to as the last observed state. When the primary resource is updated, though, the framework does not cache the resource directly, relying instead on the propagation of the update to the underlying informer’s cache. It can, therefore, happen that, if other events trigger other reconciliations before the informer cache gets updated, your reconciler does not see the latest version of the primary resource. While this might not typically be a problem in most cases, as caches eventually become consistent, depending on your reconciliation logic, you might still require the latest status version possible, for example if the status subresource is used as a communication mechanism, see Representing Allocated Values from the Kubernetes docs for more details.

The framework provides utilities to help with these use cases with PrimaryUpdateAndCacheUtils. These utility methods come in two flavors:

Using internal cache

In almost all cases for this purpose, you can use internal caches:

 @Override
public UpdateControl<StatusPatchCacheCustomResource> reconcile(
        StatusPatchCacheCustomResource resource, Context<StatusPatchCacheCustomResource> context) {
    
    // omitted logic
    
    // update with SSA requires a fresh copy
    var freshCopy = createFreshCopy(primary);
    freshCopy.getStatus().setValue(statusWithState());
    
    var updatedResource = PrimaryUpdateAndCacheUtils.ssaPatchAndCacheStatus(resource, freshCopy, context);
    
    return UpdateControl.noUpdate();
  }

In the background PrimaryUpdateAndCacheUtils.ssaPatchAndCacheStatus puts the result of the update into an internal cache and will make sure that the next reconciliation will contain the most recent version of the resource. Note that it is not necessarily the version of the resource you got as response from the update, it can be newer since other parties can do additional updates meanwhile, but if not explicitly modified, it will contain the up-to-date status.

See related integration test here.

This approach works with the default configuration of the framework and should be good to go in most of the cases. Without going further into the details, this won’t work if ConfigurationService.parseResourceVersionsForEventFilteringAndCaching is set to false (more precisely there are some edge cases when it won’t work). For that case framework provides the following solution:

Fallback approach: using PrimaryResourceCache cache

As an alternative, for very rare cases when ConfigurationService.parseResourceVersionsForEventFilteringAndCaching needs to be set to false you can use an explicit caching approach:


// We on purpose don't use the provided predicate to show what a custom one could look like.
  private final PrimaryResourceCache<StatusPatchPrimaryCacheCustomResource> cache =
      new PrimaryResourceCache<>(
          (statusPatchCacheCustomResourcePair, statusPatchCacheCustomResource) ->
              statusPatchCacheCustomResource.getStatus().getValue()
                  >= statusPatchCacheCustomResourcePair.afterUpdate().getStatus().getValue());

  @Override
  public UpdateControl<StatusPatchPrimaryCacheCustomResource> reconcile(
          StatusPatchPrimaryCacheCustomResource primary,
          Context<StatusPatchPrimaryCacheCustomResource> context) {
    
    // cache will compare the current and the cached resource and return the more recent. (And evict the old)   
    primary = cache.getFreshResource(primary);
      
    // omitted logic  
      
    var freshCopy = createFreshCopy(primary);
    
    freshCopy.getStatus().setValue(statusWithState());

    var updated =
            PrimaryUpdateAndCacheUtils.ssaPatchAndCacheStatus(primary, freshCopy, context, cache);
      
    return UpdateControl.noUpdate();
  }

  @Override
  public DeleteControl cleanup(
          StatusPatchPrimaryCacheCustomResource resource,
          Context<StatusPatchPrimaryCacheCustomResource> context)
          throws Exception {
    // cleanup the cache on resource deletion  
    cache.cleanup(resource);
    return DeleteControl.defaultDelete();
  }
  

PrimaryResourceCache is designed for this purpose. As shown in the example above, it is up to you to provide a predicate to determine if the resource is more recent than the one available. In other words, when to evict the resource from the cache. Typically, as shown in the integration test you can have a counter in status to check on that.

Since all of this happens explicitly, you cannot use this approach for managed dependent resources and workflows and will need to use the unmanaged approach instead. This is due to the fact that managed dependent resources always get their associated primary resource from the underlying informer event source cache.

Additional remarks

As shown in the integration tests, there is no optimistic locking used when updating the resource (in other words metadata.resourceVersion is set to null). This is desired since you don’t want the patch to fail on update.

In addition, you can configure the Fabric8 client retry.