Getting a base64 error when executing a UDF on a serverless cluster

Replace UNBASE64 with TRY_TO_BINARY(uid, 'BASE64') in the UDF.

Last published at: May 27th, 2025

Problem

When you try to migrate a PySpark user-defined function (UDF) to a Unity Catalog SQL UDF and execute it on a serverless cluster, the query fails with a runtime exception.

Job aborted due to stage failure: java.lang.IllegalArgumentException: Invalid base64 input: Check end padding and character grouping.

You notice the same query works correctly on all-purpose compute clusters.

Cause

The serverless cluster’s Photon engine has stricter base64 input validation.

Specifically, the UNBASE64 function inside the UDF raises an exception when it encounters malformed base64 strings, whereas the standard Apache Spark engine on all-purpose clusters silently returns NULL for such cases.

Solution

Update the UDF to use TRY_TO_BINARY(uid, 'BASE64') instead of UNBASE64(uid).

The TRY_TO_BINARY function handles malformed base64 inputs by returning NULL instead of throwing an exception, making the UDF compatible with both all-purpose and serverless clusters.

Additionally, you can use the following code to identify malformed data beforehand to proactively detect and address problematic values in a dataset.

SELECT uid FROM table WHERE uid IS NOT NULL AND TRY_TO_BINARY(uid, 'BASE64') IS NULL;

Databricks Help Center

Problem

Cause

Solution

Contact Us