Problem
When you try to migrate a PySpark user-defined function (UDF) to a Unity Catalog SQL UDF and execute it on a serverless cluster, the query fails with a runtime exception.
Job aborted due to stage failure: java.lang.IllegalArgumentException: Invalid base64 input: Check end padding and character grouping.
You notice the same query works correctly on all-purpose compute clusters.
Cause
The serverless cluster’s Photon engine has stricter base64 input validation.
Specifically, the UNBASE64
function inside the UDF raises an exception when it encounters malformed base64 strings, whereas the standard Apache Spark engine on all-purpose clusters silently returns NULL
for such cases.
Solution
Update the UDF to use TRY_TO_BINARY(uid, 'BASE64')
instead of UNBASE64(uid)
.
The TRY_TO_BINARY
function handles malformed base64 inputs by returning NULL
instead of throwing an exception, making the UDF compatible with both all-purpose and serverless clusters.
Additionally, you can use the following code to identify malformed data beforehand to proactively detect and address problematic values in a dataset.
SELECT uid FROM table WHERE uid IS NOT NULL AND TRY_TO_BINARY(uid, 'BASE64') IS NULL;