Skip to content

Fix ext function caching #1902

@Flyangz

Description

@Flyangz

Describe the bug
After upgrading to DataFusion 49 #1154 , the expression cache (cached_exprs_evaluator) fails to deduplicate identical extension functions.
This is because DataFusion's SimpleScalarUDF::equals now enforces pointer equality (Arc::ptr_eq) for function implementations (DataFusion PR #16781). Auron currently creates a new Arc for every UDF instance (native-engine/datafusion-ext-functions/src/lib.rs), causing logically identical functions to have different memory addresses.

To Reproduce
Run the following query with logging the dups in cached_exprs_evaluator.rs shows 0 duplicates found.

test("my test") {
    withTable("my_table") {
      sql("""
            |create table my_cache_table using parquet as
            |select col1 from values (''{"a":"1", "b":"2"}'), ('{"a":"3", "b":"4"}'), ('{"a":"5", "b":"6"}')
            |""".stripMargin)
      sql("""
            |select 
            |       get_json_object(col1, '$.a'),
            |       get_json_object(col1, '$.b')
            |from my_cache_table
            |""".stripMargin).show()
    }
  }

Expected behavior

Screenshots

Additional context

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions