extend ci oneDAL by Alexandr-Solovev · Pull Request #3584 · uxlfoundation/oneDAL

Alexandr-Solovev · 2026-03-28T12:52:49Z

Description

Checklist:

Completeness and readability

I have commented my code, particularly in hard-to-understand areas.
I have updated the documentation to reflect the changes or created a separate PR with updates and provided its number in the description, if necessary.
Git commit message contains an appropriate signed-off-by string (see CONTRIBUTING.md for details).
I have resolved any merge conflicts that might occur with the base branch.

Testing

I have run it locally and tested the changes extensively.
All CI jobs are green or I have provided justification why they aren't.
I have extended testing suite if new functionality was introduced in this PR.

Performance

I have measured performance for affected algorithms using scikit-learn_bench and provided at least a summary table with measured data, if performance change is expected.
I have provided justification why performance and/or quality metrics have changed or why changes are not expected.
I have extended the benchmarking suite and provided a corresponding scikit-learn_bench PR if new measurable functionality was introduced in this PR.

david-cortes-intel · 2026-04-01T08:08:10Z

From the failed CIs:

Failures in sklearnex tests are in the examples. The results do not match against the hard-coded expected predictions. Not sure what exactly changed here - perhaps oneDAL is being compiled with different instruction sets or something like that.
One of the failures in sklearn conformance tests is also happening in the main branch, but the others are new, and not expected, even though the differences aren't too large:

E       AssertionError: 
E       Not equal to tolerance rtol=0.0001, atol=0.0002
E       
E       Mismatched elements: 1 / 55 (1.82%)
E       Mismatch at index:
E        [28]: -1.4209840297698975 (ACTUAL), -1.3922417005233982 (DESIRED)
E       Max absolute difference among violations: 0.02874233
E       Max relative difference among violations: 0.02064464
E        ACTUAL: array([-1.020011, -1.080148, -1.103261, -1.345047, -1.057743, -1.229007,
E              -1.064617, -0.962323, -0.967014, -0.995485, -0.986337, -0.956837,
E              -0.97445 , -0.948463, -0.968412, -1.066067, -0.968412, -0.989617,...
E        DESIRED: array([-1.020011, -1.080148, -1.103261, -1.345047, -1.057743, -1.229007,
E              -1.064617, -0.962323, -0.967014, -0.995485, -0.986336, -0.956837,
E              -0.97445 , -0.948463, -0.968412, -1.066066, -0.968412, -0.989617,...

david-cortes-intel · 2026-04-09T09:26:26Z

CI error for the make job when it tries to import DPC (link):

CondaError: Run 'conda init' before 'conda activate'

@ethanglaser Any comments about what do about it?

For the failing C++ examples, looks like the runners don't support avx512:

14:35:45 PASSED		df_cls_default_dense_extratrees_batch
.ci/scripts/test.sh: line 276: 42821 Illegal instruction     (core dumped) ${run_command} > "${e}".res 2>&1

14:35:45 FAILED		df_cls_dense_batch_model_builder with errno 132

david-cortes-intel · 2026-04-14T06:55:45Z

Looks like the changes didn't solve the issue about results from examples not matching anymore:

__________________________ TestExNpyArray.test_em_gmm __________________________

self = <test_daal4py_examples.TestExNpyArray testMethod=test_em_gmm>

    @unittest.skipUnless(
        config.check_version(),
        f"Minimum required version {config.required_version} (have {daal_version})",
    )
    @unittest.skipUnless(missing_dep is None, f"Missing dependency: {missing_dep}")
    @unittest.skipIf(
        config.is_suspended(),
        f"Test was suspended for {config.suspended_for_n_days} days on {config.suspended_on}",
    )
    def run_test(self):
        start = time.process_time()
    
        ex = import_module_any_path(example_path / config.module_name)
    
        if not hasattr(ex, "main"):
            self.skipTest("Missing main function")
    
        result: Any = self.call_main(ex)
        if config.result_file_name and config.result_attribute:
            testdata = readcsv.np_read_csv(
                example_data_path / config.result_file_name
            )
            ra = config.result_attribute
            actual = ra(result) if callable(ra) else getattr(result, ra)
>           np.testing.assert_allclose(actual, testdata, atol=1e-05)
E           AssertionError: 
E           Not equal to tolerance rtol=1e-07, atol=1e-05
E           
E           Mismatched elements: 9 / 9 (100%)
E           First 5 mismatches are at indices:
E            [0, 0]: 33.96541597016111 (ACTUAL), 7.567869663238525 (DESIRED)
E            [0, 1]: -2.1664695368906 (ACTUAL), 1.3058382272720337 (DESIRED)
E            [0, 2]: -18.318028183740516 (ACTUAL), -0.06336316466331482 (DESIRED)
E            [1, 0]: -2.1664695368906 (ACTUAL), 1.3058382272720337 (DESIRED)
E            [1, 1]: 1.1060078371075024 (ACTUAL), 0.9793263673782349 (DESIRED)
E           Max absolute difference among violations: 26.39754631
E           Max relative difference among violations: 288.09585374
E            ACTUAL: array([[ 33.965416,  -2.16647 , -18.318028],
E                  [ -2.16647 ,   1.106008,   2.128947],
E                  [-18.318028,   2.128947,  14.218832]])
E            DESIRED: array([[ 7.56787 ,  1.305838, -0.063363],
E                  [ 1.305838,  0.979326,  0.636319],
E                  [-0.063363,  0.636319,  2.276593]], dtype=float32)

s/tests/test_daal4py_examples.py:145: AssertionError

@Alexandr-Solovev Could you give it a try without the new -O0 flag to see if that makes a difference?

david-cortes-intel · 2026-04-14T14:55:26Z

So looks like it was indeed the -O0 flag that resulted in the examples reaching different results. Curious as to how the differences ended up being that big in files like this (CC @Vika-F):
https://github.com/uxlfoundation/scikit-learn-intelex/blob/main/examples/daal4py/em_gmm.py

The other remaining test failures also occur in the sklearnex main repository and are not caused by the changes here.

Alexandr-Solovev · 2026-04-14T15:15:43Z

So looks like it was indeed the -O0 flag that resulted in the examples reaching different results. Curious as to how the differences ended up being that big in files like this (CC @Vika-F): https://github.com/uxlfoundation/scikit-learn-intelex/blob/main/examples/daal4py/em_gmm.py

The other remaining test failures also occur in the sklearnex main repository and are not caused by the changes here.

Do you know the estimation for fixing those tests?

david-cortes-intel · 2026-04-14T15:20:39Z

So looks like it was indeed the -O0 flag that resulted in the examples reaching different results. Curious as to how the differences ended up being that big in files like this (CC @Vika-F): https://github.com/uxlfoundation/scikit-learn-intelex/blob/main/examples/daal4py/em_gmm.py
The other remaining test failures also occur in the sklearnex main repository and are not caused by the changes here.

Do you know the estimation for fixing those tests?

Those aren't really tests, they are executing daal4py examples and checking the result against hard-coded numbers which were obtained from earlier runs. Some algorithms could produce slightly different results due to floating point inaccuracies, but in this case it's finding completely different cluster centers. We don't know if those are incorrect or not, but the optimization flag shouldn't make such large differences.

david-cortes-intel · 2026-04-14T15:21:39Z

So looks like it was indeed the -O0 flag that resulted in the examples reaching different results. Curious as to how the differences ended up being that big in files like this (CC @Vika-F): https://github.com/uxlfoundation/scikit-learn-intelex/blob/main/examples/daal4py/em_gmm.py
The other remaining test failures also occur in the sklearnex main repository and are not caused by the changes here.

Do you know the estimation for fixing those tests?

If you mean the failing tests that are actually tests, it will be very hard to fix, because they involve memory safety issues that only manifest under some particular environments and hardware. Perhaps @Vika-F could comment on that. Note that the issue is likely on the oneDAL side.

david-cortes-intel · 2026-04-15T06:52:25Z

@Alexandr-Solovev Is this PR meant to fix the failing step 'Check DPC module import' on windows?

@ethanglaser Any comments about what could be done about this error?

CondaError: Run 'conda init' before 'conda activate'

Alexandr-Solovev · 2026-04-15T07:10:31Z

@Alexandr-Solovev Is this PR meant to fix the failing step 'Check DPC module import' on windows?

@ethanglaser Any comments about what could be done about this error?
CondaError: Run 'conda init' before 'conda activate'

YEs, but for now I dont know whats the solution for win import dpc correct work

david-cortes-intel · 2026-04-16T11:45:51Z

New windows error after fixing the conda activation issue:

ModuleNotFoundError: No module named 'onedal._onedal_py_dpc'

(link)

As far as I can see, the onedal_py_dpc module is not even being built on those jobs:
https://dev.azure.com/daal/DAAL/_build/results?buildId=58463&view=logs&jobId=ee3ca9c7-09ef-5e6f-8aad-c1911e4a6a1a&j=ee3ca9c7-09ef-5e6f-8aad-c1911e4a6a1a&t=a323c936-ca0d-52a3-b3b1-104d24bffc7b

It doesn't appear to have any %NO_DPC% environment variable as far as I can see, but perhaps it could add a print to verify that it doesn't have such variable.

Alexandr-Solovev · 2026-04-16T16:09:00Z

New windows error after fixing the conda activation issue:
ModuleNotFoundError: No module named 'onedal._onedal_py_dpc'
(link)

As far as I can see, the onedal_py_dpc module is not even being built on those jobs: https://dev.azure.com/daal/DAAL/_build/results?buildId=58463&view=logs&jobId=ee3ca9c7-09ef-5e6f-8aad-c1911e4a6a1a&j=ee3ca9c7-09ef-5e6f-8aad-c1911e4a6a1a&t=a323c936-ca0d-52a3-b3b1-104d24bffc7b

It doesn't appear to have any %NO_DPC% environment variable as far as I can see, but perhaps it could add a print to verify that it doesn't have such variable.

I checked the sklearnex file https://github.com/uxlfoundation/scikit-learn-intelex/blob/e0a107a2fbc1bc8f0617e10ff4dea88711ce5275/.ci/pipeline/build-and-test-win.yml and looks like it does not include anything realted to dpc(build, examples). @ethanglaser Do you know why?

ethanglaser · 2026-04-16T17:13:08Z

New windows error after fixing the conda activation issue:
ModuleNotFoundError: No module named 'onedal._onedal_py_dpc'
(link)
As far as I can see, the onedal_py_dpc module is not even being built on those jobs: https://dev.azure.com/daal/DAAL/_build/results?buildId=58463&view=logs&jobId=ee3ca9c7-09ef-5e6f-8aad-c1911e4a6a1a&j=ee3ca9c7-09ef-5e6f-8aad-c1911e4a6a1a&t=a323c936-ca0d-52a3-b3b1-104d24bffc7b
It doesn't appear to have any %NO_DPC% environment variable as far as I can see, but perhaps it could add a print to verify that it doesn't have such variable.
I checked the sklearnex file https://github.com/uxlfoundation/scikit-learn-intelex/blob/e0a107a2fbc1bc8f0617e10ff4dea88711ce5275/.ci/pipeline/build-and-test-win.yml and looks like it does not include anything realted to dpc(build, examples). @ethanglaser Do you know why?

We do not build windows dpc in sklearnex azure pipelines. Only in github actions.

david-cortes-intel · 2026-04-17T06:38:21Z

@ethanglaser So what would we need to do here in order to run sklearnex DPC tests on windows?

Alexandr-Solovev · 2026-04-17T07:39:29Z

@ethanglaser So what would we need to do here in order to run sklearnex DPC tests on windows?

I believe either extend the current https://github.com/uxlfoundation/scikit-learn-intelex/blob/e0a107a2fbc1bc8f0617e10ff4dea88711ce5275/.ci/pipeline/build-and-test-win.yml on intelex side, or try to use https://github.com/uxlfoundation/scikit-learn-intelex/blob/main/.github/workflows/ci.yml#L178 for win. But not sure about github and azure compatibility

david-cortes-intel · 2026-04-17T09:06:06Z

@Alexandr-Solovev I think you would also need to add steps for DPC dependencies if you copy the yaml, like in here:
https://github.com/uxlfoundation/scikit-learn-intelex/blob/e0a107a2fbc1bc8f0617e10ff4dea88711ce5275/.github/workflows/ci.yml#L235

Alexandr-Solovev · 2026-04-17T09:27:11Z

/azp run

azure-pipelines · 2026-04-17T09:27:23Z

Azure Pipelines failed to run 1 pipeline(s).

ethanglaser · 2026-04-17T18:19:36Z

@ethanglaser So what would we need to do here in order to run sklearnex DPC tests on windows?

I believe either extend the current https://github.com/uxlfoundation/scikit-learn-intelex/blob/e0a107a2fbc1bc8f0617e10ff4dea88711ce5275/.ci/pipeline/build-and-test-win.yml on intelex side, or try to use https://github.com/uxlfoundation/scikit-learn-intelex/blob/main/.github/workflows/ci.yml#L178 for win. But not sure about github and azure compatibility

I suggest migrating the intended scope of additions from existing .ci/ (azure pipelines) to .github/ (github actions). We can set up a workflow that depends on existing Nightly-build github action and then runs sklearnex .github/workflows/ci.yml step using the result of oneDAL Nightly-build. This is the same way it functions in sklearnex, except that in sklearnex it takes the latest successful Nightly-build here and here from main branch - here it would just take the Nightly-build from the CI here.

ethanglaser · 2026-04-20T18:09:35Z

+          pip install $(python .ci/scripts/get_compatible_scipy_version.py ${{ matrix.SKLEARN_VERSION }}) pyyaml
+          if [ "${{ steps.set-env.outputs.DPCFLAG }}" == "" ]; then pip install dpctl==${{ env.DPCTL_VERSION }} dpnp==${{ env.DPNP_VERSION }}; fi
+          pip list
+      - name: Sklearnex testing


Is it possible to call github actions workflows similar to how you were initially doing for azure pipeline? Instead of having the same steps defined here as in sklearnex repo

david-cortes-intel · 2026-04-21T11:07:58Z

Test failure from the new windows jobs happens when it calls python -m ... from inside a python test.

Looks like that 'python' call might be calling an interpreter from a different environment, which doesn't have sklearnex installed:

____________ ERROR at setup of test_patching_svc_from_command_line ____________

request = <SubRequest 'patch_svc_from_command_line' for <Function test_patching_svc_from_command_line>>

    @pytest.fixture
    def patch_svc_from_command_line(request):
        err_code = subprocess.call(
            [sys.executable, "-m", "sklearnex.glob", "patch_sklearn", "-a", "sklearn.svm.SVC"]
        )
>       assert err_code == EX_OK
E       assert 1 == 0

.ci\scripts\test_global_patch.py:48: AssertionError

Curious as to how that happens, since launching pytest to execute those tests would require the correct environment to be already activated.

Alexandr-Solovev added 2 commits March 28, 2026 05:52

init restore

bfa4693

fixes

ee2814d

Alexandr-Solovev added the dependencies Pull requests that update a dependency file label Mar 30, 2026

fixes and additional rpath

8a42e6a

Alexandr-Solovev force-pushed the dev/asolovev_extend_ci_opt branch from 01756be to 8a42e6a Compare March 30, 2026 09:02

Alexandr-Solovev added 4 commits March 30, 2026 11:02

Merge branch 'main' into dev/asolovev_extend_ci_opt

2edb9d6

fixes

f0388b6

fixes

7758f8e

Merge branch 'main' into dev/asolovev_extend_ci_opt

cea3669

Alexandr-Solovev added 2 commits April 7, 2026 13:50

Merge branch 'main' into dev/asolovev_extend_ci_opt

af07f1c

fixes

9da9016

Alexandr-Solovev added 2 commits April 13, 2026 09:26

fixes

275cc3d

Merge branch 'main' into dev/asolovev_extend_ci_opt

92a63c9

fixes

9c937ac

ethanglaser reviewed Apr 15, 2026

View reviewed changes

Comment thread .ci/pipeline/ci.yml Outdated

ethanglaser reviewed Apr 16, 2026

View reviewed changes

Comment thread .ci/pipeline/ci.yml Outdated

experiment: using internal script for coverage

6e0b498

Alexandr-Solovev force-pushed the dev/asolovev_extend_ci_opt branch from fa8a0b6 to 6e0b498 Compare April 17, 2026 08:46

fixes

0365eb9

fixes

8521af2

Alexandr-Solovev force-pushed the dev/asolovev_extend_ci_opt branch from 93810dc to 8521af2 Compare April 20, 2026 08:50

Alexandr-Solovev added 3 commits April 20, 2026 10:50

Merge branch 'main' into dev/asolovev_extend_ci_opt

03e9297

fixes for azure

5078a3b

fixes for CI

553400f

ethanglaser reviewed Apr 20, 2026

View reviewed changes

fixes

c33f75b

fixes

778fcc9

Conversation

Alexandr-Solovev commented Mar 28, 2026

Description

Uh oh!

david-cortes-intel commented Apr 1, 2026

Uh oh!

david-cortes-intel commented Apr 9, 2026

Uh oh!

david-cortes-intel commented Apr 14, 2026

Uh oh!

david-cortes-intel commented Apr 14, 2026

Uh oh!

Alexandr-Solovev commented Apr 14, 2026

Uh oh!

david-cortes-intel commented Apr 14, 2026

Uh oh!

david-cortes-intel commented Apr 14, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

david-cortes-intel commented Apr 15, 2026

Uh oh!

Alexandr-Solovev commented Apr 15, 2026

Uh oh!

Uh oh!

Uh oh!

david-cortes-intel commented Apr 16, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Alexandr-Solovev commented Apr 16, 2026

Uh oh!

ethanglaser commented Apr 16, 2026

Uh oh!

david-cortes-intel commented Apr 17, 2026

Uh oh!

Alexandr-Solovev commented Apr 17, 2026

Uh oh!

david-cortes-intel commented Apr 17, 2026

Uh oh!

Alexandr-Solovev commented Apr 17, 2026

Uh oh!

azure-pipelines Bot commented Apr 17, 2026

Uh oh!

ethanglaser commented Apr 17, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

ethanglaser Apr 20, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

david-cortes-intel commented Apr 21, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

david-cortes-intel commented Apr 14, 2026 •

edited

Loading

david-cortes-intel commented Apr 16, 2026 •

edited

Loading

ethanglaser commented Apr 17, 2026 •

edited

Loading

ethanglaser Apr 20, 2026 •

edited

Loading