Skip to content

extend ci oneDAL#3584

Draft
Alexandr-Solovev wants to merge 20 commits into
uxlfoundation:mainfrom
Alexandr-Solovev:dev/asolovev_extend_ci_opt
Draft

extend ci oneDAL#3584
Alexandr-Solovev wants to merge 20 commits into
uxlfoundation:mainfrom
Alexandr-Solovev:dev/asolovev_extend_ci_opt

Conversation

@Alexandr-Solovev
Copy link
Copy Markdown
Contributor

Description


Checklist:

Completeness and readability

  • I have commented my code, particularly in hard-to-understand areas.
  • I have updated the documentation to reflect the changes or created a separate PR with updates and provided its number in the description, if necessary.
  • Git commit message contains an appropriate signed-off-by string (see CONTRIBUTING.md for details).
  • I have resolved any merge conflicts that might occur with the base branch.

Testing

  • I have run it locally and tested the changes extensively.
  • All CI jobs are green or I have provided justification why they aren't.
  • I have extended testing suite if new functionality was introduced in this PR.

Performance

  • I have measured performance for affected algorithms using scikit-learn_bench and provided at least a summary table with measured data, if performance change is expected.
  • I have provided justification why performance and/or quality metrics have changed or why changes are not expected.
  • I have extended the benchmarking suite and provided a corresponding scikit-learn_bench PR if new measurable functionality was introduced in this PR.

@Alexandr-Solovev Alexandr-Solovev added the dependencies Pull requests that update a dependency file label Mar 30, 2026
@Alexandr-Solovev Alexandr-Solovev force-pushed the dev/asolovev_extend_ci_opt branch from 01756be to 8a42e6a Compare March 30, 2026 09:02
@david-cortes-intel
Copy link
Copy Markdown
Contributor

From the failed CIs:

  • Failures in sklearnex tests are in the examples. The results do not match against the hard-coded expected predictions. Not sure what exactly changed here - perhaps oneDAL is being compiled with different instruction sets or something like that.
  • One of the failures in sklearn conformance tests is also happening in the main branch, but the others are new, and not expected, even though the differences aren't too large:
E       AssertionError: 
E       Not equal to tolerance rtol=0.0001, atol=0.0002
E       
E       Mismatched elements: 1 / 55 (1.82%)
E       Mismatch at index:
E        [28]: -1.4209840297698975 (ACTUAL), -1.3922417005233982 (DESIRED)
E       Max absolute difference among violations: 0.02874233
E       Max relative difference among violations: 0.02064464
E        ACTUAL: array([-1.020011, -1.080148, -1.103261, -1.345047, -1.057743, -1.229007,
E              -1.064617, -0.962323, -0.967014, -0.995485, -0.986337, -0.956837,
E              -0.97445 , -0.948463, -0.968412, -1.066067, -0.968412, -0.989617,...
E        DESIRED: array([-1.020011, -1.080148, -1.103261, -1.345047, -1.057743, -1.229007,
E              -1.064617, -0.962323, -0.967014, -0.995485, -0.986336, -0.956837,
E              -0.97445 , -0.948463, -0.968412, -1.066066, -0.968412, -0.989617,...

@david-cortes-intel
Copy link
Copy Markdown
Contributor

CI error for the make job when it tries to import DPC (link):

CondaError: Run 'conda init' before 'conda activate'

@ethanglaser Any comments about what do about it?

For the failing C++ examples, looks like the runners don't support avx512:

14:35:45 PASSED		df_cls_default_dense_extratrees_batch
.ci/scripts/test.sh: line 276: 42821 Illegal instruction     (core dumped) ${run_command} > "${e}".res 2>&1

14:35:45 FAILED		df_cls_dense_batch_model_builder with errno 132

@david-cortes-intel
Copy link
Copy Markdown
Contributor

Looks like the changes didn't solve the issue about results from examples not matching anymore:

__________________________ TestExNpyArray.test_em_gmm __________________________

self = <test_daal4py_examples.TestExNpyArray testMethod=test_em_gmm>

    @unittest.skipUnless(
        config.check_version(),
        f"Minimum required version {config.required_version} (have {daal_version})",
    )
    @unittest.skipUnless(missing_dep is None, f"Missing dependency: {missing_dep}")
    @unittest.skipIf(
        config.is_suspended(),
        f"Test was suspended for {config.suspended_for_n_days} days on {config.suspended_on}",
    )
    def run_test(self):
        start = time.process_time()
    
        ex = import_module_any_path(example_path / config.module_name)
    
        if not hasattr(ex, "main"):
            self.skipTest("Missing main function")
    
        result: Any = self.call_main(ex)
        if config.result_file_name and config.result_attribute:
            testdata = readcsv.np_read_csv(
                example_data_path / config.result_file_name
            )
            ra = config.result_attribute
            actual = ra(result) if callable(ra) else getattr(result, ra)
>           np.testing.assert_allclose(actual, testdata, atol=1e-05)
E           AssertionError: 
E           Not equal to tolerance rtol=1e-07, atol=1e-05
E           
E           Mismatched elements: 9 / 9 (100%)
E           First 5 mismatches are at indices:
E            [0, 0]: 33.96541597016111 (ACTUAL), 7.567869663238525 (DESIRED)
E            [0, 1]: -2.1664695368906 (ACTUAL), 1.3058382272720337 (DESIRED)
E            [0, 2]: -18.318028183740516 (ACTUAL), -0.06336316466331482 (DESIRED)
E            [1, 0]: -2.1664695368906 (ACTUAL), 1.3058382272720337 (DESIRED)
E            [1, 1]: 1.1060078371075024 (ACTUAL), 0.9793263673782349 (DESIRED)
E           Max absolute difference among violations: 26.39754631
E           Max relative difference among violations: 288.09585374
E            ACTUAL: array([[ 33.965416,  -2.16647 , -18.318028],
E                  [ -2.16647 ,   1.106008,   2.128947],
E                  [-18.318028,   2.128947,  14.218832]])
E            DESIRED: array([[ 7.56787 ,  1.305838, -0.063363],
E                  [ 1.305838,  0.979326,  0.636319],
E                  [-0.063363,  0.636319,  2.276593]], dtype=float32)

s/tests/test_daal4py_examples.py:145: AssertionError

@Alexandr-Solovev Could you give it a try without the new -O0 flag to see if that makes a difference?

@david-cortes-intel
Copy link
Copy Markdown
Contributor

So looks like it was indeed the -O0 flag that resulted in the examples reaching different results. Curious as to how the differences ended up being that big in files like this (CC @Vika-F):
https://github.com/uxlfoundation/scikit-learn-intelex/blob/main/examples/daal4py/em_gmm.py

The other remaining test failures also occur in the sklearnex main repository and are not caused by the changes here.

@Alexandr-Solovev
Copy link
Copy Markdown
Contributor Author

So looks like it was indeed the -O0 flag that resulted in the examples reaching different results. Curious as to how the differences ended up being that big in files like this (CC @Vika-F): https://github.com/uxlfoundation/scikit-learn-intelex/blob/main/examples/daal4py/em_gmm.py

The other remaining test failures also occur in the sklearnex main repository and are not caused by the changes here.

Do you know the estimation for fixing those tests?

@david-cortes-intel
Copy link
Copy Markdown
Contributor

So looks like it was indeed the -O0 flag that resulted in the examples reaching different results. Curious as to how the differences ended up being that big in files like this (CC @Vika-F): https://github.com/uxlfoundation/scikit-learn-intelex/blob/main/examples/daal4py/em_gmm.py
The other remaining test failures also occur in the sklearnex main repository and are not caused by the changes here.

Do you know the estimation for fixing those tests?

Those aren't really tests, they are executing daal4py examples and checking the result against hard-coded numbers which were obtained from earlier runs. Some algorithms could produce slightly different results due to floating point inaccuracies, but in this case it's finding completely different cluster centers. We don't know if those are incorrect or not, but the optimization flag shouldn't make such large differences.

@david-cortes-intel
Copy link
Copy Markdown
Contributor

david-cortes-intel commented Apr 14, 2026

So looks like it was indeed the -O0 flag that resulted in the examples reaching different results. Curious as to how the differences ended up being that big in files like this (CC @Vika-F): https://github.com/uxlfoundation/scikit-learn-intelex/blob/main/examples/daal4py/em_gmm.py
The other remaining test failures also occur in the sklearnex main repository and are not caused by the changes here.

Do you know the estimation for fixing those tests?

If you mean the failing tests that are actually tests, it will be very hard to fix, because they involve memory safety issues that only manifest under some particular environments and hardware. Perhaps @Vika-F could comment on that. Note that the issue is likely on the oneDAL side.

@david-cortes-intel
Copy link
Copy Markdown
Contributor

@Alexandr-Solovev Is this PR meant to fix the failing step 'Check DPC module import' on windows?

@ethanglaser Any comments about what could be done about this error?

CondaError: Run 'conda init' before 'conda activate'

@Alexandr-Solovev
Copy link
Copy Markdown
Contributor Author

@Alexandr-Solovev Is this PR meant to fix the failing step 'Check DPC module import' on windows?

@ethanglaser Any comments about what could be done about this error?

CondaError: Run 'conda init' before 'conda activate'

YEs, but for now I dont know whats the solution for win import dpc correct work

Comment thread .ci/pipeline/ci.yml Outdated
Comment thread .ci/pipeline/ci.yml Outdated
@david-cortes-intel
Copy link
Copy Markdown
Contributor

david-cortes-intel commented Apr 16, 2026

New windows error after fixing the conda activation issue:

ModuleNotFoundError: No module named 'onedal._onedal_py_dpc'

(link)

As far as I can see, the onedal_py_dpc module is not even being built on those jobs:
https://dev.azure.com/daal/DAAL/_build/results?buildId=58463&view=logs&jobId=ee3ca9c7-09ef-5e6f-8aad-c1911e4a6a1a&j=ee3ca9c7-09ef-5e6f-8aad-c1911e4a6a1a&t=a323c936-ca0d-52a3-b3b1-104d24bffc7b

It doesn't appear to have any %NO_DPC% environment variable as far as I can see, but perhaps it could add a print to verify that it doesn't have such variable.

@Alexandr-Solovev
Copy link
Copy Markdown
Contributor Author

New windows error after fixing the conda activation issue:

ModuleNotFoundError: No module named 'onedal._onedal_py_dpc'

(link)

As far as I can see, the onedal_py_dpc module is not even being built on those jobs: https://dev.azure.com/daal/DAAL/_build/results?buildId=58463&view=logs&jobId=ee3ca9c7-09ef-5e6f-8aad-c1911e4a6a1a&j=ee3ca9c7-09ef-5e6f-8aad-c1911e4a6a1a&t=a323c936-ca0d-52a3-b3b1-104d24bffc7b

It doesn't appear to have any %NO_DPC% environment variable as far as I can see, but perhaps it could add a print to verify that it doesn't have such variable.

I checked the sklearnex file https://github.com/uxlfoundation/scikit-learn-intelex/blob/e0a107a2fbc1bc8f0617e10ff4dea88711ce5275/.ci/pipeline/build-and-test-win.yml and looks like it does not include anything realted to dpc(build, examples). @ethanglaser Do you know why?

@ethanglaser
Copy link
Copy Markdown
Contributor

New windows error after fixing the conda activation issue:

ModuleNotFoundError: No module named 'onedal._onedal_py_dpc'

(link)
As far as I can see, the onedal_py_dpc module is not even being built on those jobs: https://dev.azure.com/daal/DAAL/_build/results?buildId=58463&view=logs&jobId=ee3ca9c7-09ef-5e6f-8aad-c1911e4a6a1a&j=ee3ca9c7-09ef-5e6f-8aad-c1911e4a6a1a&t=a323c936-ca0d-52a3-b3b1-104d24bffc7b
It doesn't appear to have any %NO_DPC% environment variable as far as I can see, but perhaps it could add a print to verify that it doesn't have such variable.

I checked the sklearnex file https://github.com/uxlfoundation/scikit-learn-intelex/blob/e0a107a2fbc1bc8f0617e10ff4dea88711ce5275/.ci/pipeline/build-and-test-win.yml and looks like it does not include anything realted to dpc(build, examples). @ethanglaser Do you know why?

We do not build windows dpc in sklearnex azure pipelines. Only in github actions.

@david-cortes-intel
Copy link
Copy Markdown
Contributor

@ethanglaser So what would we need to do here in order to run sklearnex DPC tests on windows?

@Alexandr-Solovev
Copy link
Copy Markdown
Contributor Author

@ethanglaser So what would we need to do here in order to run sklearnex DPC tests on windows?

I believe either extend the current https://github.com/uxlfoundation/scikit-learn-intelex/blob/e0a107a2fbc1bc8f0617e10ff4dea88711ce5275/.ci/pipeline/build-and-test-win.yml on intelex side, or try to use https://github.com/uxlfoundation/scikit-learn-intelex/blob/main/.github/workflows/ci.yml#L178 for win. But not sure about github and azure compatibility

@Alexandr-Solovev Alexandr-Solovev force-pushed the dev/asolovev_extend_ci_opt branch from fa8a0b6 to 6e0b498 Compare April 17, 2026 08:46
@david-cortes-intel
Copy link
Copy Markdown
Contributor

@Alexandr-Solovev
Copy link
Copy Markdown
Contributor Author

/azp run

@azure-pipelines
Copy link
Copy Markdown

Azure Pipelines failed to run 1 pipeline(s).

@ethanglaser
Copy link
Copy Markdown
Contributor

ethanglaser commented Apr 17, 2026

@ethanglaser So what would we need to do here in order to run sklearnex DPC tests on windows?

I believe either extend the current https://github.com/uxlfoundation/scikit-learn-intelex/blob/e0a107a2fbc1bc8f0617e10ff4dea88711ce5275/.ci/pipeline/build-and-test-win.yml on intelex side, or try to use https://github.com/uxlfoundation/scikit-learn-intelex/blob/main/.github/workflows/ci.yml#L178 for win. But not sure about github and azure compatibility

I suggest migrating the intended scope of additions from existing .ci/ (azure pipelines) to .github/ (github actions). We can set up a workflow that depends on existing Nightly-build github action and then runs sklearnex .github/workflows/ci.yml step using the result of oneDAL Nightly-build. This is the same way it functions in sklearnex, except that in sklearnex it takes the latest successful Nightly-build here and here from main branch - here it would just take the Nightly-build from the CI here.

@Alexandr-Solovev Alexandr-Solovev force-pushed the dev/asolovev_extend_ci_opt branch from 93810dc to 8521af2 Compare April 20, 2026 08:50
Comment thread .github/workflows/ci.yml Outdated
pip install $(python .ci/scripts/get_compatible_scipy_version.py ${{ matrix.SKLEARN_VERSION }}) pyyaml
if [ "${{ steps.set-env.outputs.DPCFLAG }}" == "" ]; then pip install dpctl==${{ env.DPCTL_VERSION }} dpnp==${{ env.DPNP_VERSION }}; fi
pip list
- name: Sklearnex testing
Copy link
Copy Markdown
Contributor

@ethanglaser ethanglaser Apr 20, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is it possible to call github actions workflows similar to how you were initially doing for azure pipeline? Instead of having the same steps defined here as in sklearnex repo

@david-cortes-intel
Copy link
Copy Markdown
Contributor

Test failure from the new windows jobs happens when it calls python -m ... from inside a python test.

Looks like that 'python' call might be calling an interpreter from a different environment, which doesn't have sklearnex installed:

____________ ERROR at setup of test_patching_svc_from_command_line ____________

request = <SubRequest 'patch_svc_from_command_line' for <Function test_patching_svc_from_command_line>>

    @pytest.fixture
    def patch_svc_from_command_line(request):
        err_code = subprocess.call(
            [sys.executable, "-m", "sklearnex.glob", "patch_sklearn", "-a", "sklearn.svm.SVC"]
        )
>       assert err_code == EX_OK
E       assert 1 == 0

.ci\scripts\test_global_patch.py:48: AssertionError

Curious as to how that happens, since launching pytest to execute those tests would require the correct environment to be already activated.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

dependencies Pull requests that update a dependency file

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants