[MRG+1] Return nan in RadiusNeighborsRegressor for empty neighbor set #9655

abjer · 2017-08-30T20:44:31Z

Reference Issue

What does this implement/fix? Explain your changes.

RadiusNeighborsRegressor is behaving differently when there are no neighbors for a sample between when weights are or aren't used. This PR fixes this inconsistency. This PR also fixes raised error when no available data points for RadiusNeighborRegression using non-uniform weights.

Fixes problem with raise error when no available data points for RadiusNeighborRegression using non-uniform weights.

jnothman · 2017-08-31T00:11:13Z

sklearn/neighbors/regression.py

@@ -291,9 +291,28 @@ def predict(self, X):
            y_pred = np.array([np.mean(_y[ind, :], axis=0)


Do we want this to be issuing a warning in this case? I think the tests should assert either that a warning is raised, or that none is.

I have made a test now that asserts no warning is raised.

jnothman · 2017-08-31T00:12:58Z

sklearn/neighbors/regression.py

@@ -291,9 +291,28 @@ def predict(self, X):
            y_pred = np.array([np.mean(_y[ind, :], axis=0)
                               for ind in neigh_ind])
        else:
-            y_pred = np.array([(np.average(_y[ind, :], axis=0,
-                                           weights=weights[i]))


can't we just add if len(ind) else np.nan here?

Right, great idea - I've changed this now.

Hmm, on second thought I guess that it should match the second axis of _y? I have made a commit where it is inserting np.full(_y.shape[1], np.nan) under the else clause.

Yes, that's right. Is it tested?

Yes, I've tested it - works for one or more _y variables. Should I document these tests?

@jnothman

Used @jnothman suggestion to simplify fix

jnothman · 2017-08-31T08:32:42Z

sklearn/neighbors/tests/test_neighbors.py

+                y_pred_alt_isnan = np.all(np.isnan(y_pred_alt))
+                raise_zero_div_error = False
+
+            except ZeroDivisionError:


This is unnecessary. The test will fail if there is an error raised.

I was asking about handling the warning which numpy issues when performing np.mean([]). I'm not sure whether it's good or bad for this warning to be issued, but we should be consistent in the weighted case. warnings are tested with our assert_warns_message and assert_no_warnings helpers.

Right, good point.

I looked into raising the warning - apparently warnings only arise when there are uniform weights. What do you think of the following solution?

# test fix for issue #9654 # test that nan is returned when no nearby observations X_test_nan = np.ones((1,n_features))*-1 if weights=='uniform': assert_warns_message(RuntimeWarning, "Mean of empty slice.", neigh.predict, X_test_nan) assert_true(np.all(np.isnan(neigh.predict(X_test_nan))))

jnothman · 2017-08-31T08:33:48Z

sklearn/neighbors/regression.py

@@ -291,9 +291,28 @@ def predict(self, X):
            y_pred = np.array([np.mean(_y[ind, :], axis=0)
                               for ind in neigh_ind])
        else:
-            y_pred = np.array([(np.average(_y[ind, :], axis=0,
-                                           weights=weights[i]))


Yes, that's right. Is it tested?

abjer · 2017-09-08T08:40:19Z

@jnothman I think it is ready now for merge - do you agree?

jnothman · 2017-09-09T10:28:49Z

it's on my heap for review, but my throughput will be quite limited for the next month or so.

…

On 8 Sep 2017 6:40 pm, "Andreas Bjerre-Nielsen" ***@***.***> wrote: @jnothman <https://github.com/jnothman> I think it is ready now for merge - do you agree? — You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub <#9655 (comment)>, or mute the thread <https://github.com/notifications/unsubscribe-auth/AAEz661fnObwAevsSclMLbDafhHTHWsNks5sgP12gaJpZM4PH82n> .

abjer · 2017-09-09T14:33:30Z

Alright, cheers!

jnothman · 2017-09-10T10:02:19Z

@jnothman I think it is ready now for merge - do you agree?

You should change the title from WIP to MRG

jnothman · 2017-09-10T10:03:00Z

sklearn/neighbors/tests/test_neighbors.py

+            # test that nan is returned when no nearby observations
+            X_test_nan = np.ones((1, n_features))*-1
+            if weights == 'uniform':
+                assert_warns_message(RuntimeWarning, "Mean of empty slice.",


No, we should issue a warning in all casess, not dependent on weights.

You should also get the output, i.e. pred = assert_warns_message(...) so that you can then do the isnan assertion.

Hi again, thanks for the feedback! Just to be clear on what you want me to do: raise a warning for RadiusNeighborRegressor when it returns one or more nan values irrespective of using weights or not. I guess this should be consistent with and without weights. So should we copy the warning from np.mean (without weights) and issue it when y_pred contains nan rows (and there are weights)?

I suppose so. Either that, or we can be kinder to the user: hide the numpy error, and raise our own which is more explicit, with a message like "Some samples have no neighbors; predicting NaN."

Added new warning when prediction of RadiusNeighborRegression returns empty values. Also suppressed warning for numpy warnings. Moreover, added copyright and fixed erroneous description of output.

jnothman

Otherwise LGTM, thanks!

Please add an entry to whats_new

@jnothman

Fixed more PEP8 and used @jnothman idea of using full_like

jnothman · 2017-09-27T07:15:17Z

sklearn/neighbors/regression.py

-                               for ind in neigh_ind])
+
+            with warnings.catch_warnings():
+                warnings.filterwarnings('ignore', "Mean of empty slice")


Ah. It seems errstate won't cover this.

jnothman · 2017-09-27T10:01:18Z

sklearn/neighbors/regression.py

+
+            with warnings.catch_warnings():
+                warnings.filterwarnings('ignore', "Mean of empty slice")
+                warnings.filterwarnings('ignore', ( "invalid value "


how about

warnings.filterwarnings('ignore', "invalid value encountered in divide")

I wrote this before you fixed it.

jnothman · 2017-09-27T10:02:17Z

LGTM

abjer · 2017-09-27T10:07:50Z

Great :)

jnothman · 2017-09-27T10:15:29Z

Please still add an entry in whats_new

jnothman

Thanks again!

jnothman · 2017-09-27T10:42:47Z

doc/whats_new.rst

+  :class:`neighbors.RadiusNeighborsRegressor` can handle empty neighbor set 
+  when using non uniform weights. Also raises a new warning, irrespective of 
+  weighting, when no neighbors are found for one or more samples. Moreover, 
+  all numpy errors related to missing neighbors are suppressed. :issue:`9654`. 


We actually prefer the pull request number here, #9655.

jnothman · 2017-09-27T10:43:36Z

doc/whats_new.rst

+- Fix bug so that the method ``predict`` for 
+  :class:`neighbors.RadiusNeighborsRegressor` can handle empty neighbor set 
+  when using non uniform weights. Also raises a new warning, irrespective of 
+  weighting, when no neighbors are found for one or more samples. Moreover, 


maybe just "... samples, in place of unclear numpy errors."

Good suggestion, thanks

abjer · 2017-10-07T12:38:11Z

@jnothman do you need me to do anything more or is this ready now as it is?

This reverts commit 780db9d.

abjer · 2017-10-18T12:07:57Z

@jnothman: good suggestion - I've pushed an update

abjer · 2017-11-16T15:03:12Z

@jnothman hi again joel, is there anything I can do to get a 2nd review? do you have some candidates for whom I should tag?

jnothman · 2017-11-16T20:55:51Z

This is pretty straightforward, though a more thorough PR description might help a reviewer come on board. Key idea: RadiusNeighborsRegressor is behaving differently when there are no neighbors for a sample between when weights are or aren't used. Perhaps @qinhanmin2014, @lesteve...

abjer · 2017-11-16T21:24:53Z

Thanks a lot Joel - will follow those suggestions

abjer · 2017-11-16T21:28:41Z

Hi @qinhanmin2014 and @lesteve would you mind taking a look at this PR?

qinhanmin2014

Still some concerns before my personal LGTM. Kindly forgive if some of them are naive.
@jnothman would be grateful if you can check these comments.

qinhanmin2014 · 2017-11-17T05:51:29Z

sklearn/neighbors/regression.py

+            with warnings.catch_warnings():
+                warnings.filterwarnings('ignore', "Mean of empty slice")
+                warn_div = "invalid value encountered in true_divide"
+                warnings.filterwarnings('ignore', warn_div)


(1)I don't think our code should change the default behaviour of other libraries. In this case, after user run RadiusNeighborsRegressor, he will not get any warning when he execute np.mean([]), it might be better to at least reset the warnings.
(2) invalid value encountered in true_divide seems a bit magical, could you please add some comments? Also, is there any specific reason you are blocking warnings here? It seems that you can use the same way as in the else clause. i.e.

empty_obs = np.full_like(_y[0], np.nan) y_pred = np.array([np.mean(_y[ind, :], axis=0, weights=weights[i]) if len(ind) else empty_obs for (i, ind) in enumerate(neigh_ind)])

The with catch_warnings context will reset when exited. But is there a reason we are not using numpy's own error manager?

No particular reason. I can implement the numpy warnings if you wish it to be changed.

@jnothman Thanks a lot for the confirm. If you think blocking warnings here is the right way, I won't oppose. My main concern is the redundant tests for this edge case
@abjer If you don't have specific reason, please consider to address @jnothman's comment here. Also, please consider whether we should cut off some redundant tests.
Another thing @abjer could you please update the document to tell users what we are doing in this special case? I think it might not be so straightforward for ordinary users.

@abjer You need to solve the conflict before this PR can be merged (e.g., merge current master into this branch). Thanks a lot for your issue and PR and also for your great patience :)

qinhanmin2014 · 2017-11-17T05:59:07Z

sklearn/neighbors/tests/test_neighbors.py

@@ -658,6 +659,17 @@ def test_radius_neighbors_regressor(n_samples=40,
            y_pred = neigh.predict(X[:n_test_pts] + epsilon)
            assert_true(np.all(abs(y_pred - y_target) < radius / 2))

+            # test fix for issue #9654
+            # test that nan is returned when no nearby observations


If you put the test here, similar test will execute 12 times(4 algorithm X 3 weight), but seems that the change is actually unrelated with algorithm? I think a seperate test(e.g., test_radius_neighbors_regressor_nan_output) with 2 cases (uniform weight and distance weight) might be enough.

qinhanmin2014

Could you please fix the flake8 errors according to Travis log (See the bottom of https://travis-ci.org/scikit-learn/scikit-learn/jobs/304461869)? I think my LGTM is very close :)

qinhanmin2014 · 2017-11-20T00:41:45Z

sklearn/neighbors/regression.py

-            y_pred = np.array([np.mean(_y[ind, :], axis=0)
-                               for ind in neigh_ind])
+            y_pred = np.array([np.mean(_y[ind, :],
+                                       axis=0)


please consider to fill the current line before starting a new line, other places the same.

qinhanmin2014 · 2017-11-20T00:44:45Z

sklearn/neighbors/tests/test_neighbors.py

+
+        empty_warning_msg = ("One or more samples have no neighbors "
+                             "within specified radius; predicting NaN.")
+


please consider to remove some unnecessary blank lines both in the code and in the test.

qinhanmin2014 · 2017-11-20T00:50:18Z

doc/whats_new/v0.20.rst

@@ -179,6 +179,14 @@ Metrics
  :issue:`10093` by :user:`alexryndin <alexryndin>`
  and :user:`Hanmin Qin <qinhanmin2014>`.

+  Neighbors
+
+  - Fix bug so that the method ``predict`` for


Fixed a bug seems more consistent?

qinhanmin2014 · 2017-11-20T00:52:18Z

doc/whats_new/v0.20.rst

+    :class:`neighbors.RadiusNeighborsRegressor` can handle empty neighbor set
+    when using non uniform weights. Also raises a new warning, irrespective of
+    weighting, when no neighbors are found for samples, in place of unclear
+    numpy errors.. :issue:`9655`. By :user:`Andreas Bjerre-Nielsen <abjer>`.


duplicate . (numpy errors..)

qinhanmin2014 · 2017-11-20T00:54:25Z

doc/whats_new/v0.20.rst

@@ -179,6 +179,14 @@ Metrics
  :issue:`10093` by :user:`alexryndin <alexryndin>`
  and :user:`Hanmin Qin <qinhanmin2014>`.

+  Neighbors


The doc does not look fine, please follow what others are doing (not sure but maybe remove some extra whitespace at the beginning?).

qinhanmin2014 · 2017-11-20T11:36:05Z

sklearn/neighbors/tests/test_neighbors.py

+                                    X_test_nan)
+        assert_true(np.all(np.isnan(pred)))
+
+


@abjer Please only leave two blank lines between functions. This flake8 error will not be detected for some reason but we generally don't want to introduce more flake8 errors.

Thanks, I have now recommitted.

qinhanmin2014

LGTM ping @jnothman
Latest change since your +1
(1) not block warnings, use similar way as another case instead.
(2)remove redundant tests, only keep two tests for the two cases.

abjer · 2017-11-20T13:10:58Z

@qinhanmin2014 thank you so much for the review and very constructive suggestions

jnothman · 2017-11-20T22:56:08Z

Great!

jnothman

Apart from what's new being unclear, this is good to merge

jnothman · 2017-11-20T22:57:02Z

doc/whats_new/v0.20.rst

+
+- Fixed a bug so ``predict`` in :class:`neighbors.RadiusNeighborsRegressor` can
+  handle empty neighbor set when using non uniform weights. Also raises a new
+  warning, irrespective of, when no neighbors are found for samples, in place of


"Irrespective of" can be removed

I'm not sure what in place of duplicate means

jnothman · 2017-11-20T22:57:42Z

doc/whats_new/v0.20.rst

+- Fixed a bug so ``predict`` in :class:`neighbors.RadiusNeighborsRegressor` can
+  handle empty neighbor set when using non uniform weights. Also raises a new
+  warning, irrespective of, when no neighbors are found for samples, in place of
+  duplicate. (numpy errors..) :issue:`9655`. By :user:`Andreas Bjerre-Nielsen <abjer>`.


What is "numpy errors" here. Drop it

qinhanmin2014 · 2017-11-21T05:39:37Z

Apart from what's new being unclear, this is good to merge

Sorry that I somehow miss the what's new in the final check

@abjer I believe current what's new is just a misoperation? The previous what's new seems fine (except for the format I've pointed out). So please correct current what's new accordingly. I'll merge when CIs are green.

abjer · 2017-11-21T08:29:02Z

@jnothman thanks for pointing that out - I have submitted a new version. I removed the mention of duplicate errors as @qinhanmin2014 idea of computation fixed the issue. Thanks both of you for the big support in this PR.

qinhanmin2014 · 2017-11-21T09:11:24Z

Thanks @abjer for your contributions and your patience :)

…ikit-learn#9655) * Fix scikit-learn#9654

Fix to issue 9654

191b4f7

Fixes problem with raise error when no available data points for RadiusNeighborRegression using non-uniform weights.

abjer changed the title ~~[WIP] Fix to issue 9654~~ [WIP] Return nan for RadiusNeighborsRegressor with non-uniform weights Aug 30, 2017

jnothman reviewed Aug 31, 2017

View reviewed changes

abjer added 2 commits August 31, 2017 09:14

Simplify fix

15d492e

Used @jnothman suggestion to simplify fix

Attempt to fix test and nan dimensionality

83a6d52

jnothman reviewed Aug 31, 2017

View reviewed changes

abjer added 3 commits September 6, 2017 11:44

Updated test

6221c8c

Bug fix in test: add import

f8febd5

Fix pep8 requirements

dc15b84

jnothman changed the title ~~[WIP] Return nan for RadiusNeighborsRegressor with non-uniform weights~~ [MRG] Return nan for RadiusNeighborsRegressor with non-uniform weights Sep 10, 2017

jnothman reviewed Sep 10, 2017

View reviewed changes

abjer added 2 commits September 26, 2017 21:37

Empty warning for RadiusNeighborRegression

891e38b

Added new warning when prediction of RadiusNeighborRegression returns empty values. Also suppressed warning for numpy warnings. Moreover, added copyright and fixed erroneous description of output.

Fix PEP8 issues

a2b49aa

jnothman reviewed Sep 27, 2017

View reviewed changes

abjer added 2 commits September 27, 2017 08:59

Notation fix

af791f0

Fixed more PEP8 and used @jnothman idea of using full_like

More notation fix

d4547bf

jnothman reviewed Sep 27, 2017

View reviewed changes

jnothman changed the title ~~[MRG] Return nan for RadiusNeighborsRegressor with non-uniform weights~~ [MRG+1] Return nan for RadiusNeighborsRegressor with non-uniform weights Sep 27, 2017

Update whats_new.rst: solution to issue scikit-learn#9654

780db9d

jnothman reviewed Sep 27, 2017

View reviewed changes

Update whats_new.rst

8decafa

abjer added 3 commits October 17, 2017 23:19

Revert "Update whats_new.rst: solution to issue scikit-learn#9654"

74c9574

This reverts commit 780db9d.

Merge branch 'master' of github.com:scikit-learn/scikit-learn

0cd5ef3

Updated whats new

a1a000e

qinhanmin2014 reviewed Nov 17, 2017

View reviewed changes

abjer and others added 3 commits November 19, 2017 21:48

Updated fct. and tests

e4db7d5

Fix typo

bc2eae9

bug fix: re-introduce empty obs

2781b25

qinhanmin2014 reviewed Nov 20, 2017

View reviewed changes

abjer added 3 commits November 20, 2017 11:51

Fix pep8 error and tighten notation

3dc86b8

erge branch 'fix_9654' of github.com:abjer/scikit-learn

770db3f

Fix more PEP8 bugs.

266c9a5

qinhanmin2014 reviewed Nov 20, 2017

View reviewed changes

Fix PEP8 lines space between fct.

a66b531

qinhanmin2014 changed the title ~~[MRG+1] Return nan for RadiusNeighborsRegressor with non-uniform weights~~ [MRG+1] Return nan in RadiusNeighborsRegressor for empty neighbor set Nov 20, 2017

qinhanmin2014 approved these changes Nov 20, 2017

View reviewed changes

jnothman reviewed Nov 20, 2017

View reviewed changes

abjer added 2 commits November 21, 2017 09:26

Make whats new more clear

7c35e8c

Merge branch 'master' of github.com:scikit-learn/scikit-learn

b47f5bf

jnothman merged commit 8e1efb0 into scikit-learn:master Nov 21, 2017

jwjohnson314 pushed a commit to jwjohnson314/scikit-learn that referenced this pull request Dec 18, 2017

FIX Return nan in RadiusNeighborsRegressor for empty neighbor set (sc…

0d62f61

…ikit-learn#9655) * Fix scikit-learn#9654

		@@ -291,9 +291,28 @@ def predict(self, X):
		y_pred = np.array([np.mean(_y[ind, :], axis=0)


		empty_warning_msg = ("One or more samples have no neighbors "
		"within specified radius; predicting NaN.")

[MRG+1] Return nan in RadiusNeighborsRegressor for empty neighbor set #9655

[MRG+1] Return nan in RadiusNeighborsRegressor for empty neighbor set #9655

Conversation

abjer commented Aug 30, 2017 • edited Loading

Reference Issue

What does this implement/fix? Explain your changes.

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

abjer Aug 31, 2017 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

abjer Aug 31, 2017 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

abjer commented Sep 8, 2017

jnothman commented Sep 9, 2017 via email

abjer commented Sep 9, 2017

jnothman commented Sep 10, 2017

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

jnothman left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

jnothman commented Sep 27, 2017

abjer commented Sep 27, 2017

jnothman commented Sep 27, 2017

jnothman left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

abjer commented Oct 7, 2017

abjer commented Oct 18, 2017

abjer commented Nov 16, 2017

jnothman commented Nov 16, 2017

abjer commented Nov 16, 2017

abjer commented Nov 16, 2017

qinhanmin2014 left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

qinhanmin2014 left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

qinhanmin2014 left a comment

Choose a reason for hiding this comment

abjer commented Nov 20, 2017

jnothman commented Nov 20, 2017

jnothman left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

qinhanmin2014 commented Nov 21, 2017

abjer commented Nov 21, 2017 • edited Loading

qinhanmin2014 commented Nov 21, 2017

abjer commented Aug 30, 2017 •

edited

Loading

abjer Aug 31, 2017 •

edited

Loading

abjer Aug 31, 2017 •

edited

Loading

abjer commented Nov 21, 2017 •

edited

Loading