-
-
Notifications
You must be signed in to change notification settings - Fork 25.8k
[MRG+1] Return nan in RadiusNeighborsRegressor for empty neighbor set #9655
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Fixes problem with raise error when no available data points for RadiusNeighborRegression using non-uniform weights.
@@ -291,9 +291,28 @@ def predict(self, X): | |||
y_pred = np.array([np.mean(_y[ind, :], axis=0) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we want this to be issuing a warning in this case? I think the tests should assert either that a warning is raised, or that none is.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have made a test now that asserts no warning is raised.
sklearn/neighbors/regression.py
Outdated
@@ -291,9 +291,28 @@ def predict(self, X): | |||
y_pred = np.array([np.mean(_y[ind, :], axis=0) | |||
for ind in neigh_ind]) | |||
else: | |||
y_pred = np.array([(np.average(_y[ind, :], axis=0, | |||
weights=weights[i])) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can't we just add if len(ind) else np.nan
here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Right, great idea - I've changed this now.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hmm, on second thought I guess that it should match the second axis of _y
? I have made a commit where it is inserting np.full(_y.shape[1], np.nan)
under the else clause.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, that's right. Is it tested?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, I've tested it - works for one or more _y variables. Should I document these tests?
Used @jnothman suggestion to simplify fix
y_pred_alt_isnan = np.all(np.isnan(y_pred_alt)) | ||
raise_zero_div_error = False | ||
|
||
except ZeroDivisionError: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is unnecessary. The test will fail if there is an error raised.
I was asking about handling the warning which numpy issues when performing np.mean([])
. I'm not sure whether it's good or bad for this warning to be issued, but we should be consistent in the weighted case. warnings are tested with our assert_warns_message
and assert_no_warnings
helpers.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Right, good point.
I looked into raising the warning - apparently warnings only arise when there are uniform weights. What do you think of the following solution?
# test fix for issue #9654
# test that nan is returned when no nearby observations
X_test_nan = np.ones((1,n_features))*-1
if weights=='uniform':
assert_warns_message(RuntimeWarning, "Mean of empty slice.",
neigh.predict, X_test_nan)
assert_true(np.all(np.isnan(neigh.predict(X_test_nan))))
sklearn/neighbors/regression.py
Outdated
@@ -291,9 +291,28 @@ def predict(self, X): | |||
y_pred = np.array([np.mean(_y[ind, :], axis=0) | |||
for ind in neigh_ind]) | |||
else: | |||
y_pred = np.array([(np.average(_y[ind, :], axis=0, | |||
weights=weights[i])) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, that's right. Is it tested?
@jnothman I think it is ready now for merge - do you agree? |
it's on my heap for review, but my throughput will be quite limited for the
next month or so.
…On 8 Sep 2017 6:40 pm, "Andreas Bjerre-Nielsen" ***@***.***> wrote:
@jnothman <https://github.com/jnothman> I think it is ready now for merge
- do you agree?
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
<#9655 (comment)>,
or mute the thread
<https://github.com/notifications/unsubscribe-auth/AAEz661fnObwAevsSclMLbDafhHTHWsNks5sgP12gaJpZM4PH82n>
.
|
Alright, cheers! |
You should change the title from WIP to MRG |
# test that nan is returned when no nearby observations | ||
X_test_nan = np.ones((1, n_features))*-1 | ||
if weights == 'uniform': | ||
assert_warns_message(RuntimeWarning, "Mean of empty slice.", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No, we should issue a warning in all casess, not dependent on weights.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You should also get the output, i.e. pred = assert_warns_message(...)
so that you can then do the isnan assertion.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hi again, thanks for the feedback! Just to be clear on what you want me to do: raise a warning for RadiusNeighborRegressor when it returns one or more nan values irrespective of using weights or not. I guess this should be consistent with and without weights. So should we copy the warning from np.mean (without weights) and issue it when y_pred contains nan rows (and there are weights)?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I suppose so. Either that, or we can be kinder to the user: hide the numpy error, and raise our own which is more explicit, with a message like "Some samples have no neighbors; predicting NaN."
Added new warning when prediction of RadiusNeighborRegression returns empty values. Also suppressed warning for numpy warnings. Moreover, added copyright and fixed erroneous description of output.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Otherwise LGTM, thanks!
Please add an entry to whats_new
Fixed more PEP8 and used @jnothman idea of using full_like
sklearn/neighbors/regression.py
Outdated
for ind in neigh_ind]) | ||
|
||
with warnings.catch_warnings(): | ||
warnings.filterwarnings('ignore', "Mean of empty slice") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah. It seems errstate won't cover this.
sklearn/neighbors/regression.py
Outdated
|
||
with warnings.catch_warnings(): | ||
warnings.filterwarnings('ignore', "Mean of empty slice") | ||
warnings.filterwarnings('ignore', ( "invalid value " |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
how about
warnings.filterwarnings('ignore',
"invalid value encountered in divide")
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I wrote this before you fixed it.
LGTM |
Great :) |
Please still add an entry in whats_new |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks again!
doc/whats_new.rst
Outdated
:class:`neighbors.RadiusNeighborsRegressor` can handle empty neighbor set | ||
when using non uniform weights. Also raises a new warning, irrespective of | ||
weighting, when no neighbors are found for one or more samples. Moreover, | ||
all numpy errors related to missing neighbors are suppressed. :issue:`9654`. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We actually prefer the pull request number here, #9655.
doc/whats_new.rst
Outdated
- Fix bug so that the method ``predict`` for | ||
:class:`neighbors.RadiusNeighborsRegressor` can handle empty neighbor set | ||
when using non uniform weights. Also raises a new warning, irrespective of | ||
weighting, when no neighbors are found for one or more samples. Moreover, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
maybe just "... samples, in place of unclear numpy errors."
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good suggestion, thanks
@jnothman do you need me to do anything more or is this ready now as it is? |
@jnothman: good suggestion - I've pushed an update |
@jnothman hi again joel, is there anything I can do to get a 2nd review? do you have some candidates for whom I should tag? |
This is pretty straightforward, though a more thorough PR description might help a reviewer come on board. Key idea: |
Thanks a lot Joel - will follow those suggestions |
Hi @qinhanmin2014 and @lesteve would you mind taking a look at this PR? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Still some concerns before my personal LGTM. Kindly forgive if some of them are naive.
@jnothman would be grateful if you can check these comments.
sklearn/neighbors/regression.py
Outdated
with warnings.catch_warnings(): | ||
warnings.filterwarnings('ignore', "Mean of empty slice") | ||
warn_div = "invalid value encountered in true_divide" | ||
warnings.filterwarnings('ignore', warn_div) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
(1)I don't think our code should change the default behaviour of other libraries. In this case, after user run RadiusNeighborsRegressor, he will not get any warning when he execute np.mean([])
, it might be better to at least reset the warnings.
(2) invalid value encountered in true_divide
seems a bit magical, could you please add some comments? Also, is there any specific reason you are blocking warnings here? It seems that you can use the same way as in the else clause. i.e.
empty_obs = np.full_like(_y[0], np.nan)
y_pred = np.array([np.mean(_y[ind, :], axis=0, weights=weights[i]) if len(ind) else empty_obs
for (i, ind) in enumerate(neigh_ind)])
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The with catch_warnings context will reset when exited. But is there a reason we are not using numpy's own error manager?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The with catch_warnings context will reset when exited. But is there a reason we are not using numpy's own error manager?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No particular reason. I can implement the numpy warnings if you wish it to be changed.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@jnothman Thanks a lot for the confirm. If you think blocking warnings here is the right way, I won't oppose. My main concern is the redundant tests for this edge case
@abjer If you don't have specific reason, please consider to address @jnothman's comment here. Also, please consider whether we should cut off some redundant tests.
Another thing @abjer could you please update the document to tell users what we are doing in this special case? I think it might not be so straightforward for ordinary users.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@abjer You need to solve the conflict before this PR can be merged (e.g., merge current master into this branch). Thanks a lot for your issue and PR and also for your great patience :)
@@ -658,6 +659,17 @@ def test_radius_neighbors_regressor(n_samples=40, | |||
y_pred = neigh.predict(X[:n_test_pts] + epsilon) | |||
assert_true(np.all(abs(y_pred - y_target) < radius / 2)) | |||
|
|||
# test fix for issue #9654 | |||
# test that nan is returned when no nearby observations |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If you put the test here, similar test will execute 12 times(4 algorithm X 3 weight), but seems that the change is actually unrelated with algorithm? I think a seperate test(e.g., test_radius_neighbors_regressor_nan_output) with 2 cases (uniform weight and distance weight) might be enough.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could you please fix the flake8 errors according to Travis log (See the bottom of https://travis-ci.org/scikit-learn/scikit-learn/jobs/304461869)? I think my LGTM is very close :)
sklearn/neighbors/regression.py
Outdated
y_pred = np.array([np.mean(_y[ind, :], axis=0) | ||
for ind in neigh_ind]) | ||
y_pred = np.array([np.mean(_y[ind, :], | ||
axis=0) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
please consider to fill the current line before starting a new line, other places the same.
|
||
empty_warning_msg = ("One or more samples have no neighbors " | ||
"within specified radius; predicting NaN.") | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
please consider to remove some unnecessary blank lines both in the code and in the test.
doc/whats_new/v0.20.rst
Outdated
@@ -179,6 +179,14 @@ Metrics | |||
:issue:`10093` by :user:`alexryndin <alexryndin>` | |||
and :user:`Hanmin Qin <qinhanmin2014>`. | |||
|
|||
Neighbors | |||
|
|||
- Fix bug so that the method ``predict`` for |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fixed a bug seems more consistent?
doc/whats_new/v0.20.rst
Outdated
:class:`neighbors.RadiusNeighborsRegressor` can handle empty neighbor set | ||
when using non uniform weights. Also raises a new warning, irrespective of | ||
weighting, when no neighbors are found for samples, in place of unclear | ||
numpy errors.. :issue:`9655`. By :user:`Andreas Bjerre-Nielsen <abjer>`. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
duplicate . (numpy errors..)
doc/whats_new/v0.20.rst
Outdated
@@ -179,6 +179,14 @@ Metrics | |||
:issue:`10093` by :user:`alexryndin <alexryndin>` | |||
and :user:`Hanmin Qin <qinhanmin2014>`. | |||
|
|||
Neighbors |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The doc does not look fine, please follow what others are doing (not sure but maybe remove some extra whitespace at the beginning?).
X_test_nan) | ||
assert_true(np.all(np.isnan(pred))) | ||
|
||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@abjer Please only leave two blank lines between functions. This flake8 error will not be detected for some reason but we generally don't want to introduce more flake8 errors.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks, I have now recommitted.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM ping @jnothman
Latest change since your +1
(1) not block warnings, use similar way as another case instead.
(2)remove redundant tests, only keep two tests for the two cases.
@qinhanmin2014 thank you so much for the review and very constructive suggestions |
Great! |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Apart from what's new being unclear, this is good to merge
doc/whats_new/v0.20.rst
Outdated
|
||
- Fixed a bug so ``predict`` in :class:`neighbors.RadiusNeighborsRegressor` can | ||
handle empty neighbor set when using non uniform weights. Also raises a new | ||
warning, irrespective of, when no neighbors are found for samples, in place of |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
"Irrespective of" can be removed
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not sure what in place of duplicate means
doc/whats_new/v0.20.rst
Outdated
- Fixed a bug so ``predict`` in :class:`neighbors.RadiusNeighborsRegressor` can | ||
handle empty neighbor set when using non uniform weights. Also raises a new | ||
warning, irrespective of, when no neighbors are found for samples, in place of | ||
duplicate. (numpy errors..) :issue:`9655`. By :user:`Andreas Bjerre-Nielsen <abjer>`. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What is "numpy errors" here. Drop it
Sorry that I somehow miss the what's new in the final check @abjer I believe current what's new is just a misoperation? The previous what's new seems fine (except for the format I've pointed out). So please correct current what's new accordingly. I'll merge when CIs are green. |
@jnothman thanks for pointing that out - I have submitted a new version. I removed the mention of duplicate errors as @qinhanmin2014 idea of computation fixed the issue. Thanks both of you for the big support in this PR. |
Thanks @abjer for your contributions and your patience :) |
Reference Issue
Fixes #9654
What does this implement/fix? Explain your changes.
RadiusNeighborsRegressor is behaving differently when there are no neighbors for a sample between when weights are or aren't used. This PR fixes this inconsistency. This PR also fixes raised error when no available data points for
RadiusNeighborRegression
using non-uniform weights.