-
-
Notifications
You must be signed in to change notification settings - Fork 10.8k
API: Add rtol
to matrix_rank
and stable
[Array API]
#25437
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
rtol
to matrix_rank
and stable
& descending
to sort
& argsort
rtol
to matrix_rank
and stable
& descending
to sort
& argsort
03aa5e8
to
9ecdb6a
Compare
db8eeba
to
673de76
Compare
Bit of a comment from the sidelines, but not allowing |
130d800
to
0dcad83
Compare
rtol
to matrix_rank
and stable
& descending
to sort
& argsort
rtol
to matrix_rank
and stable
& descending
to sort
& argsort
[Array API]
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A few comments. I don't really like adding an argument where we only support the default, it serves no real purpose. But if Ralf feels that is much better from the Array-API adoption perspective, I don't care enough.
You should initialize both kind and stable to some undefined value and simply always reject if both are passed.
(A thing the extra namespace could allow: Add a hack there to support it fully, even if not well. Although, not sure it is better than just raising.)
numpy/_core/src/multiarray/methods.c
Outdated
if (descending) { | ||
PyErr_SetString(PyExc_ValueError, | ||
"`descending=True` is not allowed. Use `np.flip` instead"); | ||
return NULL; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So we add an argument, but not support it? Is that even useful from the array-api perspective?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As @mhvk mentioned in the comment, it could be supported by negating the stride, but if it's Ok I would try to add it in a separate PR, as it's mostly a C-level change.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Honestly, this change is tiny. If we don't like it on its own, we should do the full work immdiately.
It must have been mentioned before many times, but flipping is also not correct for stable argsort!
EDIT: For signed zeros, that should also be true for floats and normal sorting actually...
EDIT: OK, I gu essflipping both input and result is correct? Maybe that is what was always meant? Marten makes the right point here, except that our sorts are contiguous only, so you have to copy+flip -> sort -> flip!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you for guidance! Is it Ok to do the flip
in the Python level (using existing np.flip
) pass to C's array_sort
, and flip back at Python level? Or should I redo flip
in array_sort
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thinking about this a bit more, I agree with @seberg that if we add a new argument, we should at least support it, even if the support is not yet optimal. It may make sense to do a python implementation (with some tests!) if that's easier. With a big TODO that the proper answer would seem to be to just allow sorting using either <
or >
(Tag::less
in quicksort.cpp
, simply passing on ascending/descending
to highway, quite possibly equally easy for other methods; only the one from the descriptor would need special-casing).
Note that argsort
is a bit tricky: equivalent to n - array[::-1].argsort()[::-1]
(negating the array and adding the number of points n
along an axis could be done in-place). Check:
a = np.array([0, 1, 2, 2, -1, -1])
a.argsort()
# array([4, 5, 0, 1, 2, 3])
5-a[::-1].argsort()[::-1]
# array([2, 3, 1, 0, 4, 5])
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, sorry, I hadn't realized the compat library already has these hacks correctly in place... even without extending the core sorting, it may be nice to do the flipping in C (although I am not sure we have a helper to make it easy). Otherwise the method and function diverge, which isn't great.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
np.flip
is pure python, and uses slices ::-1
to reverse the order. Which is implemented as
numpy/numpy/_core/src/multiarray/mapping.c
Lines 872 to 874 in 0ebbdb9
data_ptr += PyArray_STRIDE(self, orig_dim) * start; | |
new_strides[new_dim] = PyArray_STRIDE(self, orig_dim) * step; | |
new_shape[new_dim] = n_steps; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
FYI descending
will be addressed in a separate PR.
numpy/_core/src/multiarray/methods.c
Outdated
"algorithm, as `stable=True` was passed.", 2) < 0) { | ||
return NULL; | ||
} | ||
sortkind = NPY_STABLESORT; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This doesn't make sense! If kind
is not passed you must support stable
(no matter what is passed).
Just always refuse when both are passed.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Right, I added a separate enum entry that represents "undefined" value for both parameters, and adjusted the implementation.
numpy/linalg/_linalg.py
Outdated
@@ -2000,6 +2000,12 @@ def matrix_rank(A, tol=None, hermitian=False): | |||
Defaults to False. | |||
|
|||
.. versionadded:: 1.14 | |||
rtol : (...) array_like, float, optional | |||
Array API compatible parameter for the relative tolerance component. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Frankly, the absolute most I can really agree with mentioning the Array API in our docs is as:
See Also
--------
...
array_api.asdf
Array API function this is compatible with.
The mention here adds no useful information for the vast majority of users. And not even useful information for the few users who care: They need to check the Array API docs for practically all other functions anyway!
There are two exceptions, I can accept:
- Functions that serve little or no use except for Array-API (i.e. it informs the user that unless they are interested in the Array-API, they don't need to remember this function).
- If you discourage the use of one function for new one, I am happy to mention it as one of the reasons to transition.
(I don't care enough to wish undoing existing doc-strings, although I also think that a "See Also" link would be vastly more useful anyway.)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sure! I removed the mention of Array API
in the parameter description.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
+1 for the principle outlined by @seberg here on docs. We shouldn't have to touch on the array API standard in docstrings. Probably all that's needed is a single doc page somewhere that summarizes the current state and any discrepancies. Similar to how CuPy & co have a page on where they differ from NumPy.
|
||
New keyword parameters were added to improve array API compatibility: | ||
|
||
* ``rtol`` keyword parameter was added to `numpy.linalg.martrix_rank`. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The fact that this changes the default, does need it is own release note, as it is a breaking change.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Note that this was not addressed, still need a release note nothing the BC break.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think it changes the default.
Before default tol
was defined as:
if tol is None:
tol = (
S.max(axis=-1, keepdims=True) *
max(A.shape[-2:]) *
finfo(S.dtype).eps
)
And right now it's:
if rtol is None:
rtol = max(A.shape[-2:]) * finfo(S.dtype).eps
if tol is None:
tol = S.max(axis=-1, keepdims=True) * rtol
As rtol=None
and tol=None
by default, then the default didn't change. Or am I missing something?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Right, sorry... it does change for pinv
, but not matrix rank.
84cdc51
to
ec126e9
Compare
Can we pull the |
} | ||
else { | ||
*val = NPY_FALSE; | ||
*val = NPY_STABLE_FALSE; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, but no: What we could add is a PyArray_OptionalBoolConverter
which maybe leaves the result unchanged for None
. I might just use int
for it rather than bool, so that its 0, 1 and the user can just -1...
The above breaks all existing usages of non-optional bool converters if None
is passed in!
(I thought I finally added it recently, but maybe not, or it is in an open PR).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah, right, I actually wanted to add a new function. I reverted all changes in PyArray_BoolConverter
and added PyArray_OptionalBoolConverter
, and used an int
.
The CI is failing with KeyError: 'PyArray_OptionalBoolConverter'
. Should I add this new function to code_generators/numpy_api.py
or __init__.pxd
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I wouldn't bother making it public (I just think it's so small it isn't our job to provide it to the world). That means no comment starting with /**NUMPY_API
, just add it for the internal headers.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah right, thanks! (I think I already asked you the same question some time ago)
@@ -155,6 +155,7 @@ enum NPY_TYPECHAR { | |||
* depend on the data type. | |||
*/ | |||
typedef enum { | |||
NPY_SORT_UNDEFINED=-1, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Right, I did that too for casting (in a sense), so I think it is fine. But at least add a leading underscore to tell that this isn't considered public API. (Users can't/shouldn't pass this in to a sorting function.)
(You could probably also just initialize it as -1 and compare that. C++ cares, but I don't think C does.)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sure! Changed it to _NPY_SORT_UNDEFINED
.
numpy/_core/fromnumeric.py
Outdated
If ``False`` or ``None``, the returned array may or may not maintain | ||
the relative order of ``a`` values which compare as equal (i.e., | ||
the relative order of ``a`` values which compare as equal is | ||
implementation-dependent). Default: ``None``. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For descending: drop the "must" (which is correct spec slang, but not user docs) would be enough.
Same for stable: It should just say that if True, sorting is stable so values that evaluated equal will maintain relative order.
(Maybe it could say that this selects a kind
, to at least hint it being mutually exclusive.)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done!
|
||
New keyword parameters were added to improve array API compatibility: | ||
|
||
* ``rtol`` keyword parameter was added to `numpy.linalg.martrix_rank`. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Right, sorry... it does change for pinv
, but not matrix rank.
ec126e9
to
595a12e
Compare
@ngoldbaum Sure! I removed all |
63c6baf
to
06c2faa
Compare
06c2faa
to
14b53b5
Compare
rtol
to matrix_rank
and stable
& descending
to sort
& argsort
[Array API]rtol
to matrix_rank
and stable
[Array API]
Bringing this one in as well since all comments have been addressed and |
max(A.shape[-2:]) * | ||
finfo(S.dtype).eps | ||
) | ||
tol = S.max(axis=-1, keepdims=True) * rtol |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this logic is incorrect. tol
should also have a newaxis appended in the end in this case too (i.e., the else
below should be removed and the tol = asarray(tol)[..., newaxis]
line should be run unconditionally). Right now we have:
>>> import numpy as np
>>> x = np.zeros((4, 3, 2))
>>> rtol = np.zeros((4,))
>>> np.linalg.matrix_rank(x, tol=rtol)
array([0, 0, 0, 0])
>>> np.linalg.matrix_rank(x, rtol=rtol)
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "/Users/aaronmeurer/Documents/numpy/numpy/linalg/_linalg.py", line 2083, in matrix_rank
return count_nonzero(S > tol, axis=-1)
^^^^^^^
ValueError: operands could not be broadcast together with shapes (4,2) (4,4)
The broadcasting for rtol
should be the same as for tol
. It should broadcast against the stack shape (x.shape[:-2]
).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That's right - Here's a fix for it: #25877
Hi @rgommers @ngoldbaum,
This is the last PR for Array API support. It contains:
rtol
parameter fornp.linalg.matrix_rank
, which is just an alternative fortol
, as it provides a relative tolerance component.stable
parameter fornp.sort
andnp.argsort
, which can override the sorting method tostable
(an appropriate warning is displayed if it happens).descending
parameter fornp.sort
andnp.argsort
. It only supportsFalse
, which is also the default value. ForTrue
an exception is raised advising to usenp.flip
. Sorting methods are embedded in C with dispatching to dtype-specific sorting functions and by inspecting them briefly I didn't find a simple way to adddescending
argument. WDYT?