-
-
Notifications
You must be signed in to change notification settings - Fork 25.8k
[MRG+1] BUG: MultiLabelBinarizer.fit_transform sometimes returns an invalid CSR matrix #7750
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
See scipy/scipy#6719 for context. The gist is that the `inverse` array may have a different dtype than `yt.indices`, which causes trouble down the line because, in those cases, `yt.indices` and `yt.indptr` have different dtypes. Alternately, we could insert `yt.check_format(full_check=False)` after modifying the sparse matrix members.
Thanks. can you add a test please? |
Older versions don't support kwargs for `astype`
@@ -732,7 +732,7 @@ def fit_transform(self, y): | |||
class_mapping = np.empty(len(tmp), dtype=dtype) | |||
class_mapping[:] = tmp | |||
self.classes_, inverse = np.unique(class_mapping, return_inverse=True) | |||
yt.indices = inverse[yt.indices].astype(yt.indices.dtype, copy=False) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
you could use np.asarray(..., dtype=)
to more closely reflect this operation, I think
Yes, it looks like that's the sort of test we should perform wherever we do CSR manipulation. Hacky hacky hack hack. Thanks @perimosocordiae. |
Tests pass now. Should I squash the commits? |
no need. |
Maybe add a comment why the assert is needed and why the line is needed? It's a bit non-obvious to me. |
Opened #7762 to track the overall problem. |
[ci skip]
[ci skip]
Okay, comments added. I skipped the full CI treatment for the comment-only changes. Conversion to LIL format would test the symptom but not the cause of the error. It's possible that we may add checks in scipy to deal with this in the future, so I'd prefer to not rely on |
thanks :) |
…nvalid CSR matrix (scikit-learn#7750) * BUG: MultiLabelBinarizer makes invalid CSR matrix See scipy/scipy#6719 for context. The gist is that the `inverse` array may have a different dtype than `yt.indices`, which causes trouble down the line because, in those cases, `yt.indices` and `yt.indptr` have different dtypes. Alternately, we could insert `yt.check_format(full_check=False)` after modifying the sparse matrix members. * Fixing for old numpy Older versions don't support kwargs for `astype` * Adding tests * line-wrapping * adding comment to tests [ci skip] * added rationale comment [ci skip]
…nvalid CSR matrix (scikit-learn#7750) * BUG: MultiLabelBinarizer makes invalid CSR matrix See scipy/scipy#6719 for context. The gist is that the `inverse` array may have a different dtype than `yt.indices`, which causes trouble down the line because, in those cases, `yt.indices` and `yt.indptr` have different dtypes. Alternately, we could insert `yt.check_format(full_check=False)` after modifying the sparse matrix members. * Fixing for old numpy Older versions don't support kwargs for `astype` * Adding tests * line-wrapping * adding comment to tests [ci skip] * added rationale comment [ci skip]
…nvalid CSR matrix (scikit-learn#7750) * BUG: MultiLabelBinarizer makes invalid CSR matrix See scipy/scipy#6719 for context. The gist is that the `inverse` array may have a different dtype than `yt.indices`, which causes trouble down the line because, in those cases, `yt.indices` and `yt.indptr` have different dtypes. Alternately, we could insert `yt.check_format(full_check=False)` after modifying the sparse matrix members. * Fixing for old numpy Older versions don't support kwargs for `astype` * Adding tests * line-wrapping * adding comment to tests [ci skip] * added rationale comment [ci skip]
…nvalid CSR matrix (scikit-learn#7750) * BUG: MultiLabelBinarizer makes invalid CSR matrix See scipy/scipy#6719 for context. The gist is that the `inverse` array may have a different dtype than `yt.indices`, which causes trouble down the line because, in those cases, `yt.indices` and `yt.indptr` have different dtypes. Alternately, we could insert `yt.check_format(full_check=False)` after modifying the sparse matrix members. * Fixing for old numpy Older versions don't support kwargs for `astype` * Adding tests * line-wrapping * adding comment to tests [ci skip] * added rationale comment [ci skip]
…nvalid CSR matrix (scikit-learn#7750) * BUG: MultiLabelBinarizer makes invalid CSR matrix See scipy/scipy#6719 for context. The gist is that the `inverse` array may have a different dtype than `yt.indices`, which causes trouble down the line because, in those cases, `yt.indices` and `yt.indptr` have different dtypes. Alternately, we could insert `yt.check_format(full_check=False)` after modifying the sparse matrix members. * Fixing for old numpy Older versions don't support kwargs for `astype` * Adding tests * line-wrapping * adding comment to tests [ci skip] * added rationale comment [ci skip]
See scipy/scipy#6719 for context.
The gist is that the
inverse
array may have a different dtype thanyt.indices
, which causes trouble down the line because, in those cases,yt.indices
andyt.indptr
have different dtypes.Alternately, we could insert
yt.check_format(full_check=False)
after modifying the sparse matrix members.