-
Notifications
You must be signed in to change notification settings - Fork 6.7k
eye operator, for default storage type #9770
Conversation
| assert_almost_equal(arange_out.asnumpy(), np.arange(0, 20)) | ||
| N_array = np.random.randint(1, high=3, size=3) | ||
| M_array = np.random.randint(1, high=3, size=3) | ||
| k_array = np.random.randint(-5, high=5, size=3) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could you run some more trials?
At least locally & with larger ranges. Just to verify if there are edge cases
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also I think there is the case where M is 0
src/operator/tensor/init_op.h
Outdated
| template<typename DType> | ||
| MSHADOW_XINLINE static void Map(int i, DType* out_data, const nnvm::dim_t num_cols, | ||
| const nnvm::dim_t k) { | ||
| if ((i % num_cols) == ((i / num_cols) + k)) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks like two divides per value-fill. Would it be faster ( at least for CPU), to fill with 0's and then "walk" across an offset (using only add) to fill in the one's?
|
@ZiyueHuang ping |
|
Thanks for your comments. Sorry for the late response. @piiswrong I increased the size and the number of trials and added M=0 case in unittest. I also run 1000 times locally(ubuntu). @cjolivier01 I changed to two kernels. About 10 times faster on cpu. |
src/operator/tensor/init_op.h
Outdated
| if (nnz > 0) { | ||
| Kernel<eye_dns_fill<req_type>, xpu>::Launch( | ||
| s, nnz, out_data.dptr<DType>(), | ||
| std::max((nnvm::dim_t)0, param.k), param.k, num_cols); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit:
static_cast<nnvm::dim_t>(0)or
0LLalthough the latter sort of ties dim_t to a specific type.
* eye * more test * change to two kernels * address comments * Update init_op.h
* eye * more test * change to two kernels * address comments * Update init_op.h
Description
eye operator, for default storage type
Requested in https://discuss.mxnet.io/t/is-there-an-eye-function-in-the-ndarray-api/526/5
cc @eric-haibin-lin
Checklist
Essentials
make lint)Changes
Comments