-
Notifications
You must be signed in to change notification settings - Fork 6.7k
[MXNET-72] Improve sparse sgd on GPU #10293
Conversation
| mom_data[data_i] = momentum * mom_data[data_i] | ||
| - rate * weight_data[data_i] | ||
| - lr * | ||
| mshadow_op::clip::Map(rescale_grad * grad_data[i], |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why a line break here?
| mom_data[data_i] = momentum * mom_data[data_i] | ||
| - rate * weight_data[data_i] | ||
| - lr * | ||
| mshadow_op::clip::Map(rescale_grad * grad, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why line break here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
no particular reason. No new line will make it a 200 character line.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I mean the extra line break for lr * mshadow_op::clip::Map, these two places are inconsistent with what you have on line 52 of optimizer_op.cu below.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Don't think it's necessary to add/remove that extra line break. Please provide constructive feedbacks/review comments
* gpu kernels * update warning msg
* gpu kernels * update warning msg
Description
Apply the same optimization in #10062 for sparse sgd.
@ZiyueHuang @haojin2
Checklist
Essentials
Please feel free to remove inapplicable items for your PR.
Changes
Comments