2
Most read
8
Most read
9
Most read
Automatically+Fusing+Functions+on+CuPy
Akifumi Imanishi
What’s'CuPy
• An'implementation'of'NumPy6compatible
multi6dimensional'array'on'CUDA
• CuPy enables'us'to'write'Python'Codes
for'running'on'GPU.
• Two'basic'operations
• elementwise
• Applying'the'function'to'each'element
• reduction
• Reducing'elements
Problems'of'CuPy
• Small'functions'are'called'many'times.
• Communication'time'between'CPU'and'GPU'is'a'
bottleneck.
• A'mechanism'of'fusing'functions'is'needed'to'resolve'it.
• ex.)':''x'*'y'+'z'*'3'+'5
• There'are'4'kernel'calls'in'total.
• We'want'to'calculate'the'expression'in'1'kernel'call.
UI'for'elementwise'kernel
• Converting'a'Python'function'to'an'Elementwise.
• ex.)
Constructing'a'Data'Structure
3 5
*
*
+
+
x y z
Generating'an'Elementwise
UI'for'reduction'kernel
• Converting'a'Python'function'to'a'ReductionKernel.
• ex.)
Rewrite'adam.py by'using'”fuse”
Results
• chainer/optimizers/adam.py (update_one_gpu)
• chainer/example/mnist/train_mnist.py
Memory'usage'(MiB)
Ufunc 225
Elementwise 211
Fusion 211
78.656
62.430 62.874
55.000
60.000
65.000
70.000
75.000
80.000
85.000
Ufunc Elementwise fusion
Running'times
Memory'usage

Automatically Fusing Functions on CuPy