-
Notifications
You must be signed in to change notification settings - Fork 7
/
Copy pathsample-slides.html
227 lines (216 loc) · 19.7 KB
/
sample-slides.html
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
<!DOCTYPE html>
<html lang=en>
<head>
<meta charset=utf-8>
<title>Sample meeting with slides – 28 October 2021</title>
<meta name=viewport content="width=device-width">
<link rel="stylesheet" type="text/css" title="2018" href="https://www.w3.org/StyleSheets/scribe2/public.css">
<link rel="alternate stylesheet" type="text/css" title="2004" href="https://www.w3.org/StyleSheets/base.css">
<link rel="alternate stylesheet" type="text/css" title="2004" href="https://www.w3.org/StyleSheets/public.css">
<link rel="alternate stylesheet" type="text/css" title="2004" href="https://www.w3.org/2004/02/minutes-style.css">
<link rel="alternate stylesheet" type="text/css" title="Fancy" href="https://www.w3.org/StyleSheets/scribe2/fancy.css">
<link rel="alternate stylesheet" type="text/css" title="Typewriter" href="https://www.w3.org/StyleSheets/scribe2/tt-member.css">
<script type=module src="https://w3c.github.io/i-slide/i-slide-2.js?selector=a.islide"></script>
</head>
<body>
<header>
<p><a href="https://www.w3.org/"><img src="https://www.w3.org/StyleSheets/TR/2016/logos/W3C" alt=W3C border=0 height=48 width=72></a></p>
<h1>Sample meeting with slides</h1>
<h2>28 October 2021</h2>
<nav id=links>
<a href="https://github.com/webmachinelearning/meetings/issues/18"><img alt="Agenda." title="Agenda" src="https://www.w3.org/StyleSheets/scribe2/chronometer.png"></a>
<a href="https://www.w3.org/2021/10/28-webmachinelearning-irc"><img alt="IRC log." title="IRC log" src="https://www.w3.org/StyleSheets/scribe2/text-plain.png"></a>
</nav>
</header>
<div id=prelims>
<div id=attendees>
<h2>Attendees</h2>
<dl class=intro>
<dt>Present</dt><dd>Anssi_Kostiainen, Belem_Zhang, Chai_Chaoweeraprasit, Dom, Eric_Meyer, Feng_Dai, Geun-Hyung, Geun-Hyung_Kim, Judy_Brewer, Junwei_Fu, Ningxin_Hu, Rachel_Yager, Rafael_Cintron, Takio_Yamaoka, Wanming, Zoltan_Kis</dd>
<dt>Regrets</dt><dd>-</dd>
<dt>Chair</dt><dd>Anssi</dd>
<dt>Scribe</dt><dd>Anssi, anssik, dom</dd>
</dl>
</div>
<nav id=toc>
<h2>Contents</h2>
<ol>
<li><a href="#8980">Conformance testing of WebNN API</a>
<ol>
<li><a href="#f360">Web Platform Tests</a></li>
</ol>
</li>
<li><a href="#6569">Ethical issues in using Machine Learning on the Web</a></li>
</ol>
</nav>
</div>
<main id=meeting class=meeting>
<h2>Meeting minutes</h2>
<section></section>
<section>
<h3 id=8980>Conformance testing of WebNN API</h3>
<p id=7a65 class="phone s01"><cite>Anssi:</cite> interoperability testing helps ensure compatibility among existing and future implementations<br>
<span id=61f8>… in the context of ML, reaching interop is not necessarily easy given the variety of underlying hardware</span><br>
<span id=1e8b>… Chai is involved in Microsoft DirectML and has experience in this space</span></p>
<p id=0496 class=summary>Slideset: <a href="https://lists.w3.org/Archives/Public/www-archive/2021Oct/att-0017/Conformance_Testing_of_Machine_Learning_API.pdf">https://<wbr>lists.w3.org/<wbr>Archives/<wbr>Public/<wbr>www-archive/<wbr>2021Oct/<wbr>att-0017/<wbr>Conformance_Testing_of_Machine_Learning_API.pdf</a></p>
<p id=aa18 class=summary><a class=islide data-islide-srcref="" href="https://lists.w3.org/Archives/Public/www-archive/2021Oct/att-0017/Conformance_Testing_of_Machine_Learning_API.pdf#page=1">[Slide 1]</a></p>
<p id=1973 class="phone s02"><cite>Chai:</cite> conformance testing of ML APIs is quite important</p>
<p id=b318 class=summary><a class=islide data-islide-srcref="" href="https://lists.w3.org/Archives/Public/www-archive/2021Oct/att-0017/Conformance_Testing_of_Machine_Learning_API.pdf#page=2">[Slide 2]</a></p>
<p id=b5de class="phone s02"><cite>chai:</cite> the problems can be categorized into 3 categories:<br>
<span id=6d76>… the ML models need to run on a wide variety of specialized hardware</span><br>
<span id=4311>… my work with DirectML is at the lowest level before the hardware in the windows OS</span><br>
<span id=a42d>… windows has a very broad scale of hardware</span><br>
<span id=afae>… esp with specialized accelerators</span><br>
<span id=7749>… they don't share the same architecture and have very different approach to computation</span><br>
<span id=51d0>… ensuring the quality of results across this hardware is really important</span><br>
<span id=c1af>… another issue is that most modern AI computation relies on floating point calculation</span><br>
<span id=4a10>… FP calculation with real numbers accumulate errors as you progress in the computation - that's a fact of life</span><br>
<span id=6166>… there are trimming problems which create challenges in testing the results of ML API across hardware</span><br>
<span id=ffff>… this is a daily issue in my work testing Direct ML</span></p>
<p id=0518 class=summary><a class=islide data-islide-srcref="" href="https://lists.w3.org/Archives/Public/www-archive/2021Oct/att-0017/Conformance_Testing_of_Machine_Learning_API.pdf#page=3">[Slide 3]</a></p>
<p id=c97b class="phone s02"><cite>Chai:</cite> Karen Zack's Animals vs Food prompted a an actual AI challenge<br>
<span id=b532>… humans don't have too much difficulty doing the difference, but while many models are able to perform, they tend to give results with some level of uncertainty</span><br>
<span id=dcba>… showing the importance of reliability across hardware</span></p>
<p id=3399 class=summary><a class=islide data-islide-srcref="" href="https://lists.w3.org/Archives/Public/www-archive/2021Oct/att-0017/Conformance_Testing_of_Machine_Learning_API.pdf#page=4">[Slide 4]</a></p>
<p id=2615 class="phone s02"><cite>Chai:</cite> when we run the results of ML models, there are 4 groups of variability<br>
<span id=90f4>… the most obvious one is precision differences - half vs double precision will give different results</span><br>
<span id=6cde>… most models run with single precision float, but many will run with half</span><br>
<span id=2b49>… Another bucket is hardware differences - even looking at CPU & GPUs, different chipset may have slightly different ways of computing and calculating FP operations</span><br>
<span id=3e05>… accelerators are often DSP based; some may rely on fixed point calculation, implying conversion, to very different type of formats (e.g. 12.12, 10.10)</span><br>
<span id=312b>… A third source of variability is linked to algorithmic differences</span><br>
<span id=f784>… there are different ways of implementing convolutions, leading to different results</span><br>
<span id=02ad>… Finally, there is numerical variability - even on the same hardware, running floating point calculation, there may be slight difference across runs</span><br>
<span id=0629>… and that can be amplified by issues of lossy conversion between floating point to fixed point,</span><br>
<span id=00d6>… these issues compound one with another, so there is no guarantee of reproducible results</span></p>
<p id=ba99 class=summary><a class=islide data-islide-srcref="" href="https://lists.w3.org/Archives/Public/www-archive/2021Oct/att-0017/Conformance_Testing_of_Machine_Learning_API.pdf#page=5">[Slide 5]</a></p>
<p id=10000 class="phone s02"><cite>Chai:</cite> how do we deal with that in testing?<br>
<span id=305b>… Many test frameworks use fuzzy comparison that provides an upper boundary (called epsilon) to an acceptable margin of differences</span><br>
<span id=63af>… the problem of that approach in ML is that it doesn't deal with the source of variabilities we identified</span><br>
<span id=a061>… A better way of comparing floating point values is based on ULP, unit of least precision</span><br>
<span id=cb0a>… the distance measured between consecutive floating point values</span><br>
<span id=be9e>… a comparison between the binary representation of different floating point values, applicable to any float point format</span><br>
<span id=5680>… Using ULP comparison removes the uncertainty on numerical differences</span><br>
<span id=7806>… it also mitigates the hardware varaibility in terms of architectural differences because it compares the representations</span></p>
<p id=3599 class=summary><a class=islide data-islide-srcref="" href="https://lists.w3.org/Archives/Public/www-archive/2021Oct/att-0017/Conformance_Testing_of_Machine_Learning_API.pdf#page=6">[Slide 6]</a></p>
<p id=8aad class="phone s02"><cite>Chai:</cite> this piece of code illustrates the ULP comparison<br>
<span id=6c50>… the compare function convert the floating point number into a bitwise value that is used to calculate the difference and how much ULP that represents</span><br>
<span id=d530>… e.g. here, only a difference of 1 ULP is deemed acceptable</span><br>
<span id=7a72>… We use ULP to test DirectML</span><br>
<span id=46fc>… the actual floating point values from the tests are never the same</span></p>
<p id=7219 class=summary><a class=islide data-islide-srcref="" href="https://lists.w3.org/Archives/Public/www-archive/2021Oct/att-0017/Conformance_Testing_of_Machine_Learning_API.pdf#page=7">[Slide 7]</a></p>
<p id=66ff class="phone s02"><cite>Chai:</cite> to make the comparison, you need to define a point of reference, which we call the baseline<br>
<span id=99ca>… the baseline is determined by the best known result for the computation, the ideal result</span><br>
<span id=8315>… this serves as a stable invariant</span><br>
<span id=57af>… for directML, we have computed standard results on a well-defined CPU with double precision float</span><br>
<span id=c58d>… we use that as our ideal baseline</span><br>
<span id=0e70>… we then define the tolerance in terms of ULP - the acceptable difference between what is and what should be (the baseline)</span><br>
<span id=0374>… the key ideas here are #1 use the baseline, #2 define tolerance in terms of ULP</span></p>
<p id=c799 class=summary><a class=islide data-islide-srcref="" href="https://lists.w3.org/Archives/Public/www-archive/2021Oct/att-0017/Conformance_Testing_of_Machine_Learning_API.pdf#page=8">[Slide 8]</a></p>
<p id=8e4c class="phone s02"><cite>Chai:</cite> the strategy of constructing tests can be summarized in 5 recommendations:<br>
<span id=1c47>… we recommend testing both the model and the kernels</span><br>
<span id=f267>… each operator should be tested separately, and on top of that, a set of models that exercise the API and run the results of the whole model</span><br>
<span id=acb7>… for object classification models, you would want to compare the top K results (e.g. 99% Chiwawa, 75% muffin)</span><br>
<span id=6742>… making sure e.g. the 3 top answers are similar</span><br>
<span id=e040>… it's possible to have tests passing at the kernel level, but failing at the model level</span><br>
<span id=205d>… 2nd point: define an ideal baseline and ULP-based tolerance</span><br>
<span id=53ce>… you might have to fine-tune the tolerance for different kernels</span><br>
<span id=8dd3>… e.g. addition should have very low ULP, vs square root or convolution</span></p>
<p id=1b91 class="phone s01"><cite>anssi:</cite> thanks for the presentation<br>
<span id=3d4b>… highlights how different from usual Web API testing is in the field</span><br>
<span id=1552>… most likely similarities are with GPU and graphic APIs</span><br>
<span id=ec65>… We've had some early experimentation with bringing tests to WPT, the cross-browser platform testing project that is integrated with CI</span></p>
<p id=8ee8 class="phone s03"><cite>RafaelCintron:</cite> any recommendation in terms of ULP tolerance? what does it depend on?</p>
<p id=0caa class="phone s02"><cite>Chai:</cite> simple operations like addition, low tolerance (e.g. 1 ULP)<br>
<span id=3872>… for complex operations, the tolerance needs to be higher</span><br>
<span id=0280>… sometimes, the specific range arises organically e.g. for convolution we've landed around 2-4</span><br>
<span id=c493>… different APIs have different ULP tolerance, although they're likely using similar values</span></p>
<p id=dac4 class=irc><cite><rachel></cite> is precision testing necessary for all applications?</p>
<p id=7e56 class="phone s02"><cite>Chai:</cite> strategically, the best approach is to start with low tolerance (e.g. 1 ULP), and bump it based on real-world experience</p>
<p id=f246 class="phone s04"><cite>Rachel:</cite> [from IRC] is precision testing necessary for all applications?</p>
<p id=bddd class="phone s02"><cite>Chai:</cite> yes and no<br>
<span id=bd45>… you can't test every single model</span><br>
<span id=3505>… testing the kernel, the implementation of the operators</span><br>
<span id=6aef>… with an extensive enough set of kernel testing, the model itself should end up OK</span><br>
<span id=513c>… there are rare cases where the kernel tests are passing, but a given model on a given hardware will give slightly different results</span><br>
<span id=3fda>… but the risks of that are lower if the kernels are well tested</span></p>
<p id=85bb class="phone s05"><cite>Ningxin:</cite> regarding the ideal baseline, for some operators like convolution, there can be different algorithms<br>
<span id=8b06>… what algorithm do you use for the ideal baseline?</span><br>
<span id=a216>… Applying this to WebNN may be more challenging since there is no reference implementation to use as an ideal baseline</span></p>
<p id=10001 class="phone s02"><cite>chai:</cite> for DirectML, we implement the reference implementation using the conceptual algorithm in a CPU with double precision<br>
<span id=6aff>… this is not what you would get from a real world implementation, but we use that as a reference</span><br>
<span id=1537>… For WebNN, we may end up needing a set of reference implementations to serve as a point of comparison</span><br>
<span id=ede0>… there is no shortcut around that</span><br>
<span id=7eda>… having some open source code available somewhere would be good</span><br>
<span id=82a1>… but no matter what, you have to establish the ideal goal post</span></p>
<h4 id=f360>Web Platform Tests</h4>
<p id=a7cc class="phone s06"><cite>FengDai:</cite> I work on testing for WebNN API and have a few slides on status for WPT tests</p>
<p id=3bad class=summary>Slideset: fengdaislides</p>
<p id=bef8 class=summary>[slide 3]</p>
<p id=2e6f class="phone s06"><cite>FengDai:</cite> 353 tests available for idlharess<br>
<span id=e485>… we've ported 800 test cases built for the WebNN polyfill to the WPT harness</span><br>
<span id=fb44>… this includes 740 operator tests (340 from ONNX, 400 from Android NNAPI)</span></p>
<p id=0660 class=summary><a href="https://brucedai.github.io/wpt/webnn/">WebNN WPT tests (preview in staging)</a></p>
<p id=5b1e class="phone s06"><cite>FengDai:</cite> for 60 models tests use baseline calculated from native frameworks<br>
<span id=3027>… the tests are available as preview on my github repo</span></p>
<p id=8b07 class=summary>[slide 4]</p>
<p id=2de1 class="phone s01"><cite>Anssi:</cite> thanks for the great work - the pull request is under review, correct?<br>
<span id=5022>… any blocker?</span></p>
<p id=5c01 class="phone s06"><cite>FengDai:</cite> there are different accuracy settings, data types across tests<br>
<span id=2391>… this matches the challenges Chai mentioned</span></p>
<p id=db7f class="phone s01"><cite>Anssi:</cite> the good next step might to join one of the WG meeting to discuss this in more details</p>
<p id=c741 class="phone s02"><cite>Chai:</cite> thanks Bruce for the work! WPT right now relies on fuzzy comparison<br>
<span id=88c7>… this means we'll need to change WPT to incorporate ULP comparison</span><br>
<span id=fecd>… hopefully that shouldn't be too much code change</span></p>
<p id=cc92 class="phone s06"><cite>FengDai:</cite> thanks, indeed</p>
</section>
<section>
<h3 id=6569>Ethical issues in using Machine Learning on the Web</h3>
<p id=faff class=summary><a href="https://webmachinelearning.github.io/ethical-webmachinelearning/">Ethical Web Machine Learning Editors draft</a></p>
<p id=cd73 class="phone s01"><cite>Anssi:</cite> this is a document that I put in place a few weeks ago<br>
<span id=6ec4>… the WG per its charter is committed to document ethical issues in using ML on the Web as a WG Note</span><br>
<span id=4970>… this is a first stab</span><br>
<span id=dc9e>… big disclaimer: I'M NOT AN EXPERT IN ETHICS</span></p>
<p id=82fe class=summary><a href="https://webmachinelearning.github.io/ethical-webmachinelearning/">Ethical Web Machine Learning</a></p>
<p id=7321 class="phone s01"><cite>Anssi:</cite> we're looking for people with expertise to help<br>
<span id=062f>… this hasn't been reviewed by the group yet</span><br>
<span id=1d1c>… [reviews the content of the document]</span><br>
<span id=4135>… ML is a powerful technology, enables new compelling UX that were thought as magic and are now becoming commonplace</span><br>
<span id=9a9a>… these technologies are reshaping the world</span><br>
<span id=9113>… the algorithms that underline ML are largely invisible to users, opaque and sometimes wrong</span><br>
<span id=dcc8>… they cannot be introspected but sometimes are assumed to be always trustworthy</span><br>
<span id=6ff4>… this is why it is important to consider ethical issues in the design phase of the technology</span><br>
<span id=ae02>… it's important that we understand the limitations of the technology</span><br>
<span id=2f8f>… the document then reviews different branches of ethics: information ethics, computer ethics, machine ethics</span><br>
<span id=5a8b>… there is related work in W3C</span><br>
<span id=597d>… e.g. the horizontal review work on privacy, accessibility</span><br>
<span id=13d1>… and the TAG work on ethical web principles</span></p>
<p id=a1d6 class=summary><a href="https://www.w3.org/Privacy/">Privacy-by-design web standards</a></p>
<p id=46cb class=summary><a href="https://www.w3.org/standards/webdesign/accessibility">Accessibility techniques to support social inclusion</a></p>
<p id=b9b4 class=summary><a href="https://w3ctag.github.io/ethical-web-principles/">W3C TAG Ethical Web Principles</a></p>
<p id=15e2 class="phone s01"><cite>Anssi:</cite> the document is focusing on ethical issues at the intersection of Web & ML<br>
<span id=478a>… there are positive aspects to client-side ML: increased privacy, and reduced risk of single-point-of-failure and distributed control</span><br>
<span id=a4fc>… it allows to bring progressive enhancement in this space</span><br>
<span id=0e03>… Browsers may also help increasing transparency, pushing for greater explainability</span><br>
<span id=7bae>… in the spirit of "view source"</span><br>
<span id=a04d>… I've looked at different litterature studies in this space</span></p>
<p id=e575 class="phone s04"><cite>Rachel:</cite> I'm interested in this and suggesting including a research into thinking of corporations<br>
<span id=7c18>… many companies have efforts for responsible AI, so engaging with them is interesting</span><br>
<span id=a53b>… focusing on human perspective of this may be a good focus</span><br>
<span id=9c8a>… can work with W3C Chapter to bring interested folks from that group into this discussion</span></p>
</section>
</main>
<address>Minutes manually created (not a transcript), formatted by <a
href="https://w3c.github.io/scribe2/scribedoc.html"
>scribe.perl</a> version 243 (Thu Feb 27 00:32:23 2025 UTC).</address>
<div class=diagnostics>
<h2>Diagnostics</h2>
<p class=warning>Succeeded: s/Fang/Feng</p>
<p class=warning>Succeeded: s/fferent/fference/</p>
<p class=warning>Succeeded: s|chaislides|https://lists.w3.org/Archives/Public/www-archive/2021Oct/att-0017/Conformance_Testing_of_Machine_Learning_API.pdf</p>
<p class=warning>Succeeded: s/powerful document/powerful technology</p>
<p class=warning>Maybe present: Anssi, Chai, FengDai, Ningxin, Rachel, RafaelCintron</p>
<p class=warning>All speakers: Anssi, Chai, FengDai, Ningxin, Rachel, RafaelCintron</p>
<p class=warning>Active on IRC: anssik, Chai, dom, Geun-Hyung, ningxin_hu, rachel, RafaelCintron</p>
</div>
</body>
</html>