sample-slides.html

<!DOCTYPE html>
<html lang=en>
<head>
<meta charset=utf-8>
<title>Sample meeting with slides &ndash; 28 October 2021</title>
<meta name=viewport content="width=device-width">
<link rel="stylesheet" type="text/css" title="2018" href="https://www.w3.org/StyleSheets/scribe2/public.css">
<link rel="alternate stylesheet" type="text/css" title="2004" href="https://www.w3.org/StyleSheets/base.css">
<link rel="alternate stylesheet" type="text/css" title="2004" href="https://www.w3.org/StyleSheets/public.css">
<link rel="alternate stylesheet" type="text/css" title="2004" href="https://www.w3.org/2004/02/minutes-style.css">
<link rel="alternate stylesheet" type="text/css" title="Fancy" href="https://www.w3.org/StyleSheets/scribe2/fancy.css">
<link rel="alternate stylesheet" type="text/css" title="Typewriter" href="https://www.w3.org/StyleSheets/scribe2/tt-member.css">
<script type=module src="https://w3c.github.io/i-slide/i-slide-2.js?selector=a.islide"></script>
</head>

<body>
<header>
<p><a href="https://www.w3.org/"><img src="https://www.w3.org/StyleSheets/TR/2016/logos/W3C" alt=W3C border=0 height=48 width=72></a></p>

<h1>Sample meeting with slides</h1>
<h2>28 October 2021</h2>

<nav id=links>
<a href="https://github.com/webmachinelearning/meetings/issues/18"><img alt="Agenda." title="Agenda" src="https://www.w3.org/StyleSheets/scribe2/chronometer.png"></a>
<a href="https://www.w3.org/2021/10/28-webmachinelearning-irc"><img alt="IRC log." title="IRC log" src="https://www.w3.org/StyleSheets/scribe2/text-plain.png"></a>
</nav>
</header>

<div id=prelims>
<div id=attendees>
<h2>Attendees</h2>
<dl class=intro>
<dt>Present</dt><dd>Anssi_Kostiainen, Belem_Zhang, Chai_Chaoweeraprasit, Dom, Eric_Meyer, Feng_Dai, Geun-Hyung, Geun-Hyung_Kim, Judy_Brewer, Junwei_Fu, Ningxin_Hu, Rachel_Yager, Rafael_Cintron, Takio_Yamaoka, Wanming, Zoltan_Kis</dd>
<dt>Regrets</dt><dd>-</dd>
<dt>Chair</dt><dd>Anssi</dd>
<dt>Scribe</dt><dd>Anssi, anssik, dom</dd>
</dl>
</div>

<nav id=toc>
<h2>Contents</h2>
<ol>
<li><a href="#8980">Conformance testing of WebNN API</a>
<ol>
<li><a href="#f360">Web Platform Tests</a></li>
</ol>
</li>
<li><a href="#6569">Ethical issues in using Machine Learning on the Web</a></li>
</ol>
</nav>
</div>

<main id=meeting class=meeting>
<h2>Meeting minutes</h2>
<section></section>

<section>
<h3 id=8980>Conformance testing of WebNN API</h3>
<p id=7a65 class="phone s01"><cite>Anssi:</cite> interoperability testing helps ensure compatibility among existing and future implementations<br>
<span id=61f8>… in the context of ML, reaching interop is not necessarily easy given the variety of underlying hardware</span><br>
<span id=1e8b>… Chai is involved in Microsoft DirectML and has experience in this space</span></p>
<p id=0496 class=summary>Slideset: <a href="https://lists.w3.org/Archives/Public/www-archive/2021Oct/att-0017/Conformance_Testing_of_Machine_Learning_API.pdf">https://<wbr>lists.w3.org/<wbr>Archives/<wbr>Public/<wbr>www-archive/<wbr>2021Oct/<wbr>att-0017/<wbr>Conformance_Testing_of_Machine_Learning_API.pdf</a></p>
<p id=aa18 class=summary><a class=islide data-islide-srcref="" href="https://lists.w3.org/Archives/Public/www-archive/2021Oct/att-0017/Conformance_Testing_of_Machine_Learning_API.pdf#page=1">[Slide 1]</a></p>
<p id=1973 class="phone s02"><cite>Chai:</cite> conformance testing of ML APIs is quite important</p>
<p id=b318 class=summary><a class=islide data-islide-srcref="" href="https://lists.w3.org/Archives/Public/www-archive/2021Oct/att-0017/Conformance_Testing_of_Machine_Learning_API.pdf#page=2">[Slide 2]</a></p>
<p id=b5de class="phone s02"><cite>chai:</cite> the problems can be categorized into 3 categories:<br>
<span id=6d76>… the ML models need to run on a wide variety of specialized hardware</span><br>
<span id=4311>… my work with DirectML is at the lowest level before the hardware in the windows OS</span><br>
<span id=a42d>… windows has a very broad scale of hardware</span><br>
<span id=afae>… esp with specialized accelerators</span><br>
<span id=7749>… they don't share the same architecture and have very different approach to computation</span><br>
<span id=51d0>… ensuring the quality of results across this hardware is really important</span><br>
<span id=c1af>… another issue is that most modern AI computation relies on floating point calculation</span><br>
<span id=4a10>… FP calculation with real numbers accumulate errors as you progress in the computation - that's a fact of life</span><br>
<span id=6166>… there are trimming problems which create challenges in testing the results of ML API across hardware</span><br>
<span id=ffff>… this is a daily issue in my work testing Direct ML</span></p>
<p id=0518 class=summary><a class=islide data-islide-srcref="" href="https://lists.w3.org/Archives/Public/www-archive/2021Oct/att-0017/Conformance_Testing_of_Machine_Learning_API.pdf#page=3">[Slide 3]</a></p>
<p id=c97b class="phone s02"><cite>Chai:</cite> Karen Zack's Animals vs Food prompted a an actual AI challenge<br>
<span id=b532>… humans don't have too much difficulty doing the difference, but while many models are able to perform, they tend to give results with some level of uncertainty</span><br>
<span id=dcba>… showing the importance of reliability across hardware</span></p>
<p id=3399 class=summary><a class=islide data-islide-srcref="" href="https://lists.w3.org/Archives/Public/www-archive/2021Oct/att-0017/Conformance_Testing_of_Machine_Learning_API.pdf#page=4">[Slide 4]</a></p>
<p id=2615 class="phone s02"><cite>Chai:</cite> when we run the results of ML models, there are 4 groups of variability<br>
<span id=90f4>… the most obvious one is precision differences - half vs double precision will give different results</span><br>
<span id=6cde>… most models run with single precision float, but many will run with half</span><br>
<span id=2b49>… Another bucket is hardware differences - even looking at CPU &amp; GPUs, different chipset may have slightly different ways of computing and calculating FP operations</span><br>
<span id=3e05>… accelerators are often DSP based; some may rely on fixed point calculation, implying conversion, to very different type of formats (e.g. 12.12, 10.10)</span><br>
<span id=312b>… A third source of variability is linked to algorithmic differences</span><br>
<span id=f784>… there are different ways of implementing convolutions, leading to different results</span><br>
<span id=02ad>… Finally, there is numerical variability - even on the same hardware, running floating point calculation, there may be slight difference across runs</span><br>
<span id=0629>… and that can be amplified by issues of lossy conversion between floating point to fixed point,</span><br>
<span id=00d6>… these issues compound one with another, so there is no guarantee of reproducible results</span></p>
<p id=ba99 class=summary><a class=islide data-islide-srcref="" href="https://lists.w3.org/Archives/Public/www-archive/2021Oct/att-0017/Conformance_Testing_of_Machine_Learning_API.pdf#page=5">[Slide 5]</a></p>
<p id=10000 class="phone s02"><cite>Chai:</cite> how do we deal with that in testing?<br>
<span id=305b>… Many test frameworks use fuzzy comparison that provides an upper boundary (called epsilon) to an acceptable margin of differences</span><br>
<span id=63af>… the problem of that approach in ML is that it doesn't deal with the source of variabilities we identified</span><br>
<span id=a061>… A better way of comparing floating point values is based on ULP, unit of least precision</span><br>
<span id=cb0a>… the distance measured between consecutive floating point values</span><br>
<span id=be9e>… a comparison between the binary representation of different floating point values, applicable to any float point format</span><br>
<span id=5680>… Using ULP comparison removes the uncertainty on numerical differences</span><br>
<span id=7806>… it also mitigates the hardware varaibility in terms of architectural differences because it compares the representations</span></p>
<p id=3599 class=summary><a class=islide data-islide-srcref="" href="https://lists.w3.org/Archives/Public/www-archive/2021Oct/att-0017/Conformance_Testing_of_Machine_Learning_API.pdf#page=6">[Slide 6]</a></p>
<p id=8aad class="phone s02"><cite>Chai:</cite> this piece of code illustrates the ULP comparison<br>
<span id=6c50>… the compare function convert the floating point number into a bitwise value that is used to calculate the difference and how much ULP that represents</span><br>
<span id=d530>… e.g. here, only a difference of 1 ULP is deemed acceptable</span><br>
<span id=7a72>… We use ULP to test DirectML</span><br>
<span id=46fc>… the actual floating point values from the tests are never the same</span></p>
<p id=7219 class=summary><a class=islide data-islide-srcref="" href="https://lists.w3.org/Archives/Public/www-archive/2021Oct/att-0017/Conformance_Testing_of_Machine_Learning_API.pdf#page=7">[Slide 7]</a></p>
<p id=66ff class="phone s02"><cite>Chai:</cite> to make the comparison, you need to define a point of reference, which we call the baseline<br>
<span id=99ca>… the baseline is determined by the best known result for the computation, the ideal result</span><br>
<span id=8315>… this serves as a stable invariant</span><br>
<span id=57af>… for directML, we have computed standard results on a well-defined CPU with double precision float</span><br>
<span id=c58d>… we use that as our ideal baseline</span><br>
<span id=0e70>… we then define the tolerance in terms of ULP - the acceptable difference between what is and what should be (the baseline)</span><br>
<span id=0374>… the key ideas here are #1 use the baseline, #2 define tolerance in terms of ULP</span></p>
<p id=c799 class=summary><a class=islide data-islide-srcref="" href="https://lists.w3.org/Archives/Public/www-archive/2021Oct/att-0017/Conformance_Testing_of_Machine_Learning_API.pdf#page=8">[Slide 8]</a></p>
<p id=8e4c class="phone s02"><cite>Chai:</cite> the strategy of constructing tests can be summarized in 5 recommendations:<br>
<span id=1c47>… we recommend testing both the model and the kernels</span><br>
<span id=f267>… each operator should be tested separately, and on top of that, a set of models that exercise the API and run the results of the whole model</span><br>
<span id=acb7>… for object classification models, you would want to compare the top K results (e.g. 99% Chiwawa, 75% muffin)</span><br>
<span id=6742>… making sure e.g. the 3 top answers are similar</span><br>
<span id=e040>… it's possible to have tests passing at the kernel level, but failing at the model level</span><br>
<span id=205d>… 2nd point: define an ideal baseline and ULP-based tolerance</span><br>
<span id=53ce>… you might have to fine-tune the tolerance for different kernels</span><br>
<span id=8dd3>… e.g. addition should have very low ULP, vs square root or convolution</span></p>
<p id=1b91 class="phone s01"><cite>anssi:</cite> thanks for the presentation<br>
<span id=3d4b>… highlights how different from usual Web API testing is in the field</span><br>
<span id=1552>… most likely similarities are with GPU and graphic APIs</span><br>
<span id=ec65>… We've had some early experimentation with bringing tests to WPT, the cross-browser platform testing project that is integrated with CI</span></p>
<p id=8ee8 class="phone s03"><cite>RafaelCintron:</cite> any recommendation in terms of ULP tolerance? what does it depend on?</p>
<p id=0caa class="phone s02"><cite>Chai:</cite> simple operations like addition, low tolerance (e.g. 1 ULP)<br>
<span id=3872>… for complex operations, the tolerance needs to be higher</span><br>
<span id=0280>… sometimes, the specific range arises organically e.g. for convolution we've landed around 2-4</span><br>
<span id=c493>… different APIs have different ULP tolerance, although they're likely using similar values</span></p>
<p id=dac4 class=irc><cite>&lt;rachel&gt;</cite> is precision testing necessary for all applications?</p>
<p id=7e56 class="phone s02"><cite>Chai:</cite> strategically, the best approach is to start with low tolerance (e.g. 1 ULP), and bump it based on real-world experience</p>
<p id=f246 class="phone s04"><cite>Rachel:</cite> [from IRC] is precision testing necessary for all applications?</p>
<p id=bddd class="phone s02"><cite>Chai:</cite> yes and no<br>
<span id=bd45>… you can't test every single model</span><br>
<span id=3505>… testing the kernel, the implementation of the operators</span><br>
<span id=6aef>… with an extensive enough set of kernel testing, the model itself should end up OK</span><br>
<span id=513c>… there are rare cases where the kernel tests are passing, but a given model on a given hardware will give slightly different results</span><br>
<span id=3fda>… but the risks of that are lower if the kernels are well tested</span></p>
<p id=85bb class="phone s05"><cite>Ningxin:</cite> regarding the ideal baseline, for some operators like convolution, there can be different algorithms<br>
<span id=8b06>… what algorithm do you use for the ideal baseline?</span><br>
<span id=a216>… Applying this to WebNN may be more challenging since there is no reference implementation to use as an ideal baseline</span></p>
<p id=10001 class="phone s02"><cite>chai:</cite> for DirectML, we implement the reference implementation using the conceptual algorithm in a CPU with double precision<br>
<span id=6aff>… this is not what you would get from a real world implementation, but we use that as a reference</span><br>
<span id=1537>… For WebNN, we may end up needing a set of reference implementations to serve as a point of comparison</span><br>
<span id=ede0>… there is no shortcut around that</span><br>
<span id=7eda>… having some open source code available somewhere would be good</span><br>
<span id=82a1>… but no matter what, you have to establish the ideal goal post</span></p>
<h4 id=f360>Web Platform Tests</h4>
<p id=a7cc class="phone s06"><cite>FengDai:</cite> I work on testing for WebNN API and have a few slides on status for WPT tests</p>
<p id=3bad class=summary>Slideset: fengdaislides</p>
<p id=bef8 class=summary>[slide 3]</p>
<p id=2e6f class="phone s06"><cite>FengDai:</cite> 353 tests available for idlharess<br>
<span id=e485>… we've ported 800 test cases built for the WebNN polyfill to the WPT harness</span><br>
<span id=fb44>… this includes 740 operator tests (340 from ONNX, 400 from Android NNAPI)</span></p>
<p id=0660 class=summary><a href="https://brucedai.github.io/wpt/webnn/">WebNN WPT tests (preview in staging)</a></p>
<p id=5b1e class="phone s06"><cite>FengDai:</cite> for 60 models tests use baseline calculated from native frameworks<br>
<span id=3027>… the tests are available as preview on my github repo</span></p>
<p id=8b07 class=summary>[slide 4]</p>
<p id=2de1 class="phone s01"><cite>Anssi:</cite> thanks for the great work - the pull request is under review, correct?<br>
<span id=5022>… any blocker?</span></p>
<p id=5c01 class="phone s06"><cite>FengDai:</cite> there are different accuracy settings, data types across tests<br>
<span id=2391>… this matches the challenges Chai mentioned</span></p>
<p id=db7f class="phone s01"><cite>Anssi:</cite> the good next step might to join one of the WG meeting to discuss this in more details</p>
<p id=c741 class="phone s02"><cite>Chai:</cite> thanks Bruce for the work! WPT right now relies on fuzzy comparison<br>
<span id=88c7>… this means we'll need to change WPT to incorporate ULP comparison</span><br>
<span id=fecd>… hopefully that shouldn't be too much code change</span></p>
<p id=cc92 class="phone s06"><cite>FengDai:</cite> thanks, indeed</p>
</section>

<section>
<h3 id=6569>Ethical issues in using Machine Learning on the Web</h3>
<p id=faff class=summary><a href="https://webmachinelearning.github.io/ethical-webmachinelearning/">Ethical Web Machine Learning Editors draft</a></p>
<p id=cd73 class="phone s01"><cite>Anssi:</cite> this is a document that I put in place a few weeks ago<br>
<span id=6ec4>… the WG per its charter is committed to document ethical issues in using ML on the Web as a WG Note</span><br>
<span id=4970>… this is a first stab</span><br>
<span id=dc9e>… big disclaimer: I'M NOT AN EXPERT IN ETHICS</span></p>
<p id=82fe class=summary><a href="https://webmachinelearning.github.io/ethical-webmachinelearning/">Ethical Web Machine Learning</a></p>
<p id=7321 class="phone s01"><cite>Anssi:</cite> we're looking for people with expertise to help<br>
<span id=062f>… this hasn't been reviewed by the group yet</span><br>
<span id=1d1c>… [reviews the content of the document]</span><br>
<span id=4135>… ML is a powerful technology, enables new compelling UX that were thought as magic and are now becoming commonplace</span><br>
<span id=9a9a>… these technologies are reshaping the world</span><br>
<span id=9113>… the algorithms that underline ML are largely invisible to users, opaque and sometimes wrong</span><br>
<span id=dcc8>… they cannot be introspected but sometimes are assumed to be always trustworthy</span><br>
<span id=6ff4>… this is why it is important to consider ethical issues in the design phase of the technology</span><br>
<span id=ae02>… it's important that we understand the limitations of the technology</span><br>
<span id=2f8f>… the document then reviews different branches of ethics: information ethics, computer ethics, machine ethics</span><br>
<span id=5a8b>… there is related work in W3C</span><br>
<span id=597d>… e.g. the horizontal review work on privacy, accessibility</span><br>
<span id=13d1>… and the TAG work on ethical web principles</span></p>
<p id=a1d6 class=summary><a href="https://www.w3.org/Privacy/">Privacy-by-design web standards</a></p>
<p id=46cb class=summary><a href="https://www.w3.org/standards/webdesign/accessibility">Accessibility techniques to support social inclusion</a></p>
<p id=b9b4 class=summary><a href="https://w3ctag.github.io/ethical-web-principles/">W3C TAG Ethical Web Principles</a></p>
<p id=15e2 class="phone s01"><cite>Anssi:</cite> the document is focusing on ethical issues at the intersection of Web &amp; ML<br>
<span id=478a>… there are positive aspects to client-side ML: increased privacy, and reduced risk of single-point-of-failure and distributed control</span><br>
<span id=a4fc>… it allows to bring progressive enhancement in this space</span><br>
<span id=0e03>… Browsers may also help increasing transparency, pushing for greater explainability</span><br>
<span id=7bae>… in the spirit of &quot;view source&quot;</span><br>
<span id=a04d>… I've looked at different litterature studies in this space</span></p>
<p id=e575 class="phone s04"><cite>Rachel:</cite> I'm interested in this and suggesting including a research into thinking of corporations<br>
<span id=7c18>… many companies have efforts for responsible AI, so engaging with them is interesting</span><br>
<span id=a53b>… focusing on human perspective of this may be a good focus</span><br>
<span id=9c8a>… can work with W3C Chapter to bring interested folks from that group into this discussion</span></p>
</section>
</main>


<address>Minutes manually created (not a transcript), formatted by <a
href="https://w3c.github.io/scribe2/scribedoc.html"
>scribe.perl</a> version 243 (Thu Feb 27 00:32:23 2025 UTC).</address>

<div class=diagnostics>
<h2>Diagnostics</h2>
<p class=warning>Succeeded: s/Fang/Feng</p>
<p class=warning>Succeeded: s/fferent/fference/</p>
<p class=warning>Succeeded: s|chaislides|https://lists.w3.org/Archives/Public/www-archive/2021Oct/att-0017/Conformance_Testing_of_Machine_Learning_API.pdf</p>
<p class=warning>Succeeded: s/powerful document/powerful technology</p>
<p class=warning>Maybe present: Anssi, Chai, FengDai, Ningxin, Rachel, RafaelCintron</p>
<p class=warning>All speakers: Anssi, Chai, FengDai, Ningxin, Rachel, RafaelCintron</p>
<p class=warning>Active on IRC: anssik, Chai, dom, Geun-Hyung, ningxin_hu, rachel, RafaelCintron</p>
</div>
</body>
</html>