<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom">
  <channel>
    <title>Unweaving the Web</title>
    <description>Technical articles about my work at Igalia</description>
    <link>http://blogs.igalia.com/dpino/</link>
    <atom:link href="http://blogs.igalia.com/dpino/feed.xml" rel="self" type="application/rss+xml" />
    <pubDate>Wed, 30 Nov 2022 15:53:36 +0000</pubDate>
    <lastBuildDate>Wed, 30 Nov 2022 15:53:36 +0000</lastBuildDate>
    <generator>Jekyll v3.4.0</generator>
    
      <item>
        <title>Renderization of Conic gradients</title>
        <description>&lt;p&gt;The &lt;a href=&quot;https://www.w3.org/TR/css-images-4/&quot;&gt;CSS Images Module Level 4&lt;/a&gt; introduced a new type of gradient: &lt;code&gt;conic-gradient&lt;/code&gt;. Until then, there were only two other type of gradients available on the Web: &lt;code&gt;linear-gradient&lt;/code&gt; and &lt;code&gt;radial-gradient&lt;/code&gt;.&lt;/p&gt;

&lt;p&gt;The first browser to ship &lt;code&gt;conic-gradient&lt;/code&gt; support was Google Chrome, around March 2018. A few months after, September 2018, the feature was available in Safari. Firefox have been missing support until now, although an implementation is on the way and will ship soon. In the case of WebKitGTK (Epiphany) and WPE (Web Platform for Embedded), support &lt;a href=&quot;https://bugs.webkit.org/show_bug.cgi?id=202739&quot;&gt;landed in October 2019&lt;/a&gt; which I implemented as part of my work at Igalia. The feature has been officially available in WebKitGTK and WPE since version 2.28 (March 2020).&lt;/p&gt;

&lt;p&gt;Before native browser support, &lt;code&gt;conic-gradient&lt;/code&gt; was available as a JavaScript polyfill created by &lt;a href=&quot;https://twitter.com/LeaVerou&quot;&gt;Lea Verou&lt;/a&gt;.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Gradients in the Web&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;Generally speaking, a gradient is a smooth transition of colors defined by two or more stop-colors. In the case of a linear gradient, this transition is defined by a straight line (which might have and angle or not).&lt;/p&gt;

&lt;figure class=&quot;highlight&quot;&gt;&lt;pre&gt;&lt;code class=&quot;language-css&quot; data-lang=&quot;css&quot;&gt;&lt;span&gt;&lt;/span&gt;&lt;span class=&quot;nt&quot;&gt;div&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;nc&quot;&gt;linear-gradient&lt;/span&gt; &lt;span class=&quot;p&quot;&gt;{&lt;/span&gt;
  &lt;span class=&quot;k&quot;&gt;width&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;:&lt;/span&gt; &lt;span class=&quot;mi&quot;&gt;400&lt;/span&gt;&lt;span class=&quot;kt&quot;&gt;px&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;;&lt;/span&gt;
  &lt;span class=&quot;k&quot;&gt;height&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;:&lt;/span&gt; &lt;span class=&quot;mi&quot;&gt;100&lt;/span&gt;&lt;span class=&quot;kt&quot;&gt;px&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;;&lt;/span&gt;
  &lt;span class=&quot;k&quot;&gt;background&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;:&lt;/span&gt; &lt;span class=&quot;nb&quot;&gt;linear-gradient&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;kc&quot;&gt;to&lt;/span&gt; &lt;span class=&quot;kc&quot;&gt;right&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;kc&quot;&gt;red&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;kc&quot;&gt;yellow&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;kc&quot;&gt;lime&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;kc&quot;&gt;aqua&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;kc&quot;&gt;blue&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;kc&quot;&gt;magenta&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;kc&quot;&gt;red&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;);&lt;/span&gt;
&lt;span class=&quot;p&quot;&gt;}&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/figure&gt;

&lt;figure&gt;&lt;img src=&quot;/dpino/files/2020/06/linear-gradient.png&quot; title=&quot;Linear gradient&quot; alt=&quot;Linear gradient&quot; /&gt;&lt;figcaption style=&quot;text-align: center&quot;&gt;Linear gradient&lt;/figcaption&gt;&lt;/figure&gt;

&lt;p&gt;In the case of a radial gradient, the transition is defined by a center and a radius. Colors expand evenly in all directions from the center of the circle to outside.&lt;/p&gt;

&lt;figure class=&quot;highlight&quot;&gt;&lt;pre&gt;&lt;code class=&quot;language-css&quot; data-lang=&quot;css&quot;&gt;&lt;span&gt;&lt;/span&gt;&lt;span class=&quot;nt&quot;&gt;div&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;nc&quot;&gt;radial-gradient&lt;/span&gt; &lt;span class=&quot;p&quot;&gt;{&lt;/span&gt;
  &lt;span class=&quot;k&quot;&gt;width&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;:&lt;/span&gt; &lt;span class=&quot;mi&quot;&gt;300&lt;/span&gt;&lt;span class=&quot;kt&quot;&gt;px&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;;&lt;/span&gt;
  &lt;span class=&quot;k&quot;&gt;height&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;:&lt;/span&gt; &lt;span class=&quot;mi&quot;&gt;300&lt;/span&gt;&lt;span class=&quot;kt&quot;&gt;px&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;;&lt;/span&gt;
  &lt;span class=&quot;k&quot;&gt;border-radius&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;:&lt;/span&gt; &lt;span class=&quot;mi&quot;&gt;50&lt;/span&gt;&lt;span class=&quot;kt&quot;&gt;%&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;;&lt;/span&gt;
  &lt;span class=&quot;k&quot;&gt;background&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;:&lt;/span&gt; &lt;span class=&quot;nb&quot;&gt;radial-gradient&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;kc&quot;&gt;red&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;kc&quot;&gt;yellow&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;kc&quot;&gt;lime&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;kc&quot;&gt;aqua&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;kc&quot;&gt;blue&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;kc&quot;&gt;magenta&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;kc&quot;&gt;red&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;);&lt;/span&gt;
&lt;span class=&quot;p&quot;&gt;}&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/figure&gt;

&lt;figure&gt;&lt;img src=&quot;/dpino/files/2020/06/radial-gradient.png&quot; title=&quot;Radial gradient&quot; alt=&quot;Radial gradient&quot; /&gt;&lt;figcaption style=&quot;text-align: center&quot;&gt;Radial gradient&lt;/figcaption&gt;&lt;/figure&gt;

&lt;p&gt;A conical gradient, although also defined by a center and a radius, isn’t the same as a radial gradient. In a conical gradient colors spin around the circle.&lt;/p&gt;

&lt;figure class=&quot;highlight&quot;&gt;&lt;pre&gt;&lt;code class=&quot;language-css&quot; data-lang=&quot;css&quot;&gt;&lt;span&gt;&lt;/span&gt;&lt;span class=&quot;nt&quot;&gt;div&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;nc&quot;&gt;conic-gradient&lt;/span&gt; &lt;span class=&quot;p&quot;&gt;{&lt;/span&gt;
  &lt;span class=&quot;k&quot;&gt;width&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;:&lt;/span&gt; &lt;span class=&quot;mi&quot;&gt;300&lt;/span&gt;&lt;span class=&quot;kt&quot;&gt;px&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;;&lt;/span&gt;
  &lt;span class=&quot;k&quot;&gt;height&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;:&lt;/span&gt; &lt;span class=&quot;mi&quot;&gt;300&lt;/span&gt;&lt;span class=&quot;kt&quot;&gt;px&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;;&lt;/span&gt;
  &lt;span class=&quot;k&quot;&gt;border-radius&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;:&lt;/span&gt; &lt;span class=&quot;mi&quot;&gt;50&lt;/span&gt;&lt;span class=&quot;kt&quot;&gt;%&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;;&lt;/span&gt;
  &lt;span class=&quot;k&quot;&gt;background&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;:&lt;/span&gt; &lt;span class=&quot;nf&quot;&gt;conic-gradient&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;kc&quot;&gt;red&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;kc&quot;&gt;yellow&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;kc&quot;&gt;lime&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;kc&quot;&gt;aqua&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;kc&quot;&gt;blue&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;kc&quot;&gt;magenta&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;kc&quot;&gt;red&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;);&lt;/span&gt;
&lt;span class=&quot;p&quot;&gt;}&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/figure&gt;

&lt;figure&gt;&lt;img src=&quot;/dpino/files/2020/06/conic-gradient.png&quot; title=&quot;Conic gradient&quot; alt=&quot;Conic gradient&quot; /&gt;&lt;figcaption style=&quot;text-align: center&quot;&gt;Conic gradient&lt;/figcaption&gt;&lt;/figure&gt;

&lt;p&gt;&lt;strong&gt;Implementation in WebKitGTK and WPE&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;At the time of implementing support in WebKitGTK and WPE, the feature had already shipped in Safari. That meant WebKit already had support for parsing the conic-gradient specification as defined in &lt;a href=&quot;https://www.w3.org/TR/css-images-4/&quot;&gt;CSS Images Module Level 4&lt;/a&gt; and the data structures to store relevant information were already created. The only piece missing in WebKitGTK and WPE was painting.&lt;/p&gt;

&lt;p&gt;Safari leverages many of its graphical painting operations on &lt;a href=&quot;https://developer.apple.com/documentation/coregraphics&quot;&gt;CoreGraphics&lt;/a&gt; library, which counts with a primitive for conic gradient painting (&lt;code&gt;CGContextDrawConicGradient&lt;/code&gt;). Something similar happens in Google Chrome, although in this case the graphics library underneath is &lt;a href=&quot;https://skia.org/&quot;&gt;Skia&lt;/a&gt; (&lt;code&gt;CreateTwoPointConicalGradient&lt;/code&gt;). WebKitGTK and WPE use Cairo for many of their graphical operations. In the case of linear and radial gradients, there’s native support in Cairo. However, there isn’t a function for conical gradient painting. This doesn’t mean Cairo cannot be used to paint conical gradients, it just means that is a little bit more complicated.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Mesh gradients&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;Cairo documentation states is possible to paint a conical gradient using a mesh gradient. A mesh gradient is defined by a set of colors and control points. The most basic type of mesh gradient is a Gouraud-shading triangle mesh.&lt;/p&gt;

&lt;figure class=&quot;highlight&quot;&gt;&lt;pre&gt;&lt;code class=&quot;language-c&quot; data-lang=&quot;c&quot;&gt;&lt;span&gt;&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;cairo_mesh_pattern_begin_patch&lt;/span&gt;&lt;span class=&quot;w&quot;&gt; &lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;pattern&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;)&lt;/span&gt;&lt;span class=&quot;w&quot;&gt;&lt;/span&gt;

&lt;span class=&quot;n&quot;&gt;cairo_mesh_pattern_move_to&lt;/span&gt;&lt;span class=&quot;w&quot;&gt; &lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;pattern&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt;&lt;span class=&quot;w&quot;&gt; &lt;/span&gt;&lt;span class=&quot;mi&quot;&gt;100&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt;&lt;span class=&quot;w&quot;&gt; &lt;/span&gt;&lt;span class=&quot;mi&quot;&gt;100&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;);&lt;/span&gt;&lt;span class=&quot;w&quot;&gt;&lt;/span&gt;
&lt;span class=&quot;n&quot;&gt;cairo_mesh_pattern_line_to&lt;/span&gt;&lt;span class=&quot;w&quot;&gt; &lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;pattern&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt;&lt;span class=&quot;w&quot;&gt; &lt;/span&gt;&lt;span class=&quot;mi&quot;&gt;130&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt;&lt;span class=&quot;w&quot;&gt; &lt;/span&gt;&lt;span class=&quot;mi&quot;&gt;130&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;);&lt;/span&gt;&lt;span class=&quot;w&quot;&gt;&lt;/span&gt;
&lt;span class=&quot;n&quot;&gt;cairo_mesh_pattern_line_to&lt;/span&gt;&lt;span class=&quot;w&quot;&gt; &lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;pattern&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt;&lt;span class=&quot;w&quot;&gt; &lt;/span&gt;&lt;span class=&quot;mi&quot;&gt;130&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt;&lt;span class=&quot;w&quot;&gt;  &lt;/span&gt;&lt;span class=&quot;mi&quot;&gt;70&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;);&lt;/span&gt;&lt;span class=&quot;w&quot;&gt;&lt;/span&gt;

&lt;span class=&quot;n&quot;&gt;cairo_mesh_pattern_set_corner_color_rgb&lt;/span&gt;&lt;span class=&quot;w&quot;&gt; &lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;pattern&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt;&lt;span class=&quot;w&quot;&gt; &lt;/span&gt;&lt;span class=&quot;mi&quot;&gt;0&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt;&lt;span class=&quot;w&quot;&gt; &lt;/span&gt;&lt;span class=&quot;mi&quot;&gt;1&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt;&lt;span class=&quot;w&quot;&gt; &lt;/span&gt;&lt;span class=&quot;mi&quot;&gt;0&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt;&lt;span class=&quot;w&quot;&gt; &lt;/span&gt;&lt;span class=&quot;mi&quot;&gt;0&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;);&lt;/span&gt;&lt;span class=&quot;w&quot;&gt;&lt;/span&gt;
&lt;span class=&quot;n&quot;&gt;cairo_mesh_pattern_set_corner_color_rgb&lt;/span&gt;&lt;span class=&quot;w&quot;&gt; &lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;pattern&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt;&lt;span class=&quot;w&quot;&gt; &lt;/span&gt;&lt;span class=&quot;mi&quot;&gt;1&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt;&lt;span class=&quot;w&quot;&gt; &lt;/span&gt;&lt;span class=&quot;mi&quot;&gt;0&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt;&lt;span class=&quot;w&quot;&gt; &lt;/span&gt;&lt;span class=&quot;mi&quot;&gt;1&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt;&lt;span class=&quot;w&quot;&gt; &lt;/span&gt;&lt;span class=&quot;mi&quot;&gt;0&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;);&lt;/span&gt;&lt;span class=&quot;w&quot;&gt;&lt;/span&gt;
&lt;span class=&quot;n&quot;&gt;cairo_mesh_pattern_set_corner_color_rgb&lt;/span&gt;&lt;span class=&quot;w&quot;&gt; &lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;pattern&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt;&lt;span class=&quot;w&quot;&gt; &lt;/span&gt;&lt;span class=&quot;mi&quot;&gt;2&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt;&lt;span class=&quot;w&quot;&gt; &lt;/span&gt;&lt;span class=&quot;mi&quot;&gt;0&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt;&lt;span class=&quot;w&quot;&gt; &lt;/span&gt;&lt;span class=&quot;mi&quot;&gt;0&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt;&lt;span class=&quot;w&quot;&gt; &lt;/span&gt;&lt;span class=&quot;mi&quot;&gt;1&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;);&lt;/span&gt;&lt;span class=&quot;w&quot;&gt;&lt;/span&gt;

&lt;span class=&quot;n&quot;&gt;cairo_mesh_pattern_end_patch&lt;/span&gt;&lt;span class=&quot;w&quot;&gt; &lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;pattern&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;)&lt;/span&gt;&lt;span class=&quot;w&quot;&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/figure&gt;

&lt;figure&gt;&lt;img src=&quot;/dpino/files/2020/06/goraud-patch-gradient.png&quot; title=&quot;Gouraud-shaded triangle mesh&quot; alt=&quot;Gouraud-shaded triangle mesh&quot; /&gt;&lt;figcaption style=&quot;text-align: center&quot;&gt;Gouraud-shaded triangle mesh&lt;/figcaption&gt;&lt;/figure&gt;

&lt;p&gt;A more sophisticated patch of mesh gradient is a Coons patch. A Coons patch is a quadrilateral defined by 4 cubic Bézier curve and 4 colors, one for each vertex. A Bézier curve is defined by 4 points, so we have a total of 12 control points (and 4 colors) in a Coons patch.&lt;/p&gt;

&lt;figure class=&quot;highlight&quot;&gt;&lt;pre&gt;&lt;code class=&quot;language-c&quot; data-lang=&quot;c&quot;&gt;&lt;span&gt;&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;cairo_mesh_pattern_begin_patch&lt;/span&gt;&lt;span class=&quot;w&quot;&gt; &lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;pattern&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;);&lt;/span&gt;&lt;span class=&quot;w&quot;&gt;&lt;/span&gt;

&lt;span class=&quot;n&quot;&gt;cairo_mesh_pattern_move_to&lt;/span&gt;&lt;span class=&quot;w&quot;&gt; &lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;pattern&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt;&lt;span class=&quot;w&quot;&gt; &lt;/span&gt;&lt;span class=&quot;mi&quot;&gt;45&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt;&lt;span class=&quot;w&quot;&gt; &lt;/span&gt;&lt;span class=&quot;mi&quot;&gt;12&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;);&lt;/span&gt;&lt;span class=&quot;w&quot;&gt;&lt;/span&gt;
&lt;span class=&quot;n&quot;&gt;cairo_mesh_pattern_curve_to&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;pattern&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt;&lt;span class=&quot;w&quot;&gt; &lt;/span&gt;&lt;span class=&quot;mi&quot;&gt;69&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt;&lt;span class=&quot;w&quot;&gt; &lt;/span&gt;&lt;span class=&quot;mi&quot;&gt;24&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt;&lt;span class=&quot;w&quot;&gt; &lt;/span&gt;&lt;span class=&quot;mi&quot;&gt;173&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt;&lt;span class=&quot;w&quot;&gt; &lt;/span&gt;&lt;span class=&quot;mi&quot;&gt;-15&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt;&lt;span class=&quot;w&quot;&gt; &lt;/span&gt;&lt;span class=&quot;mi&quot;&gt;115&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt;&lt;span class=&quot;w&quot;&gt; &lt;/span&gt;&lt;span class=&quot;mi&quot;&gt;50&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;);&lt;/span&gt;&lt;span class=&quot;w&quot;&gt;&lt;/span&gt;
&lt;span class=&quot;n&quot;&gt;cairo_mesh_pattern_curve_to&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;pattern&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt;&lt;span class=&quot;w&quot;&gt; &lt;/span&gt;&lt;span class=&quot;mi&quot;&gt;127&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt;&lt;span class=&quot;w&quot;&gt; &lt;/span&gt;&lt;span class=&quot;mi&quot;&gt;66&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt;&lt;span class=&quot;w&quot;&gt; &lt;/span&gt;&lt;span class=&quot;mi&quot;&gt;174&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt;&lt;span class=&quot;w&quot;&gt; &lt;/span&gt;&lt;span class=&quot;mi&quot;&gt;47&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt;&lt;span class=&quot;w&quot;&gt; &lt;/span&gt;&lt;span class=&quot;mi&quot;&gt;148&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt;&lt;span class=&quot;w&quot;&gt; &lt;/span&gt;&lt;span class=&quot;mi&quot;&gt;104&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;);&lt;/span&gt;&lt;span class=&quot;w&quot;&gt;&lt;/span&gt;
&lt;span class=&quot;n&quot;&gt;cairo_mesh_pattern_curve_to&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;pattern&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt;&lt;span class=&quot;w&quot;&gt; &lt;/span&gt;&lt;span class=&quot;mi&quot;&gt;65&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt;&lt;span class=&quot;w&quot;&gt; &lt;/span&gt;&lt;span class=&quot;mi&quot;&gt;58&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt;&lt;span class=&quot;w&quot;&gt; &lt;/span&gt;&lt;span class=&quot;mi&quot;&gt;70&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt;&lt;span class=&quot;w&quot;&gt; &lt;/span&gt;&lt;span class=&quot;mi&quot;&gt;69&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt;&lt;span class=&quot;w&quot;&gt; &lt;/span&gt;&lt;span class=&quot;mi&quot;&gt;18&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt;&lt;span class=&quot;w&quot;&gt; &lt;/span&gt;&lt;span class=&quot;mi&quot;&gt;103&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;);&lt;/span&gt;&lt;span class=&quot;w&quot;&gt;&lt;/span&gt;
&lt;span class=&quot;n&quot;&gt;cairo_mesh_pattern_curve_to&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;pattern&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt;&lt;span class=&quot;w&quot;&gt; &lt;/span&gt;&lt;span class=&quot;mi&quot;&gt;42&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt;&lt;span class=&quot;w&quot;&gt; &lt;/span&gt;&lt;span class=&quot;mi&quot;&gt;43&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt;&lt;span class=&quot;w&quot;&gt; &lt;/span&gt;&lt;span class=&quot;mi&quot;&gt;63&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt;&lt;span class=&quot;w&quot;&gt; &lt;/span&gt;&lt;span class=&quot;mi&quot;&gt;45&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt;&lt;span class=&quot;w&quot;&gt; &lt;/span&gt;&lt;span class=&quot;mi&quot;&gt;45&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt;&lt;span class=&quot;w&quot;&gt; &lt;/span&gt;&lt;span class=&quot;mi&quot;&gt;12&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;);&lt;/span&gt;&lt;span class=&quot;w&quot;&gt;&lt;/span&gt;

&lt;span class=&quot;n&quot;&gt;cairo_mesh_pattern_set_corner_color_rgb&lt;/span&gt;&lt;span class=&quot;w&quot;&gt; &lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;pattern&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt;&lt;span class=&quot;w&quot;&gt; &lt;/span&gt;&lt;span class=&quot;mi&quot;&gt;0&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt;&lt;span class=&quot;w&quot;&gt; &lt;/span&gt;&lt;span class=&quot;mi&quot;&gt;1&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt;&lt;span class=&quot;w&quot;&gt; &lt;/span&gt;&lt;span class=&quot;mi&quot;&gt;0&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt;&lt;span class=&quot;w&quot;&gt; &lt;/span&gt;&lt;span class=&quot;mi&quot;&gt;0&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;);&lt;/span&gt;&lt;span class=&quot;w&quot;&gt; &lt;/span&gt;&lt;span class=&quot;c1&quot;&gt;// red&lt;/span&gt;
&lt;span class=&quot;n&quot;&gt;cairo_mesh_pattern_set_corner_color_rgb&lt;/span&gt;&lt;span class=&quot;w&quot;&gt; &lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;pattern&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt;&lt;span class=&quot;w&quot;&gt; &lt;/span&gt;&lt;span class=&quot;mi&quot;&gt;1&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt;&lt;span class=&quot;w&quot;&gt; &lt;/span&gt;&lt;span class=&quot;mi&quot;&gt;0&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt;&lt;span class=&quot;w&quot;&gt; &lt;/span&gt;&lt;span class=&quot;mi&quot;&gt;1&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt;&lt;span class=&quot;w&quot;&gt; &lt;/span&gt;&lt;span class=&quot;mi&quot;&gt;0&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;);&lt;/span&gt;&lt;span class=&quot;w&quot;&gt; &lt;/span&gt;&lt;span class=&quot;c1&quot;&gt;// green&lt;/span&gt;
&lt;span class=&quot;n&quot;&gt;cairo_mesh_pattern_set_corner_color_rgb&lt;/span&gt;&lt;span class=&quot;w&quot;&gt; &lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;pattern&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt;&lt;span class=&quot;w&quot;&gt; &lt;/span&gt;&lt;span class=&quot;mi&quot;&gt;2&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt;&lt;span class=&quot;w&quot;&gt; &lt;/span&gt;&lt;span class=&quot;mi&quot;&gt;0&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt;&lt;span class=&quot;w&quot;&gt; &lt;/span&gt;&lt;span class=&quot;mi&quot;&gt;0&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt;&lt;span class=&quot;w&quot;&gt; &lt;/span&gt;&lt;span class=&quot;mi&quot;&gt;1&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;);&lt;/span&gt;&lt;span class=&quot;w&quot;&gt; &lt;/span&gt;&lt;span class=&quot;c1&quot;&gt;// blue&lt;/span&gt;
&lt;span class=&quot;n&quot;&gt;cairo_mesh_pattern_set_corner_color_rgb&lt;/span&gt;&lt;span class=&quot;w&quot;&gt; &lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;pattern&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt;&lt;span class=&quot;w&quot;&gt; &lt;/span&gt;&lt;span class=&quot;mi&quot;&gt;3&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt;&lt;span class=&quot;w&quot;&gt; &lt;/span&gt;&lt;span class=&quot;mi&quot;&gt;1&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt;&lt;span class=&quot;w&quot;&gt; &lt;/span&gt;&lt;span class=&quot;mi&quot;&gt;1&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt;&lt;span class=&quot;w&quot;&gt; &lt;/span&gt;&lt;span class=&quot;mi&quot;&gt;0&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;);&lt;/span&gt;&lt;span class=&quot;w&quot;&gt; &lt;/span&gt;&lt;span class=&quot;c1&quot;&gt;// yellow&lt;/span&gt;

&lt;span class=&quot;n&quot;&gt;cairo_mesh_pattern_end_patch&lt;/span&gt;&lt;span class=&quot;w&quot;&gt; &lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;pattern&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;);&lt;/span&gt;&lt;span class=&quot;w&quot;&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/figure&gt;

&lt;figure&gt;&lt;img src=&quot;/dpino/files/2020/06/coons-patch-gradient.png&quot; title=&quot;Coons patch gradient&quot; alt=&quot;Coons patch gradient&quot; /&gt;&lt;figcaption style=&quot;text-align: center&quot;&gt;Coons patch gradient&lt;/figcaption&gt;&lt;/figure&gt;

&lt;p&gt;A Coons patch comes very handy to paint a conical gradient. Consider the first quadrant of a circle, such quadrant can be easily defined with a Bézier curve.&lt;/p&gt;

&lt;figure class=&quot;highlight&quot;&gt;&lt;pre&gt;&lt;code class=&quot;language-c&quot; data-lang=&quot;c&quot;&gt;&lt;span&gt;&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;cairo_mesh_pattern_begin_patch&lt;/span&gt;&lt;span class=&quot;w&quot;&gt; &lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;pattern&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;);&lt;/span&gt;&lt;span class=&quot;w&quot;&gt;&lt;/span&gt;

&lt;span class=&quot;n&quot;&gt;cairo_mesh_pattern_move_to&lt;/span&gt;&lt;span class=&quot;w&quot;&gt; &lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;pattern&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt;&lt;span class=&quot;w&quot;&gt; &lt;/span&gt;&lt;span class=&quot;mi&quot;&gt;0&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt;&lt;span class=&quot;w&quot;&gt; &lt;/span&gt;&lt;span class=&quot;mi&quot;&gt;200&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;);&lt;/span&gt;&lt;span class=&quot;w&quot;&gt;&lt;/span&gt;
&lt;span class=&quot;n&quot;&gt;cairo_mesh_pattern_line_to&lt;/span&gt;&lt;span class=&quot;w&quot;&gt; &lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;pattern&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt;&lt;span class=&quot;w&quot;&gt; &lt;/span&gt;&lt;span class=&quot;mi&quot;&gt;0&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt;&lt;span class=&quot;w&quot;&gt; &lt;/span&gt;&lt;span class=&quot;mi&quot;&gt;0&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;);&lt;/span&gt;&lt;span class=&quot;w&quot;&gt;&lt;/span&gt;
&lt;span class=&quot;n&quot;&gt;cairo_mesh_pattern_curve_to&lt;/span&gt;&lt;span class=&quot;w&quot;&gt; &lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;pattern&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt;&lt;span class=&quot;w&quot;&gt; &lt;/span&gt;&lt;span class=&quot;mi&quot;&gt;133&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt;&lt;span class=&quot;w&quot;&gt; &lt;/span&gt;&lt;span class=&quot;mi&quot;&gt;0&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt;&lt;span class=&quot;w&quot;&gt; &lt;/span&gt;&lt;span class=&quot;mi&quot;&gt;200&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt;&lt;span class=&quot;w&quot;&gt; &lt;/span&gt;&lt;span class=&quot;mi&quot;&gt;133&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt;&lt;span class=&quot;w&quot;&gt; &lt;/span&gt;&lt;span class=&quot;mi&quot;&gt;200&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt;&lt;span class=&quot;w&quot;&gt; &lt;/span&gt;&lt;span class=&quot;mi&quot;&gt;200&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;);&lt;/span&gt;&lt;span class=&quot;w&quot;&gt;&lt;/span&gt;
&lt;span class=&quot;n&quot;&gt;cairo_mesh_pattern_line_to&lt;/span&gt;&lt;span class=&quot;w&quot;&gt; &lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;pattern&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt;&lt;span class=&quot;w&quot;&gt; &lt;/span&gt;&lt;span class=&quot;mi&quot;&gt;0&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt;&lt;span class=&quot;w&quot;&gt; &lt;/span&gt;&lt;span class=&quot;mi&quot;&gt;200&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;);&lt;/span&gt;&lt;span class=&quot;w&quot;&gt;&lt;/span&gt;

&lt;span class=&quot;n&quot;&gt;cairo_mesh_pattern_set_corner_color_rgb&lt;/span&gt;&lt;span class=&quot;w&quot;&gt; &lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;pattern&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt;&lt;span class=&quot;w&quot;&gt; &lt;/span&gt;&lt;span class=&quot;mi&quot;&gt;0&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt;&lt;span class=&quot;w&quot;&gt; &lt;/span&gt;&lt;span class=&quot;mi&quot;&gt;1&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt;&lt;span class=&quot;w&quot;&gt; &lt;/span&gt;&lt;span class=&quot;mi&quot;&gt;0&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt;&lt;span class=&quot;w&quot;&gt; &lt;/span&gt;&lt;span class=&quot;mi&quot;&gt;0&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;);&lt;/span&gt;&lt;span class=&quot;w&quot;&gt; &lt;/span&gt;&lt;span class=&quot;c1&quot;&gt;// red&lt;/span&gt;
&lt;span class=&quot;n&quot;&gt;cairo_mesh_pattern_set_corner_color_rgb&lt;/span&gt;&lt;span class=&quot;w&quot;&gt; &lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;pattern&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt;&lt;span class=&quot;w&quot;&gt; &lt;/span&gt;&lt;span class=&quot;mi&quot;&gt;1&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt;&lt;span class=&quot;w&quot;&gt; &lt;/span&gt;&lt;span class=&quot;mi&quot;&gt;0&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt;&lt;span class=&quot;w&quot;&gt; &lt;/span&gt;&lt;span class=&quot;mi&quot;&gt;1&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt;&lt;span class=&quot;w&quot;&gt; &lt;/span&gt;&lt;span class=&quot;mi&quot;&gt;0&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;);&lt;/span&gt;&lt;span class=&quot;w&quot;&gt; &lt;/span&gt;&lt;span class=&quot;c1&quot;&gt;// green&lt;/span&gt;
&lt;span class=&quot;n&quot;&gt;cairo_mesh_pattern_set_corner_color_rgb&lt;/span&gt;&lt;span class=&quot;w&quot;&gt; &lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;pattern&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt;&lt;span class=&quot;w&quot;&gt; &lt;/span&gt;&lt;span class=&quot;mi&quot;&gt;2&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt;&lt;span class=&quot;w&quot;&gt; &lt;/span&gt;&lt;span class=&quot;mi&quot;&gt;0&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt;&lt;span class=&quot;w&quot;&gt; &lt;/span&gt;&lt;span class=&quot;mi&quot;&gt;0&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt;&lt;span class=&quot;w&quot;&gt; &lt;/span&gt;&lt;span class=&quot;mi&quot;&gt;1&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;);&lt;/span&gt;&lt;span class=&quot;w&quot;&gt; &lt;/span&gt;&lt;span class=&quot;c1&quot;&gt;// blue&lt;/span&gt;
&lt;span class=&quot;n&quot;&gt;cairo_mesh_pattern_set_corner_color_rgb&lt;/span&gt;&lt;span class=&quot;w&quot;&gt; &lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;pattern&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt;&lt;span class=&quot;w&quot;&gt; &lt;/span&gt;&lt;span class=&quot;mi&quot;&gt;3&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt;&lt;span class=&quot;w&quot;&gt; &lt;/span&gt;&lt;span class=&quot;mi&quot;&gt;1&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt;&lt;span class=&quot;w&quot;&gt; &lt;/span&gt;&lt;span class=&quot;mi&quot;&gt;1&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt;&lt;span class=&quot;w&quot;&gt; &lt;/span&gt;&lt;span class=&quot;mi&quot;&gt;0&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;);&lt;/span&gt;&lt;span class=&quot;w&quot;&gt; &lt;/span&gt;&lt;span class=&quot;c1&quot;&gt;// yellow&lt;/span&gt;

&lt;span class=&quot;n&quot;&gt;cairo_mesh_pattern_end_patch&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;pattern&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;);&lt;/span&gt;&lt;span class=&quot;w&quot;&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/figure&gt;

&lt;figure&gt;&lt;img src=&quot;/dpino/files/2020/06/conic-gradient-patch-1.png&quot; title=&quot;Coons patch of the first quadrant of a circle&quot; alt=&quot;Coons patch of the first quadrant of a circle&quot; /&gt;&lt;figcaption style=&quot;text-align: center&quot;&gt;Coons patch of the first quadrant of a circle&lt;/figcaption&gt;&lt;/figure&gt;

&lt;p&gt;If we just simply use two colors instead, the final result resembles more to how a conical gradient looks.&lt;/p&gt;

&lt;figure class=&quot;highlight&quot;&gt;&lt;pre&gt;&lt;code class=&quot;language-c&quot; data-lang=&quot;c&quot;&gt;&lt;span&gt;&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;cairo_mesh_pattern_set_corner_color_rgb&lt;/span&gt;&lt;span class=&quot;w&quot;&gt; &lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;pattern&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt;&lt;span class=&quot;w&quot;&gt; &lt;/span&gt;&lt;span class=&quot;mi&quot;&gt;0&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt;&lt;span class=&quot;w&quot;&gt; &lt;/span&gt;&lt;span class=&quot;mi&quot;&gt;1&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt;&lt;span class=&quot;w&quot;&gt; &lt;/span&gt;&lt;span class=&quot;mi&quot;&gt;0&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt;&lt;span class=&quot;w&quot;&gt; &lt;/span&gt;&lt;span class=&quot;mi&quot;&gt;0&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;);&lt;/span&gt;&lt;span class=&quot;w&quot;&gt; &lt;/span&gt;&lt;span class=&quot;c1&quot;&gt;// red&lt;/span&gt;
&lt;span class=&quot;n&quot;&gt;cairo_mesh_pattern_set_corner_color_rgb&lt;/span&gt;&lt;span class=&quot;w&quot;&gt; &lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;pattern&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt;&lt;span class=&quot;w&quot;&gt; &lt;/span&gt;&lt;span class=&quot;mi&quot;&gt;1&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt;&lt;span class=&quot;w&quot;&gt; &lt;/span&gt;&lt;span class=&quot;mi&quot;&gt;1&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt;&lt;span class=&quot;w&quot;&gt; &lt;/span&gt;&lt;span class=&quot;mi&quot;&gt;0&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt;&lt;span class=&quot;w&quot;&gt; &lt;/span&gt;&lt;span class=&quot;mi&quot;&gt;0&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;);&lt;/span&gt;&lt;span class=&quot;w&quot;&gt; &lt;/span&gt;&lt;span class=&quot;c1&quot;&gt;// red&lt;/span&gt;
&lt;span class=&quot;n&quot;&gt;cairo_mesh_pattern_set_corner_color_rgb&lt;/span&gt;&lt;span class=&quot;w&quot;&gt; &lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;pattern&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt;&lt;span class=&quot;w&quot;&gt; &lt;/span&gt;&lt;span class=&quot;mi&quot;&gt;2&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt;&lt;span class=&quot;w&quot;&gt; &lt;/span&gt;&lt;span class=&quot;mi&quot;&gt;1&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt;&lt;span class=&quot;w&quot;&gt; &lt;/span&gt;&lt;span class=&quot;mi&quot;&gt;1&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt;&lt;span class=&quot;w&quot;&gt; &lt;/span&gt;&lt;span class=&quot;mi&quot;&gt;0&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;);&lt;/span&gt;&lt;span class=&quot;w&quot;&gt; &lt;/span&gt;&lt;span class=&quot;c1&quot;&gt;// yellow&lt;/span&gt;
&lt;span class=&quot;n&quot;&gt;cairo_mesh_pattern_set_corner_color_rgb&lt;/span&gt;&lt;span class=&quot;w&quot;&gt; &lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;pattern&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt;&lt;span class=&quot;w&quot;&gt; &lt;/span&gt;&lt;span class=&quot;mi&quot;&gt;3&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt;&lt;span class=&quot;w&quot;&gt; &lt;/span&gt;&lt;span class=&quot;mi&quot;&gt;1&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt;&lt;span class=&quot;w&quot;&gt; &lt;/span&gt;&lt;span class=&quot;mi&quot;&gt;1&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt;&lt;span class=&quot;w&quot;&gt; &lt;/span&gt;&lt;span class=&quot;mi&quot;&gt;0&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;);&lt;/span&gt;&lt;span class=&quot;w&quot;&gt; &lt;/span&gt;&lt;span class=&quot;c1&quot;&gt;// yellow&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/figure&gt;

&lt;figure&gt;&lt;img src=&quot;/dpino/files/2020/06/conic-gradient-patch-2.png&quot; title=&quot;Coons patch of the first quadrant of a circle (2 colors)&quot; alt=&quot;Coons patch of the first quadrant of a circle (2 colors)&quot; /&gt;&lt;figcaption style=&quot;text-align: center&quot;&gt;Coons patch of the first quadrant of a circle (2 colors)&lt;/figcaption&gt;&lt;/figure&gt;

&lt;p&gt;Repeat this step 3 times more, with a few more stop colors, and you have a nice conical gradient.&lt;/p&gt;

&lt;figure&gt;&lt;img src=&quot;/dpino/files/2020/06/conic-gradient-full-example.png&quot; title=&quot;A conic gradient made by composing mesh patches&quot; alt=&quot;A conic gradient made by composing mesh patches&quot; /&gt;&lt;figcaption style=&quot;text-align: center&quot;&gt;A conic gradient made by composing mesh patches&lt;/figcaption&gt;&lt;/figure&gt;

&lt;p&gt;&lt;strong&gt;Bézier curve as arcs&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;At this point the difficulty of painting a conical gradient has been reduced to calculating the shape of the Bézier curve of each mesh patch.&lt;/p&gt;

&lt;p&gt;Computing the starting and ending points is straight forward, however calculating the position of the other two control points of the Bézier curve is a bit much harder.&lt;/p&gt;

&lt;figure&gt;&lt;img src=&quot;/dpino/files/2020/06/circle-sector-control-points.png&quot; title=&quot;Bézier curve approximation to a circle quadrant&quot; alt=&quot;Bézier curve approximation to a circle quadrant&quot; /&gt;&lt;figcaption style=&quot;text-align: center&quot;&gt;Bézier curve approximation to a circle quadrant&lt;/figcaption&gt;&lt;/figure&gt;

&lt;p&gt;Mozillian Michiel Kamermans (&lt;a href=&quot;https://twitter.com/TheRealPomax&quot;&gt;pomax&lt;/a&gt;) has a beautifully written &lt;a href=&quot;https://pomax.github.io/bezierinfo/&quot;&gt;essay on Bézier curves&lt;/a&gt;. Section “Circles and cubic Bézier curves” of such essay discusses how to approximate a Bézier curve to an arc. The case of a circular quadrant is particularly interesting because it allows painting a circle with 4 Bézier curves with minimal error. In the case of the quadrant above the values for each point would be the following:&lt;/p&gt;

&lt;pre&gt;&lt;code&gt;S = (0, r), CP1 = (0.552 * r, r), CP2 = (r, 0.552 * r), E = (r, 0) 
&lt;/code&gt;&lt;/pre&gt;

&lt;p&gt;Even though on its most basic form a conic gradient is defined by one starting and one ending color, painting a circle with two Bézier curves is not a good approximation to a semicircle (check the interactive examples of pomax’s Bézier curve essay). In such case, the conic gradient is split into four Coon patches with middle colors interpolated.&lt;/p&gt;

&lt;p&gt;Also, in cases were there are more than 4 colors, each Coons patch will be smaller than a quadrant. It’s necessary a general formula that can compute the control points for each section of the circle, given an angle and a radius. After some math, the following formula can be inferred (check section “Circle and cubic Bézier curves” in pomax’s essay):&lt;/p&gt;

&lt;pre&gt;&lt;code&gt;cp1 = {
   x: cx + (r * cos(angleStart) - f * (r * sin(angleStart),
   y: cy + (r * sin(angleStart)) + f * (r * cos(angleStart))
}
cp2 = {
   x: cx + (r * cos(angleEnd)) + f * (r * sin(angleEnd)),
   y: cy + (r * sin(angleEnd)) - f * (r * cos(angleEnd))
}
&lt;/code&gt;&lt;/pre&gt;

&lt;p&gt;where &lt;code&gt;f&lt;/code&gt; is a variable computed as:&lt;/p&gt;

&lt;pre&gt;&lt;code&gt;f = 4 * tan((angleEnd - angleStart) / 4) / 3;
&lt;/code&gt;&lt;/pre&gt;

&lt;p&gt;For a 90 degrees angle the value of f is 0.552. Thus, if the quadrant above had a radius of 100px, the values of the control points would be: CP1(155.2, 0) and CP2(200, 44.8) (considering top corner left as point 0,0).&lt;/p&gt;

&lt;p&gt;And that’s basically all that is needed. The formula above allows us to compute a circular sector as a Bézier line, which when setup as a Coons patch creates a section of a conical gradient. Adding several Coons patches together creates the final conical gradient.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Wrapping up&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;It has been a long time since conic gradients for the Web were first drafted. For instance, the current &lt;a href=&quot;https://bugzilla.mozilla.org/show_bug.cgi?id=1175958&quot;&gt;bug in Firefox’s Bugzilla&lt;/a&gt; was created by Lea Verou five years ago. Fortunately, browsers have started shipping native support and conical gradients have been available in Chrome and Safari since two years ago. In this post I discussed the implementation, mainly rendering, of conic gradients in WebKitGTK and WPE. And since both browsers are WebKit based, they can leverage on the implementation efforts led by Apple when bringing support of this feature to Safari. With Firefox shipping conic gradient support soon this feature will be safe to use in the Web Platform.&lt;/p&gt;

</description>
        <pubDate>Thu, 11 Jun 2020 00:00:00 +0000</pubDate>
        <link>http://blogs.igalia.com/dpino/2020/06/11/renderization-of-conic-gradients/</link>
        <guid isPermaLink="true">http://blogs.igalia.com/dpino/2020/06/11/renderization-of-conic-gradients/</guid>
        
        <category>igalia</category>
        
        <category>webkit</category>
        
        <category>wpe</category>
        
        <category>web-platform</category>
        
        
      </item>
    
      <item>
        <title>The eXpress Data Path</title>
        <description>
&lt;p&gt;In the &lt;a href=&quot;https://blogs.igalia.com/dpino/2019/01/07/introduction-to-xdp-and-ebpf/&quot;&gt;previous article&lt;/a&gt; I briefly introduced XDP (&lt;em&gt;eXpress Data Path&lt;/em&gt;) and eBPF, the multipurpose in-kernel virtual machine. On the XDP side, I focused only on the motivations behind this new technology, the reasons why rearchitecting the Linux kernel networking layer to enable faster packet processing. However, I didn’t get much into the details on how XDP works. In this new blog post I try to go deeper into XDP.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;XDP: A fast path for packet processing&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;The design of XDP has its roots in a DDoS attack mitigation solution presented by Cloudflare at Netdev 1.1. Cloudflare leverages heavily on &lt;code&gt;iptables&lt;/code&gt;, which according to their own metrics is able to handle 1 Mpps on a decent server (Source: &lt;a href=&quot;https://blog.cloudflare.com/why-we-use-the-linux-kernels-tcp-stack/&quot;&gt;Why we use the Linux kernel’s TCP stack&lt;/a&gt;). In the event of a DDoS attack, the amount of spoofed traffic can be up to 3 Mpps. Under those circumstances, a Linux box starts to be overflooded by IRQ interruptions until it becomes unusable.&lt;/p&gt;

&lt;p&gt;Because Cloudflare wanted to keep the convenience of using &lt;code&gt;iptables&lt;/code&gt; (and the rest of the kernel’s network stack), they couldn’t go with a solution that takes full control of the hardware, such as DPDK. Their solution consisted of implementing what they called a &lt;em&gt;“partial kernel bypass”&lt;/em&gt;. Some queues of the NIC are still attached to the kernel while others are attached to an &lt;em&gt;user-space&lt;/em&gt; program that decides whether a packet should be dropped or not. By dropping packets at the lowest point of the stack, the amount of traffic that reaches the kernel’s networking subsystem gets significantly reduced.&lt;/p&gt;

&lt;p&gt;Cloudflare’s solution used the Netmap toolkit to implement its partial kernel bypass (Source: &lt;a href=&quot;https://blog.cloudflare.com/single-rx-queue-kernel-bypass-with-netmap/&quot;&gt;Single Rx queue kernel bypass with Netmap&lt;/a&gt;). However this idea could be generalized by adding a checkpoint in the Linux kernel network stack, preferably as soon as a packet is received in the NIC. This checkpoint should pass a packet to an &lt;em&gt;user-space&lt;/em&gt; program that will decide what to do with it: drop it or let it continue through the normal path.&lt;/p&gt;

&lt;p&gt;Luckily, Linux already features a mechanism that allows &lt;em&gt;user-space&lt;/em&gt; code execution within the kernel: the eBPF VM. So the solution seemed obvious.&lt;/p&gt;

&lt;figure&gt;&lt;img src=&quot;/dpino/files/2019/01/linux-network-stack-with-xdp.png&quot; title=&quot;Linux network stack with XDP&quot; alt=&quot;Linux network stack with XDP&quot; /&gt;&lt;figcaption style=&quot;text-align: center&quot;&gt;Linux network stack with XDP&lt;/figcaption&gt;&lt;/figure&gt;

&lt;p&gt;&lt;strong&gt;Packet operations&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;Every network function, no matter how complex it is, consists of a series of basic operations:&lt;/p&gt;

&lt;ul&gt;
  &lt;li&gt;&lt;strong&gt;Firewall&lt;/strong&gt;: read incoming packets, compare them to a table of rules and execute an action: &lt;em&gt;forward&lt;/em&gt; or &lt;em&gt;drop&lt;/em&gt;.&lt;/li&gt;
  &lt;li&gt;&lt;strong&gt;NAT&lt;/strong&gt;: read incoming packets, modify headers and forward packet.&lt;/li&gt;
  &lt;li&gt;&lt;strong&gt;Tunelling&lt;/strong&gt;: read incoming packets, create a new packet, embed packet into new one and forward it.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;XDP passes packets to our eBPF program which decides what to do with them. We can read them or modify them if we need it. We can also access to helper functions to parse packets, compute checksums, and other functionalities, at no cost (avoiding system call cost penalties). And thanks to eBPF Maps we have access to complex data structures for persistent data storage, like tables. We are also able to decide what to do with a packet. Are we going to drop it? Forward it? To control a packet’s processing logic, XDP provides a set of predefined actions:&lt;/p&gt;

&lt;ul&gt;
  &lt;li&gt;&lt;strong&gt;XDP_PASS&lt;/strong&gt;: pass the packet to the normal network stack.&lt;/li&gt;
  &lt;li&gt;&lt;strong&gt;XDP_DROP&lt;/strong&gt;: very fast drop.&lt;/li&gt;
  &lt;li&gt;&lt;strong&gt;XDP_TX&lt;/strong&gt;: forward or TX-bounce back-out same interface.&lt;/li&gt;
  &lt;li&gt;&lt;strong&gt;XDP_REDIRECT&lt;/strong&gt;: redirects the packet to another NIC or CPU.&lt;/li&gt;
  &lt;li&gt;&lt;strong&gt;XDP_ABORTED&lt;/strong&gt;: indicates eBPF program error.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;em&gt;XDP_PASS&lt;/em&gt;, &lt;em&gt;XDP_TX&lt;/em&gt; and &lt;em&gt;XDP_REDIRECT&lt;/em&gt; are specific cases of a forwarding action, whereas &lt;em&gt;XDP_ABORTED&lt;/em&gt; is actually treated as a packet drop.&lt;/p&gt;

&lt;p&gt;Let’s take a look at one example that uses most of these elements to program a simple network function.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Example: An IPv6 packet filter&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;The canonical example when introducing XDP is a DDoS filter. What such network function does is to drop packets if they’re coming from a suspicious origin. In my case, I’m going with something even simpler: a function that filters out all traffic except IPv6.&lt;/p&gt;

&lt;p&gt;The advantage of this simpler function is that we don’t need to manage a list of suspicious addresses. Our program will simply examine the &lt;em&gt;ethertype&lt;/em&gt; value of a packet and let it continue through the network stack or drop it depending on whether is an IPv6 packet or not.&lt;/p&gt;

&lt;pre&gt;&lt;code&gt;SEC(&quot;prog&quot;)
int xdp_ipv6_filter_program(struct xdp_md *ctx)
{
    void *data_end = (void *)(long)ctx-&amp;gt;data_end;
    void *data     = (void *)(long)ctx-&amp;gt;data;
    struct ethhdr *eth = data;
    u16 eth_type = 0;

    if (!(parse_eth(eth, data_end, eth_type))) {
        bpf_debug(&quot;Debug: Cannot parse L2\n&quot;);
        return XDP_PASS;
    }

    bpf_debug(&quot;Debug: eth_type:0x%x\n&quot;, ntohs(eth_type));
    if (eth_type == ntohs(0x86dd)) {
        return XDP_PASS;
    } else {
        return XDP_DROP;
    }
}
&lt;/code&gt;&lt;/pre&gt;

&lt;p&gt;The function &lt;code&gt;xdp_ipv6_filter_program&lt;/code&gt; is our main program. We define a new section in the binary called &lt;em&gt;prog&lt;/em&gt;. This serves as a hook between our program and XDP. Whenever XDP receives a packet, our code will be executed.&lt;/p&gt;

&lt;p&gt;&lt;code&gt;ctx&lt;/code&gt; represents a context, a &lt;code&gt;struct&lt;/code&gt; which contains all the data necessary to access a packet. Our program calls &lt;code&gt;parse_eth&lt;/code&gt; to fetch the &lt;code&gt;ethertype&lt;/code&gt;. Then checks whether its value is &lt;em&gt;0x86dd&lt;/em&gt; (IPv6 ethertype), in that case the packet passes. Otherwise the packet is dropped. In addition, all the &lt;code&gt;ethertype&lt;/code&gt; values are printed for debugging purposes.&lt;/p&gt;

&lt;p&gt;&lt;code&gt;bpf_debug&lt;/code&gt; is in fact a macro defined as:&lt;/p&gt;

&lt;pre&gt;&lt;code&gt;#define bpf_debug(fmt, ...)                          \
    ({                                               \
        char ____fmt[] = fmt;                        \
        bpf_trace_printk(____fmt, sizeof(____fmt),   \
            ##__VA_ARGS__);                          \
    })
&lt;/code&gt;&lt;/pre&gt;

&lt;p&gt;It uses the function &lt;code&gt;bpf_trace_printk&lt;/code&gt; under the hood, a function which prints out messages in &lt;em&gt;/sys/kernel/debug/tracing/trace_pipe&lt;/em&gt;.&lt;/p&gt;

&lt;p&gt;The function &lt;code&gt;parse_eth&lt;/code&gt; takes a packet’s beginning and end and parses its content.&lt;/p&gt;

&lt;pre&gt;&lt;code&gt;static __always_inline
bool parse_eth(struct ethhdr *eth, void *data_end, u16 *eth_type)
{
    u64 offset;

    offset = sizeof(*eth);
    if ((void *)eth + offset &amp;gt; data_end)
        return false;
    *eth_type = eth-&amp;gt;h_proto;
    return true;
}
&lt;/code&gt;&lt;/pre&gt;

&lt;p&gt;Running external code in the kernel involves certain risks. For instance, an infinite loop may freeze the kernel or a program may access an unrestricted area of memory. To avoid these potential hazards a verifier is run when the eBPF code is loaded. The verifier walks all possible code paths, checking our program doesn’t access out-of-range memory and there are not out of bound jumps. The verifier also ensures the program terminates in finite time.&lt;/p&gt;

&lt;p&gt;The snippets above conform our eBPF program. Now we just need to compile it (Full source code is available at: &lt;a href=&quot;https://github.com/dpino/xdp_ipv6_filter&quot;&gt;xdp_ipv6_filter&lt;/a&gt;).&lt;/p&gt;

&lt;pre&gt;&lt;code&gt;$ make
&lt;/code&gt;&lt;/pre&gt;

&lt;p&gt;Which generates &lt;code&gt;xdp_ipv6_filter.o&lt;/code&gt;, the eBPF object file.&lt;/p&gt;

&lt;p&gt;Now we’re going to load this object file into a network interface. There are two ways to do that:&lt;/p&gt;
&lt;ul&gt;
  &lt;li&gt;Write an &lt;em&gt;user-space&lt;/em&gt; program that loads the object file and attaches it to a network interface.&lt;/li&gt;
  &lt;li&gt;Use &lt;code&gt;iproute2&lt;/code&gt; to load the object file to an interface.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;For this example, I’m going to use the latter method.&lt;/p&gt;

&lt;p&gt;Currently there’s a limited amount of network interfaces that support XDP (&lt;em&gt;ixgbe&lt;/em&gt;, &lt;em&gt;i40e&lt;/em&gt;, &lt;em&gt;mlx5&lt;/em&gt;, &lt;em&gt;veth&lt;/em&gt;, &lt;em&gt;tap&lt;/em&gt;, &lt;em&gt;tun&lt;/em&gt;, &lt;em&gt;virtio_net&lt;/em&gt; and others), although the list is growing. Some of this network interfaces support XDP at driver level. That means, the XDP hook is implemented at the lowest point in the networking layer, just when the NIC receives a packet in the Rx ring. In other cases, the XDP hook is implemented at a higher point in the network stack. The former method offers better performance results, although the latter makes XDP available for any network interface.&lt;/p&gt;

&lt;p&gt;Luckily, &lt;code&gt;veth&lt;/code&gt; interfaces are supported by XDP. I’m going to create a &lt;code&gt;veth&lt;/code&gt; pair and attach the eBPF program to one of its ends. Remember that a &lt;code&gt;veth&lt;/code&gt; always comes in pairs. It’s like a virtual patch cable connecting two interfaces. Whatever is transmited in one of the ends arrives to the other end and viceversa.&lt;/p&gt;

&lt;pre&gt;&lt;code&gt;$ sudo ip link add dev veth0 type veth peer name veth1
$ sudo ip link set up dev veth0
$ sudo ip link set up dev veth1
&lt;/code&gt;&lt;/pre&gt;

&lt;p&gt;Now I attach the eBPF program to &lt;code&gt;veth1&lt;/code&gt;:&lt;/p&gt;

&lt;pre&gt;&lt;code&gt;$ sudo ip link set dev veth1 xdp object xdp_ipv6_filter.o
&lt;/code&gt;&lt;/pre&gt;

&lt;p&gt;You may have noticed I called the section for the eBPF program &lt;em&gt;“prog”&lt;/em&gt;. That’s the name of the section &lt;code&gt;iproute2&lt;/code&gt; expects to find and naming the section with a different name will result into an error.&lt;/p&gt;

&lt;p&gt;If the program was successfully loaded I should see an &lt;code&gt;xdp&lt;/code&gt; flag in the &lt;code&gt;veth1&lt;/code&gt; interface:&lt;/p&gt;

&lt;pre&gt;&lt;code&gt;$ sudo ip link sh veth1
8: veth1@veth0: &amp;lt;BROADCAST,MULTICAST,UP,LOWER_UP&amp;gt; mtu 1500 xdp qdisc noqueue state UP mode DEFAULT group default qlen 1000
    link/ether 32:05:fc:9a:d8:75 brd ff:ff:ff:ff:ff:ff
    prog/xdp id 32 tag bdb81fb6a5cf3154 jited
&lt;/code&gt;&lt;/pre&gt;

&lt;p&gt;To verify my program works as expected, I’m going to push a mix of IPv4 and IPv6 packets to &lt;code&gt;veth0&lt;/code&gt; (&lt;code&gt;ipv4-and-ipv6-data.pcap&lt;/code&gt;). My sample has a total of 20 packets (10 IPv4 and 10 IPv6). Before doing that though, I’m going to launch a &lt;code&gt;tcpdump&lt;/code&gt; program on &lt;code&gt;veth1&lt;/code&gt; which is ready to capture only 10 IPv6 packets.&lt;/p&gt;

&lt;pre&gt;&lt;code&gt;$ sudo tcpdump &quot;ip6&quot; -i veth1 -w captured.pcap -c 10
tcpdump: listening on veth1, link-type EN10MB (Ethernet), capture size 262144 bytes
&lt;/code&gt;&lt;/pre&gt;

&lt;p&gt;Send packets to &lt;code&gt;veth0&lt;/code&gt;:&lt;/p&gt;

&lt;pre&gt;&lt;code&gt;$ sudo tcpreplay -i veth0 ipv4-and-ipv6-data.pcap
&lt;/code&gt;&lt;/pre&gt;

&lt;p&gt;The filtered packets arrived at the other end. The &lt;code&gt;tcpdump&lt;/code&gt; program terminates since all the expected packets were received.&lt;/p&gt;

&lt;pre&gt;&lt;code&gt;10 packets captured
10 packets received by filter
0 packets dropped by kernel
&lt;/code&gt;&lt;/pre&gt;

&lt;p&gt;We can also print out &lt;code&gt;/sys/kernel/debug/tracing/trace_pipe&lt;/code&gt; to check the ethertype values listed:&lt;/p&gt;

&lt;pre&gt;&lt;code&gt;$ sudo cat /sys/kernel/debug/tracing/trace_pipe
tcpreplay-4496  [003] ..s1 15472.046835: 0: Debug: eth_type:0x86dd
tcpreplay-4496  [003] ..s1 15472.046847: 0: Debug: eth_type:0x86dd
tcpreplay-4496  [003] ..s1 15472.046855: 0: Debug: eth_type:0x86dd
tcpreplay-4496  [003] ..s1 15472.046862: 0: Debug: eth_type:0x86dd
tcpreplay-4496  [003] ..s1 15472.046869: 0: Debug: eth_type:0x86dd
tcpreplay-4496  [003] ..s1 15472.046878: 0: Debug: eth_type:0x800
tcpreplay-4496  [003] ..s1 15472.046885: 0: Debug: eth_type:0x800
tcpreplay-4496  [003] ..s1 15472.046892: 0: Debug: eth_type:0x800
tcpreplay-4496  [003] ..s1 15472.046903: 0: Debug: eth_type:0x800
tcpreplay-4496  [003] ..s1 15472.046911: 0: Debug: eth_type:0x800
...
&lt;/code&gt;&lt;/pre&gt;

&lt;p&gt;&lt;strong&gt;XDP: The future of in-kernel packet processing?&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;XDP started as a fast path for certain use cases, especially the ones which could result into an early packet drop (like a DDoS attack prevention solution). However, since a network function is nothing else but a combination of basic primitives (reads, writes, forwarding, dropping…), all of them available via XDP/eBPF, it could possible to use XDP for more than packet dropping. It could be used, in fact, to implement any network function.&lt;/p&gt;

&lt;p&gt;So what started as a fast path gradually is becoming the normal path. We’re seeing now how tools such as &lt;code&gt;iptables&lt;/code&gt; are getting rewritten in XDP/eBPF, keeping their user-level interfaces intact. The enormous performance gains of this new approach makes the effort worth it. And since the hunger for more performance gains never ends, it seems reasonable to think that any other tool that can be possibly written in XDP/eBPF will follow a similar fate.&lt;/p&gt;

&lt;figure&gt;&lt;img src=&quot;/dpino/files/2019/01/bpfilter.png&quot; title=&quot;iptables vs nftables vs bpfilter&quot; alt=&quot;iptables vs nftables vs bpfilter&quot; /&gt;&lt;figcaption style=&quot;text-align: center&quot;&gt;iptables vs nftables vs bpfilter&lt;/figcaption&gt;&lt;/figure&gt;

&lt;p&gt;Source: &lt;a href=&quot;https://cilium.io/blog/2018/04/17/why-is-the-kernel-community-replacing-iptables/&quot;&gt;Why is the kernel community replacing iptables with BPF?&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Summary&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;In this article I took a closer look at XDP. I explained the motivations that lead to its design. Through a simple example, I showed how XDP and eBPF work together to perform fast packet processing inside the kernel. XDP provides check points within the kernel’s network stack. An eBPF program can hook to XDP events to perform an operation on a packet (modify its headers, drop it, forward it, etc).&lt;/p&gt;

&lt;p&gt;XDP offers high-performance packet processing while maintaining interoperatibility with the rest of networking subsystem, an advantage over full kernel bypass solutions. I didn’t get much into the internals of XDP and how it interacts with other parts of the networking subsystem though. I encourage checking the first two links in the recommended readings section for further understanding on XDP internals.&lt;/p&gt;

&lt;p&gt;In the next article, the last in the series, I will cover the new AF_XDP socket address family and the implementation of a Snabb bridge for this new interface.&lt;/p&gt;

&lt;p&gt;Recommended readings:&lt;/p&gt;

&lt;ul&gt;
  &lt;li&gt;&lt;a href=&quot;https://raw.githubusercontent.com/tohojo/xdp-paper/master/xdp-the-express-data-path.pdf&quot;&gt;The eXpress Data Path: Fast Programmable Packet Processing in the Operating System Kernel&lt;/a&gt; by &lt;strong&gt;Høiland-Jørgensen&lt;/strong&gt;, &lt;strong&gt;Dangaard Brouer&lt;/strong&gt; and others. Paper published at CoNext ‘18 (December 4-7th, 2018). Possibly the best document published so far about XDP.&lt;/li&gt;
  &lt;li&gt;&lt;a href=&quot;http://vger.kernel.org/lpc_net2018_talks/presentation-lpc2018-xdp-future.pdf&quot;&gt;XDP - challenges and future work&lt;/a&gt; by &lt;strong&gt;Jesper Dangaard Brouer&lt;/strong&gt; &amp;amp; &lt;strong&gt;Toke Høiland-Jørgensen&lt;/strong&gt;. Talk at LPC (Linux Plumbers Conference) Networking Track. Status review of XDP and future plans. Complementary to the paper above.&lt;/li&gt;
  &lt;li&gt;&lt;a href=&quot;https://jvns.ca/blog/2017/04/07/xdp-bpf-tutorial/&quot;&gt;How to filter packets super fast: XDP &amp;amp; eBPF!&lt;/a&gt; by &lt;strong&gt;Julia Evans&lt;/strong&gt;. Another great tutorial from Julia. These are the notes of an eBPF/XDP tutorial carried by Jesper D. Bouer and Magnuss Karlsson at NetDev 2.2 2018 (Montreal).&lt;/li&gt;
  &lt;li&gt;&lt;a href=&quot;https://developers.redhat.com/blog/2018/12/06/achieving-high-performance-low-latency-networking-with-xdp-part-1/&quot;&gt;Achieving high-performance, low-latency networking with XDP: Part I&lt;/a&gt; by &lt;strong&gt;Paolo Abeni&lt;/strong&gt;. It explains how to load an eBPF program directly with iproute2. If you liked it, check out the &lt;a href=&quot;https://developers.redhat.com/blog/2018/12/17/using-xdp-maps-rhel8/&quot;&gt;follow up&lt;/a&gt;.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;Update (10/06/2020):&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;Unfortunately, I never wrote the following article in this series.&lt;/p&gt;

&lt;p&gt;Work on providing an XDP driver for Snabb was started at &lt;a href=&quot;https://github.com/snabbco/snabb/pull/1417&quot;&gt;Snabb#1417&lt;/a&gt; and a first implementation was available by the end of February 2019. The implementation relied heavily on some of the XDP examples available as part of the kernel sources. Performance wasn’t as great as I expected and soon I found myself focused on other things.&lt;/p&gt;

&lt;p&gt;Luckily, a few months after, &lt;a href=&quot;https://twitter.com/eugeneia_&quot;&gt;Max Rottenkolber&lt;/a&gt; took over this task. He implemented XDP support for Snabb from scratch, developing in the meantime other interesting tools and contributing significantly to the state of XDP in the Linux kernel. Max summarized all this work in a great blog post, which can be considered the successor in spirit to this series. Check it out here: &lt;a href=&quot;https://mr.gy/blog/snabb-xdp.html&quot;&gt;How to XDP (with Snabb)&lt;/a&gt;.&lt;/p&gt;

&lt;p&gt;I also recommend checking out &lt;a href=&quot;https://fosdem.org/2020/schedule/event/vita_high_speed_traffic_encryption_on_x86_64/&quot;&gt;Max’s talk at Fosdem 2020&lt;/a&gt; about his work on Vita, a high-performance site-to-site VPN gateway based on Snabb. XDP is also covered in his talk, although briefly.&lt;/p&gt;

</description>
        <pubDate>Thu, 10 Jan 2019 10:00:00 +0000</pubDate>
        <link>http://blogs.igalia.com/dpino/2019/01/10/the-express-data-path/</link>
        <guid isPermaLink="true">http://blogs.igalia.com/dpino/2019/01/10/the-express-data-path/</guid>
        
        <category>networking</category>
        
        <category>igalia</category>
        
        
      </item>
    
      <item>
        <title>A brief introduction to XDP and eBPF</title>
        <description>
&lt;p&gt;In a &lt;a href=&quot;https://blogs.igalia.com/dpino/2019/01/02/build-a-kernel/&quot;&gt;previous post&lt;/a&gt; I explained how to build a kernel with XDP (&lt;em&gt;eXpress Data Path&lt;/em&gt;) support. Having that feature enabled is mandatory in order to use it. XDP is a new Linux kernel component that highly improves packet processing performance.&lt;/p&gt;

&lt;p&gt;In the last years, we have seen an upraise of programming toolkits and techniques to overcome the limitations of the Linux kernel when it comes to do high-performance packet processing. One of the most popular techniques is &lt;strong&gt;kernel bypass&lt;/strong&gt; which means to skip the kernel’s networking layer and do all packet processing from &lt;em&gt;user-space&lt;/em&gt;. Kernel bypass also involves to manage the NIC from &lt;em&gt;user-space&lt;/em&gt;, in other words, to rely on an &lt;em&gt;user-space&lt;/em&gt; driver to handle the NIC.&lt;/p&gt;

&lt;p&gt;By giving full control of the NIC to an &lt;em&gt;user-space&lt;/em&gt; program, we reduce the overhead introduced by the kernel (context switching, networking layer processing, interruptions, etc), which is relevant enough when working at speeds of 10Gbps or higher. Kernel bypass plus a combination of other features (&lt;em&gt;batch packet processing&lt;/em&gt;) and performance tuning adjustments (&lt;em&gt;NUMA awareness&lt;/em&gt;, &lt;em&gt;CPU isolation&lt;/em&gt;, etc) conform the basis of high-performance &lt;em&gt;user-space&lt;/em&gt; networking. Perhaps the poster child of this new approach to packet processing is Intel’s &lt;a href=&quot;https://www.dpdk.org/&quot;&gt;DPDK&lt;/a&gt; (&lt;em&gt;Data Plane Development Kit&lt;/em&gt;), although other well-know toolkits and techniques are Cisco’s &lt;a href=&quot;https://fd.io/technology/&quot;&gt;VPP&lt;/a&gt; (&lt;em&gt;Vector Packet Processing&lt;/em&gt;), Netmap and of course &lt;a href=&quot;https://github.com/igalia/snabb&quot;&gt;Snabb&lt;/a&gt;.&lt;/p&gt;

&lt;p&gt;The disadvantages of &lt;em&gt;user-space&lt;/em&gt; networking are several:&lt;/p&gt;

&lt;ul&gt;
  &lt;li&gt;An OS’s kernel is an abstraction layer for hardware resources. Since &lt;em&gt;user-space&lt;/em&gt; programs need to manage their resources directly, they also need to manage their hardware. That often means to program their own drivers.&lt;/li&gt;
  &lt;li&gt;As the &lt;em&gt;kernel-space&lt;/em&gt; is completely skipped, all the networking functionality provided by the kernel is skipped too. &lt;em&gt;User-space&lt;/em&gt; programs need to reimplement functionality that might be already provided by the kernel or the OS.&lt;/li&gt;
  &lt;li&gt;Programs work as sandboxes, which severely limit their ability to interact, and be integrated, with other parts of the OS.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Essentially, &lt;em&gt;user-space&lt;/em&gt; networking achieves high-speed performance by moving packet-processing out of the kernel’s realm into &lt;em&gt;user-space&lt;/em&gt;. XDP does in fact the opposite: it moves &lt;em&gt;user-space&lt;/em&gt; networking programs (filters, mappers, routing, etc) into the kernel’s realm. XDP allow us to execute our network function as soon as a packet hits the NIC, and before it starts moving upwards into the kernel’s networking subsystem, which results into a significant increase of packet-processing speed. But how does the kernel make possible for an user to execute their programs within the kernel’s realm? Before answering this question we need to take a look at BPF.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;BPF and eBPF&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;Despite its somehow misleading name, BPF (Berkeley Packet Filtering) is in fact a virtual machine model. This VM was originally designed for packet filtering processing, thus its name.&lt;/p&gt;

&lt;p&gt;One of the most prominent users of BPF is the tool &lt;code&gt;tcpdump&lt;/code&gt;. When capturing packets with &lt;code&gt;tcpdump&lt;/code&gt;, an user can define a packet-filtering expression. Only packets that match that expression will actually be captured. For instance, the expression &lt;em&gt;“tcp dst port 80”&lt;/em&gt; captures all TCP packets which destination port equals to 80. This expression can be reduced by a compiler to BPF bytecode.&lt;/p&gt;

&lt;pre&gt;&lt;code&gt;$ sudo tcpdump -d &quot;tcp dst port 80&quot;
(000) ldh      [12]
(001) jeq      #0x86dd          jt 2    jf 6
(002) ldb      [20]
(003) jeq      #0x6             jt 4    jf 15
(004) ldh      [56]
(005) jeq      #0x50            jt 14   jf 15
(006) jeq      #0x800           jt 7    jf 15
(007) ldb      [23]
(008) jeq      #0x6             jt 9    jf 15
(009) ldh      [20]
(010) jset     #0x1fff          jt 15   jf 11
(011) ldxb     4*([14]&amp;amp;0xf)
(012) ldh      [x + 16]
(013) jeq      #0x50            jt 14   jf 15
(014) ret      #262144
(015) ret      #0
&lt;/code&gt;&lt;/pre&gt;

&lt;p&gt;Basically what the program above does is:&lt;/p&gt;

&lt;ul&gt;
  &lt;li&gt;&lt;strong&gt;Instruction (000)&lt;/strong&gt;: loads the packet’s offset 12, as a 16-bit word, into the accumulator. Offset 12 represents a packet’s ethertype.&lt;/li&gt;
  &lt;li&gt;&lt;strong&gt;Instruction (001)&lt;/strong&gt;: compares the value of the accumulator to 0x86dd, which is the ethertype value for IPv6. If the result is true, the program counter jumps to instruction (002), if not it jumps to (006).&lt;/li&gt;
  &lt;li&gt;&lt;strong&gt;Instruction (006)&lt;/strong&gt;: compares the value to 0x800 (ethertype value of IPv4). If true jump to (007), if not (015).&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;And so forth, until the packet-filtering program returns a result. This result is generally a boolean. Returning a non-zero value (instruction (014)) means the packet matched, whereas returning a zero value (instruction (015)) means the packet didn’t match.&lt;/p&gt;

&lt;p&gt;The BPF VM and its bytecode was introduced by Steve McCanne and Van Jacobson in late 1992, in their paper &lt;a href=&quot;http://www.tcpdump.org/papers/bpf-usenix93.pdf&quot;&gt;The BSD Packet Filter: A New Architecture for User-level Packet Capture&lt;/a&gt;, and it was presented for the first time at Usenix Conference Winter ‘93.&lt;/p&gt;

&lt;p&gt;Since BPF is a VM, it defines an environment where programs are executed. Besides a bytecode, it also defines a packet-based memory model (load instructions are implicitly done on the processing packet), registers (A and X; &lt;em&gt;Accumulator&lt;/em&gt; and &lt;em&gt;Index register&lt;/em&gt;), a scratch memory store and an implicit &lt;em&gt;Program Counter&lt;/em&gt;. Interestingly, BPF’s bytecode was modeled after the Motorola 6502 ISA. As Steve McCanne recalls in his &lt;a href=&quot;https://sharkfestus.wireshark.org/sharkfest.11/presentations/McCanne-Sharkfest'11_Keynote_Address.pdf&quot;&gt;Sharkfest ‘11 keynote&lt;/a&gt;, he was familiar with 6502 assembly from his junior high-school days programming on an Apple II and that influence him when he designed the BPF bytecode.&lt;/p&gt;

&lt;p&gt;The Linux kernel features BPF support since v2.5, mainly added by Jay Schullist. There were not major changes in the BPF code until 2011, when Eric Dumazet turned the BPF interpreter into a JIT (Source: &lt;a href=&quot;https://lwn.net/Articles/437981/&quot;&gt;A JIT for packet filters&lt;/a&gt;). Instead of interpreting BPF bytecode, now the kernel was able to translate BPF programs directly to a target architecture: x86, ARM, MIPS, etc.&lt;/p&gt;

&lt;p&gt;Later on, in 2014, Alexei Starovoitov introduced a new BPF JIT. This new JIT was actually a new architecture based on BPF, known as eBPF. Both VMs co-existed for some time I think, but nowadays packet-filtering is implemented on top of eBPF. In fact, a lot of documentation refers now to eBPF as BPF, and the classic BPF is known as cBPF.&lt;/p&gt;

&lt;p&gt;eBPF extends the classic BPF virtual machine in several ways:&lt;/p&gt;

&lt;ul&gt;
  &lt;li&gt;&lt;strong&gt;Takes advantage of modern 64-bit architectures&lt;/strong&gt;. eBPF uses 64-bit registers and increases the number of available registers from 2 (Accumulator and X register) to 10. eBPF also extends the number of opcodes (&lt;em&gt;BPF_MOV&lt;/em&gt;, &lt;em&gt;BPF_JNE&lt;/em&gt;, &lt;em&gt;BPF_CALL&lt;/em&gt;…).&lt;/li&gt;
  &lt;li&gt;&lt;strong&gt;Decoupled from the networking subsystem&lt;/strong&gt;. BPF was bounded to a packet-based data model. Since it was used for packet filtering, its code lived within the networking subsystem. However, the eBPF VM is no longer bounded to a data model and it can be used for any purpose. It’s possible to attach now an eBPF program to a tracepoint or to a kprobe. This opens up the door of eBPF to instrumentation, performance analysis and many more uses within other kernel subsystems. The eBPF code lives now at its own path: &lt;a href=&quot;https://github.com/torvalds/linux/tree/master/kernel/bpf&quot;&gt;kernel/bpf&lt;/a&gt;.&lt;/li&gt;
  &lt;li&gt;Global data stores called &lt;strong&gt;Maps&lt;/strong&gt;. Maps are key-value stores that allow the interchange of data between &lt;em&gt;user-space&lt;/em&gt; and &lt;em&gt;kernel-space&lt;/em&gt;. eBPF provides several types of Maps.&lt;/li&gt;
  &lt;li&gt;&lt;strong&gt;Helper functions&lt;/strong&gt;. Such as packet rewrite, checksum calculation or packet cloning. Unlike &lt;em&gt;user-space&lt;/em&gt; programming, these functions get executed inside the kernel. In addition, it’s possible to execute system calls from eBPF programs.&lt;/li&gt;
  &lt;li&gt;&lt;strong&gt;Tail-calls&lt;/strong&gt;. eBPF programs are limited to 4096 bytes. The tail-call feature allows a eBPF program to pass control a new eBPF program, overcoming this limitation (up to 32 programs can be chained).&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;eBPF: an example&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;The Linux kernel sources include several eBPF examples. They’re available at &lt;a href=&quot;https://github.com/torvalds/linux/tree/v4.19/samples/bpf&quot;&gt;samples/bpf/&lt;/a&gt;. To compile these examples simply type:&lt;/p&gt;

&lt;pre&gt;&lt;code&gt;$ sudo make samples/bpf/
&lt;/code&gt;&lt;/pre&gt;

&lt;p&gt;Instead of coding a new eBPF example, I’m going to reuse one of the samples available in &lt;code&gt;samples/bpf/&lt;/code&gt;. I will go through some parts of the code and explain how it works. The example I chose was the &lt;code&gt;tracex4&lt;/code&gt; program.&lt;/p&gt;

&lt;p&gt;Generally, all the examples at &lt;code&gt;samples/bpf/&lt;/code&gt; consist of 2 files. In this case:&lt;/p&gt;
&lt;ul&gt;
  &lt;li&gt;&lt;a href=&quot;https://raw.githubusercontent.com/torvalds/linux/v4.19/samples/bpf/tracex4_kern.c&quot;&gt;tracex4_kern.c&lt;/a&gt;, contains the source code to be executed in the kernel as eBPF bytecode.&lt;/li&gt;
  &lt;li&gt;&lt;a href=&quot;https://raw.githubusercontent.com/torvalds/linux/v4.19/samples/bpf/tracex4_user.c&quot;&gt;tracex4_user.c&lt;/a&gt;, contains the &lt;em&gt;user-space&lt;/em&gt; program.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;We need to compile then &lt;code&gt;tracex4_kern.c&lt;/code&gt; to eBPF bytecode. At this moment, &lt;code&gt;gcc&lt;/code&gt; lacks a backend for eBPF. Luckily, &lt;code&gt;clang&lt;/code&gt; can emit eBPF bytecode. The &lt;a href=&quot;https://github.com/torvalds/linux/blob/v4.19/samples/bpf/Makefile#L266&quot;&gt;Makefile&lt;/a&gt; uses &lt;code&gt;clang&lt;/code&gt; to compile &lt;code&gt;tracex4_kern.c&lt;/code&gt; into an object file.&lt;/p&gt;

&lt;p&gt;I commented earlier that one of the most interesting features of eBPF are Maps. Maps are key/value stores that allow to exchange data between &lt;em&gt;user-space&lt;/em&gt; and &lt;em&gt;kernel-space&lt;/em&gt; programs. &lt;code&gt;tracex4_kern&lt;/code&gt; defines one map:&lt;/p&gt;

&lt;pre&gt;&lt;code&gt;struct pair {
    u64 val;
    u64 ip;
};  

struct bpf_map_def SEC(&quot;maps&quot;) my_map = {
    .type = BPF_MAP_TYPE_HASH,
    .key_size = sizeof(long),
    .value_size = sizeof(struct pair),
    .max_entries = 1000000,
};
&lt;/code&gt;&lt;/pre&gt;

&lt;p&gt;BPF_MAP_TYPE_HASH is one of the many Map types offered by eBPF. In this case, it’s simply a hash. You may also have noticed the &lt;code&gt;SEC(&quot;maps&quot;)&lt;/code&gt; declaration. SEC is a macro used to create a new section in the binary. Actually the &lt;code&gt;tracex4_kern&lt;/code&gt; example defines two more sections:&lt;/p&gt;

&lt;pre&gt;&lt;code&gt;SEC(&quot;kprobe/kmem_cache_free&quot;)
int bpf_prog1(struct pt_regs *ctx)
{   
    long ptr = PT_REGS_PARM2(ctx);

    bpf_map_delete_elem(&amp;amp;my_map, &amp;amp;ptr); 
    return 0;
}
    
SEC(&quot;kretprobe/kmem_cache_alloc_node&quot;) 
int bpf_prog2(struct pt_regs *ctx)
{
    long ptr = PT_REGS_RC(ctx);
    long ip = 0;

    // get ip address of kmem_cache_alloc_node() caller
    BPF_KRETPROBE_READ_RET_IP(ip, ctx);

    struct pair v = {
        .val = bpf_ktime_get_ns(),
        .ip = ip,
    };
    
    bpf_map_update_elem(&amp;amp;my_map, &amp;amp;ptr, &amp;amp;v, BPF_ANY);
    return 0;
}   
&lt;/code&gt;&lt;/pre&gt;

&lt;p&gt;These two functions will allow us to delete an entry from a map (&lt;em&gt;kprobe/kmem_cache_free&lt;/em&gt;) and to add a new entry to a map (&lt;em&gt;kretprobe/kmem_cache_alloc_node&lt;/em&gt;). All the function calls in capital letters are actually macros defined at &lt;a href=&quot;https://raw.githubusercontent.com/torvalds/linux/v4.19/tools/testing/selftests/bpf/bpf_helpers.h&quot;&gt;bpf_helpers.h&lt;/a&gt;.&lt;/p&gt;

&lt;p&gt;If I dump the sections of the object file, I should be able to see these new sections defined:&lt;/p&gt;

&lt;pre&gt;&lt;code&gt;$ objdump -h tracex4_kern.o

tracex4_kern.o:     file format elf64-little

Sections:
Idx Name          Size      VMA               LMA               File off  Algn
  0 .text         00000000  0000000000000000  0000000000000000  00000040  2**2
                  CONTENTS, ALLOC, LOAD, READONLY, CODE
  1 kprobe/kmem_cache_free 00000048  0000000000000000  0000000000000000  00000040  2**3
                  CONTENTS, ALLOC, LOAD, RELOC, READONLY, CODE
  2 kretprobe/kmem_cache_alloc_node 000000c0  0000000000000000  0000000000000000  00000088  2**3
                  CONTENTS, ALLOC, LOAD, RELOC, READONLY, CODE
  3 maps          0000001c  0000000000000000  0000000000000000  00000148  2**2
                  CONTENTS, ALLOC, LOAD, DATA
  4 license       00000004  0000000000000000  0000000000000000  00000164  2**0
                  CONTENTS, ALLOC, LOAD, DATA
  5 version       00000004  0000000000000000  0000000000000000  00000168  2**2
                  CONTENTS, ALLOC, LOAD, DATA
  6 .eh_frame     00000050  0000000000000000  0000000000000000  00000170  2**3
                  CONTENTS, ALLOC, LOAD, RELOC, READONLY, DATA
&lt;/code&gt;&lt;/pre&gt;

&lt;p&gt;Then there is &lt;a href=&quot;https://raw.githubusercontent.com/torvalds/linux/v4.19/samples/bpf/tracex4_user.c&quot;&gt;tracex4_user.c&lt;/a&gt;, the main program. Basically what the program does is to listen to &lt;code&gt;kmem_cache_alloc_node&lt;/code&gt; events. When that event happens, the corresponding eBPF code is executed. The code stores the IP attribute of an object into a map, which is printed in loop in the main program. Example:&lt;/p&gt;

&lt;pre&gt;&lt;code&gt;$ sudo ./tracex4
obj 0xffff8d6430f60a00 is  2sec old was allocated at ip ffffffff9891ad90
obj 0xffff8d6062ca5e00 is 23sec old was allocated at ip ffffffff98090e8f
obj 0xffff8d5f80161780 is  6sec old was allocated at ip ffffffff98090e8f
&lt;/code&gt;&lt;/pre&gt;

&lt;p&gt;How the &lt;em&gt;user-space&lt;/em&gt; program and the eBPF program are connected? On initialization, &lt;code&gt;tracex4_user.c&lt;/code&gt; loads the &lt;code&gt;tracex4_kern.o&lt;/code&gt; object file using the &lt;code&gt;load_bpf_file&lt;/code&gt; function.&lt;/p&gt;

&lt;pre&gt;&lt;code&gt;int main(int ac, char **argv)
{
    struct rlimit r = {RLIM_INFINITY, RLIM_INFINITY};
    char filename[256];
    int i;

    snprintf(filename, sizeof(filename), &quot;%s_kern.o&quot;, argv[0]);

    if (setrlimit(RLIMIT_MEMLOCK, &amp;amp;r)) {
        perror(&quot;setrlimit(RLIMIT_MEMLOCK, RLIM_INFINITY)&quot;);
        return 1;
    }

    if (load_bpf_file(filename)) {
        printf(&quot;%s&quot;, bpf_log_buf);
        return 1;
    }

    for (i = 0; ; i++) {
        print_old_objects(map_fd[1]);
        sleep(1);
    }

    return 0;
}
&lt;/code&gt;&lt;/pre&gt;

&lt;p&gt;When &lt;a href=&quot;https://github.com/torvalds/linux/blob/v4.19/samples/bpf/bpf_load.c#L492&quot;&gt;load_bpf_file&lt;/a&gt; is executed, the probes defined in the eBPF file are added to &lt;code&gt;/sys/kernel/debug/tracing/kprobe_events&lt;/code&gt;. We’re listening now to those events and our program can do something when they happen.&lt;/p&gt;

&lt;pre&gt;&lt;code&gt;$ sudo cat /sys/kernel/debug/tracing/kprobe_events
p:kprobes/kmem_cache_free kmem_cache_free
r:kprobes/kmem_cache_alloc_node kmem_cache_alloc_node
&lt;/code&gt;&lt;/pre&gt;

&lt;p&gt;All the other programs in &lt;code&gt;sample/bpf/&lt;/code&gt; follow a similar structure. There’s always two files:&lt;/p&gt;

&lt;ul&gt;
  &lt;li&gt;&lt;em&gt;XXX_kern.c&lt;/em&gt;: the eBPF program.&lt;/li&gt;
  &lt;li&gt;&lt;em&gt;XXX_user.c&lt;/em&gt;: the main program.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;The eBPF program defines Maps and functions hooked to a binary section. When the kernel emits a certain type of event (a tracepoint, for instance) our hooks will be executed. Maps are used to exchange data between the kernel program and the &lt;em&gt;user-space&lt;/em&gt; program.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Wrapping up&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;In this article I have covered BPF and eBPF from a high-level view. I’m aware there’s a lot of resources and information nowadays about eBPF, but I feel I needed to explain it with my own words. Please check out the list of recommended readings for further information.&lt;/p&gt;

&lt;p&gt;On the next article I will cover XDP and its relation with eBPF.&lt;/p&gt;

&lt;p&gt;Recommended readings:&lt;/p&gt;

&lt;ul&gt;
  &lt;li&gt;&lt;a href=&quot;https://lwn.net/Articles/599755/&quot;&gt;BPF: the universal in-kernel virtual machine&lt;/a&gt; by Jonathan Corbet. An introduction to BPF and its evolution towards eBPF.&lt;/li&gt;
  &lt;li&gt;&lt;a href=&quot;https://lwn.net/Articles/740157/&quot;&gt;A thorough introduction to eBPF&lt;/a&gt; by Brendan Gregg. Article by LWN.net. Brendan tweets often about eBPF and maintains a list of resources in his &lt;a href=&quot;http://www.brendangregg.com/&quot;&gt;personal blog&lt;/a&gt;.&lt;/li&gt;
  &lt;li&gt;&lt;a href=&quot;https://jvns.ca/blog/2017/06/28/notes-on-bpf---ebpf/&quot;&gt;Notes on BPF &amp;amp; eBPF&lt;/a&gt; by Julia Evans. Notes on Suchakra Sharma’s presentation “The BSD Packet Filter: A New Architecture for User-level Packet Capture”. The notes are of good quality and really helpful to digest the slides.&lt;/li&gt;
  &lt;li&gt;&lt;a href=&quot;https://ferrisellis.com/posts/ebpf_past_present_future/&quot;&gt;eBPF, part1: Past, Present and Future&lt;/a&gt; by Ferris Ellis. A long read, with a &lt;a href=&quot;https://ferrisellis.com/posts/ebpf_syscall_and_maps/&quot;&gt;follow-up&lt;/a&gt;, but time worth invested. One of the best articles I’ve read so far about eBPF.&lt;/li&gt;
&lt;/ul&gt;
</description>
        <pubDate>Mon, 07 Jan 2019 08:00:00 +0000</pubDate>
        <link>http://blogs.igalia.com/dpino/2019/01/07/a-brief-introduction-to-xdp-and-ebpf/</link>
        <guid isPermaLink="true">http://blogs.igalia.com/dpino/2019/01/07/a-brief-introduction-to-xdp-and-ebpf/</guid>
        
        <category>networking</category>
        
        <category>igalia</category>
        
        
      </item>
    
      <item>
        <title>How to build a kernel with XDP support</title>
        <description>
&lt;p&gt;&lt;strong&gt;Update&lt;/strong&gt; (2019/01/10): This post explains how to build a kernel with AF_XDP support (rather than XDP support). XDP support in the kernel is made available through &lt;em&gt;CONFIG_BPF_SYSCALL&lt;/em&gt;. XDP is shipped in the kernel since v4.8, and it’s usually enabled by default. AF_XDP is a new socket address family that relies on XDP to pass a packet from &lt;em&gt;kernel-space&lt;/em&gt; directly to &lt;em&gt;user-space&lt;/em&gt; using zero-copy. Thanks to &lt;a href=&quot;https://twitter.com/qeole/status/1083329706827661313&quot;&gt;Quentin Monet&lt;/a&gt; for the correction.&lt;/p&gt;

&lt;p&gt;This post is the first one of a series about XDP (&lt;em&gt;eXpress Data Path&lt;/em&gt;), the brand-new kernel component for doing fast packet processing.&lt;/p&gt;

&lt;p&gt;Lately I’ve been in the quest of adding XDP support in Snabb. This work was actually started by one of our &lt;a href=&quot;https://www.igalia.com/about-us/coding-experience&quot;&gt;Coding Experience&lt;/a&gt; students, &lt;a href=&quot;https://github.com/djkonro&quot;&gt;Konrad Djimeli&lt;/a&gt;. Unfortunately Konrad had to leave before completing his Coding Experience, so I picked up his work and finished it. The posts ahead are an attempt to summarize my findings about XDP. In this one, I explain how to get a kernel ready for starting with XDP.&lt;/p&gt;

&lt;p&gt;Building a kernel with XDP support is simply a matter of enabling that feature in the kernel’s &lt;code&gt;.config&lt;/code&gt; file (&lt;code&gt;CONFIG_XDP_SOCKETS=y&lt;/code&gt;). If you’re familiar with building kernels and installing them, you’re done. Besides enabling this option, there’s nothing else you need to do.&lt;/p&gt;

&lt;p&gt;However, if you’re not familiar with the process of building a kernel, a recap might be handy. In my case, it was ages since the last time I built a kernel. Here some instructions:&lt;/p&gt;

&lt;p&gt;First things first, fetch the Linux kernel source code:&lt;/p&gt;

&lt;pre&gt;&lt;code&gt;$ git clone https://github.com/torvalds/linux.git 
&lt;/code&gt;&lt;/pre&gt;

&lt;p&gt;You can chose whether to build &lt;em&gt;master&lt;/em&gt; or any of the latest kernel releases. In my case, I went with v4.19:&lt;/p&gt;

&lt;pre&gt;&lt;code&gt;$ git checkout v4.19 -b v4.19
&lt;/code&gt;&lt;/pre&gt;

&lt;p&gt;Now you should run &lt;code&gt;make menuconfig&lt;/code&gt; and pick the features you’d like to build in your kernel. Once you’re done, a configuration file with the selected options will be written in &lt;code&gt;.config&lt;/code&gt;. However, there’s an easier way if what you really need is a &lt;code&gt;.config&lt;/code&gt; file to start with. Simply copy the configuration file of some other kernel in your system (each kernel in &lt;code&gt;/boot&lt;/code&gt; has its corresponding &lt;code&gt;.config&lt;/code&gt; file).&lt;/p&gt;

&lt;pre&gt;&lt;code&gt;$ uname -r
4.15.0-43-generic
$ cp /boot/config-4.15.0-43-generic ./.config
&lt;/code&gt;&lt;/pre&gt;

&lt;p&gt;Edit &lt;code&gt;.config&lt;/code&gt; and add XDP support:&lt;/p&gt;

&lt;pre&gt;&lt;code&gt;CONFIG_XDP_SOCKETS=y
&lt;/code&gt;&lt;/pre&gt;

&lt;p&gt;And we start building now:&lt;/p&gt;

&lt;pre&gt;&lt;code&gt;$ sudo make -j4 &amp;amp;&amp;amp; sudo make modules_install INSTALL_MOD_STRIP=1
&lt;/code&gt;&lt;/pre&gt;

&lt;p&gt;That’s very standard, except for &lt;em&gt;INSTALL_MOD_STRIP&lt;/em&gt;. What’s that for? Once the kernel is built, in some cases the &lt;code&gt;initrd&lt;/code&gt; image is so big it doesn’t fit in &lt;code&gt;/boot&lt;/code&gt;. If you have trouble with that, simply remove the table symbols of the binary to make it smaller. That’s what &lt;em&gt;INSTALL_MOD_STRIP&lt;/em&gt; does. The size of &lt;code&gt;initrd&lt;/code&gt; will be considerably reduced, likely fitting in &lt;code&gt;/boot&lt;/code&gt;.&lt;/p&gt;

&lt;p&gt;Lastly, run &lt;code&gt;make install&lt;/code&gt; to actually install your kernel. That command also updates Grub, so no need to run it manually.&lt;/p&gt;

&lt;pre&gt;&lt;code&gt;$ sudo make install
&lt;/code&gt;&lt;/pre&gt;

&lt;p&gt;Optionally, run &lt;code&gt;make headers_install&lt;/code&gt; if you plan to use the kernel’s header files from an user-space program.&lt;/p&gt;

&lt;p&gt;Regarding the brand new kernel, it’s recommended to not make it your default kernel until you’ve verified it works. Once you’ve checked that, make it the default by following these steps:&lt;/p&gt;

&lt;ul&gt;
  &lt;li&gt;List Grub’s menu entries by running the following command: &lt;code&gt;grep -i &quot;menuentry_id_option&quot; /boot/grub/grub.cfg&lt;/code&gt;. The literal after &lt;code&gt;menuentry_id_option&lt;/code&gt; is the kernel’s identifier. For instance, &lt;code&gt;gnulinux-4.19.0-advanced-886d53cd-5893-4e51-ac10-2282e653e0b9&lt;/code&gt;.&lt;/li&gt;
  &lt;li&gt;Edit &lt;code&gt;/etc/default/grub&lt;/code&gt; and set &lt;code&gt;GRUB_DEFAULT&lt;/code&gt; to the kernel’s identifier. In my case, &lt;code&gt;GRUB_DEFAULT=gnulinux-4.19.0-advanced-886d53cd-5893-4e51-ac10-2282e653e0b9&lt;/code&gt;.&lt;/li&gt;
  &lt;li&gt;Run &lt;code&gt;sudo update-grub&lt;/code&gt;, to apply the changes.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;We are ready now to use XDP to process packets. But how to do that will be the subject of a new post.&lt;/p&gt;
</description>
        <pubDate>Wed, 02 Jan 2019 08:00:00 +0000</pubDate>
        <link>http://blogs.igalia.com/dpino/2019/01/02/build-a-kernel/</link>
        <guid isPermaLink="true">http://blogs.igalia.com/dpino/2019/01/02/build-a-kernel/</guid>
        
        <category>networking</category>
        
        
      </item>
    
      <item>
        <title>YANG alarms</title>
        <description>
&lt;p&gt;Alarm management is a fundamental part of network monitoring. The motivation for defining a standard alarm interface for network devices isn’t new. In the early 90s, ITU-T standardized &lt;a href=&quot;https://www.itu.int/ITU-T/recommendations/rec.aspx?rec=3071&quot;&gt;X.733&lt;/a&gt; (OSI model). This continued in mobile networks with the standardization of &lt;a href=&quot;http://www.tech-invite.com/3m32/tinv-3gpp-32-111-3.html&quot;&gt;Alarm IRP&lt;/a&gt; (&lt;em&gt;Integration Reference Point&lt;/em&gt;) by 3GPP. In TCP/IP networks, SNMP is the preferred choice for network management, along with &lt;em&gt;ad hoc&lt;/em&gt; tools (usually command-line scripts). In SNMP, object information is stored as MIBs (&lt;em&gt;Management Information Base&lt;/em&gt;), formal descriptions of the network objects that can be managed. Usually MIBs have a tree structure.&lt;/p&gt;

&lt;p&gt;The IETF didn’t early on standardize an alarm MIB. Instead, management systems interpreted the enterprise specific traps per MIB to build an alarm list. When finally &lt;a href=&quot;https://tools.ietf.org/html/rfc3877&quot;&gt;RFC 3877&lt;/a&gt; (&lt;em&gt;Alarm Management Information Base MIB&lt;/em&gt;) was published, it had to address the existence of these enterprise traps and map them into alarms. This requirement led to a MIB that was not easy to use.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Introducing NETCONF and YANG&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;SNMP is still the dominant protocol for network management, although it has start showing its age. In the last years, several alternatives were proposed with the goal of replacing it. Among all proposals, the most promising alternative is NETCONF (&lt;a href=&quot;https://tools.ietf.org/html/rfc6241&quot;&gt;RFC 6241&lt;/a&gt;: &lt;em&gt;Network Configuration Protocol&lt;/em&gt;). NETCONF is, like SNMP, a network management protocol. It provides mechanisms to install, manipulate, and delete the configuration of network devices. NETCONF uses an RPC mechanism to execute its operations, whereas protocol messages are encoded in XML (or JSON).&lt;/p&gt;

&lt;p&gt;The NETMOD WG (&lt;em&gt;NETCONF Data Modeling Working Group&lt;/em&gt;) defines the semantics of operational data, configuration data, notifications and operations, using a data modeling language called YANG (See &lt;a href=&quot;https://tools.ietf.org/html/rfc6020&quot;&gt;RFC 6020&lt;/a&gt; and &lt;a href=&quot;https://tools.ietf.org/html/rfc6021&quot;&gt;RFC 6021&lt;/a&gt;).&lt;/p&gt;

&lt;p&gt;YANG is a very rich language. It allows to define much more complex data structures than other modeling languages such DTD or XML-Schema. For instance, YANG features a wide range of primitive data types (&lt;em&gt;uint32&lt;/em&gt;, &lt;em&gt;string&lt;/em&gt;,  &lt;em&gt;boolean&lt;/em&gt;, &lt;em&gt;decimal64&lt;/em&gt;, etc), simple data (&lt;em&gt;leaf&lt;/em&gt;), structured data elements (&lt;em&gt;container&lt;/em&gt;, &lt;em&gt;list&lt;/em&gt;, &lt;em&gt;list-leaf&lt;/em&gt;), definition of customized types (&lt;em&gt;typedef&lt;/em&gt;), definition of remote procedure calls, references (&lt;em&gt;instance-ref&lt;/em&gt;, &lt;em&gt;leaf-ref&lt;/em&gt;), notifications, etc.&lt;/p&gt;

&lt;p&gt;Take the following model as example:&lt;/p&gt;

&lt;figure class=&quot;highlight&quot;&gt;&lt;pre&gt;&lt;code class=&quot;language-text&quot; data-lang=&quot;text&quot;&gt;&lt;span&gt;&lt;/span&gt;container students {
   list student {
      leaf name {
         type string;
      }
      leaf data-birth {
         type yang:date;
      }
   }
}

students {
   student { name &amp;quot;Jane&amp;quot;; date-of-birth &amp;quot;01-01-1995&amp;quot;; }
   student { name &amp;quot;John&amp;quot;; date-of-birth &amp;quot;31-03-1995&amp;quot;; }
}&lt;/code&gt;&lt;/pre&gt;&lt;/figure&gt;

&lt;p&gt;That very same model could be written in DTD/XML form as:&lt;/p&gt;

&lt;figure class=&quot;highlight&quot;&gt;&lt;pre&gt;&lt;code class=&quot;language-xml&quot; data-lang=&quot;xml&quot;&gt;&lt;span&gt;&lt;/span&gt;&lt;span class=&quot;cp&quot;&gt;&amp;lt;?xml version=&amp;quot;1.0&amp;quot;?&amp;gt;&lt;/span&gt;
&lt;span class=&quot;cp&quot;&gt;&amp;lt;!DOCTYPE note [&lt;/span&gt;
&lt;span class=&quot;cp&quot;&gt;&amp;lt;!ELEMENT students (student*)&amp;gt;&lt;/span&gt;
&lt;span class=&quot;cp&quot;&gt;&amp;lt;!ELEMENT student (name,date-of-birth)&amp;gt;&lt;/span&gt;
&lt;span class=&quot;cp&quot;&gt;&amp;lt;!ELEMENT name (#PCDATA)&amp;gt;&lt;/span&gt;
&lt;span class=&quot;cp&quot;&gt;&amp;lt;!ELEMENT date-of-birth (#PCDATA)&amp;gt;&lt;/span&gt;
]&amp;gt;
&lt;span class=&quot;nt&quot;&gt;&amp;lt;students&amp;gt;&lt;/span&gt;
   &lt;span class=&quot;nt&quot;&gt;&amp;lt;student&amp;gt;&lt;/span&gt;
      &lt;span class=&quot;nt&quot;&gt;&amp;lt;name&amp;gt;&lt;/span&gt;Jane&lt;span class=&quot;nt&quot;&gt;&amp;lt;/name&amp;gt;&lt;/span&gt;
      &lt;span class=&quot;nt&quot;&gt;&amp;lt;date_of_birth&amp;gt;&lt;/span&gt;01-01-1995&lt;span class=&quot;nt&quot;&gt;&amp;lt;/date_of_birth&amp;gt;&lt;/span&gt;
   &lt;span class=&quot;nt&quot;&gt;&amp;lt;/student&amp;gt;&lt;/span&gt;
   &lt;span class=&quot;nt&quot;&gt;&amp;lt;student&amp;gt;&lt;/span&gt;
      &lt;span class=&quot;nt&quot;&gt;&amp;lt;name&amp;gt;&lt;/span&gt;John&lt;span class=&quot;nt&quot;&gt;&amp;lt;/name&amp;gt;&lt;/span&gt;
      &lt;span class=&quot;nt&quot;&gt;&amp;lt;date_of_birth&amp;gt;&lt;/span&gt;31-01-1995&lt;span class=&quot;nt&quot;&gt;&amp;lt;/date_of_birth&amp;gt;&lt;/span&gt;
   &lt;span class=&quot;nt&quot;&gt;&amp;lt;/student&amp;gt;&lt;/span&gt;
&lt;span class=&quot;nt&quot;&gt;&amp;lt;/students&amp;gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/figure&gt;

&lt;p&gt;One obvious difference is that field &lt;code&gt;date-of-birth&lt;/code&gt; is encoded as a string in the DTD/XML model. On the contrary, it’s defined as a date in the YANG model. Supporting date as a native data type in the language improves value checking. If &lt;code&gt;date-of-birth&lt;/code&gt; is not a valid date, our YANG library will report the error.&lt;/p&gt;

&lt;p&gt;YANG also allows to compose several YANG modules into one single document. Data types from a different module can be accessed via namespace, as in the example above in the case of &lt;code&gt;yang:date&lt;/code&gt;.&lt;/p&gt;

&lt;p&gt;Covering all its aspects of YANG would require a blog post on its own, so I will end here this introduction. Summarizing, the two main ideas I’d like to highlight are the following:&lt;/p&gt;
&lt;ul&gt;
  &lt;li&gt;NETCONF is a relatively new network management protocol, aimed to replace SNMP, tightly coupled with YANG.&lt;/li&gt;
  &lt;li&gt;YANG is the data modeling language used to define NETCONF’s data models.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;YANG alarms&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;The YANG alarms module is defined in &lt;a href=&quot;https://www.ietf.org/archive/id/draft-vallin-netmod-alarm-module-02.txt&quot;&gt;draft-vallin-netmod-alarm-module-02.txt&lt;/a&gt;. Its implementation in Snabb was sponsored, as most lwAFTR related work, by Deutsche-Telekom. The module specification is still a draft but even in this state it features enough functionality to make an implementation valuable.&lt;/p&gt;

&lt;p&gt;The implementation uses Snabb’s native YANG library and Snabb’s &lt;code&gt;config&lt;/code&gt; tool, a simple implementation of NETCONF. Both tools were mostly developed by Igalia (more precisely by my colleagues &lt;a href=&quot;https://twitter.com/andywingo&quot;&gt;Andy Wingo&lt;/a&gt; and &lt;a href=&quot;https://twitter.com/Tsyesika&quot;&gt;Jessica Tallon&lt;/a&gt;), also as part of the work of Snabb’s lwAFTR.&lt;/p&gt;

&lt;p&gt;At a high level view, the YANG alarms module is organized in two parts:&lt;/p&gt;
&lt;ul&gt;
  &lt;li&gt;&lt;strong&gt;Configuration data&lt;/strong&gt;: stores all the attributes and variables that control how the module should operate. For example, &lt;em&gt;max-alarm-status-changes&lt;/em&gt; controls the size of an alarm status-change list (default: 32); &lt;em&gt;notify-status-changes&lt;/em&gt;, controls whether notifications are sent on alarms status updates.&lt;/li&gt;
  &lt;li&gt;&lt;strong&gt;State data&lt;/strong&gt;: actually stores alarm information and consists of 4 containers: &lt;em&gt;alarm-list&lt;/em&gt;, &lt;em&gt;alarm-inventory&lt;/em&gt;, &lt;em&gt;shelved-alarms&lt;/em&gt; and &lt;em&gt;summary&lt;/em&gt;.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;The main component of the state data container is the &lt;code&gt;alarm-list&lt;/code&gt; container:&lt;/p&gt;

&lt;figure class=&quot;highlight&quot;&gt;&lt;pre&gt;&lt;code class=&quot;language-text&quot; data-lang=&quot;text&quot;&gt;&lt;span&gt;&lt;/span&gt;list alarm {
   key &amp;quot;resource alarm-type-id alarm-type-qualifier&amp;quot;;

   uses common-alarm-parameters;
}

grouping common-alarm-parameters {
   leaf resource {
      type resource;
      mandatory true;
   }
   leaf alarm-type-id {
      type alarm-type-id;
      mandatory true;
   }
   leaf alarm-type-qualifier {
      type alarm-type-qualifier;
   }
}&lt;/code&gt;&lt;/pre&gt;&lt;/figure&gt;

&lt;p&gt;The &lt;code&gt;alarm-list&lt;/code&gt; container stores all the active alarms managed in the system. But before going any further, we should define what an alarm is. Basically, an alarm is a persistent indication of a fault that clears only when its triggering condition has been resolved. An active alarm is always in at least these two states: &lt;strong&gt;raised&lt;/strong&gt; or &lt;strong&gt;cleared&lt;/strong&gt;.&lt;/p&gt;

&lt;p&gt;When an alarm is raised a new entry is created in &lt;code&gt;alarm-list&lt;/code&gt;. An alarm is identified by the triple: &lt;em&gt;{resource, alarm-type-id, alarm-type-qualifier}&lt;/em&gt;, describing the resource that is affected, a type of alarm identifier and a qualifier that contains other optional information. Besides this information, an alarm also stores other information (omitted in the example for simplification) such as whether the alarm &lt;em&gt;is-cleared&lt;/em&gt;, its &lt;em&gt;last-changed&lt;/em&gt; timestamp, &lt;em&gt;perceived-severity&lt;/em&gt; and a list of status changes. When an alarm is created, a new item is created in this list. If later the alarm increases or decreases its priority, or changes some other properties as per defined in the standard, a new status change is added to this list.&lt;/p&gt;

&lt;p&gt;Most of the YANG Alarms module business logic is implemented in &lt;a href=&quot;https://github.com/Igalia/snabb/blob/lwaftr/src/lib/yang/alarms.lua&quot;&gt;lib/yang/alarms.lua&lt;/a&gt;. This library provides an API that allows to define alarms and handle when to raise them or clear them. If we would like to monitor a special condition we just simply need to import the alarms module and create a check point. For instance:&lt;/p&gt;

&lt;figure class=&quot;highlight&quot;&gt;&lt;pre&gt;&lt;code class=&quot;language-lua&quot; data-lang=&quot;lua&quot;&gt;&lt;span&gt;&lt;/span&gt;&lt;span class=&quot;kr&quot;&gt;function&lt;/span&gt; &lt;span class=&quot;nc&quot;&gt;ARP&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;:&lt;/span&gt;&lt;span class=&quot;nf&quot;&gt;maybe_send_arp_request&lt;/span&gt; &lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;output&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;)&lt;/span&gt;
   &lt;span class=&quot;kr&quot;&gt;if&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;self&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;next_mac&lt;/span&gt; &lt;span class=&quot;kr&quot;&gt;then&lt;/span&gt; &lt;span class=&quot;kr&quot;&gt;return&lt;/span&gt; &lt;span class=&quot;kr&quot;&gt;end&lt;/span&gt;
   &lt;span class=&quot;n&quot;&gt;self&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;next_arp_request_time&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;self&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;next_arp_request_time&lt;/span&gt; &lt;span class=&quot;ow&quot;&gt;or&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;engine&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;now&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;()&lt;/span&gt;
   &lt;span class=&quot;kr&quot;&gt;if&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;self&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;next_arp_request_time&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;&amp;lt;=&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;engine&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;now&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;()&lt;/span&gt; &lt;span class=&quot;kr&quot;&gt;then&lt;/span&gt;
      &lt;span class=&quot;n&quot;&gt;self&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;:&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;arp_resolving&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;self&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;next_ip&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;)&lt;/span&gt;
      &lt;span class=&quot;p&quot;&gt;...&lt;/span&gt;
   &lt;span class=&quot;kr&quot;&gt;end&lt;/span&gt;
&lt;span class=&quot;kr&quot;&gt;end&lt;/span&gt;
&lt;span class=&quot;kr&quot;&gt;function&lt;/span&gt; &lt;span class=&quot;nc&quot;&gt;ARP&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;:&lt;/span&gt;&lt;span class=&quot;nf&quot;&gt;arp_resolving&lt;/span&gt; &lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;ip&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;)&lt;/span&gt;
   &lt;span class=&quot;nb&quot;&gt;print&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;((&lt;/span&gt;&lt;span class=&quot;s2&quot;&gt;&amp;quot;ARP: Resolving &amp;#39;%s&amp;#39;&amp;quot;&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;):&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;format&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;ipv4&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;:&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;ntop&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;self&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;next_ip&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;)))&lt;/span&gt;
   &lt;span class=&quot;kr&quot;&gt;if&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;self&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;alarm_notification&lt;/span&gt; &lt;span class=&quot;kr&quot;&gt;then&lt;/span&gt;
      &lt;span class=&quot;n&quot;&gt;arp_alarm&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;:&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;raise&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;()&lt;/span&gt;
   &lt;span class=&quot;kr&quot;&gt;end&lt;/span&gt;
&lt;span class=&quot;kr&quot;&gt;end&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/figure&gt;

&lt;p&gt;When the condition is not met (&lt;em&gt;self.next_arp&lt;/em&gt; wasn’t solved yet and &lt;em&gt;self.next_arp_request_time&lt;/em&gt; has expired), an alarm is raised. But what if this check point is executed repeatedly, for instance every second until an operator fixes the alarm condition? To avoid saturating the alarm list, the standard specifies an elapse time of 2 minutes before the same alarm is raised again. This elapse is managed by the alarms library.&lt;/p&gt;

&lt;p&gt;Besides a list of alarms, the module also defines these other containers:&lt;/p&gt;
&lt;ul&gt;
  &lt;li&gt;&lt;strong&gt;alarm-inventory&lt;/strong&gt;: It contains all possible alarm types for the system.&lt;/li&gt;
  &lt;li&gt;&lt;strong&gt;summary&lt;/strong&gt;: Summary of numbers of alarms and shelved alarms.&lt;/li&gt;
  &lt;li&gt;&lt;strong&gt;shelved-alarms&lt;/strong&gt;: A shelved alarm is ignored and won’t emit raise or clear events. Shelved alarms don’t emit notifications either. Shelving an alarm is a convenient way to silent an alarm.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;When an alarm is raised, cleared or changes its status, a notification is sent. The alarms module specifies three types of notifications:&lt;/p&gt;
&lt;ul&gt;
  &lt;li&gt;&lt;strong&gt;alarm-notification&lt;/strong&gt;: Used to report a state change for an alarm. This alarm is emitted when an alarm is raised, clear or its status change.&lt;/li&gt;
  &lt;li&gt;&lt;strong&gt;alarm-inventory-changed&lt;/strong&gt;: Used to report that the list of possible alarms has changed.&lt;/li&gt;
  &lt;li&gt;&lt;strong&gt;operator-action&lt;/strong&gt;: Used to report that an operator acted upon an alarm.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Continuing with the ARP alarm example, here’s how a notification looks like when such alarm raises:&lt;/p&gt;

&lt;pre&gt;&lt;code&gt;$ sudo ./snabb alarms listen lwaftr
{&quot;event&quot;:&quot;alarm-notification&quot;,
 &quot;resource&quot;:&quot;16446&quot;, &quot;alarm_type_id&quot;:&quot;arp-resolution&quot;, &quot;alarm_type_qualifier&quot;:&quot;&quot;,
 &quot;perceived_severity&quot;:&quot;critical&quot;, &quot;alarm_text&quot;:&quot;Make sure you can resolve...&quot;}
&lt;/code&gt;&lt;/pre&gt;

&lt;p&gt;Upon receiving a notification, an operator, or an external program, can act on the affected resource signaled by the alarm and fix the condition that triggered it. For instance, in the case of the lwAFTR being unable to resolve the next hop IPv4 address, such alarm indicates the host isn’t reachable (the host is down, or there’s no route to that address).&lt;/p&gt;

&lt;p&gt;Lastly, the module also specifies one YANG &lt;em&gt;action&lt;/em&gt; and two YANG &lt;em&gt;RPCs&lt;/em&gt;:&lt;/p&gt;
&lt;ul&gt;
  &lt;li&gt;&lt;strong&gt;set-operator-state&lt;/strong&gt;: Allows an operator to change the state of an alarm. The specification defines 4 possible operator states: &lt;em&gt;cleared-not-closed&lt;/em&gt;, &lt;em&gt;cleared-closed&lt;/em&gt;, &lt;em&gt;not-cleared-closed&lt;/em&gt;, &lt;em&gt;not-cleared-closed&lt;/em&gt;, &lt;em&gt;not-cleared-not-closed&lt;/em&gt;.&lt;/li&gt;
  &lt;li&gt;&lt;strong&gt;purge-alarms&lt;/strong&gt;: Deletes entries from the alarm list according to the supplied criteria. It can be used to delete alarms that are in closed state or an older than a specified time.&lt;/li&gt;
  &lt;li&gt;&lt;strong&gt;compress-alarms&lt;/strong&gt;: Compress entries in the alarm list by removing all but the latest state change for all alarms.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;NETCONF side&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;Adding alarms support to Snabb, and more precisely to the lwAFTR, has brought in many good things. First of all, Snabb’s YANG library has added support for more data types such as &lt;em&gt;empty&lt;/em&gt;, &lt;em&gt;identityref&lt;/em&gt; and &lt;em&gt;leafref&lt;/em&gt;. It has also improved parsing and validation of other data types such as &lt;em&gt;ipv4-prefix&lt;/em&gt;, &lt;em&gt;ipv6-prefix&lt;/em&gt; and &lt;em&gt;enum&lt;/em&gt;, in addition to other minor improvements and bug fixes. For the moment, the lwAFTR is the poster child for alarms, but the mechanism is generic enough and it can be used by other data-planes.&lt;/p&gt;

&lt;p&gt;A new program has been added to Snabb, not surprisingly being called &lt;code&gt;alarms&lt;/code&gt;. It consists of five sub-commands:&lt;/p&gt;
&lt;ul&gt;
  &lt;li&gt;&lt;strong&gt;listen&lt;/strong&gt;: Listens to a Snabb instance which provides alarms support. The subprogram can send RPC requests calls to the server program or listen to notifications.&lt;/li&gt;
  &lt;li&gt;&lt;strong&gt;get-state&lt;/strong&gt;: Sends an XPath request to a target Snabb instance that provides alarms state information.&lt;/li&gt;
  &lt;li&gt;&lt;strong&gt;set-operator-state&lt;/strong&gt;: User interface to &lt;em&gt;set-operator-state&lt;/em&gt; action.&lt;/li&gt;
  &lt;li&gt;&lt;strong&gt;purge-alarms&lt;/strong&gt;: User interface to &lt;em&gt;purge-alarms&lt;/em&gt;.&lt;/li&gt;
  &lt;li&gt;&lt;strong&gt;compress-alarms&lt;/strong&gt;: User interface to &lt;em&gt;compress-alarms&lt;/em&gt;.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Below there’s an excerpt of &lt;em&gt;get-state&lt;/em&gt; subprogram and its output:&lt;/p&gt;

&lt;figure class=&quot;highlight&quot;&gt;&lt;pre&gt;&lt;code class=&quot;language-lua&quot; data-lang=&quot;lua&quot;&gt;&lt;span&gt;&lt;/span&gt;&lt;span class=&quot;err&quot;&gt;$&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;sudo&lt;/span&gt; &lt;span class=&quot;p&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;/&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;snabb&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;alarms&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;get&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;-&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;state&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;lwaftr&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;/&lt;/span&gt;
&lt;span class=&quot;n&quot;&gt;alarm&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;-&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;list&lt;/span&gt; &lt;span class=&quot;p&quot;&gt;{&lt;/span&gt;
   &lt;span class=&quot;n&quot;&gt;alarm&lt;/span&gt; &lt;span class=&quot;p&quot;&gt;{&lt;/span&gt;
      &lt;span class=&quot;n&quot;&gt;alarm&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;-&lt;/span&gt;&lt;span class=&quot;nb&quot;&gt;type&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;-&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;id&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;arp&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;-&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;resolution&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;;&lt;/span&gt;
      &lt;span class=&quot;n&quot;&gt;alarm&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;-&lt;/span&gt;&lt;span class=&quot;nb&quot;&gt;type&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;-&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;qualifier&lt;/span&gt; &lt;span class=&quot;s1&quot;&gt;&amp;#39;&amp;#39;&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;;&lt;/span&gt;
      &lt;span class=&quot;n&quot;&gt;resource&lt;/span&gt; &lt;span class=&quot;mi&quot;&gt;21385&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;;&lt;/span&gt;
      &lt;span class=&quot;n&quot;&gt;alarm&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;-&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;text&lt;/span&gt;
         &lt;span class=&quot;s2&quot;&gt;&amp;quot;Make sure you can resolve external-interface.next-hop.ip address manually.&amp;quot;&lt;/span&gt;
         &lt;span class=&quot;s2&quot;&gt;&amp;quot;If it cannot be resolved, consider setting the MAC address of the next-hop directly.&amp;quot;&lt;/span&gt;
         &lt;span class=&quot;s2&quot;&gt;&amp;quot;To do it so, set external-interface.next-hop.mac to the value of the MAC address.&amp;quot;&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;;&lt;/span&gt;
      &lt;span class=&quot;n&quot;&gt;is&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;-&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;cleared&lt;/span&gt; &lt;span class=&quot;kc&quot;&gt;false&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;;&lt;/span&gt;
      &lt;span class=&quot;n&quot;&gt;last&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;-&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;changed&lt;/span&gt; &lt;span class=&quot;mi&quot;&gt;2018&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;-&lt;/span&gt;&lt;span class=&quot;mi&quot;&gt;06&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;-&lt;/span&gt;&lt;span class=&quot;mi&quot;&gt;18&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;T14&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;:&lt;/span&gt;&lt;span class=&quot;mi&quot;&gt;57&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;:&lt;/span&gt;&lt;span class=&quot;mi&quot;&gt;40&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;Z&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;;&lt;/span&gt;
      &lt;span class=&quot;n&quot;&gt;perceived&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;-&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;severity&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;critical&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;;&lt;/span&gt;
      &lt;span class=&quot;n&quot;&gt;status&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;-&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;change&lt;/span&gt; &lt;span class=&quot;p&quot;&gt;{&lt;/span&gt;
         &lt;span class=&quot;n&quot;&gt;time&lt;/span&gt; &lt;span class=&quot;mi&quot;&gt;2018&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;-&lt;/span&gt;&lt;span class=&quot;mi&quot;&gt;06&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;-&lt;/span&gt;&lt;span class=&quot;mi&quot;&gt;18&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;T14&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;:&lt;/span&gt;&lt;span class=&quot;mi&quot;&gt;57&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;:&lt;/span&gt;&lt;span class=&quot;mi&quot;&gt;40&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;Z&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;;&lt;/span&gt;
         &lt;span class=&quot;n&quot;&gt;alarm&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;-&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;text&lt;/span&gt; 
            &lt;span class=&quot;s2&quot;&gt;&amp;quot;Make sure you can resolve external-interface.next-hop.ip address manually.&amp;quot;&lt;/span&gt;
            &lt;span class=&quot;s2&quot;&gt;&amp;quot;If it cannot be resolved, consider setting the MAC address of the next-hop directly.&amp;quot;&lt;/span&gt;
            &lt;span class=&quot;s2&quot;&gt;&amp;quot;To do it so, set external-interface.next-hop.mac to the value of the MAC address.&amp;quot;&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;;&lt;/span&gt;
         &lt;span class=&quot;n&quot;&gt;perceived&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;-&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;severity&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;critical&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;;&lt;/span&gt;
      &lt;span class=&quot;p&quot;&gt;}&lt;/span&gt;
      &lt;span class=&quot;n&quot;&gt;time&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;-&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;created&lt;/span&gt; &lt;span class=&quot;mi&quot;&gt;2018&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;-&lt;/span&gt;&lt;span class=&quot;mi&quot;&gt;06&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;-&lt;/span&gt;&lt;span class=&quot;mi&quot;&gt;18&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;T14&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;:&lt;/span&gt;&lt;span class=&quot;mi&quot;&gt;57&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;:&lt;/span&gt;&lt;span class=&quot;mi&quot;&gt;40&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;Z&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;;&lt;/span&gt;
   &lt;span class=&quot;p&quot;&gt;}&lt;/span&gt;
   &lt;span class=&quot;n&quot;&gt;last&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;-&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;changed&lt;/span&gt; &lt;span class=&quot;mi&quot;&gt;2018&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;-&lt;/span&gt;&lt;span class=&quot;mi&quot;&gt;06&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;-&lt;/span&gt;&lt;span class=&quot;mi&quot;&gt;18&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;T14&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;:&lt;/span&gt;&lt;span class=&quot;mi&quot;&gt;57&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;:&lt;/span&gt;&lt;span class=&quot;mi&quot;&gt;40&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;Z&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;;&lt;/span&gt;
   &lt;span class=&quot;n&quot;&gt;number&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;-&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;of&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;-&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;alarms&lt;/span&gt; &lt;span class=&quot;mi&quot;&gt;1&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;;&lt;/span&gt;
&lt;span class=&quot;p&quot;&gt;}&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/figure&gt;

&lt;p&gt;The alarms module keeps all its state into one Snabb instance, the leader process. As a reminder, since v3.0 the lwAFTR runs in a multiprocess architecture which consists of:&lt;/p&gt;

&lt;ul&gt;
  &lt;li&gt;1 Leader, which manages changes in lwAFTR configuration file. For instance, changes in softwires (add, remove, update).&lt;/li&gt;
  &lt;li&gt;1 or N Workers, which runs a lwAFTR data-plane.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Both processes communicate via an IPC (&lt;em&gt;Inter-process communication&lt;/em&gt;) mechanism, in this case a message channel implemented using sockets. When a worker raises an alarm, a message is sent to the leader via a worker. The leader polls the alarms-channel periodically, consuming all the stored messages. The result of processing a message is an action that alters the alarms state, for instance, adding a new alarm to the inventory, raising an alarm, clearing it, etc. All this logic is coded in &lt;a href=&quot;https://github.com/Igalia/snabb/blob/lwaftr/src/lib/ptree/ptree.lua&quot;&gt;lib/ptree/ptree.lua&lt;/a&gt; and &lt;a href=&quot;https://github.com/Igalia/snabb/blob/lwaftr/src/lib/ptree/alarm_coded.lua&quot;&gt;lib/ptree/alarm_coded.lua&lt;/a&gt;.&lt;/p&gt;

&lt;p&gt;Besides alarms, there are also notifications. A notification is a sort of simple message that is emitted under certain circumstances: when an alarm is raised, when its status change or when a new alarm-type is added to the inventory. Notifications are a native YANG element, not particular only to alarms.&lt;/p&gt;

&lt;p&gt;In Snabb, the notifications mechanism is also implemented via sockets. In this case, a socket connects a lwAFTR leader to a series of peers that listen on the socket. When a notification is triggered, a new notification is added to the leader’s list of notifications. The leader process runs a fiber that constantly polls this list. If it finds new entries, the notifications got serialized to a JSON object and are sent through the socket. Once a notification is sent, it’s removed from the alarms state. This logic is implemented &lt;a href=&quot;https://github.com/Igalia/snabb/blob/lwaftr/src/lib/ptree/ptree.lua&quot;&gt;lib/ptree/ptree.lua&lt;/a&gt; and &lt;a href=&quot;https://github.com/Igalia/snabb/blob/lwaftr/src/lib/yang/alarms.lua&quot;&gt;lib/yang/alarms.lua&lt;/a&gt;.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Summary and conclusions&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;YANG Alarms is a simple mechanism to notify erroneous conditions. The main strengths of this module are:&lt;/p&gt;

&lt;ul&gt;
  &lt;li&gt;It’s encoded as a YANG module, with all the advantages which that represents (common vocabulary and semantics, reusable).&lt;/li&gt;
  &lt;li&gt;Signaling errors by simply printing out messages in &lt;em&gt;stdout&lt;/em&gt; is not reliable, as they can be easily missed. Alarms are in-memory stored, they keep state which can be later consulted on demand.&lt;/li&gt;
  &lt;li&gt;Active notifications for the most important state changes. This allows to hook external programs, which do not need to constantly poll the artifact current state to check whether a change happened.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;On the down side, I personally think that the amount of information tracked per alarm is excessive, making the YANG specification more complex than one may thought at first. Fortunately, programs interested in supporting this module do not need to implement all the features specified, being satisfied with just a subset of all the module’s features. At the moment of writing this, the YANG alarms proposal is still a draft but hopefully it will become an standard after several revisions.&lt;/p&gt;
</description>
        <pubDate>Thu, 13 Sep 2018 06:00:00 +0000</pubDate>
        <link>http://blogs.igalia.com/dpino/2018/09/13/yang-alarms/</link>
        <guid isPermaLink="true">http://blogs.igalia.com/dpino/2018/09/13/yang-alarms/</guid>
        
        <category>igalia</category>
        
        <category>networking</category>
        
        
      </item>
    
      <item>
        <title>Fast checksum computation</title>
        <description>
&lt;p&gt;An Internet packet generally includes two checksums: a TCP/UDP checksum and an IP checksum. In both cases, the checksum value is calculated using the same algorithm. For instance, IP header checksum is computed as follows:&lt;/p&gt;

&lt;ul&gt;
  &lt;li&gt;Set the packet’s IP header checksum to zero.&lt;/li&gt;
  &lt;li&gt;Fetch the IP header octets in groups of 16-bit and calculate the accumulated sum.&lt;/li&gt;
  &lt;li&gt;In case there’s an overflow while adding, sum the carry-bit to the total sum.&lt;/li&gt;
  &lt;li&gt;Once the sum is done, set the checksum to the one’s complement of the accumulated sum.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Here is an implementation of such algorithm in Lua:&lt;/p&gt;

&lt;figure class=&quot;highlight&quot;&gt;&lt;pre&gt;&lt;code class=&quot;language-lua&quot; data-lang=&quot;lua&quot;&gt;&lt;span&gt;&lt;/span&gt;&lt;span class=&quot;kd&quot;&gt;local&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;ffi&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;nb&quot;&gt;require&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;s2&quot;&gt;&amp;quot;ffi&amp;quot;&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;)&lt;/span&gt;
&lt;span class=&quot;kd&quot;&gt;local&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;bit&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;nb&quot;&gt;require&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;s2&quot;&gt;&amp;quot;bit&amp;quot;&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;)&lt;/span&gt;

&lt;span class=&quot;kd&quot;&gt;local&lt;/span&gt; &lt;span class=&quot;kr&quot;&gt;function&lt;/span&gt; &lt;span class=&quot;nf&quot;&gt;checksum_lua&lt;/span&gt; &lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;data&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;size&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;)&lt;/span&gt;
   &lt;span class=&quot;kd&quot;&gt;local&lt;/span&gt; &lt;span class=&quot;kr&quot;&gt;function&lt;/span&gt; &lt;span class=&quot;nf&quot;&gt;r16&lt;/span&gt; &lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;data&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;)&lt;/span&gt;
      &lt;span class=&quot;kr&quot;&gt;return&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;ffi&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;cast&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;s2&quot;&gt;&amp;quot;uint16_t*&amp;quot;&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;data&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;)[&lt;/span&gt;&lt;span class=&quot;mi&quot;&gt;0&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;]&lt;/span&gt;
   &lt;span class=&quot;kr&quot;&gt;end&lt;/span&gt;
   &lt;span class=&quot;kd&quot;&gt;local&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;csum&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;mi&quot;&gt;0&lt;/span&gt;
   &lt;span class=&quot;kd&quot;&gt;local&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;i&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;size&lt;/span&gt;
   &lt;span class=&quot;c1&quot;&gt;-- Accumulated sum.&lt;/span&gt;
   &lt;span class=&quot;kr&quot;&gt;while&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;i&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;&amp;gt;&lt;/span&gt; &lt;span class=&quot;mi&quot;&gt;1&lt;/span&gt; &lt;span class=&quot;kr&quot;&gt;do&lt;/span&gt;
      &lt;span class=&quot;kd&quot;&gt;local&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;word&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;r16&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;data&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;+&lt;/span&gt; &lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;size&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;-&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;i&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;))&lt;/span&gt;
      &lt;span class=&quot;n&quot;&gt;csum&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;csum&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;+&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;word&lt;/span&gt;
      &lt;span class=&quot;n&quot;&gt;i&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;i&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;-&lt;/span&gt; &lt;span class=&quot;mi&quot;&gt;2&lt;/span&gt;
   &lt;span class=&quot;kr&quot;&gt;end&lt;/span&gt;
   &lt;span class=&quot;c1&quot;&gt;-- Handle odd sizes.&lt;/span&gt;
   &lt;span class=&quot;kr&quot;&gt;if&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;i&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;==&lt;/span&gt; &lt;span class=&quot;mi&quot;&gt;1&lt;/span&gt; &lt;span class=&quot;kr&quot;&gt;then&lt;/span&gt;
      &lt;span class=&quot;n&quot;&gt;csum&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;csum&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;+&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;data&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;[&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;size&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;-&lt;/span&gt;&lt;span class=&quot;mi&quot;&gt;1&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;]&lt;/span&gt;
   &lt;span class=&quot;kr&quot;&gt;end&lt;/span&gt;
   &lt;span class=&quot;c1&quot;&gt;-- Add accumulated carry.&lt;/span&gt;
   &lt;span class=&quot;kr&quot;&gt;while&lt;/span&gt; &lt;span class=&quot;kc&quot;&gt;true&lt;/span&gt; &lt;span class=&quot;kr&quot;&gt;do&lt;/span&gt;
      &lt;span class=&quot;kd&quot;&gt;local&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;carry&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;bit&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;rshift&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;csum&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;mi&quot;&gt;16&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;)&lt;/span&gt;
      &lt;span class=&quot;kr&quot;&gt;if&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;carry&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;==&lt;/span&gt; &lt;span class=&quot;mi&quot;&gt;0&lt;/span&gt; &lt;span class=&quot;kr&quot;&gt;then&lt;/span&gt; &lt;span class=&quot;kr&quot;&gt;break&lt;/span&gt; &lt;span class=&quot;kr&quot;&gt;end&lt;/span&gt;
      &lt;span class=&quot;n&quot;&gt;csum&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;bit&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;band&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;csum&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;mh&quot;&gt;0xffff&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;)&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;+&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;carry&lt;/span&gt;
   &lt;span class=&quot;kr&quot;&gt;end&lt;/span&gt;
   &lt;span class=&quot;c1&quot;&gt;-- One&amp;#39;s complement.&lt;/span&gt;
   &lt;span class=&quot;kr&quot;&gt;return&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;bit&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;band&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;bit&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;bnot&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;csum&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;),&lt;/span&gt; &lt;span class=&quot;mh&quot;&gt;0xffff&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;)&lt;/span&gt;
&lt;span class=&quot;kr&quot;&gt;end&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/figure&gt;

&lt;p&gt;The IP header checksum is calculated only over the IP header octets. However, the TCP header is calculated over the TCP header, the packet’s payload plus an extra header called the pseudo-header.&lt;/p&gt;

&lt;p&gt;A pseudo-header is a 12-byte data structured composed of:&lt;/p&gt;

&lt;ul&gt;
  &lt;li&gt;IP source and destination addresses (8 bytes).&lt;/li&gt;
  &lt;li&gt;Protocol (TCP=0x6 or UDP=0x11) (2 bytes).&lt;/li&gt;
  &lt;li&gt;TCP length, calculated from IP header’s total length minus TCP or UDP header size (2 bytes).&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;You may wonder what’s the purpose of the pseudo-header? David P Reed, often considered the father of UDP, provides a great explanation in this thread: &lt;a href=&quot;http://www.postel.org/pipermail/end2end-interest/2005-February/004616.html&quot;&gt;Purpose of pseudo header in TCP checksum&lt;/a&gt;. Basically, the original goal of the pseudo-header was to take into account IP addresses as part of the TCP checksum, since they’re relevant fields in an end-to-end communication. Back in those days, the original plan for TCP safe communications was to leave source and destination addresses clear but encrypt the rest of the TCP fields. That would avoid man-in-the middle attacks. However NAT, which is essentially a man-in-the-middle, happened thrashing away this original plan. In summary, the pseudo-header exists today for legacy reasons.&lt;/p&gt;

&lt;p&gt;Lastly, is worth mentioning that UDP checksum is optional in IPv4, so you might see it set as zero many times. However, the field is mandatory in IPv6.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Verifying a packet’s checksum&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;Verifying a packet’s checksum is easy. On receiving a packet, the receiver sums all the relevant octets, including the checksum field. The result must be zero if the packet is correct, since the sum of a number and its one’s complement is always zero.&lt;/p&gt;

&lt;p&gt;From a developer’s perspective there are several tools for verifying the correctness of a packet’s checksum. Perhaps my preferred tool is Wireshark, which features an option to check the validity of TCP checksums (&lt;em&gt;Edit-&amp;gt;Preferences-&amp;gt;Protocols[TCP]&lt;/em&gt;. Mark &lt;em&gt;Validate the TCP checksum if possible&lt;/em&gt;). When this option is enabled, packets with a wrong checksum are highlighted in a black background.&lt;/p&gt;

&lt;figure&gt;&lt;img src=&quot;/dpino/files/2018/06/wrong-checksum-wireshark.png&quot; title=&quot;Bad checksums&quot; alt=&quot;Bad checksums&quot; /&gt;&lt;figcaption style=&quot;text-align: center&quot;&gt;Bad checksums&lt;/figcaption&gt;&lt;/figure&gt;

&lt;p&gt;Seeing packets with wrong checksums is common when capturing packets with &lt;em&gt;tcpdump&lt;/em&gt; and open them in Wireshark. The reason why checksums are not correct is that TCP checksumming is generally offloaded to the NIC, since it’s a relatively expensive operation (nowadays, NICs count with specialized hardware to do that operation fast). Since &lt;em&gt;tcpdump&lt;/em&gt; captures outgoing packets before they hit the NIC, the checksum value hasn’t been calculated yet and likely contains garbage. It’s possible to check whether checksum offloading is enabled in a NIC by typing:&lt;/p&gt;

&lt;figure class=&quot;highlight&quot;&gt;&lt;pre&gt;&lt;code class=&quot;language-bash&quot; data-lang=&quot;bash&quot;&gt;&lt;span&gt;&lt;/span&gt;$ ethtool --show-offload &amp;lt;nic&amp;gt; &lt;span class=&quot;p&quot;&gt;|&lt;/span&gt; grep checksumming
rx-checksumming: on
tx-checksumming: on&lt;/code&gt;&lt;/pre&gt;&lt;/figure&gt;

&lt;p&gt;Another option for verifying checksum values is using &lt;code&gt;tshark&lt;/code&gt;:&lt;/p&gt;

&lt;figure class=&quot;highlight&quot;&gt;&lt;pre&gt;&lt;code class=&quot;language-bash&quot; data-lang=&quot;bash&quot;&gt;&lt;span&gt;&lt;/span&gt;$ tshark -r packets.pcap -V -o tcp.check_checksum:TRUE &lt;span class=&quot;p&quot;&gt;|&lt;/span&gt; grep -c &lt;span class=&quot;s2&quot;&gt;&amp;quot;Error/Checksum&amp;quot;&lt;/span&gt;
&lt;span class=&quot;m&quot;&gt;20&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/figure&gt;

&lt;p&gt;Lastly, in case you’d like to fix wrong checksums in a pcap file is possible to do that with &lt;code&gt;tcprewrite&lt;/code&gt;:&lt;/p&gt;

&lt;figure class=&quot;highlight&quot;&gt;&lt;pre&gt;&lt;code class=&quot;language-bash&quot; data-lang=&quot;bash&quot;&gt;&lt;span&gt;&lt;/span&gt;$ tcprewrite -C -i packets.pcap -o packets-fixed.pcap
$ tshark -r packets-fixed.pcap -V -o tcp.check_checksum:TRUE &lt;span class=&quot;p&quot;&gt;|&lt;/span&gt; grep -c &lt;span class=&quot;s2&quot;&gt;&amp;quot;Error/Checksum&amp;quot;&lt;/span&gt;
&lt;span class=&quot;m&quot;&gt;0&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/figure&gt;

&lt;p&gt;&lt;strong&gt;Fast checksum computation&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;Since TCP checksum computation involves a large chunk of data improving its performance is important. There are, in fact, several RFCs dedicated exclusively to discuss this topic. &lt;a href=&quot;https://tools.ietf.org/html/rfc1071&quot;&gt;RFC 1071&lt;/a&gt; (&lt;em&gt;Computing the Internet Checksum&lt;/em&gt;) includes a detailed explanation of the algorithm and also explores different techniques for speeding up checksumming. In addition, it features reference implementations in several hardware architectures such as Motorola 68020, Cray and IBM 370.&lt;/p&gt;

&lt;p&gt;Perhaps the fastest way to recompute a checksum of a modified packet is to incrementally update the checksum as the packet gets modified. Take for instance the case of NAT which modifies origin and destination ports and addresses. Those operations affect both the TCP and IP checksums. In the case of the IP checksum, if the source address gets modified we can recompute the new IP checksum as:&lt;/p&gt;

&lt;figure class=&quot;highlight&quot;&gt;&lt;pre&gt;&lt;code class=&quot;language-c&quot; data-lang=&quot;c&quot;&gt;&lt;span&gt;&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;new_checksum&lt;/span&gt;&lt;span class=&quot;w&quot;&gt; &lt;/span&gt;&lt;span class=&quot;o&quot;&gt;=&lt;/span&gt;&lt;span class=&quot;w&quot;&gt; &lt;/span&gt;&lt;span class=&quot;o&quot;&gt;~&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;~&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;checksum&lt;/span&gt;&lt;span class=&quot;w&quot;&gt; &lt;/span&gt;&lt;span class=&quot;o&quot;&gt;+&lt;/span&gt;&lt;span class=&quot;w&quot;&gt; &lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;-&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;pkt&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;source_ip&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;)&lt;/span&gt;&lt;span class=&quot;w&quot;&gt; &lt;/span&gt;&lt;span class=&quot;o&quot;&gt;+&lt;/span&gt;&lt;span class=&quot;w&quot;&gt; &lt;/span&gt;&lt;span class=&quot;n&quot;&gt;new_source_ip&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;);&lt;/span&gt;&lt;span class=&quot;w&quot;&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/figure&gt;

&lt;p&gt;Or more generically using the following formula:&lt;/p&gt;

&lt;figure class=&quot;highlight&quot;&gt;&lt;pre&gt;&lt;code class=&quot;language-c&quot; data-lang=&quot;c&quot;&gt;&lt;span&gt;&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;HC&lt;/span&gt;&lt;span class=&quot;w&quot;&gt; &lt;/span&gt;&lt;span class=&quot;o&quot;&gt;=&lt;/span&gt;&lt;span class=&quot;w&quot;&gt; &lt;/span&gt;&lt;span class=&quot;n&quot;&gt;one&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;-&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;complement&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;one&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;-&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;complement&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;C&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;)&lt;/span&gt;&lt;span class=&quot;w&quot;&gt; &lt;/span&gt;&lt;span class=&quot;o&quot;&gt;+&lt;/span&gt;&lt;span class=&quot;w&quot;&gt; &lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;-&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;m&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;)&lt;/span&gt;&lt;span class=&quot;w&quot;&gt; &lt;/span&gt;&lt;span class=&quot;o&quot;&gt;+&lt;/span&gt;&lt;span class=&quot;w&quot;&gt; &lt;/span&gt;&lt;span class=&quot;n&quot;&gt;m&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;)&lt;/span&gt;&lt;span class=&quot;w&quot;&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/figure&gt;

&lt;p&gt;This technique is covered in RFC 1071 and further polished over two other RFCs: &lt;a href=&quot;https://tools.ietf.org/html/rfc1141&quot;&gt;RFC 1141&lt;/a&gt; and &lt;a href=&quot;https://tools.ietf.org/html/rfc1624&quot;&gt;RFC 1624&lt;/a&gt; (&lt;em&gt;Incremental Updating of the Internet Checksum&lt;/em&gt;).&lt;/p&gt;

&lt;p&gt;If we decide to recompute the checksum, there are several techniques to do it fast. On its canonical form, the algorithm says octets are summed as 16-bit words. If there’s carry after an addition, the carry should be added to the accumulated sum. Truth is it’s not necessary to add octets as 16-bit words. Due to the associative property of addition, it is possible to do parallel addition using larger word sizes such as 32-bit or 64-bit words. In those cases the variable that stores the accumulative sum has to be bigger too. Once the sum is computed a final step folds the sum to a 16-bit word (adding carry if any).&lt;/p&gt;

&lt;p&gt;Here’s an implementation in C using 32-bit words:&lt;/p&gt;

&lt;figure class=&quot;highlight&quot;&gt;&lt;pre&gt;&lt;code class=&quot;language-c&quot; data-lang=&quot;c&quot;&gt;&lt;span&gt;&lt;/span&gt;&lt;span class=&quot;kt&quot;&gt;uint16_t&lt;/span&gt;&lt;span class=&quot;w&quot;&gt; &lt;/span&gt;&lt;span class=&quot;nf&quot;&gt;checksum&lt;/span&gt;&lt;span class=&quot;w&quot;&gt; &lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;kt&quot;&gt;uint8_t&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;*&lt;/span&gt;&lt;span class=&quot;w&quot;&gt; &lt;/span&gt;&lt;span class=&quot;n&quot;&gt;data&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt;&lt;span class=&quot;w&quot;&gt; &lt;/span&gt;&lt;span class=&quot;kt&quot;&gt;uint16_t&lt;/span&gt;&lt;span class=&quot;w&quot;&gt; &lt;/span&gt;&lt;span class=&quot;n&quot;&gt;len&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;)&lt;/span&gt;&lt;span class=&quot;w&quot;&gt;&lt;/span&gt;
&lt;span class=&quot;p&quot;&gt;{&lt;/span&gt;&lt;span class=&quot;w&quot;&gt;&lt;/span&gt;
&lt;span class=&quot;w&quot;&gt;    &lt;/span&gt;&lt;span class=&quot;kt&quot;&gt;uint64_t&lt;/span&gt;&lt;span class=&quot;w&quot;&gt; &lt;/span&gt;&lt;span class=&quot;n&quot;&gt;sum&lt;/span&gt;&lt;span class=&quot;w&quot;&gt; &lt;/span&gt;&lt;span class=&quot;o&quot;&gt;=&lt;/span&gt;&lt;span class=&quot;w&quot;&gt; &lt;/span&gt;&lt;span class=&quot;mi&quot;&gt;0&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;;&lt;/span&gt;&lt;span class=&quot;w&quot;&gt;&lt;/span&gt;
&lt;span class=&quot;w&quot;&gt;    &lt;/span&gt;&lt;span class=&quot;kt&quot;&gt;uint32_t&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;*&lt;/span&gt;&lt;span class=&quot;w&quot;&gt; &lt;/span&gt;&lt;span class=&quot;n&quot;&gt;p&lt;/span&gt;&lt;span class=&quot;w&quot;&gt; &lt;/span&gt;&lt;span class=&quot;o&quot;&gt;=&lt;/span&gt;&lt;span class=&quot;w&quot;&gt; &lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;kt&quot;&gt;uint32_t&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;*&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;)&lt;/span&gt;&lt;span class=&quot;w&quot;&gt; &lt;/span&gt;&lt;span class=&quot;n&quot;&gt;data&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;;&lt;/span&gt;&lt;span class=&quot;w&quot;&gt;&lt;/span&gt;
&lt;span class=&quot;w&quot;&gt;    &lt;/span&gt;&lt;span class=&quot;kt&quot;&gt;uint16_t&lt;/span&gt;&lt;span class=&quot;w&quot;&gt; &lt;/span&gt;&lt;span class=&quot;n&quot;&gt;i&lt;/span&gt;&lt;span class=&quot;w&quot;&gt; &lt;/span&gt;&lt;span class=&quot;o&quot;&gt;=&lt;/span&gt;&lt;span class=&quot;w&quot;&gt; &lt;/span&gt;&lt;span class=&quot;mi&quot;&gt;0&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;;&lt;/span&gt;&lt;span class=&quot;w&quot;&gt;&lt;/span&gt;
&lt;span class=&quot;w&quot;&gt;    &lt;/span&gt;&lt;span class=&quot;k&quot;&gt;while&lt;/span&gt;&lt;span class=&quot;w&quot;&gt; &lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;len&lt;/span&gt;&lt;span class=&quot;w&quot;&gt; &lt;/span&gt;&lt;span class=&quot;o&quot;&gt;&amp;gt;=&lt;/span&gt;&lt;span class=&quot;w&quot;&gt; &lt;/span&gt;&lt;span class=&quot;mi&quot;&gt;4&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;)&lt;/span&gt;&lt;span class=&quot;w&quot;&gt; &lt;/span&gt;&lt;span class=&quot;p&quot;&gt;{&lt;/span&gt;&lt;span class=&quot;w&quot;&gt;&lt;/span&gt;
&lt;span class=&quot;w&quot;&gt;        &lt;/span&gt;&lt;span class=&quot;n&quot;&gt;sum&lt;/span&gt;&lt;span class=&quot;w&quot;&gt; &lt;/span&gt;&lt;span class=&quot;o&quot;&gt;=&lt;/span&gt;&lt;span class=&quot;w&quot;&gt; &lt;/span&gt;&lt;span class=&quot;n&quot;&gt;sum&lt;/span&gt;&lt;span class=&quot;w&quot;&gt; &lt;/span&gt;&lt;span class=&quot;o&quot;&gt;+&lt;/span&gt;&lt;span class=&quot;w&quot;&gt; &lt;/span&gt;&lt;span class=&quot;n&quot;&gt;p&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;[&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;i&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;++&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;];&lt;/span&gt;&lt;span class=&quot;w&quot;&gt;&lt;/span&gt;
&lt;span class=&quot;w&quot;&gt;        &lt;/span&gt;&lt;span class=&quot;n&quot;&gt;len&lt;/span&gt;&lt;span class=&quot;w&quot;&gt; &lt;/span&gt;&lt;span class=&quot;o&quot;&gt;-=&lt;/span&gt;&lt;span class=&quot;w&quot;&gt; &lt;/span&gt;&lt;span class=&quot;mi&quot;&gt;4&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;;&lt;/span&gt;&lt;span class=&quot;w&quot;&gt;&lt;/span&gt;
&lt;span class=&quot;w&quot;&gt;    &lt;/span&gt;&lt;span class=&quot;p&quot;&gt;}&lt;/span&gt;&lt;span class=&quot;w&quot;&gt;&lt;/span&gt;
&lt;span class=&quot;w&quot;&gt;    &lt;/span&gt;&lt;span class=&quot;k&quot;&gt;if&lt;/span&gt;&lt;span class=&quot;w&quot;&gt; &lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;len&lt;/span&gt;&lt;span class=&quot;w&quot;&gt; &lt;/span&gt;&lt;span class=&quot;o&quot;&gt;&amp;gt;=&lt;/span&gt;&lt;span class=&quot;w&quot;&gt; &lt;/span&gt;&lt;span class=&quot;mi&quot;&gt;2&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;)&lt;/span&gt;&lt;span class=&quot;w&quot;&gt; &lt;/span&gt;&lt;span class=&quot;p&quot;&gt;{&lt;/span&gt;&lt;span class=&quot;w&quot;&gt; &lt;/span&gt;
&lt;span class=&quot;w&quot;&gt;        &lt;/span&gt;&lt;span class=&quot;n&quot;&gt;sum&lt;/span&gt;&lt;span class=&quot;w&quot;&gt; &lt;/span&gt;&lt;span class=&quot;o&quot;&gt;=&lt;/span&gt;&lt;span class=&quot;w&quot;&gt; &lt;/span&gt;&lt;span class=&quot;n&quot;&gt;sum&lt;/span&gt;&lt;span class=&quot;w&quot;&gt; &lt;/span&gt;&lt;span class=&quot;o&quot;&gt;+&lt;/span&gt;&lt;span class=&quot;w&quot;&gt; &lt;/span&gt;&lt;span class=&quot;p&quot;&gt;((&lt;/span&gt;&lt;span class=&quot;kt&quot;&gt;uint16_t&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;*&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;)&lt;/span&gt;&lt;span class=&quot;w&quot;&gt; &lt;/span&gt;&lt;span class=&quot;n&quot;&gt;data&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;)[&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;i&lt;/span&gt;&lt;span class=&quot;w&quot;&gt; &lt;/span&gt;&lt;span class=&quot;o&quot;&gt;*&lt;/span&gt;&lt;span class=&quot;w&quot;&gt; &lt;/span&gt;&lt;span class=&quot;mi&quot;&gt;4&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;];&lt;/span&gt;&lt;span class=&quot;w&quot;&gt;&lt;/span&gt;
&lt;span class=&quot;w&quot;&gt;        &lt;/span&gt;&lt;span class=&quot;n&quot;&gt;len&lt;/span&gt;&lt;span class=&quot;w&quot;&gt; &lt;/span&gt;&lt;span class=&quot;o&quot;&gt;-=&lt;/span&gt;&lt;span class=&quot;w&quot;&gt; &lt;/span&gt;&lt;span class=&quot;mi&quot;&gt;2&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;;&lt;/span&gt;&lt;span class=&quot;w&quot;&gt;&lt;/span&gt;
&lt;span class=&quot;w&quot;&gt;    &lt;/span&gt;&lt;span class=&quot;p&quot;&gt;}&lt;/span&gt;&lt;span class=&quot;w&quot;&gt;&lt;/span&gt;
&lt;span class=&quot;w&quot;&gt;    &lt;/span&gt;&lt;span class=&quot;k&quot;&gt;if&lt;/span&gt;&lt;span class=&quot;w&quot;&gt; &lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;len&lt;/span&gt;&lt;span class=&quot;w&quot;&gt; &lt;/span&gt;&lt;span class=&quot;o&quot;&gt;==&lt;/span&gt;&lt;span class=&quot;w&quot;&gt; &lt;/span&gt;&lt;span class=&quot;mi&quot;&gt;1&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;)&lt;/span&gt;&lt;span class=&quot;w&quot;&gt; &lt;/span&gt;&lt;span class=&quot;p&quot;&gt;{&lt;/span&gt;&lt;span class=&quot;w&quot;&gt;&lt;/span&gt;
&lt;span class=&quot;w&quot;&gt;        &lt;/span&gt;&lt;span class=&quot;n&quot;&gt;sum&lt;/span&gt;&lt;span class=&quot;w&quot;&gt; &lt;/span&gt;&lt;span class=&quot;o&quot;&gt;+=&lt;/span&gt;&lt;span class=&quot;w&quot;&gt; &lt;/span&gt;&lt;span class=&quot;n&quot;&gt;data&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;[&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;len&lt;/span&gt;&lt;span class=&quot;mi&quot;&gt;-1&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;];&lt;/span&gt;&lt;span class=&quot;w&quot;&gt;&lt;/span&gt;
&lt;span class=&quot;w&quot;&gt;    &lt;/span&gt;&lt;span class=&quot;p&quot;&gt;}&lt;/span&gt;&lt;span class=&quot;w&quot;&gt;&lt;/span&gt;
&lt;span class=&quot;w&quot;&gt;    &lt;/span&gt;
&lt;span class=&quot;w&quot;&gt;    &lt;/span&gt;&lt;span class=&quot;c1&quot;&gt;// Fold sum into 16-bit word.&lt;/span&gt;
&lt;span class=&quot;w&quot;&gt;    &lt;/span&gt;&lt;span class=&quot;k&quot;&gt;while&lt;/span&gt;&lt;span class=&quot;w&quot;&gt; &lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;sum&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;&amp;gt;&amp;gt;&lt;/span&gt;&lt;span class=&quot;mi&quot;&gt;16&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;)&lt;/span&gt;&lt;span class=&quot;w&quot;&gt; &lt;/span&gt;&lt;span class=&quot;p&quot;&gt;{&lt;/span&gt;&lt;span class=&quot;w&quot;&gt;&lt;/span&gt;
&lt;span class=&quot;w&quot;&gt;        &lt;/span&gt;&lt;span class=&quot;n&quot;&gt;sum&lt;/span&gt;&lt;span class=&quot;w&quot;&gt; &lt;/span&gt;&lt;span class=&quot;o&quot;&gt;=&lt;/span&gt;&lt;span class=&quot;w&quot;&gt; &lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;sum&lt;/span&gt;&lt;span class=&quot;w&quot;&gt; &lt;/span&gt;&lt;span class=&quot;o&quot;&gt;&amp;amp;&lt;/span&gt;&lt;span class=&quot;w&quot;&gt; &lt;/span&gt;&lt;span class=&quot;mh&quot;&gt;0xffff&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;)&lt;/span&gt;&lt;span class=&quot;w&quot;&gt; &lt;/span&gt;&lt;span class=&quot;o&quot;&gt;+&lt;/span&gt;&lt;span class=&quot;w&quot;&gt; &lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;sum&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;&amp;gt;&amp;gt;&lt;/span&gt;&lt;span class=&quot;mi&quot;&gt;16&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;);&lt;/span&gt;&lt;span class=&quot;w&quot;&gt;&lt;/span&gt;
&lt;span class=&quot;w&quot;&gt;    &lt;/span&gt;&lt;span class=&quot;p&quot;&gt;}&lt;/span&gt;&lt;span class=&quot;w&quot;&gt;&lt;/span&gt;
&lt;span class=&quot;w&quot;&gt;    &lt;/span&gt;&lt;span class=&quot;k&quot;&gt;return&lt;/span&gt;&lt;span class=&quot;w&quot;&gt; &lt;/span&gt;&lt;span class=&quot;n&quot;&gt;ntohs&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;((&lt;/span&gt;&lt;span class=&quot;kt&quot;&gt;uint16_t&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;)&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;~&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;sum&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;);&lt;/span&gt;&lt;span class=&quot;w&quot;&gt;&lt;/span&gt;
&lt;span class=&quot;p&quot;&gt;}&lt;/span&gt;&lt;span class=&quot;w&quot;&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/figure&gt;

&lt;p&gt;Using larger word sizes increases speed as it reduces the total number of operations. What about using this technique on 64-bit integers? It definitely would be possible, but it requires to handle carry in the body loop. In the algorithm above, 32-bit words are summed to a 64-bit word. Carry, if any, is stored in the higher part of &lt;em&gt;sum&lt;/em&gt;, which later gets summed in the folding step.&lt;/p&gt;

&lt;p&gt;Using SIMD instructions should allow us to sum larger sizes of data in parallel. For instance using AVX2’s VPADD (vector-packed addition) instruction it should be possible to sum 16x16-bit words in parallel. The issue here once again is handling the possible generated carry to the accumulated sum. So instead of a 16x16-bit vector a 8x32 vector is used instead. From a functional point of view this is equivalent to sum using 128-bit words.&lt;/p&gt;

&lt;p&gt;Snabb features implementations of checksum computation, generic and using SIMD instructions. In the latter case there are versions for SSE2 and AVX2 instruction sets. Snabb’s philosophy is to do everything in software and rely as much less as possible in offloaded NIC functions. Thus checksum computation is something done in code. Snabb’s implementation using AVX2 instructions is available at &lt;a href=&quot;https://github.com/snabbco/snabb/blob/master/src/arch/avx2.c&quot;&gt;src/arch/avx2.c&lt;/a&gt; (Luke pushed a very interesting implementation in machine code as well. See &lt;a href=&quot;https://github.com/snabbco/snabb/pull/899&quot;&gt;PR#899&lt;/a&gt;).&lt;/p&gt;

&lt;p&gt;Going back to RFC 1071, many of the reference implementations do additions in the main loop taking into account the carry bit. For instance, in the Motorola 68020 implementation that is done using the &lt;em&gt;ADDXL&lt;/em&gt; instruction. In X86 there’s an equivalent add-with-carry instruction (&lt;em&gt;ADC&lt;/em&gt;). Basically this instruction performs a sum of two operands plus the carry-flag.&lt;/p&gt;

&lt;p&gt;Another technique described in RFC 1071, and also used in the reference implementations, is &lt;em&gt;loop unrolling&lt;/em&gt;. Instead of summing one word per loop, we could sum 2, 4 or 8 words instead. A loop that sums 64-bit words in strides of 8 means actually avoiding loops for packet sizes lower than 512 bytes. Unrolling a loop requires adding waterfall code after the loop to handle the edge-cases that control the bounds of the loop.&lt;/p&gt;

&lt;p&gt;As an exercise to teach myself more &lt;a href=&quot;https://luajit.org/dynasm_examples.html&quot;&gt;DynASM&lt;/a&gt; and X86-64 assembly, I decided to rewrite the generic checksum algorithm and see if performance improved. The first implementation followed the canonical algorithm, summing words as 16-bit values. Performance was much better than the generic Lua implementation posted at the beginning of this article, but it wasn’t better than Snabb’s C implementation, which does loop unrolling.&lt;/p&gt;

&lt;p&gt;After this initially disappointing result, I decided to apply some of the optimization techniques commented before. Summing octets as 32-bit words definitely improved performance. The advantage of writing the algorithm in assembly is that I could make use of the ADC instruction. That allowed me to use 64-bit words. Performance improved once again. Finally I tried out several loop unrolling. With a loop unrolling of 4 strides the algorithm proved to be better than the SSE2 algorithm for several packet sizes: 64 bytes, 570 bytes and 1520 bytes. However, it doesn’t beat the AVX2 implementation in the large packet case, but it shown better performance for small and medium sizes.&lt;/p&gt;

&lt;p&gt;And here’s the final implementation:&lt;/p&gt;

&lt;figure class=&quot;highlight&quot;&gt;&lt;pre&gt;&lt;code class=&quot;language-asm&quot; data-lang=&quot;asm&quot;&gt;&lt;span&gt;&lt;/span&gt;&lt;span class=&quot;c1&quot;&gt;; Prologue.&lt;/span&gt;
&lt;span class=&quot;nf&quot;&gt;push&lt;/span&gt; &lt;span class=&quot;no&quot;&gt;rbp&lt;/span&gt;
&lt;span class=&quot;nf&quot;&gt;mov&lt;/span&gt; &lt;span class=&quot;no&quot;&gt;rbp&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;no&quot;&gt;rsp&lt;/span&gt;

&lt;span class=&quot;c1&quot;&gt;; Accumulative sum.&lt;/span&gt;
&lt;span class=&quot;nf&quot;&gt;xor&lt;/span&gt; &lt;span class=&quot;no&quot;&gt;rax&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;no&quot;&gt;rax&lt;/span&gt;                &lt;span class=&quot;c1&quot;&gt;; Clear out rax. Stores accumulated sum.&lt;/span&gt;
&lt;span class=&quot;nf&quot;&gt;xor&lt;/span&gt; &lt;span class=&quot;no&quot;&gt;r9&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;no&quot;&gt;r9&lt;/span&gt;                  &lt;span class=&quot;c1&quot;&gt;; Clear out r9. Stores value of array.&lt;/span&gt;
&lt;span class=&quot;nf&quot;&gt;xor&lt;/span&gt; &lt;span class=&quot;no&quot;&gt;r8&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;no&quot;&gt;r8&lt;/span&gt;                  &lt;span class=&quot;c1&quot;&gt;; Clear out r8. Stores array index.&lt;/span&gt;
&lt;span class=&quot;nf&quot;&gt;mov&lt;/span&gt; &lt;span class=&quot;no&quot;&gt;rcx&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;no&quot;&gt;rsi&lt;/span&gt;                &lt;span class=&quot;c1&quot;&gt;; Rsi (2nd argument; size). Assign rsi to rcx.&lt;/span&gt;
&lt;span class=&quot;err&quot;&gt;1:&lt;/span&gt;
&lt;span class=&quot;nf&quot;&gt;cmp&lt;/span&gt; &lt;span class=&quot;no&quot;&gt;rcx&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;mi&quot;&gt;32&lt;/span&gt;                 &lt;span class=&quot;c1&quot;&gt;; If index is less than 16.&lt;/span&gt;
&lt;span class=&quot;nf&quot;&gt;jl&lt;/span&gt; &lt;span class=&quot;err&quot;&gt;&amp;gt;&lt;/span&gt;&lt;span class=&quot;mi&quot;&gt;2&lt;/span&gt;                       &lt;span class=&quot;c1&quot;&gt;; Jump to branch &amp;#39;2&amp;#39;.&lt;/span&gt;
&lt;span class=&quot;nf&quot;&gt;add&lt;/span&gt; &lt;span class=&quot;no&quot;&gt;rax&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;p&quot;&gt;[&lt;/span&gt;&lt;span class=&quot;no&quot;&gt;rdi&lt;/span&gt; &lt;span class=&quot;err&quot;&gt;+&lt;/span&gt; &lt;span class=&quot;no&quot;&gt;r8&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;]&lt;/span&gt;         &lt;span class=&quot;c1&quot;&gt;; Sum acc with qword[0].&lt;/span&gt;
&lt;span class=&quot;nf&quot;&gt;adc&lt;/span&gt; &lt;span class=&quot;no&quot;&gt;rax&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;p&quot;&gt;[&lt;/span&gt;&lt;span class=&quot;no&quot;&gt;rdi&lt;/span&gt; &lt;span class=&quot;err&quot;&gt;+&lt;/span&gt; &lt;span class=&quot;no&quot;&gt;r8&lt;/span&gt; &lt;span class=&quot;err&quot;&gt;+&lt;/span&gt; &lt;span class=&quot;mi&quot;&gt;8&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;]&lt;/span&gt;     &lt;span class=&quot;c1&quot;&gt;; Sum with carry qword[1].&lt;/span&gt;
&lt;span class=&quot;nf&quot;&gt;adc&lt;/span&gt; &lt;span class=&quot;no&quot;&gt;rax&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;p&quot;&gt;[&lt;/span&gt;&lt;span class=&quot;no&quot;&gt;rdi&lt;/span&gt; &lt;span class=&quot;err&quot;&gt;+&lt;/span&gt; &lt;span class=&quot;no&quot;&gt;r8&lt;/span&gt; &lt;span class=&quot;err&quot;&gt;+&lt;/span&gt; &lt;span class=&quot;mi&quot;&gt;16&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;]&lt;/span&gt;    &lt;span class=&quot;c1&quot;&gt;; Sum with carry qword[2].&lt;/span&gt;
&lt;span class=&quot;nf&quot;&gt;adc&lt;/span&gt; &lt;span class=&quot;no&quot;&gt;rax&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;p&quot;&gt;[&lt;/span&gt;&lt;span class=&quot;no&quot;&gt;rdi&lt;/span&gt; &lt;span class=&quot;err&quot;&gt;+&lt;/span&gt; &lt;span class=&quot;no&quot;&gt;r8&lt;/span&gt; &lt;span class=&quot;err&quot;&gt;+&lt;/span&gt; &lt;span class=&quot;mi&quot;&gt;24&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;]&lt;/span&gt;    &lt;span class=&quot;c1&quot;&gt;; Sum with carry qword[3]&lt;/span&gt;
&lt;span class=&quot;nf&quot;&gt;adc&lt;/span&gt; &lt;span class=&quot;no&quot;&gt;rax&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;mi&quot;&gt;0&lt;/span&gt;                  &lt;span class=&quot;c1&quot;&gt;; Sum carry-bit into acc.&lt;/span&gt;
&lt;span class=&quot;nf&quot;&gt;sub&lt;/span&gt; &lt;span class=&quot;no&quot;&gt;rcx&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;mi&quot;&gt;32&lt;/span&gt;                 &lt;span class=&quot;c1&quot;&gt;; Decrease index by 8.&lt;/span&gt;
&lt;span class=&quot;nf&quot;&gt;add&lt;/span&gt; &lt;span class=&quot;no&quot;&gt;r8&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;mi&quot;&gt;32&lt;/span&gt;                  &lt;span class=&quot;c1&quot;&gt;; Jump two qwords.&lt;/span&gt;
&lt;span class=&quot;nf&quot;&gt;jmp&lt;/span&gt; &lt;span class=&quot;err&quot;&gt;&amp;lt;&lt;/span&gt;&lt;span class=&quot;mi&quot;&gt;1&lt;/span&gt;                      &lt;span class=&quot;c1&quot;&gt;; Go to beginning of loop.&lt;/span&gt;
&lt;span class=&quot;err&quot;&gt;2:&lt;/span&gt;
&lt;span class=&quot;nf&quot;&gt;cmp&lt;/span&gt; &lt;span class=&quot;no&quot;&gt;rcx&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;mi&quot;&gt;16&lt;/span&gt;                 &lt;span class=&quot;c1&quot;&gt;; If index is less than 16.&lt;/span&gt;
&lt;span class=&quot;nf&quot;&gt;jl&lt;/span&gt; &lt;span class=&quot;err&quot;&gt;&amp;gt;&lt;/span&gt;&lt;span class=&quot;mi&quot;&gt;3&lt;/span&gt;                       &lt;span class=&quot;c1&quot;&gt;; Jump to branch &amp;#39;2&amp;#39;.&lt;/span&gt;
&lt;span class=&quot;nf&quot;&gt;add&lt;/span&gt; &lt;span class=&quot;no&quot;&gt;rax&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;p&quot;&gt;[&lt;/span&gt;&lt;span class=&quot;no&quot;&gt;rdi&lt;/span&gt; &lt;span class=&quot;err&quot;&gt;+&lt;/span&gt; &lt;span class=&quot;no&quot;&gt;r8&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;]&lt;/span&gt;         &lt;span class=&quot;c1&quot;&gt;; Sum acc with qword[0].&lt;/span&gt;
&lt;span class=&quot;nf&quot;&gt;adc&lt;/span&gt; &lt;span class=&quot;no&quot;&gt;rax&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;p&quot;&gt;[&lt;/span&gt;&lt;span class=&quot;no&quot;&gt;rdi&lt;/span&gt; &lt;span class=&quot;err&quot;&gt;+&lt;/span&gt; &lt;span class=&quot;no&quot;&gt;r8&lt;/span&gt; &lt;span class=&quot;err&quot;&gt;+&lt;/span&gt; &lt;span class=&quot;mi&quot;&gt;8&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;]&lt;/span&gt;     &lt;span class=&quot;c1&quot;&gt;; Sum with carry qword[1].&lt;/span&gt;
&lt;span class=&quot;nf&quot;&gt;adc&lt;/span&gt; &lt;span class=&quot;no&quot;&gt;rax&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;mi&quot;&gt;0&lt;/span&gt;                  &lt;span class=&quot;c1&quot;&gt;; Sum carry-bit into acc.&lt;/span&gt;
&lt;span class=&quot;nf&quot;&gt;sub&lt;/span&gt; &lt;span class=&quot;no&quot;&gt;rcx&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;mi&quot;&gt;16&lt;/span&gt;                 &lt;span class=&quot;c1&quot;&gt;; Decrease index by 8.&lt;/span&gt;
&lt;span class=&quot;nf&quot;&gt;add&lt;/span&gt; &lt;span class=&quot;no&quot;&gt;r8&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;mi&quot;&gt;16&lt;/span&gt;                  &lt;span class=&quot;c1&quot;&gt;; Jump two qwords.&lt;/span&gt;
&lt;span class=&quot;err&quot;&gt;3:&lt;/span&gt;
&lt;span class=&quot;nf&quot;&gt;cmp&lt;/span&gt; &lt;span class=&quot;no&quot;&gt;rcx&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;mi&quot;&gt;8&lt;/span&gt;                  &lt;span class=&quot;c1&quot;&gt;; If index is less than 8.&lt;/span&gt;
&lt;span class=&quot;nf&quot;&gt;jl&lt;/span&gt; &lt;span class=&quot;err&quot;&gt;&amp;gt;&lt;/span&gt;&lt;span class=&quot;mi&quot;&gt;4&lt;/span&gt;                       &lt;span class=&quot;c1&quot;&gt;; Jump to branch &amp;#39;2&amp;#39;.&lt;/span&gt;
&lt;span class=&quot;nf&quot;&gt;add&lt;/span&gt; &lt;span class=&quot;no&quot;&gt;rax&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;p&quot;&gt;[&lt;/span&gt;&lt;span class=&quot;no&quot;&gt;rdi&lt;/span&gt; &lt;span class=&quot;err&quot;&gt;+&lt;/span&gt; &lt;span class=&quot;no&quot;&gt;r8&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;]&lt;/span&gt;         &lt;span class=&quot;c1&quot;&gt;; Sum acc with qword[0].&lt;/span&gt;
&lt;span class=&quot;nf&quot;&gt;adc&lt;/span&gt; &lt;span class=&quot;no&quot;&gt;rax&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;mi&quot;&gt;0&lt;/span&gt;                  &lt;span class=&quot;c1&quot;&gt;; Sum carry-bit into acc.&lt;/span&gt;
&lt;span class=&quot;nf&quot;&gt;sub&lt;/span&gt; &lt;span class=&quot;no&quot;&gt;rcx&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;mi&quot;&gt;8&lt;/span&gt;                  &lt;span class=&quot;c1&quot;&gt;; Decrease index by 8.&lt;/span&gt;
&lt;span class=&quot;nf&quot;&gt;add&lt;/span&gt; &lt;span class=&quot;no&quot;&gt;r8&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;mi&quot;&gt;8&lt;/span&gt;                   &lt;span class=&quot;c1&quot;&gt;; Next 64-bit.&lt;/span&gt;
&lt;span class=&quot;err&quot;&gt;4:&lt;/span&gt;
&lt;span class=&quot;nf&quot;&gt;cmp&lt;/span&gt; &lt;span class=&quot;no&quot;&gt;rcx&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;mi&quot;&gt;4&lt;/span&gt;                  &lt;span class=&quot;c1&quot;&gt;; If index is less than 4.&lt;/span&gt;
&lt;span class=&quot;nf&quot;&gt;jl&lt;/span&gt; &lt;span class=&quot;err&quot;&gt;&amp;gt;&lt;/span&gt;&lt;span class=&quot;mi&quot;&gt;5&lt;/span&gt;                       &lt;span class=&quot;c1&quot;&gt;; Jump to branch &amp;#39;3&amp;#39;.&lt;/span&gt;
&lt;span class=&quot;nf&quot;&gt;mov&lt;/span&gt; &lt;span class=&quot;no&quot;&gt;r9d&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;no&quot;&gt;dword&lt;/span&gt; &lt;span class=&quot;p&quot;&gt;[&lt;/span&gt;&lt;span class=&quot;no&quot;&gt;rdi&lt;/span&gt; &lt;span class=&quot;err&quot;&gt;+&lt;/span&gt; &lt;span class=&quot;no&quot;&gt;r8&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;]&lt;/span&gt;   &lt;span class=&quot;c1&quot;&gt;; Fetch 32-bit from data + r8 into r9d.&lt;/span&gt;
&lt;span class=&quot;nf&quot;&gt;add&lt;/span&gt; &lt;span class=&quot;no&quot;&gt;rax&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;no&quot;&gt;r9&lt;/span&gt;                 &lt;span class=&quot;c1&quot;&gt;; Sum acc with r9. Accumulate carry.&lt;/span&gt;
&lt;span class=&quot;nf&quot;&gt;sub&lt;/span&gt; &lt;span class=&quot;no&quot;&gt;rcx&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;mi&quot;&gt;4&lt;/span&gt;                  &lt;span class=&quot;c1&quot;&gt;; Decrease index by 4.&lt;/span&gt;
&lt;span class=&quot;nf&quot;&gt;add&lt;/span&gt; &lt;span class=&quot;no&quot;&gt;r8&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;mi&quot;&gt;4&lt;/span&gt;                   &lt;span class=&quot;c1&quot;&gt;; Next 32-bit.&lt;/span&gt;
&lt;span class=&quot;err&quot;&gt;5:&lt;/span&gt;
&lt;span class=&quot;nf&quot;&gt;cmp&lt;/span&gt; &lt;span class=&quot;no&quot;&gt;rcx&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;mi&quot;&gt;2&lt;/span&gt;                  &lt;span class=&quot;c1&quot;&gt;; If index is less than 2.&lt;/span&gt;
&lt;span class=&quot;nf&quot;&gt;jl&lt;/span&gt; &lt;span class=&quot;err&quot;&gt;&amp;gt;&lt;/span&gt;&lt;span class=&quot;mi&quot;&gt;6&lt;/span&gt;                       &lt;span class=&quot;c1&quot;&gt;; Jump to branch &amp;#39;4&amp;#39;.&lt;/span&gt;
&lt;span class=&quot;nf&quot;&gt;movzx&lt;/span&gt; &lt;span class=&quot;no&quot;&gt;r9&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;no&quot;&gt;word&lt;/span&gt; &lt;span class=&quot;p&quot;&gt;[&lt;/span&gt;&lt;span class=&quot;no&quot;&gt;rdi&lt;/span&gt; &lt;span class=&quot;err&quot;&gt;+&lt;/span&gt; &lt;span class=&quot;no&quot;&gt;r8&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;]&lt;/span&gt;   &lt;span class=&quot;c1&quot;&gt;; Fetch 16-bit from data + r8 into r9.&lt;/span&gt;
&lt;span class=&quot;nf&quot;&gt;add&lt;/span&gt; &lt;span class=&quot;no&quot;&gt;rax&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;no&quot;&gt;r9&lt;/span&gt;                 &lt;span class=&quot;c1&quot;&gt;; Sum acc with r9. Accumulate carry.&lt;/span&gt;
&lt;span class=&quot;nf&quot;&gt;sub&lt;/span&gt; &lt;span class=&quot;no&quot;&gt;rcx&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;mi&quot;&gt;2&lt;/span&gt;                  &lt;span class=&quot;c1&quot;&gt;; Decrease index by 2.&lt;/span&gt;
&lt;span class=&quot;nf&quot;&gt;add&lt;/span&gt; &lt;span class=&quot;no&quot;&gt;r8&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;mi&quot;&gt;2&lt;/span&gt;                   &lt;span class=&quot;c1&quot;&gt;; Next 16-bit.&lt;/span&gt;
&lt;span class=&quot;err&quot;&gt;6:&lt;/span&gt;
&lt;span class=&quot;nf&quot;&gt;cmp&lt;/span&gt; &lt;span class=&quot;no&quot;&gt;rcx&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;mi&quot;&gt;1&lt;/span&gt;                  &lt;span class=&quot;c1&quot;&gt;; If index is less than 1.&lt;/span&gt;
&lt;span class=&quot;nf&quot;&gt;jl&lt;/span&gt; &lt;span class=&quot;err&quot;&gt;&amp;gt;&lt;/span&gt;&lt;span class=&quot;mi&quot;&gt;7&lt;/span&gt;                       &lt;span class=&quot;c1&quot;&gt;; Jump to branch &amp;#39;5&amp;#39;.&lt;/span&gt;
&lt;span class=&quot;nf&quot;&gt;movzx&lt;/span&gt; &lt;span class=&quot;no&quot;&gt;r9&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;no&quot;&gt;byte&lt;/span&gt; &lt;span class=&quot;p&quot;&gt;[&lt;/span&gt;&lt;span class=&quot;no&quot;&gt;rdi&lt;/span&gt; &lt;span class=&quot;err&quot;&gt;+&lt;/span&gt; &lt;span class=&quot;no&quot;&gt;r8&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;]&lt;/span&gt;   &lt;span class=&quot;c1&quot;&gt;; Fetch 8-bit from data + r8 into r9.&lt;/span&gt;
&lt;span class=&quot;nf&quot;&gt;add&lt;/span&gt; &lt;span class=&quot;no&quot;&gt;rax&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;no&quot;&gt;r9&lt;/span&gt;                 &lt;span class=&quot;c1&quot;&gt;; Sum acc with r9. Accumulate carry.&lt;/span&gt;

&lt;span class=&quot;c1&quot;&gt;; Fold 64-bit into 16-bit.&lt;/span&gt;
&lt;span class=&quot;err&quot;&gt;7:&lt;/span&gt;
&lt;span class=&quot;nf&quot;&gt;mov&lt;/span&gt; &lt;span class=&quot;no&quot;&gt;r9&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;no&quot;&gt;rax&lt;/span&gt;                 &lt;span class=&quot;c1&quot;&gt;; Assign acc to r9.&lt;/span&gt;
&lt;span class=&quot;nf&quot;&gt;shr&lt;/span&gt; &lt;span class=&quot;no&quot;&gt;r9&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;mi&quot;&gt;32&lt;/span&gt;                  &lt;span class=&quot;c1&quot;&gt;; Shift r9 32-bit. Stores higher part of acc.&lt;/span&gt;
&lt;span class=&quot;nf&quot;&gt;and&lt;/span&gt; &lt;span class=&quot;no&quot;&gt;rax&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;mi&quot;&gt;0x00000000ffffffff&lt;/span&gt; &lt;span class=&quot;c1&quot;&gt;; Clear out higher-part of rax. Stores lower part of acc.&lt;/span&gt;
&lt;span class=&quot;nf&quot;&gt;add&lt;/span&gt; &lt;span class=&quot;no&quot;&gt;eax&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;no&quot;&gt;r9d&lt;/span&gt;                &lt;span class=&quot;c1&quot;&gt;; 32-bit sum of acc and r9.&lt;/span&gt;
&lt;span class=&quot;nf&quot;&gt;adc&lt;/span&gt; &lt;span class=&quot;no&quot;&gt;eax&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;mi&quot;&gt;0&lt;/span&gt;                  &lt;span class=&quot;c1&quot;&gt;; Sum carry to acc.&lt;/span&gt;
&lt;span class=&quot;nf&quot;&gt;mov&lt;/span&gt; &lt;span class=&quot;no&quot;&gt;r9d&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;no&quot;&gt;eax&lt;/span&gt;                &lt;span class=&quot;c1&quot;&gt;; Repeat for 16-bit.&lt;/span&gt;
&lt;span class=&quot;nf&quot;&gt;shr&lt;/span&gt; &lt;span class=&quot;no&quot;&gt;r9d&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;mi&quot;&gt;16&lt;/span&gt;
&lt;span class=&quot;nf&quot;&gt;and&lt;/span&gt; &lt;span class=&quot;no&quot;&gt;eax&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;mi&quot;&gt;0x0000ffff&lt;/span&gt;
&lt;span class=&quot;nf&quot;&gt;add&lt;/span&gt; &lt;span class=&quot;no&quot;&gt;ax&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;no&quot;&gt;r9w&lt;/span&gt;
&lt;span class=&quot;nf&quot;&gt;adc&lt;/span&gt; &lt;span class=&quot;no&quot;&gt;ax&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;mi&quot;&gt;0&lt;/span&gt;

&lt;span class=&quot;c1&quot;&gt;; One&amp;#39;s complement.&lt;/span&gt;
&lt;span class=&quot;nf&quot;&gt;not&lt;/span&gt; &lt;span class=&quot;no&quot;&gt;rax&lt;/span&gt;                     &lt;span class=&quot;c1&quot;&gt;; One-complement of rax.&lt;/span&gt;
&lt;span class=&quot;nf&quot;&gt;and&lt;/span&gt; &lt;span class=&quot;no&quot;&gt;rax&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;mi&quot;&gt;0xffff&lt;/span&gt;             &lt;span class=&quot;c1&quot;&gt;; Clear out higher part of rax.&lt;/span&gt;

&lt;span class=&quot;c1&quot;&gt;; Epilogue.&lt;/span&gt;
&lt;span class=&quot;nf&quot;&gt;mov&lt;/span&gt; &lt;span class=&quot;no&quot;&gt;rsp&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;no&quot;&gt;rbp&lt;/span&gt;
&lt;span class=&quot;nf&quot;&gt;pop&lt;/span&gt; &lt;span class=&quot;no&quot;&gt;rbp&lt;/span&gt;

&lt;span class=&quot;c1&quot;&gt;; Return.&lt;/span&gt;
&lt;span class=&quot;nf&quot;&gt;ret&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/figure&gt;

&lt;p&gt;Benchmark results for several data sizes:&lt;/p&gt;

&lt;p&gt;Data size: 44 bytes&lt;/p&gt;

&lt;table&gt;
  &lt;thead&gt;
    &lt;tr&gt;
      &lt;th&gt;Algorithm&lt;/th&gt;
      &lt;th&gt;Time per csum&lt;/th&gt;
      &lt;th&gt;Time per byte&lt;/th&gt;
    &lt;/tr&gt;
  &lt;/thead&gt;
  &lt;tbody&gt;
    &lt;tr&gt;
      &lt;td&gt;Generic&lt;/td&gt;
      &lt;td&gt;87.77 ns&lt;/td&gt;
      &lt;td&gt;1.99 ns&lt;/td&gt;
    &lt;/tr&gt;
    &lt;tr&gt;
      &lt;td&gt;SSE2&lt;/td&gt;
      &lt;td&gt;86.06 ns&lt;/td&gt;
      &lt;td&gt;1.96 ns&lt;/td&gt;
    &lt;/tr&gt;
    &lt;tr&gt;
      &lt;td&gt;AVX2&lt;/td&gt;
      &lt;td&gt;83.20 ns&lt;/td&gt;
      &lt;td&gt;1.89 ns&lt;/td&gt;
    &lt;/tr&gt;
    &lt;tr&gt;
      &lt;td&gt;New&lt;/td&gt;
      &lt;td&gt;52.10 ns&lt;/td&gt;
      &lt;td&gt;1.18 ns&lt;/td&gt;
    &lt;/tr&gt;
  &lt;/tbody&gt;
&lt;/table&gt;

&lt;p&gt;Data size: 550 bytes&lt;/p&gt;

&lt;table&gt;
  &lt;thead&gt;
    &lt;tr&gt;
      &lt;th&gt;Algorithm&lt;/th&gt;
      &lt;th&gt;Time per csum&lt;/th&gt;
      &lt;th&gt;Time per byte&lt;/th&gt;
    &lt;/tr&gt;
  &lt;/thead&gt;
  &lt;tbody&gt;
    &lt;tr&gt;
      &lt;td&gt;Generic&lt;/td&gt;
      &lt;td&gt;1058.04 ns&lt;/td&gt;
      &lt;td&gt;1.92 ns&lt;/td&gt;
    &lt;/tr&gt;
    &lt;tr&gt;
      &lt;td&gt;SSE2&lt;/td&gt;
      &lt;td&gt;510.40 ns&lt;/td&gt;
      &lt;td&gt;0.93 ns&lt;/td&gt;
    &lt;/tr&gt;
    &lt;tr&gt;
      &lt;td&gt;AVX2&lt;/td&gt;
      &lt;td&gt;318.42 ns&lt;/td&gt;
      &lt;td&gt;0.58 ns&lt;/td&gt;
    &lt;/tr&gt;
    &lt;tr&gt;
      &lt;td&gt;New&lt;/td&gt;
      &lt;td&gt;270.79 ns&lt;/td&gt;
      &lt;td&gt;0.49 ns&lt;/td&gt;
    &lt;/tr&gt;
  &lt;/tbody&gt;
&lt;/table&gt;

&lt;p&gt;Data size: 1500 bytes&lt;/p&gt;

&lt;table&gt;
  &lt;thead&gt;
    &lt;tr&gt;
      &lt;th&gt;Algorithm&lt;/th&gt;
      &lt;th&gt;Time per csum&lt;/th&gt;
      &lt;th&gt;Time per byte&lt;/th&gt;
    &lt;/tr&gt;
  &lt;/thead&gt;
  &lt;tbody&gt;
    &lt;tr&gt;
      &lt;td&gt;Generic&lt;/td&gt;
      &lt;td&gt;2910.10 ns&lt;/td&gt;
      &lt;td&gt;1.94 ns&lt;/td&gt;
    &lt;/tr&gt;
    &lt;tr&gt;
      &lt;td&gt;SSE2&lt;/td&gt;
      &lt;td&gt;991.04 ns&lt;/td&gt;
      &lt;td&gt;0.66 ns&lt;/td&gt;
    &lt;/tr&gt;
    &lt;tr&gt;
      &lt;td&gt;AVX2&lt;/td&gt;
      &lt;td&gt;664.98 ns&lt;/td&gt;
      &lt;td&gt;0.44 ns&lt;/td&gt;
    &lt;/tr&gt;
    &lt;tr&gt;
      &lt;td&gt;New&lt;/td&gt;
      &lt;td&gt;743.88 ns&lt;/td&gt;
      &lt;td&gt;0.50 ns&lt;/td&gt;
    &lt;/tr&gt;
  &lt;/tbody&gt;
&lt;/table&gt;

&lt;p&gt;All in all, it has been a fun exercise. I have learned quite a lot about the Internet checksum algorithm. I also learned how loop unrolling can help improving performance in a dramatic way (more than I initially expected). I found very interesting as well how changing the context of a problem, in this case the target programming language, forces to think about the problem in a different way but it also enables the possibility of doing more optimizations that were not possible before.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Update&lt;/strong&gt; (17th November 2022):&lt;/p&gt;

&lt;p&gt;Over the years I have been contacted about whether it would be possible to use the code of this post in other projects.&lt;/p&gt;

&lt;p&gt;The algorithms in this post follow the same logic described in [RFC 1701] and follow-up RFCs, but adapted to larger word sizes (64 and 32 bit). There’s nothing novel about these algorithms. The RFCs already included reference implementations for different architectures.&lt;/p&gt;

&lt;p&gt;So the answer is yes, you can use the code in this post for your projects. Referring this blog post is enough.&lt;/p&gt;

&lt;p&gt;Projects using (or inspired by) the code of this post&lt;/p&gt;
&lt;ul&gt;
  &lt;li&gt;The x64 asm implementation was eventually merged into [Snabb] (https://github.com/snabbco/snabb/pull/1275).&lt;/li&gt;
  &lt;li&gt;&lt;a href=&quot;https://github.com/ssharks&quot;&gt;Sjors Hettinga&lt;/a&gt; took inspiration from this post to implement a faster checksum algorithm for the &lt;a href=&quot;https://github.com/zephyrproject-rtos/zephyr/pull/51439#event-7662323468&quot;&gt;Zephyr RTOS&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;
</description>
        <pubDate>Thu, 14 Jun 2018 06:00:00 +0000</pubDate>
        <link>http://blogs.igalia.com/dpino/2018/06/14/fast-checksum-computation/</link>
        <guid isPermaLink="true">http://blogs.igalia.com/dpino/2018/06/14/fast-checksum-computation/</guid>
        
        <category>igalia</category>
        
        <category>networking</category>
        
        
      </item>
    
      <item>
        <title>The B4 network function</title>
        <description>
&lt;p&gt;Some time ago I started a series of blog posts about IPv6 and network namespaces. The purpose of those posts was preparing the ground for covering a network function called &lt;strong&gt;B4&lt;/strong&gt; (&lt;em&gt;Basic Bridging BroadBand&lt;/em&gt;).&lt;/p&gt;

&lt;p&gt;The B4 network function is one of the main components of a &lt;strong&gt;lw4o6 architecture&lt;/strong&gt; (&lt;a href=&quot;https://tools.ietf.org/html/rfc7596&quot;&gt;RFC7596&lt;/a&gt;). The function runs within every &lt;strong&gt;CPEs&lt;/strong&gt; (&lt;em&gt;Customer’s Premises Equipment&lt;/em&gt;, essentially a home router) of a carrier’s network. This function takes care of two things: 1) NAPT the customer’s IPv4 traffic and 2) encapsulate it into IPv6. This is fundamental as lw4o6 proposes an IPv6-only network, which can still provide IPv4 services and connectivity. Besides lw4o6, the B4 function is also present in other architectures such as &lt;strong&gt;DS-Lite&lt;/strong&gt; or &lt;strong&gt;MAP-E&lt;/strong&gt;. In the case of lw4o6 the exact name of this function is lwB4. All these architectures rely on &lt;a href=&quot;https://en.wikipedia.org/wiki/Mapping_of_Address_and_Port&quot;&gt;A+P mapping&lt;/a&gt; techniques and are managed by the &lt;a href=&quot;https://datatracker.ietf.org/wg/softwire/about/&quot;&gt;Softwire WG&lt;/a&gt;.&lt;/p&gt;

&lt;p&gt;The diagram below shows how a lw4o6 architecture works:&lt;/p&gt;

&lt;figure&gt;&lt;img src=&quot;/dpino/files/2017/06/lw4o6.png&quot; title=&quot;lw4o6 chart&quot; alt=&quot;lw4o6 chart&quot; /&gt;&lt;figcaption style=&quot;text-align: center&quot;&gt;lw4o6 chart&lt;/figcaption&gt;&lt;/figure&gt;

&lt;p&gt;Packets arriving the CPE from the customer (IPv4) are shown in red. Packets leaving the CPE to the carrier’s network are shown in blue (IPv6). The counterpart of a lwB4 function is the lwAFTR function, deployed at one of the border-routers of the carrrier’s network.&lt;/p&gt;

&lt;p&gt;In the article ‘&lt;a href=&quot;https://blogs.igalia.com/dpino/2017/06/05/dive-into-lw4o6/&quot;&gt;Dive into lw4o6&lt;/a&gt;’ I reviewed in detail how a lw4o6 architecture works. Please check out the article if you want to learn more.&lt;/p&gt;

&lt;p&gt;At Igalia we implemented a high-performant lwAFTR network function. This network function has been part of Snabb since at least 2015, and has kept evolving and getting merged back to Snabb through new releases. We kindly thank Deutsche Telekom for their support financing this project, as well as Juniper networks, who also helped improving the status of Snabb’s lwAFTR.&lt;/p&gt;

&lt;p&gt;While we were developing the lwAFTR network function we heavily tested it through a wide range of tests: end-to-end tests, performance tests, soak tests, etc. However, in some occassions we got to diagnose potential bugs in real deployments. To do that, we needed the other major component of a lw4o6 architecture: the B4 network function.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;OpenWRT&lt;/strong&gt;, the Linux-based OS powering many home routers, features a &lt;a href=&quot;http://tools.ietf.org/html/draft-ietf-softwire-map-10&quot;&gt;MAP network function&lt;/a&gt; to help deploying MAP-E architectures. This function can also be used to implement a B4 for DS-Lite or lw4o6. With the invaluable help of my colleagues Adrián and Carlos López I managed to setup an OpenWRT on a virtual machine with B4 enabled. However, I was not completely satisfied with the solution.&lt;/p&gt;

&lt;p&gt;That led me to explore another solution very much inspired by an excelent blog post from Marcel Wiget: &lt;a href=&quot;https://marcelwiget.wordpress.com/2015/11/30/lightweight-4over6-b4-client-in-linux-namespace/&quot;&gt;Lightweight 4over6 B4 Client in Linux Namespace&lt;/a&gt;. In this post Marcel describes how to build a B4 network function using standard Linux commands.&lt;/p&gt;

&lt;p&gt;Basically, a B4 function does 2 things:&lt;/p&gt;
&lt;ul&gt;
  &lt;li&gt;NAT44, which is possible to do it with &lt;code&gt;iptables&lt;/code&gt;.&lt;/li&gt;
  &lt;li&gt;IPv4-in-IPv6 tunneling, which is possible to do it &lt;code&gt;iproute2&lt;/code&gt;.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;In addition, Marcel’s B4 network function is isolated into its own network namespace. That’s less of a headache than installing and configuring a virtual machine.&lt;/p&gt;

&lt;p&gt;On the other hand, my deployment had a extra twist compared to a standard lw4o6 deployment. The lwAFTR I was trying to reach was somewhere out on the Internet, not within my ISP’s network. To make things worse, ISP providers in Spain are not rolling out IPv6 yet so I needed to use an IPv6 tunnel broker, more precisely Hurricane Electric (In the article ‘&lt;a href=&quot;https://blogs.igalia.com/dpino/2016/04/02/ipv6-tunnel/&quot;&gt;IPv6 tunnel&lt;/a&gt;’ I described how to set up such tunnel).&lt;/p&gt;

&lt;p&gt;Basically my deployment looked like this:&lt;/p&gt;

&lt;figure&gt;&lt;img src=&quot;/dpino/files/2018/02/b4-deployment.png&quot; title=&quot;lwB4-lwAFTR over Internet&quot; alt=&quot;lwB4-lwAFTR over Internet&quot; /&gt;&lt;figcaption style=&quot;text-align: center&quot;&gt;lwB4-lwAFTR over Internet&lt;/figcaption&gt;&lt;/figure&gt;

&lt;p&gt;After scratching my head during several days I came up with the following script: &lt;a href=&quot;https://gist.github.com/dpino/3eab3ab7b175d9d28a7814ce4e7bccb3&quot;&gt;b4-to-aftr-over-inet.sh&lt;/a&gt;. I break it down below in pieces for better comprehension.&lt;/p&gt;

&lt;p&gt;Warning: The script requires a Hurricane Electric tunnel up and running in order to work.&lt;/p&gt;

&lt;p&gt;Our B4 would have the following provisioned data:&lt;/p&gt;

&lt;ul&gt;
  &lt;li&gt;B4 IPv6: IPv6 address provided by Hurricane Electric.&lt;/li&gt;
  &lt;li&gt;B4 IPv4: &lt;code&gt;192.0.2.1&lt;/code&gt;&lt;/li&gt;
  &lt;li&gt;B4 port-range: &lt;code&gt;4096-8191&lt;/code&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;While the address of the AFTR is &lt;code&gt;2001:DB8::0001&lt;/code&gt;.&lt;/p&gt;

&lt;p&gt;Given this settings our B4 is ready to match the following softwire in the lwAFTR’s binding table:&lt;/p&gt;

&lt;pre&gt;&lt;code class=&quot;language-yaml&quot;&gt;softwire {
    ipv4 192.0.2.1;
    psid 1;
    b4-ipv6 IFHE (See below);
    br-address 2001:DB8::0001;
    port-set {
        psid-length 12;
    }
}
&lt;/code&gt;&lt;/pre&gt;

&lt;p&gt;In case of doubt about how softwires work, please check ‘&lt;a href=&quot;https://blogs.igalia.com/dpino/2017/06/05/dive-into-lw4o6/&quot;&gt;Dive into lw4o6&lt;/a&gt;’.&lt;/p&gt;

&lt;pre&gt;&lt;code&gt;IPHT=&quot;fd24:f64b:aca9:e498::1&quot;
IPNS=&quot;fd24:f64b:aca9:e498::2&quot;
CID=64
IFHT=&quot;veth9&quot;
IFNS=&quot;vpeer9&quot;
IFHE=&quot;sit1&quot;
NS=&quot;ns-b4&quot;
&lt;/code&gt;&lt;/pre&gt;

&lt;p&gt;Definition of several constants. &lt;code&gt;IPHT&lt;/code&gt; and &lt;code&gt;IPNS&lt;/code&gt; stand for &lt;em&gt;IP host&lt;/em&gt; and &lt;em&gt;IP namespace&lt;/em&gt;. Our script will create a network namespace which requires a &lt;em&gt;veth pair&lt;/em&gt; to communicate the namespace with the host. &lt;code&gt;IPHT&lt;/code&gt; is an ULA address for the host side, while IPNS is an ULA address for the network namespace side. Likewise, &lt;code&gt;IFHT&lt;/code&gt; and &lt;code&gt;IFNS&lt;/code&gt; are the interface names for host and namespace sides respectively.&lt;/p&gt;

&lt;p&gt;&lt;code&gt;IFHE&lt;/code&gt; is the interface of the Hurricane Electric IPv6-in-IPv4 tunnel. We will use the IPv6 address of this interface as IPv6 source address of the B4.&lt;/p&gt;

&lt;pre&gt;&lt;code class=&quot;language-bash&quot;&gt;AFTR_IPV6=&quot;2001:DB8::0001&quot;
IP=&quot;192.0.2.1&quot;
PORTRANGE=&quot;4096-8191&quot;
&lt;/code&gt;&lt;/pre&gt;

&lt;p&gt;Softwire related constants, as described above.&lt;/p&gt;

&lt;pre&gt;&lt;code class=&quot;language-bash&quot;&gt;# Reset everything
ip li del dev &quot;${IFHT}&quot; &amp;amp;&amp;gt;/dev/null
ip netns del &quot;${NS}&quot; &amp;amp;&amp;gt; /dev/null
&lt;/code&gt;&lt;/pre&gt;

&lt;p&gt;Removes namespace and host-side interface if defined.&lt;/p&gt;

&lt;pre&gt;&lt;code class=&quot;language-bash&quot;&gt;# Create a network namespace and enable loopback on it
ip netns add &quot;${NS}&quot;
ip netns exec &quot;${NS}&quot; ip li set dev lo up

# Create the veth pair and move one of the ends to the NS.
ip li add name &quot;${IFHT}&quot; type veth peer name &quot;${IFNS}&quot;
ip li set dev &quot;${IFNS}&quot; netns &quot;${NS}&quot;

# Configure interface ${IFHT} on the host
ip -6 addr add &quot;${IPHT}/${CID}&quot; dev &quot;${IFHT}&quot;
ip li set dev &quot;${IFHT}&quot; up

# Configure interface ${IFNS} on the network namespace.
ip netns exec &quot;${NS}&quot; ip -6 addr add &quot;${IPNS}/${CID}&quot; dev &quot;${IFNS}&quot;
ip netns exec &quot;${NS}&quot; ip li set dev &quot;${IFNS}&quot; up
&lt;/code&gt;&lt;/pre&gt;

&lt;p&gt;The commands above set up the basics of the network namespace. A network namespace is created with two virtual-interface pairs (think of a patch cable). Each of the veth ends is assigned a private IPv6 address (ULA). One of the ends of the veth pair is moved into the network namespace while the other remains on the host side. In case of doubt, please check this other article I wrote about &lt;a href=&quot;https://blogs.igalia.com/dpino/2016/04/10/network-namespaces/&quot;&gt;network namespaces&lt;/a&gt;.&lt;/p&gt;

&lt;pre&gt;&lt;code class=&quot;language-bash&quot;&gt;# Create IPv4-in-IPv6 tunnel.
ip netns exec &quot;${NS}&quot; ip -6 tunnel add b4tun mode ipip6 local &quot;${IPNS}&quot; remote &quot;${IPHT}&quot; dev &quot;${IFNS}&quot;
ip netns exec &quot;${NS}&quot; ip addr add 10.0.0.1 dev b4tun
ip netns exec &quot;${NS}&quot; ip link set dev b4tun up
# All IPv4 packets go through the tunnel.
ip netns exec &quot;${NS}&quot; ip route add default dev b4tun
# Make ${IFNS} the default gw.
ip netns exec &quot;${NS}&quot; ip -6 route add default dev &quot;${IFNS}&quot;
&lt;/code&gt;&lt;/pre&gt;

&lt;p&gt;From the B4 we will send IPv4 packets that will get encapsulated into IPv6. These packets will eventually leave the host via the Hurricane Electric tunnel. What we do here is to create an IPv4-in-IPv6 tunnel (ipip6) called &lt;code&gt;b4tun&lt;/code&gt;. The tunnel has two ends: &lt;code&gt;IPNS&lt;/code&gt; and &lt;code&gt;IPHT&lt;/code&gt;. All IPv4 traffic started from the network namespace gets routed through &lt;code&gt;b4tun&lt;/code&gt;, so it gets encapsulated. If the traffic if IPv6 native traffic it doesn’t need to get encapsulated, thus it’s simply forwarded to &lt;code&gt;IFNS&lt;/code&gt;.&lt;/p&gt;

&lt;pre&gt;&lt;code class=&quot;language-bash&quot;&gt;# Adjust MTU size. 
ip netns exec &quot;${NS}&quot; ip li set mtu 1252 dev b4tun
ip netns exec &quot;${NS}&quot; ip li set mtu 1300 dev vpeer9
&lt;/code&gt;&lt;/pre&gt;

&lt;p&gt;Since packets leaving the CPE get IPv6 encapsulated we need to make room for those extra bytes that will grow the packet size. Normally routing appliances have a default MTU size of 1500 bytes. That’s why we artificially reduce the MTU size of both &lt;code&gt;b4tun&lt;/code&gt; and &lt;code&gt;vpeer9&lt;/code&gt; interfaces to a number lower than 1500. This technique is known as MSS (&lt;em&gt;Maximum Segment Size&lt;/em&gt;) Clamping.&lt;/p&gt;

&lt;pre&gt;&lt;code class=&quot;language-bash&quot;&gt;# NAT44.
ip netns exec &quot;${NS}&quot; iptables -t nat --flush
ip netns exec &quot;${NS}&quot; iptables -t nat -A POSTROUTING -p tcp  -o b4tun -j SNAT --to $IP:$PORTRANGE
ip netns exec &quot;${NS}&quot; iptables -t nat -A POSTROUTING -p udp  -o b4tun -j SNAT --to $IP:$PORTRANGE
ip netns exec &quot;${NS}&quot; iptables -t nat -A POSTROUTING -p icmp -o b4tun -j SNAT --to $IP:$PORTRANGE
&lt;/code&gt;&lt;/pre&gt;

&lt;p&gt;Outgoing IPv4 packets leaving the B4 got their IPv4 source address and port sourced natted. The block of code above flushes the iptables’s NAT44 rules and creates new Source NAT rules for several protocols.&lt;/p&gt;

&lt;pre&gt;&lt;code class=&quot;language-bash&quot;&gt;# Enable forwarding and IPv6 NAT
sysctl -w net.ipv6.conf.all.forwarding=1
ip6tables -t nat --flush
# Packets coming into the veth pair in the host side, change their destination address to AFTR.
ip6tables -t nat -A PREROUTING  -i &quot;${IFHT}&quot; -j DNAT --to-destination &quot;${AFTR_IPV6}&quot;
# Outgoing packets change their source address to HE Client address (B4 address).
ip6tables -t nat -A POSTROUTING -o &quot;${IFHE}&quot; -j MASQUERADE
&lt;/code&gt;&lt;/pre&gt;

&lt;p&gt;Outgoing packets leaving our host need to get their source address masqueraded to the IPv6 address assigned to the interface of the Hurricane Electric tunnel point. Likewise anything that comes into the host should seem to arrive from the lwAFTR, when actually its origin address is the IPv6 address of the other end of the Hurricane Electric tunnel. To overcome this problem I applied a NAT66 on source address and destination. Could this be done in a different way skipping the controversial NAT66? I’m not sure. I think the veth pairs need to get assigned private addresses so the only way to get the packets routed through the Internet is with a NAT66.&lt;/p&gt;

&lt;pre&gt;&lt;code&gt;# Get into NS.
bash=/run/current-system/sw/bin/bash
ip netns exec ${NS} ${bash} --rcfile &amp;lt;(echo &quot;PS1=\&quot;${NS}&amp;gt; \&quot;&quot;)
&lt;/code&gt;&lt;/pre&gt;

&lt;p&gt;The last step gets us into the network namespace from which we will be able to run commands constrained into the environment created during the steps before.&lt;/p&gt;

&lt;p&gt;I don’t know how much useful or reusable this script can be, but in hindsight coming up with this complex setting helped me learning several Linux networking tools. I think I could have never figured all this out without the help and support from my colleagues as well as the guidance from Marcel’s original script and blog post.&lt;/p&gt;
</description>
        <pubDate>Thu, 15 Feb 2018 06:00:00 +0000</pubDate>
        <link>http://blogs.igalia.com/dpino/2018/02/15/the-b4-network-function/</link>
        <guid isPermaLink="true">http://blogs.igalia.com/dpino/2018/02/15/the-b4-network-function/</guid>
        
        <category>igalia</category>
        
        <category>networking</category>
        
        
      </item>
    
      <item>
        <title>More practical Snabb</title>
        <description>
&lt;p&gt;Some time ago, in a &lt;a href=&quot;https://news.ycombinator.com/item?id=7250505&quot;&gt;Hacker News thread&lt;/a&gt; an user proposed the following use case for Snabb:&lt;/p&gt;

&lt;blockquote&gt;
  &lt;p&gt;I have a ChromeCast on my home network, but I want sandbox/log its traffic. I would want to write some logic to ignore video data, because that’s big. 
But I want to see the metadata and which servers it’s talking to. I want to see when it’s auto-updating itself with new binaries and record them.&lt;/p&gt;

  &lt;p&gt;Is that a good use case for Snabb Switch, or is there is an easier way to accomplish what I want?&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;I decided to take this request and implement it as a tutorial. Hopefully, the resulting tutorial can be a valuable piece of information highlighting some of Snabb’s strengths:&lt;/p&gt;

&lt;ul&gt;
  &lt;li&gt;Fine-grained control of the data-plane.&lt;/li&gt;
  &lt;li&gt;Wide variety of solid libraries for protocol parsing.&lt;/li&gt;
  &lt;li&gt;Rapid prototyping.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;Limiting the project’s scope&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;Before putting my hands down on this project, I broke it down into smaller pieces and checked how much of it is already supported in Snabb. To fully implement this project I’d need:&lt;/p&gt;

&lt;ol&gt;
  &lt;li&gt;To be able to discover Chromecast devices.&lt;/li&gt;
  &lt;li&gt;Identify their network flows.&lt;/li&gt;
  &lt;li&gt;Save the data to disk.&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;Snabb provides libraries to identify network flows as well as capturing packets and filter them by content. That pretty much covers bullets 2) and 3). However, Snabb doesn’t provide any tool or library to fully support bullet 1). Thus, I’m going to limit the scope of this tutorial to that single feature: &lt;strong&gt;Discover Chromecast and similar devices in a local network&lt;/strong&gt;.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Multicast DNS&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;A fast lookup on Chromecast’s Wikipedia article reveals Chromecast devices rely on a protocol called &lt;strong&gt;Multicast DNS&lt;/strong&gt; (mDNS).&lt;/p&gt;

&lt;p&gt;Multicast DNS is standardized as &lt;a href=&quot;https://tools.ietf.org/html/rfc6762&quot;&gt;RFC6762&lt;/a&gt;. The origin of the protocol goes back to &lt;strong&gt;Apple’s Rendezvous&lt;/strong&gt;, later rebranded as &lt;a href=&quot;https://en.wikipedia.org/wiki/Bonjour_(software)&quot;&gt;Bonjour&lt;/a&gt;. Bonjour is in fact the origin of the more generic concept known as Zeroconf. Zeroconf’s goal is to automatically create usable TCP/IP computer networks when computers or network peripherals are interconnected. It is composed of three main elements:&lt;/p&gt;

&lt;ul&gt;
  &lt;li&gt;&lt;strong&gt;Addressing&lt;/strong&gt;: Self-Assigned Link-Local Addressing (&lt;a href=&quot;https://tools.ietf.org/html/rfc2462&quot;&gt;RFC2462&lt;/a&gt; and &lt;a href=&quot;https://tools.ietf.org/html/rfc3927&quot;&gt;RFC3927&lt;/a&gt;). Automatically assigned addresses in the 169.254.0.0/16 network space.&lt;/li&gt;
  &lt;li&gt;&lt;strong&gt;Naming&lt;/strong&gt;: Multicast DNS (&lt;a href=&quot;https://tools.ietf.org/html/rfc6762&quot;&gt;RFC6762&lt;/a&gt;). Host name resolution.&lt;/li&gt;
  &lt;li&gt;&lt;strong&gt;Browsing&lt;/strong&gt;: DNS Service Discovery (&lt;a href=&quot;https://tools.ietf.org/html/rfc6763&quot;&gt;RFC6763&lt;/a&gt;). The ability of discovering devices and services in a local network.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Multicast DNS and DNS-SD are very similar and are often mixed up, although they are not strictly the same thing. The former is the description of how to do name resolution in a serverless DNS network, while DNS-SD, although a protocol as well, is an specific use of Multicast DNS.&lt;/p&gt;

&lt;p&gt;One of the nicest things of Multicast DNS is that it reuses many of the concepts of DNS. This allowed mDNS to spread quickly and gain fast adoption, since existing software only required mininimal change. What’s more, programmers didn’t need to learn new APIs or study a completely brand-new protocol.&lt;/p&gt;

&lt;p&gt;Today Multicast DNS is featured in a myriad of small devices, ranging from Google Chromecast to Amazon’s FireTV or Philips Hue lights, as well as software such as Apple’s Bonjour or Spotify.&lt;/p&gt;

&lt;p&gt;This tutorial is going to focus pretty much on mDNS/DNS-SD. Since Multicast DNS reuses many of the ideas of DNS, I am going to review DNS first. Feel free to skip the next section if you are already familiar with DNS.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;DNS basis&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;The most common use case of DNS is resolving host names to IP addresses:&lt;/p&gt;

&lt;figure class=&quot;highlight&quot;&gt;&lt;pre&gt;&lt;code class=&quot;language-bash&quot; data-lang=&quot;bash&quot;&gt;&lt;span&gt;&lt;/span&gt;$ dig igalia.com -t A +short
&lt;span class=&quot;m&quot;&gt;91&lt;/span&gt;.117.99.155&lt;/code&gt;&lt;/pre&gt;&lt;/figure&gt;

&lt;p&gt;In the command above, flag ‘-t A’ means an &lt;em&gt;Address record&lt;/em&gt;. There are actually many different types of DNS records. The most common ones are:&lt;/p&gt;

&lt;ul&gt;
  &lt;li&gt;&lt;strong&gt;A&lt;/strong&gt; (&lt;em&gt;Address record&lt;/em&gt;). Used to map hostnames to IPv4 address.&lt;/li&gt;
  &lt;li&gt;&lt;strong&gt;AAAA&lt;/strong&gt; (&lt;em&gt;IPv6 address record&lt;/em&gt;). Used to map hostnames to IPv6 address.&lt;/li&gt;
  &lt;li&gt;&lt;strong&gt;PTR&lt;/strong&gt; (&lt;em&gt;Pointer record&lt;/em&gt;). Used for reverse DNS lookups, that means, IP addresses to hostnames.&lt;/li&gt;
  &lt;li&gt;&lt;strong&gt;SOA&lt;/strong&gt; (&lt;em&gt;Start of zone of authority&lt;/em&gt;). DNS can be seen as a distributed database which is organized in a hierarchical layout of subdomains. A DNS zone is a contiguous portion of the domain space for which a server is responsible of. The top-level DNS zone is known as the &lt;strong&gt;DNS root zone&lt;/strong&gt;, which consists of 13 logical &lt;a href=&quot;https://en.wikipedia.org/wiki/Root_name_server&quot;&gt;root name servers&lt;/a&gt; (although there are more than 13 instances) that contain the &lt;strong&gt;top-level domains&lt;/strong&gt;, &lt;strong&gt;generic top-level domains&lt;/strong&gt; (.com, .net, etc) and &lt;strong&gt;country code top-level domains&lt;/strong&gt;. The command below prints out how the domain www.google.com gets resolved (I trimmed down the output for the sake of clarity).&lt;/li&gt;
&lt;/ul&gt;

&lt;figure class=&quot;highlight&quot;&gt;&lt;pre&gt;&lt;code class=&quot;language-bash&quot; data-lang=&quot;bash&quot;&gt;&lt;span&gt;&lt;/span&gt;$ dig @8.8.8.8 www.google.com +trace

&lt;span class=&quot;p&quot;&gt;;&lt;/span&gt; &amp;lt;&amp;lt;&amp;gt;&amp;gt; DiG &lt;span class=&quot;m&quot;&gt;9&lt;/span&gt;.10.3-P4-Ubuntu &amp;lt;&amp;lt;&amp;gt;&amp;gt; @8.8.8.8 www.google.com +trace
&lt;span class=&quot;p&quot;&gt;;&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;m&quot;&gt;1&lt;/span&gt; server found&lt;span class=&quot;o&quot;&gt;)&lt;/span&gt;
&lt;span class=&quot;p&quot;&gt;;;&lt;/span&gt; global options: +cmd
.                       &lt;span class=&quot;m&quot;&gt;181853&lt;/span&gt;  IN      NS      k.root-servers.net.
.                       &lt;span class=&quot;m&quot;&gt;181853&lt;/span&gt;  IN      NS      g.root-servers.net.
.                       &lt;span class=&quot;m&quot;&gt;181853&lt;/span&gt;  IN      NS      j.root-servers.net.
.                       &lt;span class=&quot;m&quot;&gt;181853&lt;/span&gt;  IN      RRSIG   NS &lt;span class=&quot;m&quot;&gt;8&lt;/span&gt; &lt;span class=&quot;m&quot;&gt;0&lt;/span&gt; &lt;span class=&quot;m&quot;&gt;518400&lt;/span&gt; &lt;span class=&quot;m&quot;&gt;20180117170000&lt;/span&gt; &lt;span class=&quot;m&quot;&gt;20180104160000&lt;/span&gt; &lt;span class=&quot;m&quot;&gt;41824&lt;/span&gt; ....
&lt;span class=&quot;p&quot;&gt;;;&lt;/span&gt; Received &lt;span class=&quot;m&quot;&gt;525&lt;/span&gt; bytes from &lt;span class=&quot;m&quot;&gt;8&lt;/span&gt;.8.8.8#53&lt;span class=&quot;o&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;m&quot;&gt;8&lt;/span&gt;.8.8.8&lt;span class=&quot;o&quot;&gt;)&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;in&lt;/span&gt; &lt;span class=&quot;m&quot;&gt;48&lt;/span&gt; ms

com.                    &lt;span class=&quot;m&quot;&gt;172800&lt;/span&gt;  IN      NS      j.gtld-servers.net.
com.                    &lt;span class=&quot;m&quot;&gt;172800&lt;/span&gt;  IN      NS      k.gtld-servers.net.
com.                    &lt;span class=&quot;m&quot;&gt;172800&lt;/span&gt;  IN      NS      l.gtld-servers.net.
com.                    &lt;span class=&quot;m&quot;&gt;86400&lt;/span&gt;   IN      RRSIG   DS &lt;span class=&quot;m&quot;&gt;8&lt;/span&gt; &lt;span class=&quot;m&quot;&gt;1&lt;/span&gt; &lt;span class=&quot;m&quot;&gt;86400&lt;/span&gt; &lt;span class=&quot;m&quot;&gt;20180118170000&lt;/span&gt; &lt;span class=&quot;m&quot;&gt;20180105160000&lt;/span&gt; &lt;span class=&quot;m&quot;&gt;41824&lt;/span&gt; ...
&lt;span class=&quot;p&quot;&gt;;;&lt;/span&gt; Received &lt;span class=&quot;m&quot;&gt;1174&lt;/span&gt; bytes from &lt;span class=&quot;m&quot;&gt;199&lt;/span&gt;.7.83.42#53&lt;span class=&quot;o&quot;&gt;(&lt;/span&gt;l.root-servers.net&lt;span class=&quot;o&quot;&gt;)&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;in&lt;/span&gt; &lt;span class=&quot;m&quot;&gt;44&lt;/span&gt; ms

google.com.             &lt;span class=&quot;m&quot;&gt;172800&lt;/span&gt;  IN      NS      ns2.google.com.
google.com.             &lt;span class=&quot;m&quot;&gt;172800&lt;/span&gt;  IN      NS      ns1.google.com.
google.com.             &lt;span class=&quot;m&quot;&gt;172800&lt;/span&gt;  IN      NS      ns3.google.com.
google.com.             &lt;span class=&quot;m&quot;&gt;172800&lt;/span&gt;  IN      NS      ns4.google.com.

&lt;span class=&quot;p&quot;&gt;;;&lt;/span&gt; Received &lt;span class=&quot;m&quot;&gt;664&lt;/span&gt; bytes from &lt;span class=&quot;m&quot;&gt;192&lt;/span&gt;.26.92.30#53&lt;span class=&quot;o&quot;&gt;(&lt;/span&gt;c.gtld-servers.net&lt;span class=&quot;o&quot;&gt;)&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;in&lt;/span&gt; &lt;span class=&quot;m&quot;&gt;44&lt;/span&gt; ms

www.google.com.         &lt;span class=&quot;m&quot;&gt;300&lt;/span&gt;     IN      A       &lt;span class=&quot;m&quot;&gt;216&lt;/span&gt;.58.201.132
&lt;span class=&quot;p&quot;&gt;;;&lt;/span&gt; Received &lt;span class=&quot;m&quot;&gt;48&lt;/span&gt; bytes from &lt;span class=&quot;m&quot;&gt;216&lt;/span&gt;.239.32.10#53&lt;span class=&quot;o&quot;&gt;(&lt;/span&gt;ns1.google.com&lt;span class=&quot;o&quot;&gt;)&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;in&lt;/span&gt; &lt;span class=&quot;m&quot;&gt;48&lt;/span&gt; ms&lt;/code&gt;&lt;/pre&gt;&lt;/figure&gt;

&lt;p&gt;The domain name is split in parts. First the top-level domain is consulted which returns a list of name servers. The root server &lt;em&gt;l.root-servers.net&lt;/em&gt; gets consulted to resolve the subdomain &lt;em&gt;.com&lt;/em&gt;. That also returns a list of generic top-level domain name servers. Name server &lt;em&gt;c.gtld-servers.net&lt;/em&gt; is picked and returns another list of name servers for &lt;em&gt;google.com&lt;/em&gt;. Finally &lt;em&gt;www.google.com&lt;/em&gt; gets resolved by &lt;em&gt;ns1.google.com&lt;/em&gt;, that returns the A record containing the domain name IPv4 address.&lt;/p&gt;

&lt;p&gt;Using DNS is also possible to resolve an IP address to a domain name.&lt;/p&gt;

&lt;figure class=&quot;highlight&quot;&gt;&lt;pre&gt;&lt;code class=&quot;language-bash&quot; data-lang=&quot;bash&quot;&gt;&lt;span&gt;&lt;/span&gt;$ dig -x &lt;span class=&quot;m&quot;&gt;8&lt;/span&gt;.8.4.4 +short
google-public-dns-b.google.com.&lt;/code&gt;&lt;/pre&gt;&lt;/figure&gt;

&lt;p&gt;In this case, the type record is &lt;strong&gt;PTR&lt;/strong&gt;. The command above is equivalent to:&lt;/p&gt;

&lt;figure class=&quot;highlight&quot;&gt;&lt;pre&gt;&lt;code class=&quot;language-bash&quot; data-lang=&quot;bash&quot;&gt;&lt;span&gt;&lt;/span&gt;$ dig &lt;span class=&quot;m&quot;&gt;4&lt;/span&gt;.4.8.8.in-addr.arpa -t PTR +short
google-public-dns-b.google.com.&lt;/code&gt;&lt;/pre&gt;&lt;/figure&gt;

&lt;p&gt;When using PTR records for reverse lookups, the target IPv4 addres has to be part of the domain &lt;em&gt;in-addr.arpa&lt;/em&gt;. This is an special domain registered under the top-level domain &lt;em&gt;arpa&lt;/em&gt; and it’s used for reverse IPv4 lookup. Reverse lookup is the most common use of PTR records, but in fact PTR records are just pointers to a canonical name and other uses are possible as we will see later.&lt;/p&gt;

&lt;p&gt;Summarizing:&lt;/p&gt;

&lt;ul&gt;
  &lt;li&gt;DNS helps solving a host name to an IP address. Other types of record resolution are possible.&lt;/li&gt;
  &lt;li&gt;DNS is a centralized protocol where DNS servers respond to DNS queries.&lt;/li&gt;
  &lt;li&gt;DNS names are grouped in zones or domains, forming a hierarchical structure. Each SOA is responsible of the name resolution within its area.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;DNS Service Discovery&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;Unlike DNS, Multicast DNS doesn’t require a central server. Instead devices listen on port &lt;strong&gt;5353&lt;/strong&gt; for DNS queries to a multicast address. In the case of IPv4, this destination address is &lt;strong&gt;224.0.0.251&lt;/strong&gt;. In addition, the destination Ethernet address of a mDNS request must be &lt;strong&gt;01:00:5E:00:00:FB&lt;/strong&gt;.&lt;/p&gt;

&lt;p&gt;The Multicast DNS standard defines the domain name &lt;em&gt;local&lt;/em&gt; as a &lt;strong&gt;pseudo-TLD&lt;/strong&gt; (top-level domain) under which hosts and services can register. For instance, a laptop computer might answer to the name &lt;em&gt;mylaptop.local&lt;/em&gt; (replace &lt;em&gt;mylaptop&lt;/em&gt; for your actual laptop’s name).&lt;/p&gt;

&lt;figure class=&quot;highlight&quot;&gt;&lt;pre&gt;&lt;code class=&quot;language-bash&quot; data-lang=&quot;bash&quot;&gt;&lt;span&gt;&lt;/span&gt;$ dig @224.0.0.251 -p &lt;span class=&quot;m&quot;&gt;5353&lt;/span&gt; mylaptop.local. +short
&lt;span class=&quot;m&quot;&gt;192&lt;/span&gt;.168.0.13&lt;/code&gt;&lt;/pre&gt;&lt;/figure&gt;

&lt;p&gt;To discover all the services and devices in a local network, DNS-SD sends a PTR Multicast DNS request asking for the domain name `&lt;em&gt;services._dns-sd._udp.local&lt;/em&gt;.&lt;/p&gt;

&lt;figure class=&quot;highlight&quot;&gt;&lt;pre&gt;&lt;code class=&quot;language-bash&quot; data-lang=&quot;bash&quot;&gt;&lt;span&gt;&lt;/span&gt;$ dig @224.0.0.251 -p &lt;span class=&quot;m&quot;&gt;5353&lt;/span&gt; -t PTR _services._dns-sd._udp.local&lt;/code&gt;&lt;/pre&gt;&lt;/figure&gt;

&lt;p&gt;The expected result should be a set of PTR records announcing their domain name. In my case the dig command doesn’t print out any PTR records, but using &lt;em&gt;tcpdump&lt;/em&gt; I can check I’m in fact receiving mDNS responses:&lt;/p&gt;

&lt;figure class=&quot;highlight&quot;&gt;&lt;pre&gt;&lt;code class=&quot;language-bash&quot; data-lang=&quot;bash&quot;&gt;&lt;span&gt;&lt;/span&gt;$ sudo tcpdump &lt;span class=&quot;s2&quot;&gt;&amp;quot;port 5353&amp;quot;&lt;/span&gt; -t -qns &lt;span class=&quot;m&quot;&gt;0&lt;/span&gt; -e -i wlp3s0
tcpdump: verbose output suppressed, use -v or -vv &lt;span class=&quot;k&quot;&gt;for&lt;/span&gt; full protocol decode
listening on wlp3s0, link-type EN10MB &lt;span class=&quot;o&quot;&gt;(&lt;/span&gt;Ethernet&lt;span class=&quot;o&quot;&gt;)&lt;/span&gt;, capture size &lt;span class=&quot;m&quot;&gt;262144&lt;/span&gt; bytes
&lt;span class=&quot;m&quot;&gt;44&lt;/span&gt;:85:00:4f:b8:fc &amp;gt; &lt;span class=&quot;m&quot;&gt;01&lt;/span&gt;:00:5e:00:00:fb, IPv4, length &lt;span class=&quot;m&quot;&gt;99&lt;/span&gt;: &lt;span class=&quot;m&quot;&gt;192&lt;/span&gt;.168.86.30.58722 &amp;gt; &lt;span class=&quot;m&quot;&gt;224&lt;/span&gt;.0.0.251.5353: UDP, length &lt;span class=&quot;m&quot;&gt;57&lt;/span&gt;
&lt;span class=&quot;m&quot;&gt;54&lt;/span&gt;:60:09:fc:d6:04 &amp;gt; &lt;span class=&quot;m&quot;&gt;01&lt;/span&gt;:00:5e:00:00:fb, IPv4, length &lt;span class=&quot;m&quot;&gt;82&lt;/span&gt;: &lt;span class=&quot;m&quot;&gt;192&lt;/span&gt;.168.86.57.5353 &amp;gt; &lt;span class=&quot;m&quot;&gt;224&lt;/span&gt;.0.0.251.5353: UDP, length &lt;span class=&quot;m&quot;&gt;40&lt;/span&gt;
&lt;span class=&quot;m&quot;&gt;54&lt;/span&gt;:60:09:fc:d6:04 &amp;gt; &lt;span class=&quot;m&quot;&gt;01&lt;/span&gt;:00:5e:00:00:fb, IPv4, length &lt;span class=&quot;m&quot;&gt;299&lt;/span&gt;: &lt;span class=&quot;m&quot;&gt;192&lt;/span&gt;.168.86.57.5353 &amp;gt; &lt;span class=&quot;m&quot;&gt;224&lt;/span&gt;.0.0.251.5353: UDP, length &lt;span class=&quot;m&quot;&gt;257&lt;/span&gt;
&lt;span class=&quot;m&quot;&gt;54&lt;/span&gt;:60:09:fc:d6:04 &amp;gt; &lt;span class=&quot;m&quot;&gt;01&lt;/span&gt;:00:5e:00:00:fb, IPv4, length &lt;span class=&quot;m&quot;&gt;119&lt;/span&gt;: &lt;span class=&quot;m&quot;&gt;192&lt;/span&gt;.168.86.57.5353 &amp;gt; &lt;span class=&quot;m&quot;&gt;224&lt;/span&gt;.0.0.251.5353: UDP, length &lt;span class=&quot;m&quot;&gt;77&lt;/span&gt;
f4:f5:d8:d3:de:dc &amp;gt; &lt;span class=&quot;m&quot;&gt;01&lt;/span&gt;:00:5e:00:00:fb, IPv4, length &lt;span class=&quot;m&quot;&gt;299&lt;/span&gt;: &lt;span class=&quot;m&quot;&gt;192&lt;/span&gt;.168.86.61.5353 &amp;gt; &lt;span class=&quot;m&quot;&gt;224&lt;/span&gt;.0.0.251.5353: UDP, length &lt;span class=&quot;m&quot;&gt;257&lt;/span&gt;
f4:f5:d8:d3:de:dc &amp;gt; &lt;span class=&quot;m&quot;&gt;01&lt;/span&gt;:00:5e:00:00:fb, IPv4, length &lt;span class=&quot;m&quot;&gt;186&lt;/span&gt;: &lt;span class=&quot;m&quot;&gt;192&lt;/span&gt;.168.86.61.5353 &amp;gt; &lt;span class=&quot;m&quot;&gt;224&lt;/span&gt;.0.0.251.5353: UDP, length &lt;span class=&quot;m&quot;&gt;144&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/figure&gt;

&lt;p&gt;Why &lt;em&gt;dig&lt;/em&gt; doesn’t print out the PTR records is still a mystery to me. So instead of &lt;em&gt;dig&lt;/em&gt; I used &lt;strong&gt;Avahi&lt;/strong&gt;, the free software implementation of mDNS/DNS-SD, to browse the available devices:&lt;/p&gt;

&lt;figure class=&quot;highlight&quot;&gt;&lt;pre&gt;&lt;code class=&quot;language-bash&quot; data-lang=&quot;bash&quot;&gt;&lt;span&gt;&lt;/span&gt;$ avahi-browse -a
+ wlp3s0 IPv4 dcad2b6c-7a21-10c310-568b-ad83b4a3ea1e          _googlezone._tcp     &lt;span class=&quot;nb&quot;&gt;local&lt;/span&gt;
+ wlp3s0 IPv4 1ebe35f6-26f1-bc92-318c-9e35fdcbe11d          _googlezone._tcp     &lt;span class=&quot;nb&quot;&gt;local&lt;/span&gt;
+ wlp3s0 IPv4 Google-Cast-Group-71010755f10ad16b10c231437a5e543d1dc3 _googlecast._tcp     &lt;span class=&quot;nb&quot;&gt;local&lt;/span&gt;
+ wlp3s0 IPv4 Chromecast-Audio-fd7d2b9d29c92b24db10be10661010eebb9f _googlecast._tcp     &lt;span class=&quot;nb&quot;&gt;local&lt;/span&gt;
+ wlp3s0 IPv4 Google-Home-d81d02e1e48a1f0b7d2cbac88f2df820  _googlecast._tcp     &lt;span class=&quot;nb&quot;&gt;local&lt;/span&gt;
+ wlp3s0 IPv4 dcad2b6c7a2110c310-0                            _spotify-connect._tcp &lt;span class=&quot;nb&quot;&gt;local&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/figure&gt;

&lt;p&gt;Each row identifies a service instance name. The structure of a service instance name is the following:&lt;/p&gt;

&lt;pre&gt;&lt;code&gt;Service Instance Name = &amp;lt;Instance&amp;gt; . &amp;lt;Service&amp;gt; . &amp;lt;Domain&amp;gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;p&gt;For example, consider the following record &lt;em&gt;“_spotify-connect._tcp.local”&lt;/em&gt;:&lt;/p&gt;

&lt;ul&gt;
  &lt;li&gt;&lt;strong&gt;Domain&lt;/strong&gt;: &lt;em&gt;local&lt;/em&gt;. The pseudo-TLD used by Multicast DNS.&lt;/li&gt;
  &lt;li&gt;&lt;strong&gt;Service&lt;/strong&gt;: &lt;em&gt;spotify-connect._tcp. The service names consists of a pair of DNS labels. The first label identifies what the service does (_spotify-connect&lt;/em&gt; is a service that allows an user to continue playing Spotify from a phone to a desktop computer, and viceversa). The second label identifies what protocol the service uses, in this case TCP.&lt;/li&gt;
  &lt;li&gt;&lt;strong&gt;Instance&lt;/strong&gt;: &lt;em&gt;dcad2b6c7a2110c310-0&lt;/em&gt;. An user friendly name that identifies the service.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Besides a PTR record, an instance also replies with several additional DNS records that might be useful for the requester. These extra records are part of the PTR record and are embed in the DNS &lt;em&gt;additional records&lt;/em&gt; field. These extra records are of 3 types:&lt;/p&gt;

&lt;ul&gt;
  &lt;li&gt;&lt;strong&gt;SRV&lt;/strong&gt;: Gives the target host and port where the service instance can be reached.&lt;/li&gt;
  &lt;li&gt;&lt;strong&gt;TXT&lt;/strong&gt;: Gives additional information about this instance, in a structured form using key/value pairs.&lt;/li&gt;
  &lt;li&gt;&lt;strong&gt;A&lt;/strong&gt;: IPv4 address of the reached instance.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;Snabb’s DNS-SD&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;Now that we have a fair understanding of Multicast DNS and DNS-SD, we can start coding the app in Snabb. Like on the previous posts I decided not to past the code directly here, instead I’ve pushed the code to a &lt;a href=&quot;https://github.com/dpino/snabb/tree/dns-sd&quot;&gt;remote branch&lt;/a&gt; and will comment on the most relevant parts. To checkout this repo do:&lt;/p&gt;

&lt;pre&gt;&lt;code class=&quot;language-bash&quot;&gt;$ git clone https://github.com/snabbco/snabb.git
$ cd snabb
$ git remote add dpino https://github.com/dpino/snabb.git
$ git checkout dns-sd
&lt;/code&gt;&lt;/pre&gt;

&lt;p&gt;Highlights:&lt;/p&gt;

&lt;ul&gt;
  &lt;li&gt;The app needs to send a DNS-SD packet through a network interface managed by the OS. I used Snabb’s &lt;a href=&quot;https://github.com/dpino/snabb/blob/dns-sd/src/program/dnssd/dnssd.lua#L159&quot;&gt;RawSocket app&lt;/a&gt; to do that.&lt;/li&gt;
  &lt;li&gt;A &lt;a href=&quot;https://github.com/dpino/snabb/blob/dns-sd/src/program/dnssd/dnssd.lua#L57&quot;&gt;DNSSD app&lt;/a&gt; emits one DNS-SD request every second. This is done in &lt;a href=&quot;https://github.com/dpino/snabb/blob/dns-sd/src/program/dnssd/dnssd.lua#L75&quot;&gt;DNSSD’s pull&lt;/a&gt; method. There’s a helper class called &lt;a href=&quot;https://github.com/dpino/snabb/blob/dns-sd/src/program/dnssd/lib/mdns_query.lua&quot;&gt;mDNSQuery&lt;/a&gt; that is in charge of composing this request.&lt;/li&gt;
  &lt;li&gt;The DNSSD app receives responses on its &lt;a href=&quot;https://github.com/dpino/snabb/blob/dns-sd/src/program/dnssd/dnssd.lua#L87&quot;&gt;push method&lt;/a&gt;. If the response is a Multicast DNS packet, it will print out all the contained DNS records in stdout.&lt;/li&gt;
  &lt;li&gt;A Multicast DNS packet is composed by a header and a body. The header contains control information such as number of queries, answers, additional records, etc. The body contains DNS records. If the mDNS packet is a response packet, these are the DNS records we would need to print out.&lt;/li&gt;
  &lt;li&gt;To help me handling Multicast DNS packets I created a &lt;a href=&quot;https://github.com/dpino/snabb/blob/dns-sd/src/program/dnssd/lib/mdns.lua&quot;&gt;MDNS helper class&lt;/a&gt;. Similarly, I added a &lt;a href=&quot;https://github.com/dpino/snabb/blob/dns-sd/src/program/dnssd/lib/dns.lua&quot;&gt;DNS helper class&lt;/a&gt; that helps me parsing the necessary DNS records: PTR, SRV, TXT and A records.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Here is Snabb’s dns-sd command in use:&lt;/p&gt;

&lt;figure class=&quot;highlight&quot;&gt;&lt;pre&gt;&lt;code class=&quot;language-bash&quot; data-lang=&quot;bash&quot;&gt;&lt;span&gt;&lt;/span&gt;$ sudo ./snabb dnssd --interface wlp3s0
PTR: &lt;span class=&quot;o&quot;&gt;(&lt;/span&gt;name: _services._dns-sd._udp.local&lt;span class=&quot;p&quot;&gt;;&lt;/span&gt; domain-name: _spotify-connect._tcp &lt;span class=&quot;o&quot;&gt;)&lt;/span&gt;
SRV: &lt;span class=&quot;o&quot;&gt;(&lt;/span&gt;target: dcad2b6c7a2110c310-0&lt;span class=&quot;o&quot;&gt;)&lt;/span&gt;
TXT: &lt;span class=&quot;o&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;nv&quot;&gt;CPath&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;=&lt;/span&gt;/zc/0&lt;span class=&quot;p&quot;&gt;;&lt;/span&gt;&lt;span class=&quot;nv&quot;&gt;VERSION&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;=&lt;/span&gt;&lt;span class=&quot;m&quot;&gt;1&lt;/span&gt;.0&lt;span class=&quot;p&quot;&gt;;&lt;/span&gt;&lt;span class=&quot;nv&quot;&gt;Stack&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;=&lt;/span&gt;SP&lt;span class=&quot;p&quot;&gt;;&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;)&lt;/span&gt;
Address: &lt;span class=&quot;m&quot;&gt;192&lt;/span&gt;.168.86.55
PTR: &lt;span class=&quot;o&quot;&gt;(&lt;/span&gt;name: _googlecast._tcp.local&lt;span class=&quot;p&quot;&gt;;&lt;/span&gt; domain-name: Chromecast-Audio-fd7d2b9d29c92b24db10be10661010eebb9f&lt;span class=&quot;o&quot;&gt;)&lt;/span&gt;
SRV: &lt;span class=&quot;o&quot;&gt;(&lt;/span&gt;target: 1ebe35f6-26f1-bc92-318c-9e35fdcbe11d&lt;span class=&quot;o&quot;&gt;)&lt;/span&gt;
TXT: &lt;span class=&quot;o&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;nv&quot;&gt;id&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;=&lt;/span&gt;fd7d2b9d29c92b24db10be10661010eebb9f&lt;span class=&quot;p&quot;&gt;;&lt;/span&gt;&lt;span class=&quot;nv&quot;&gt;cd&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;=&lt;/span&gt;224708C2E61AED24676383796588FF7E&lt;span class=&quot;p&quot;&gt;;&lt;/span&gt;
&lt;span class=&quot;nv&quot;&gt;rm&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;=&lt;/span&gt;8F2EE2757C6626CC&lt;span class=&quot;p&quot;&gt;;&lt;/span&gt;&lt;span class=&quot;nv&quot;&gt;ve&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;=&lt;/span&gt;&lt;span class=&quot;m&quot;&gt;05&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;;&lt;/span&gt;&lt;span class=&quot;nv&quot;&gt;md&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;=&lt;/span&gt;Chromecast Audio&lt;span class=&quot;p&quot;&gt;;&lt;/span&gt;&lt;span class=&quot;nv&quot;&gt;ic&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;=&lt;/span&gt;/setup/icon.png&lt;span class=&quot;p&quot;&gt;;&lt;/span&gt;&lt;span class=&quot;nv&quot;&gt;fn&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;=&lt;/span&gt;Jukebox&lt;span class=&quot;p&quot;&gt;;&lt;/span&gt;
&lt;span class=&quot;nv&quot;&gt;ca&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;=&lt;/span&gt;&lt;span class=&quot;m&quot;&gt;2052&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;;&lt;/span&gt;&lt;span class=&quot;nv&quot;&gt;st&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;=&lt;/span&gt;&lt;span class=&quot;m&quot;&gt;0&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;;&lt;/span&gt;&lt;span class=&quot;nv&quot;&gt;bs&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;=&lt;/span&gt;FA8FCA9E3FC2&lt;span class=&quot;p&quot;&gt;;&lt;/span&gt;&lt;span class=&quot;nv&quot;&gt;nf&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;=&lt;/span&gt;&lt;span class=&quot;m&quot;&gt;1&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;;&lt;/span&gt;&lt;span class=&quot;nv&quot;&gt;rs&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;=&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;;&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;)&lt;/span&gt;
Address: &lt;span class=&quot;m&quot;&gt;192&lt;/span&gt;.168.86.57&lt;/code&gt;&lt;/pre&gt;&lt;/figure&gt;

&lt;p&gt;Finally I’d like to share some trick or practices I used when coding the app:&lt;/p&gt;

&lt;p&gt;1) I started small by capturing a DNS-SD’s request emited from Avahi. Then I sent that very same packet from Snabb and checked the response was a Multicast DNS packet:&lt;/p&gt;

&lt;figure class=&quot;highlight&quot;&gt;&lt;pre&gt;&lt;code class=&quot;language-bash&quot; data-lang=&quot;bash&quot;&gt;&lt;span&gt;&lt;/span&gt;$ avahi-browse -a&lt;/code&gt;&lt;/pre&gt;&lt;/figure&gt;

&lt;figure class=&quot;highlight&quot;&gt;&lt;pre&gt;&lt;code class=&quot;language-bash&quot; data-lang=&quot;bash&quot;&gt;&lt;span&gt;&lt;/span&gt;$ sudo tcpdump -i wlp3s0 -w mdns.pcap&lt;/code&gt;&lt;/pre&gt;&lt;/figure&gt;

&lt;p&gt;Then open &lt;strong&gt;mdns.pcap&lt;/strong&gt; with Wireshark, mark the request packet only and save it to disk. Then use &lt;strong&gt;od&lt;/strong&gt; command to dump the packet as text:&lt;/p&gt;

&lt;figure class=&quot;highlight&quot;&gt;&lt;pre&gt;&lt;code class=&quot;language-bash&quot; data-lang=&quot;bash&quot;&gt;&lt;span&gt;&lt;/span&gt;$ od -j &lt;span class=&quot;m&quot;&gt;40&lt;/span&gt; -A x -tx1 mdns_request.pcap
&lt;span class=&quot;m&quot;&gt;000028&lt;/span&gt; &lt;span class=&quot;m&quot;&gt;01&lt;/span&gt; &lt;span class=&quot;m&quot;&gt;00&lt;/span&gt; 5e &lt;span class=&quot;m&quot;&gt;00&lt;/span&gt; &lt;span class=&quot;m&quot;&gt;00&lt;/span&gt; fb &lt;span class=&quot;m&quot;&gt;44&lt;/span&gt; &lt;span class=&quot;m&quot;&gt;85&lt;/span&gt; &lt;span class=&quot;m&quot;&gt;00&lt;/span&gt; 4f b8 &lt;span class=&quot;nb&quot;&gt;fc&lt;/span&gt; &lt;span class=&quot;m&quot;&gt;08&lt;/span&gt; &lt;span class=&quot;m&quot;&gt;00&lt;/span&gt; &lt;span class=&quot;m&quot;&gt;45&lt;/span&gt; &lt;span class=&quot;m&quot;&gt;00&lt;/span&gt;
&lt;span class=&quot;m&quot;&gt;000038&lt;/span&gt; &lt;span class=&quot;m&quot;&gt;00&lt;/span&gt; &lt;span class=&quot;m&quot;&gt;55&lt;/span&gt; &lt;span class=&quot;m&quot;&gt;32&lt;/span&gt; 7c &lt;span class=&quot;m&quot;&gt;00&lt;/span&gt; &lt;span class=&quot;m&quot;&gt;00&lt;/span&gt; &lt;span class=&quot;m&quot;&gt;01&lt;/span&gt; &lt;span class=&quot;m&quot;&gt;11&lt;/span&gt; 8f 5a c0 a8 &lt;span class=&quot;m&quot;&gt;56&lt;/span&gt; 1e e0 &lt;span class=&quot;m&quot;&gt;00&lt;/span&gt;
&lt;span class=&quot;m&quot;&gt;000048&lt;/span&gt; &lt;span class=&quot;m&quot;&gt;00&lt;/span&gt; fb e3 &lt;span class=&quot;m&quot;&gt;53&lt;/span&gt; &lt;span class=&quot;m&quot;&gt;14&lt;/span&gt; e9 &lt;span class=&quot;m&quot;&gt;00&lt;/span&gt; &lt;span class=&quot;m&quot;&gt;41&lt;/span&gt; &lt;span class=&quot;m&quot;&gt;89&lt;/span&gt; 9d &lt;span class=&quot;m&quot;&gt;25&lt;/span&gt; &lt;span class=&quot;m&quot;&gt;85&lt;/span&gt; &lt;span class=&quot;m&quot;&gt;01&lt;/span&gt; &lt;span class=&quot;m&quot;&gt;20&lt;/span&gt; &lt;span class=&quot;m&quot;&gt;00&lt;/span&gt; &lt;span class=&quot;m&quot;&gt;01&lt;/span&gt;
&lt;span class=&quot;m&quot;&gt;000058&lt;/span&gt; &lt;span class=&quot;m&quot;&gt;00&lt;/span&gt; &lt;span class=&quot;m&quot;&gt;00&lt;/span&gt; &lt;span class=&quot;m&quot;&gt;00&lt;/span&gt; &lt;span class=&quot;m&quot;&gt;00&lt;/span&gt; &lt;span class=&quot;m&quot;&gt;00&lt;/span&gt; &lt;span class=&quot;m&quot;&gt;01&lt;/span&gt; &lt;span class=&quot;m&quot;&gt;09&lt;/span&gt; 5f &lt;span class=&quot;m&quot;&gt;73&lt;/span&gt; &lt;span class=&quot;m&quot;&gt;65&lt;/span&gt; &lt;span class=&quot;m&quot;&gt;72&lt;/span&gt; &lt;span class=&quot;m&quot;&gt;76&lt;/span&gt; &lt;span class=&quot;m&quot;&gt;69&lt;/span&gt; &lt;span class=&quot;m&quot;&gt;63&lt;/span&gt; &lt;span class=&quot;m&quot;&gt;65&lt;/span&gt; &lt;span class=&quot;m&quot;&gt;73&lt;/span&gt;
&lt;span class=&quot;m&quot;&gt;000068&lt;/span&gt; &lt;span class=&quot;m&quot;&gt;07&lt;/span&gt; 5f &lt;span class=&quot;m&quot;&gt;64&lt;/span&gt; 6e &lt;span class=&quot;m&quot;&gt;73&lt;/span&gt; 2d &lt;span class=&quot;m&quot;&gt;73&lt;/span&gt; &lt;span class=&quot;m&quot;&gt;64&lt;/span&gt; &lt;span class=&quot;m&quot;&gt;04&lt;/span&gt; 5f &lt;span class=&quot;m&quot;&gt;75&lt;/span&gt; &lt;span class=&quot;m&quot;&gt;64&lt;/span&gt; &lt;span class=&quot;m&quot;&gt;70&lt;/span&gt; &lt;span class=&quot;m&quot;&gt;05&lt;/span&gt; 6c 6f
&lt;span class=&quot;m&quot;&gt;000078&lt;/span&gt; &lt;span class=&quot;m&quot;&gt;63&lt;/span&gt; &lt;span class=&quot;m&quot;&gt;61&lt;/span&gt; 6c &lt;span class=&quot;m&quot;&gt;00&lt;/span&gt; &lt;span class=&quot;m&quot;&gt;00&lt;/span&gt; 0c &lt;span class=&quot;m&quot;&gt;00&lt;/span&gt; &lt;span class=&quot;m&quot;&gt;01&lt;/span&gt; &lt;span class=&quot;m&quot;&gt;00&lt;/span&gt; &lt;span class=&quot;m&quot;&gt;00&lt;/span&gt; &lt;span class=&quot;m&quot;&gt;29&lt;/span&gt; &lt;span class=&quot;m&quot;&gt;10&lt;/span&gt; &lt;span class=&quot;m&quot;&gt;00&lt;/span&gt; &lt;span class=&quot;m&quot;&gt;00&lt;/span&gt; &lt;span class=&quot;m&quot;&gt;00&lt;/span&gt; &lt;span class=&quot;m&quot;&gt;00&lt;/span&gt;
&lt;span class=&quot;m&quot;&gt;000088&lt;/span&gt; &lt;span class=&quot;m&quot;&gt;00&lt;/span&gt; &lt;span class=&quot;m&quot;&gt;00&lt;/span&gt; &lt;span class=&quot;m&quot;&gt;00&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/figure&gt;

&lt;p&gt;This dumped packet can be copied raw into Snabb such in &lt;a href=&quot;https://github.com/dpino/snabb/blob/dns-sd/src/program/dnssd/lib/mdns.lua#L93&quot;&gt;MDNS’s selftest&lt;/a&gt;.&lt;/p&gt;

&lt;p&gt;NOTE: &lt;strong&gt;text2pcap&lt;/strong&gt; command can also be a very convenient tool to convert a dumped packet in text format to a pcap file.&lt;/p&gt;

&lt;p&gt;2) Instead of sending requests on the wire to obtain responses, I saved a bunch of responses to a .pcap file and used the file as an input for the DNS parser. In fact the command supports a –pcap flag that can be used to print out DNS records.&lt;/p&gt;

&lt;figure class=&quot;highlight&quot;&gt;&lt;pre&gt;&lt;code class=&quot;language-bash&quot; data-lang=&quot;bash&quot;&gt;&lt;span&gt;&lt;/span&gt;$ sudo ./snabb dnssd --pcap /home/dpino/avahi-browse.pcap
Reading from file: /home/dpino/avahi-browse.pcap
PTR: &lt;span class=&quot;o&quot;&gt;(&lt;/span&gt;name: _services._dns-sd._udp.local&lt;span class=&quot;p&quot;&gt;;&lt;/span&gt; domain-name: _spotify-connect._tcp&lt;span class=&quot;o&quot;&gt;)&lt;/span&gt;
PTR: &lt;span class=&quot;o&quot;&gt;(&lt;/span&gt;name: &lt;span class=&quot;p&quot;&gt;;&lt;/span&gt; domain-name: dcad2b6c7a2110c310-0&lt;span class=&quot;o&quot;&gt;)&lt;/span&gt;
SRV: &lt;span class=&quot;o&quot;&gt;(&lt;/span&gt;target: dcad2b6c7a2110c310-0&lt;span class=&quot;o&quot;&gt;)&lt;/span&gt;
TXT: &lt;span class=&quot;o&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;nv&quot;&gt;CPath&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;=&lt;/span&gt;/zc/0&lt;span class=&quot;p&quot;&gt;;&lt;/span&gt;&lt;span class=&quot;nv&quot;&gt;VERSION&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;=&lt;/span&gt;&lt;span class=&quot;m&quot;&gt;1&lt;/span&gt;.0&lt;span class=&quot;p&quot;&gt;;&lt;/span&gt;&lt;span class=&quot;nv&quot;&gt;Stack&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;=&lt;/span&gt;SP&lt;span class=&quot;p&quot;&gt;;&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;)&lt;/span&gt;
Address: &lt;span class=&quot;m&quot;&gt;192&lt;/span&gt;.168.86.55
..._&lt;/code&gt;&lt;/pre&gt;&lt;/figure&gt;

&lt;p&gt;3) When sending a packet to the wire, checkout the packet’s header checksum are correct. Wireshark has a mode to verify whether a packet’s header checksums are correct or not, which is very convenient. Snabb counts with protocol libraries to calculate a IP, TCP or UDP checksums. Check out &lt;a href=&quot;https://github.com/dpino/snabb/blob/dns-sd/src/program/dnssd/lib/mdns_query.lua#L167&quot;&gt;how mDNSQuery does it&lt;/a&gt;.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Last thoughts&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;Implementing this tool has helped me to understand DNS better, specially the Multicast DNS/DNS-SD part. I never expected it could be so interesting.&lt;/p&gt;

&lt;p&gt;Going from an idea to a working prototype with Snabb is really fast. It’s one of the advantages of user-space networking and one of the things I enjoy the most. That said the resulting code has been bigger that I initially expected. I think that to avoid losing this work I will try to land the DNS and mDNS libraries into Snabb.&lt;/p&gt;

&lt;p&gt;This post puts an end to this series of practical Snabb posts. I hope you found them interesting as much as I enjoyed writing them. Luckily in the future these posts can be useful for anyone interested in user-space networking to try out Snabb.&lt;/p&gt;
</description>
        <pubDate>Fri, 12 Jan 2018 06:00:00 +0000</pubDate>
        <link>http://blogs.igalia.com/dpino/2018/01/12/more-practical-snabb/</link>
        <guid isPermaLink="true">http://blogs.igalia.com/dpino/2018/01/12/more-practical-snabb/</guid>
        
        <category>igalia</category>
        
        <category>networking</category>
        
        
      </item>
    
      <item>
        <title>Practical Snabb</title>
        <description>
&lt;p&gt;In a previous article I introduced Snabb, a toolkit for developing network functions. In this article I want to dive into some practical examples on how to use Snabb for network function programming.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;The elements of a network function&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;A network function is any program that does something with traffic data. There’s a certain set of operations that can be done onto any packet. Operations such as &lt;strong&gt;reading&lt;/strong&gt;, &lt;strong&gt;modifying&lt;/strong&gt; (headers or payload), &lt;strong&gt;creating&lt;/strong&gt; (new packets), &lt;strong&gt;dropping&lt;/strong&gt; or &lt;strong&gt;forwarding&lt;/strong&gt;. Any network function is a combination of these primitives. For instance, a NAT function consists of packet header modification and forwarding.&lt;/p&gt;

&lt;p&gt;Some of the built-in network functions featured in Snabb are:&lt;/p&gt;

&lt;ul&gt;
  &lt;li&gt;&lt;strong&gt;lwAFTR&lt;/strong&gt; (NAT, encap/decap): Implementation of the lwAFTR network function as specified in &lt;a href=&quot;https://tools.ietf.org/html/rfc7596&quot;&gt;RFC7596&lt;/a&gt;. lwAFTR is a NAT between IPv6 and IPv4 address+port.&lt;/li&gt;
  &lt;li&gt;&lt;strong&gt;IPSEC&lt;/strong&gt; (processing): encryption of packet payloads using AES instructions.&lt;/li&gt;
  &lt;li&gt;&lt;strong&gt;Snabbwall&lt;/strong&gt; (filtering): a &lt;a href=&quot;http://snabbwall.org/&quot;&gt;L7 firewall&lt;/a&gt; that relies on libnDPI for &lt;em&gt;Deep-Packet Inspection&lt;/em&gt;. It also allows L3/L4 filtering using &lt;em&gt;tcpdump&lt;/em&gt; alike expressions.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;Real-world scenarios&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;The downside of by-passing the kernel and taking full control of a NIC is that the NIC cannot be used by any other program. That means the network function run by Snabb acts as a black-box. Some traffic comes in, gets transformed and it’s pushed out through the same NIC (or any other NIC controlled by the network function). The advantage is clear, outstanding performance.&lt;/p&gt;

&lt;p&gt;For this reason Snabb is mostly used to develop network functions that run within the ISP’s network, where traffic load is expected to be high. An ISP can spare one or several NICs to run a network function alone since the results pay off (lower hardware costs, custom network function development, good performance, etc).&lt;/p&gt;

&lt;p&gt;Snabb might seem like a less attractive tool in other scenarios. However, that doesn’t mean it cannot be used to program network functions that run in a personal computer or in a less demanding network. Snabb has interfaces to Tap, Raw socket and Unix socket programming, which allows to use Snabb as a program managed by the kernel. In fact, using some of these interfaces is the best way to start with Snabb if you don’t count with native hardware support.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Building Snabb&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;In this tutorial I’ll cover two examples to help me illustrate how to use Snabb. But before proceeding with the examples, we need to download and build Snabb.&lt;/p&gt;

&lt;figure class=&quot;highlight&quot;&gt;&lt;pre&gt;&lt;code class=&quot;language-bash&quot; data-lang=&quot;bash&quot;&gt;&lt;span&gt;&lt;/span&gt;$ git clone https://github.com/snabbco/snabb
$ &lt;span class=&quot;nb&quot;&gt;cd&lt;/span&gt; snabb
$ make&lt;/code&gt;&lt;/pre&gt;&lt;/figure&gt;

&lt;p&gt;Now we can run the snabb executable, which will print out a list of all the subprograms available:&lt;/p&gt;

&lt;figure class=&quot;highlight&quot;&gt;&lt;pre&gt;&lt;code class=&quot;language-bash&quot; data-lang=&quot;bash&quot;&gt;&lt;span&gt;&lt;/span&gt;$ &lt;span class=&quot;nb&quot;&gt;cd&lt;/span&gt; src/
$ sudo ./snabb
Usage: ./snabb &amp;lt;program&amp;gt; ...

This snabb executable has the following programs built &lt;span class=&quot;k&quot;&gt;in&lt;/span&gt;:
  config
  example_replay
  example_spray
  firehose
  ...
  snsh
  wall

For detailed usage of any program run:
  snabb &amp;lt;program&amp;gt; --help

If you rename &lt;span class=&quot;o&quot;&gt;(&lt;/span&gt;or copy or symlink&lt;span class=&quot;o&quot;&gt;)&lt;/span&gt; this executable with one of
the names above &lt;span class=&quot;k&quot;&gt;then&lt;/span&gt; that program will be chosen automatically.&lt;/code&gt;&lt;/pre&gt;&lt;/figure&gt;

&lt;p&gt;&lt;strong&gt;Hello world!&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;One of the simplest network functions to build is something that reads packets from a source, filters some of them and forwards the rest to an output. In this case I want to capture traffic from my browser (packets to HTTP or HTTPS). Here is how our &lt;em&gt;hello world!&lt;/em&gt; program looks like:&lt;/p&gt;

&lt;figure class=&quot;highlight&quot;&gt;&lt;pre&gt;&lt;code class=&quot;language-lua&quot; data-lang=&quot;lua&quot;&gt;&lt;span&gt;&lt;/span&gt;&lt;span class=&quot;cp&quot;&gt;#!./snabb snsh&lt;/span&gt;

&lt;span class=&quot;kd&quot;&gt;local&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;pcap&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;nb&quot;&gt;require&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;s2&quot;&gt;&amp;quot;apps.pcap.pcap&amp;quot;&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;)&lt;/span&gt;
&lt;span class=&quot;kd&quot;&gt;local&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;PcapFilter&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;nb&quot;&gt;require&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;s2&quot;&gt;&amp;quot;apps.packet_filter.pcap_filter&amp;quot;&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;).&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;PcapFilter&lt;/span&gt;
&lt;span class=&quot;kd&quot;&gt;local&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;RawSocket&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;nb&quot;&gt;require&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;s2&quot;&gt;&amp;quot;apps.socket.raw&amp;quot;&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;).&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;RawSocket&lt;/span&gt;

&lt;span class=&quot;kd&quot;&gt;local&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;args&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;main&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;parameters&lt;/span&gt;
&lt;span class=&quot;kd&quot;&gt;local&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;iface&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;nb&quot;&gt;assert&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;args&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;[&lt;/span&gt;&lt;span class=&quot;mi&quot;&gt;1&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;],&lt;/span&gt; &lt;span class=&quot;s2&quot;&gt;&amp;quot;No listening interface&amp;quot;&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;)&lt;/span&gt;
&lt;span class=&quot;kd&quot;&gt;local&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;fileout&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;args&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;[&lt;/span&gt;&lt;span class=&quot;mi&quot;&gt;2&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;]&lt;/span&gt; &lt;span class=&quot;ow&quot;&gt;or&lt;/span&gt; &lt;span class=&quot;s2&quot;&gt;&amp;quot;output.pcap&amp;quot;&lt;/span&gt;

&lt;span class=&quot;kd&quot;&gt;local&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;c&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;config&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;new&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;()&lt;/span&gt;
&lt;span class=&quot;n&quot;&gt;config&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;app&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;c&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;s2&quot;&gt;&amp;quot;nic&amp;quot;&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;RawSocket&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;iface&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;)&lt;/span&gt;
&lt;span class=&quot;n&quot;&gt;config&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;app&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;c&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;s2&quot;&gt;&amp;quot;filter&amp;quot;&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;PcapFilter&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;p&quot;&gt;{&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;filter&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;s2&quot;&gt;&amp;quot;tcp dst port 80 or dst port 443&amp;quot;&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;})&lt;/span&gt;
&lt;span class=&quot;n&quot;&gt;config&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;app&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;c&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;s2&quot;&gt;&amp;quot;writer&amp;quot;&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;pcap&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;PcapWriter&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;fileout&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;)&lt;/span&gt;

&lt;span class=&quot;n&quot;&gt;config&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;link&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;c&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;s2&quot;&gt;&amp;quot;nic.tx -&amp;gt; filter.input&amp;quot;&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;)&lt;/span&gt;
&lt;span class=&quot;n&quot;&gt;config&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;link&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;c&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;s2&quot;&gt;&amp;quot;filter.output -&amp;gt; writer.input&amp;quot;&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;)&lt;/span&gt;

&lt;span class=&quot;n&quot;&gt;engine&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;configure&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;c&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;)&lt;/span&gt;
&lt;span class=&quot;n&quot;&gt;engine&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;main&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;({&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;duration&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;=&lt;/span&gt;&lt;span class=&quot;mi&quot;&gt;30&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;})&lt;/span&gt;

&lt;span class=&quot;n&quot;&gt;main&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;exit&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;mi&quot;&gt;0&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;)&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/figure&gt;

&lt;p&gt;Now save the script and run it:&lt;/p&gt;

&lt;figure class=&quot;highlight&quot;&gt;&lt;pre&gt;&lt;code class=&quot;language-bash&quot; data-lang=&quot;bash&quot;&gt;&lt;span&gt;&lt;/span&gt;$ chmod +x http-filter.snabb 
$ sudo ./http-filter.snabb wlp3s0&lt;/code&gt;&lt;/pre&gt;&lt;/figure&gt;

&lt;p&gt;While the script is running I open a few websites in my browser. Hopefully some packets will be captured onto &lt;em&gt;output.pcap&lt;/em&gt;:&lt;/p&gt;

&lt;figure class=&quot;highlight&quot;&gt;&lt;pre&gt;&lt;code class=&quot;language-lua&quot; data-lang=&quot;lua&quot;&gt;&lt;span&gt;&lt;/span&gt;&lt;span class=&quot;err&quot;&gt;$&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;sudo&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;tcpdump&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;-&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;tr&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;output&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;pcap&lt;/span&gt;
&lt;span class=&quot;n&quot;&gt;IP&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;sagan&lt;/span&gt;&lt;span class=&quot;mf&quot;&gt;.50062&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;&amp;gt;&lt;/span&gt; &lt;span class=&quot;mf&quot;&gt;54.239.17.7&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;http&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;:&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;Flags&lt;/span&gt; &lt;span class=&quot;p&quot;&gt;[&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;P&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;.],&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;seq&lt;/span&gt; &lt;span class=&quot;mi&quot;&gt;0&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;:&lt;/span&gt;&lt;span class=&quot;mi&quot;&gt;926&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;ack&lt;/span&gt; &lt;span class=&quot;mi&quot;&gt;1&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;win&lt;/span&gt; &lt;span class=&quot;mi&quot;&gt;229&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;length&lt;/span&gt; &lt;span class=&quot;mi&quot;&gt;926&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;:&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;HTTP&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;:&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;GET&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;/&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;HTTP&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;/&lt;/span&gt;&lt;span class=&quot;mf&quot;&gt;1.1&lt;/span&gt;
&lt;span class=&quot;n&quot;&gt;IP&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;sagan&lt;/span&gt;&lt;span class=&quot;mf&quot;&gt;.50062&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;&amp;gt;&lt;/span&gt; &lt;span class=&quot;mf&quot;&gt;54.239.17.7&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;http&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;:&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;Flags&lt;/span&gt; &lt;span class=&quot;p&quot;&gt;[.],&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;ack&lt;/span&gt; &lt;span class=&quot;mi&quot;&gt;189&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;win&lt;/span&gt; &lt;span class=&quot;mi&quot;&gt;237&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;length&lt;/span&gt; &lt;span class=&quot;mi&quot;&gt;0&lt;/span&gt;
&lt;span class=&quot;n&quot;&gt;IP&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;sagan&lt;/span&gt;&lt;span class=&quot;mf&quot;&gt;.50062&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;&amp;gt;&lt;/span&gt; &lt;span class=&quot;mf&quot;&gt;54.239.17.7&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;http&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;:&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;Flags&lt;/span&gt; &lt;span class=&quot;p&quot;&gt;[.],&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;ack&lt;/span&gt; &lt;span class=&quot;mi&quot;&gt;368&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;win&lt;/span&gt; &lt;span class=&quot;mi&quot;&gt;245&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;length&lt;/span&gt; &lt;span class=&quot;mi&quot;&gt;0&lt;/span&gt;
&lt;span class=&quot;n&quot;&gt;IP&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;sagan&lt;/span&gt;&lt;span class=&quot;mf&quot;&gt;.37346&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;&amp;gt;&lt;/span&gt; &lt;span class=&quot;mf&quot;&gt;93.184.220.29&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;http&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;:&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;Flags&lt;/span&gt; &lt;span class=&quot;p&quot;&gt;[&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;S&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;],&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;seq&lt;/span&gt; &lt;span class=&quot;mi&quot;&gt;370675941&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;win&lt;/span&gt; &lt;span class=&quot;mi&quot;&gt;29200&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;options&lt;/span&gt; &lt;span class=&quot;p&quot;&gt;[&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;mss&lt;/span&gt; &lt;span class=&quot;mi&quot;&gt;1460&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;sackOK&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;TS&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;val&lt;/span&gt; &lt;span class=&quot;mi&quot;&gt;1370741706&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;ecr&lt;/span&gt; &lt;span class=&quot;mi&quot;&gt;0&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;nop&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;wscale&lt;/span&gt; &lt;span class=&quot;mi&quot;&gt;7&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;],&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;length&lt;/span&gt; &lt;span class=&quot;mi&quot;&gt;0&lt;/span&gt;
&lt;span class=&quot;n&quot;&gt;IP&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;sagan&lt;/span&gt;&lt;span class=&quot;mf&quot;&gt;.37346&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;&amp;gt;&lt;/span&gt; &lt;span class=&quot;mf&quot;&gt;93.184.220.29&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;http&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;:&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;Flags&lt;/span&gt; &lt;span class=&quot;p&quot;&gt;[.],&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;ack&lt;/span&gt; &lt;span class=&quot;mi&quot;&gt;2640726891&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;win&lt;/span&gt; &lt;span class=&quot;mi&quot;&gt;229&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;options&lt;/span&gt; &lt;span class=&quot;p&quot;&gt;[&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;nop&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;nop&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;TS&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;val&lt;/span&gt; &lt;span class=&quot;mi&quot;&gt;1370741710&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;ecr&lt;/span&gt; &lt;span class=&quot;mi&quot;&gt;2287287426&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;],&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;length&lt;/span&gt; &lt;span class=&quot;mi&quot;&gt;0&lt;/span&gt;
&lt;span class=&quot;n&quot;&gt;IP&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;sagan&lt;/span&gt;&lt;span class=&quot;mf&quot;&gt;.37346&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;&amp;gt;&lt;/span&gt; &lt;span class=&quot;mf&quot;&gt;93.184.220.29&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;http&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;:&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;Flags&lt;/span&gt; &lt;span class=&quot;p&quot;&gt;[&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;P&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;.],&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;seq&lt;/span&gt; &lt;span class=&quot;mi&quot;&gt;0&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;:&lt;/span&gt;&lt;span class=&quot;mi&quot;&gt;439&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;ack&lt;/span&gt; &lt;span class=&quot;mi&quot;&gt;1&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;win&lt;/span&gt; &lt;span class=&quot;mi&quot;&gt;229&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;options&lt;/span&gt; &lt;span class=&quot;p&quot;&gt;[&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;nop&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;nop&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;TS&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;val&lt;/span&gt; &lt;span class=&quot;mi&quot;&gt;1370741729&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;ecr&lt;/span&gt; &lt;span class=&quot;mi&quot;&gt;2287287426&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;],&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;length&lt;/span&gt; &lt;span class=&quot;mi&quot;&gt;439&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;:&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;HTTP&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;:&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;POST&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;/&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;HTTP&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;/&lt;/span&gt;&lt;span class=&quot;mf&quot;&gt;1.1&lt;/span&gt;
&lt;span class=&quot;n&quot;&gt;IP&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;sagan&lt;/span&gt;&lt;span class=&quot;mf&quot;&gt;.37346&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;&amp;gt;&lt;/span&gt; &lt;span class=&quot;mf&quot;&gt;93.184.220.29&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;http&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;:&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;Flags&lt;/span&gt; &lt;span class=&quot;p&quot;&gt;[.],&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;ack&lt;/span&gt; &lt;span class=&quot;mi&quot;&gt;789&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;win&lt;/span&gt; &lt;span class=&quot;mi&quot;&gt;251&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;options&lt;/span&gt; &lt;span class=&quot;p&quot;&gt;[&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;nop&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;nop&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;TS&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;val&lt;/span&gt; &lt;span class=&quot;mi&quot;&gt;1370741733&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;ecr&lt;/span&gt; &lt;span class=&quot;mi&quot;&gt;2287287449&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;],&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;length&lt;/span&gt; &lt;span class=&quot;mi&quot;&gt;0&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/figure&gt;

&lt;p&gt;Some highlights in this script:&lt;/p&gt;

&lt;ul&gt;
  &lt;li&gt;The shebang line (&lt;em&gt;#./snabb snsh&lt;/em&gt;) refers to the Snabb’s shell (&lt;em&gt;snsh&lt;/em&gt;), one of the many subprograms available in Snabb. It allows us to run Snabb scripts, that is Lua programs that have access to the Snabb environment (engine, apps, libraries, etc).&lt;/li&gt;
  &lt;li&gt;There’s a series of libraries that where not loaded: &lt;em&gt;config&lt;/em&gt;, &lt;em&gt;engine&lt;/em&gt;, &lt;em&gt;main&lt;/em&gt;, etc. These libraries are part of the Snabb environment and are automatically loaded in every program.&lt;/li&gt;
  &lt;li&gt;The network function instantiates 3 apps: &lt;strong&gt;RawSocket&lt;/strong&gt;, &lt;strong&gt;PcapFilter&lt;/strong&gt; and &lt;strong&gt;PcapWriter&lt;/strong&gt;, initializes them and pipes them together through links forming a graph. This graph is passed to the engine that executes it for 30 seconds.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;Martian packets&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;Let’s continue with another example: a network function that manages a more complex set of rules to filter out traffic. Since there are more rules I will encapsulate the filtering logic into a custom app.&lt;/p&gt;

&lt;p&gt;The data we’re going to filter are &lt;a href=&quot;https://en.wikipedia.org/wiki/Martian_packet&quot;&gt;martian packets&lt;/a&gt;. According to Wikipedia, a martian packet is &lt;em&gt;“an IP packet seen on the public internet that contains a source or destination address that is reserved for special-use by Internet Assigned Numbers Authority (IANA)”&lt;/em&gt;. For instance, packets with &lt;a href=&quot;https://tools.ietf.org/html/rfc1918&quot;&gt;RFC1918&lt;/a&gt; addresses or multicast addresses seen on the public internet are martian packets.&lt;/p&gt;

&lt;p&gt;Unlike the previous example, I decided not to code this network function as an script, but as a program instead. The network function lives at &lt;em&gt;src/program/martian&lt;/em&gt;. I’ve pushed the final code to a &lt;a href=&quot;https://github.com/dpino/snabb/commits/martian-packets&quot;&gt;branch&lt;/a&gt; in my Snabb repository:&lt;/p&gt;

&lt;figure class=&quot;highlight&quot;&gt;&lt;pre&gt;&lt;code class=&quot;language-bash&quot; data-lang=&quot;bash&quot;&gt;&lt;span&gt;&lt;/span&gt;$ git remote add https://github.com/dpino/snabb.git dpino
$ git fetch dpino
$ git checkout -b dpino/martian-packets&lt;/code&gt;&lt;/pre&gt;&lt;/figure&gt;

&lt;p&gt;To run the app:&lt;/p&gt;

&lt;figure class=&quot;highlight&quot;&gt;&lt;pre&gt;&lt;code class=&quot;language-bash&quot; data-lang=&quot;bash&quot;&gt;&lt;span&gt;&lt;/span&gt;$ sudo ./snabb martian program/martian/test/sample.pcap
link report:
   &lt;span class=&quot;m&quot;&gt;3&lt;/span&gt; sent on filter.output -&amp;gt; writer.input &lt;span class=&quot;o&quot;&gt;(&lt;/span&gt;loss rate: &lt;span class=&quot;m&quot;&gt;0&lt;/span&gt;%&lt;span class=&quot;o&quot;&gt;)&lt;/span&gt;
   &lt;span class=&quot;m&quot;&gt;5&lt;/span&gt; sent on reader.output -&amp;gt; filter.input &lt;span class=&quot;o&quot;&gt;(&lt;/span&gt;loss rate: &lt;span class=&quot;m&quot;&gt;0&lt;/span&gt;%&lt;span class=&quot;o&quot;&gt;)&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/figure&gt;

&lt;p&gt;The functions lets pass 3 out of 5 packets from sample.pcap.&lt;/p&gt;

&lt;figure class=&quot;highlight&quot;&gt;&lt;pre&gt;&lt;code class=&quot;language-bash&quot; data-lang=&quot;bash&quot;&gt;&lt;span&gt;&lt;/span&gt;$ sudo tcpdump -qns &lt;span class=&quot;m&quot;&gt;0&lt;/span&gt; -t -e -r program/martian/test/sample.pcap
reading from file program/martian/test/sample.pcap, link-type EN10MB &lt;span class=&quot;o&quot;&gt;(&lt;/span&gt;Ethernet&lt;span class=&quot;o&quot;&gt;)&lt;/span&gt;
&lt;span class=&quot;m&quot;&gt;00&lt;/span&gt;:00:01:00:00:00 &amp;gt; fe:ff:20:00:01:00, IPv4, length &lt;span class=&quot;m&quot;&gt;62&lt;/span&gt;: &lt;span class=&quot;m&quot;&gt;145&lt;/span&gt;.254.160.237.3372 &amp;gt; &lt;span class=&quot;m&quot;&gt;65&lt;/span&gt;.208.228.223.80: tcp &lt;span class=&quot;m&quot;&gt;0&lt;/span&gt;
fe:ff:20:00:01:00 &amp;gt; &lt;span class=&quot;m&quot;&gt;00&lt;/span&gt;:00:01:00:00:00, IPv4, length &lt;span class=&quot;m&quot;&gt;62&lt;/span&gt;: &lt;span class=&quot;m&quot;&gt;65&lt;/span&gt;.208.228.223.80 &amp;gt; &lt;span class=&quot;m&quot;&gt;145&lt;/span&gt;.254.160.237.3372: tcp &lt;span class=&quot;m&quot;&gt;0&lt;/span&gt;
&lt;span class=&quot;m&quot;&gt;00&lt;/span&gt;:00:01:00:00:00 &amp;gt; fe:ff:20:00:01:00, IPv4, length &lt;span class=&quot;m&quot;&gt;54&lt;/span&gt;: &lt;span class=&quot;m&quot;&gt;145&lt;/span&gt;.254.160.237.3372 &amp;gt; &lt;span class=&quot;m&quot;&gt;65&lt;/span&gt;.208.228.223.80: tcp &lt;span class=&quot;m&quot;&gt;0&lt;/span&gt;
&lt;span class=&quot;m&quot;&gt;90&lt;/span&gt;:e2:ba:94:2a:bc &amp;gt; &lt;span class=&quot;m&quot;&gt;02&lt;/span&gt;:cf:69:15:81:01, IPv4, length &lt;span class=&quot;m&quot;&gt;242&lt;/span&gt;: &lt;span class=&quot;m&quot;&gt;10&lt;/span&gt;.0.1.100 &amp;gt; &lt;span class=&quot;m&quot;&gt;10&lt;/span&gt;.10.0.0: ICMP &lt;span class=&quot;nb&quot;&gt;echo&lt;/span&gt; reply, id &lt;span class=&quot;m&quot;&gt;1024&lt;/span&gt;, seq &lt;span class=&quot;m&quot;&gt;0&lt;/span&gt;, length &lt;span class=&quot;m&quot;&gt;208&lt;/span&gt;
&lt;span class=&quot;m&quot;&gt;90&lt;/span&gt;:e2:ba:94:2a:bc &amp;gt; &lt;span class=&quot;m&quot;&gt;02&lt;/span&gt;:cf:69:15:81:01, IPv4, length &lt;span class=&quot;m&quot;&gt;242&lt;/span&gt;: &lt;span class=&quot;m&quot;&gt;10&lt;/span&gt;.0.1.100 &amp;gt; &lt;span class=&quot;m&quot;&gt;10&lt;/span&gt;.10.0.0: ICMP &lt;span class=&quot;nb&quot;&gt;echo&lt;/span&gt; reply, id &lt;span class=&quot;m&quot;&gt;53&lt;/span&gt;, seq &lt;span class=&quot;m&quot;&gt;0&lt;/span&gt;, length &lt;span class=&quot;m&quot;&gt;208&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/figure&gt;

&lt;p&gt;The last two packets are martian packets. They cannot occur in a public network since their source or destination addresses are private addresses.&lt;/p&gt;

&lt;p&gt;Some highlights about this network function:&lt;/p&gt;

&lt;ul&gt;
  &lt;li&gt;Instead of a filtering app, I’ve coded my own filtering app, called &lt;a href=&quot;https://github.com/dpino/snabb/blob/martian-packets/src/program/martian/martian.lua#L45&quot;&gt;MartianFiltering&lt;/a&gt;. This new app is the responsible for determining whether a packet is a martian packet or not. This operation has to be done in the &lt;em&gt;push&lt;/em&gt; method of the app.&lt;/li&gt;
  &lt;li&gt;I’ve coded some utility functions to &lt;a href=&quot;https://github.com/dpino/snabb/blob/martian-packets/src/program/martian/martian.lua#L15&quot;&gt;parse CIDR addresses&lt;/a&gt; (such as 100.64.0.0/10) and to check whether an &lt;a href=&quot;https://github.com/dpino/snabb/blob/martian-packets/src/program/martian/martian.lua#L25&quot;&gt;IP address belongs to a network&lt;/a&gt;. Instead I could have used Snabb’s filtering library that allows to filter packets using &lt;em&gt;tcpdump&lt;/em&gt; like expressions. For instance, &lt;em&gt;“net 100.64.0.0 mask 255.192.0.0”&lt;/em&gt;.&lt;/li&gt;
  &lt;li&gt;The network function doesn’t use a network interface to read packets from, instead it reads packets out of a .pcap file.&lt;/li&gt;
  &lt;li&gt;Every Snabb program has a &lt;a href=&quot;https://github.com/dpino/snabb/blob/martian-packets/src/program/martian/martian.lua#L95&quot;&gt;run function&lt;/a&gt;, that is the program’s entry point. A Snabb program or library can also add a &lt;a href=&quot;https://github.com/dpino/snabb/blob/martian-packets/src/program/martian/martian.lua#L111&quot;&gt;selftest function&lt;/a&gt;, which is used to unit test the module (&lt;em&gt;$ sudo ./snabb snsh -t program.martian&lt;/em&gt;). On the other hand, Snabb apps must implement a &lt;a href=&quot;https://github.com/dpino/snabb/blob/martian-packets/src/program/martian/martian.lua#L47&quot;&gt;new method&lt;/a&gt; and optionally a &lt;a href=&quot;https://github.com/dpino/snabb/blob/martian-packets/src/program/martian/martian.lua#L81&quot;&gt;push&lt;/a&gt; or &lt;em&gt;pull&lt;/em&gt; method (or both, but at least one of them).&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Here’s the app’s graph:&lt;/p&gt;

&lt;figure class=&quot;highlight&quot;&gt;&lt;pre&gt;&lt;code class=&quot;language-lua&quot; data-lang=&quot;lua&quot;&gt;&lt;span&gt;&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;config&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;app&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;c&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;s2&quot;&gt;&amp;quot;reader&amp;quot;&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;pcap&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;PcapReader&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;filein&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;)&lt;/span&gt;
&lt;span class=&quot;n&quot;&gt;config&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;app&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;c&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;s2&quot;&gt;&amp;quot;filter&amp;quot;&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;MartianFilter&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;)&lt;/span&gt;
&lt;span class=&quot;n&quot;&gt;config&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;app&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;c&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;s2&quot;&gt;&amp;quot;writer&amp;quot;&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;pcap&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;PcapWriter&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;fileout&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;)&lt;/span&gt;

&lt;span class=&quot;n&quot;&gt;config&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;link&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;c&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;s2&quot;&gt;&amp;quot;reader.output -&amp;gt; filter.input&amp;quot;&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;)&lt;/span&gt;
&lt;span class=&quot;n&quot;&gt;config&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;link&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;c&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;s2&quot;&gt;&amp;quot;filter.output -&amp;gt; writer.input&amp;quot;&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;)&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/figure&gt;

&lt;p&gt;And here is how &lt;em&gt;MartianPacket:pull&lt;/em&gt; method looks like:&lt;/p&gt;

&lt;figure class=&quot;highlight&quot;&gt;&lt;pre&gt;&lt;code class=&quot;language-lua&quot; data-lang=&quot;lua&quot;&gt;&lt;span&gt;&lt;/span&gt;&lt;span class=&quot;kr&quot;&gt;function&lt;/span&gt; &lt;span class=&quot;nc&quot;&gt;MartianFilter&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;:&lt;/span&gt;&lt;span class=&quot;nf&quot;&gt;push&lt;/span&gt; &lt;span class=&quot;p&quot;&gt;()&lt;/span&gt;
   &lt;span class=&quot;kd&quot;&gt;local&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;input&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;output&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;nb&quot;&gt;assert&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;self&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;input&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;input&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;),&lt;/span&gt; &lt;span class=&quot;nb&quot;&gt;assert&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;self&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;output&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;output&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;)&lt;/span&gt;

   &lt;span class=&quot;kr&quot;&gt;while&lt;/span&gt; &lt;span class=&quot;ow&quot;&gt;not&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;link&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;empty&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;input&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;)&lt;/span&gt; &lt;span class=&quot;kr&quot;&gt;do&lt;/span&gt;
      &lt;span class=&quot;kd&quot;&gt;local&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;pkt&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;link&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;receive&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;input&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;)&lt;/span&gt;
      &lt;span class=&quot;kd&quot;&gt;local&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;ip_hdr&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;ipv4&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;:&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;new_from_mem&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;pkt&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;data&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;+&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;IPV4_OFFSET&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;IPV4_SIZE&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;)&lt;/span&gt;
      &lt;span class=&quot;kr&quot;&gt;if&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;self&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;:&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;is_martian&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;ip_hdr&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;:&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;src&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;())&lt;/span&gt; &lt;span class=&quot;ow&quot;&gt;or&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;self&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;:&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;is_martian&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;ip_hdr&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;:&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;dst&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;())&lt;/span&gt; &lt;span class=&quot;kr&quot;&gt;then&lt;/span&gt;
         &lt;span class=&quot;n&quot;&gt;packet&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;free&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;pkt&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;)&lt;/span&gt;
      &lt;span class=&quot;kr&quot;&gt;else&lt;/span&gt;
         &lt;span class=&quot;n&quot;&gt;link&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;transmit&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;output&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;pkt&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;)&lt;/span&gt;
      &lt;span class=&quot;kr&quot;&gt;end&lt;/span&gt;
   &lt;span class=&quot;kr&quot;&gt;end&lt;/span&gt;
&lt;span class=&quot;kr&quot;&gt;end&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/figure&gt;

&lt;p&gt;As a rule of thumb, in every Snabb program there’s always one app only that feeds packets into the graph, in this case the PcapReader app. Such applications have to override the method &lt;em&gt;pull&lt;/em&gt;. Apps that would like to manipulate packets will have a chance to do it in their &lt;em&gt;push&lt;/em&gt; method.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Summary&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;Snabb is a very useful tool for coding network functions that need to run at very high speed. For this reason, it’s usually deployed as part of an ISP network infrastructure. However, the toolkit is versatile enough to allow us code any type of application that has to manipulate network traffic.&lt;/p&gt;

&lt;p&gt;In this tutorial I introduced how to start using Snabb to code network functions. In a first example I showed how to download and build Snabb plus a very simple application that filters HTTP or HTTPS traffic from a network interface. On a second example, I introduced how to code a Snabb program and an app, &lt;em&gt;MartianFiltering&lt;/em&gt;. This app exemplifies how to filter out packets based on a set of rules and forward or drop packets based on those conditions. Other more sophisticated network functions, such as firewalling, packet-rate limiting or DDoS prevention attack, behave in a similar manner.&lt;/p&gt;

&lt;p&gt;That’s all for now. I left out another example that consisted of sending and receiving Multicast DNS packets. Likely I’ll cover it in a followup article.&lt;/p&gt;
</description>
        <pubDate>Tue, 28 Nov 2017 06:00:00 +0000</pubDate>
        <link>http://blogs.igalia.com/dpino/2017/11/28/practical-snabb/</link>
        <guid isPermaLink="true">http://blogs.igalia.com/dpino/2017/11/28/practical-snabb/</guid>
        
        <category>igalia</category>
        
        <category>networking</category>
        
        
      </item>
    
      <item>
        <title>Snabb explained in less than 10 minutes</title>
        <description>
&lt;p&gt;Last month I attended the 20th edition of &lt;a href=&quot;http://www.esnog.net/gore20.html&quot;&gt;GORE&lt;/a&gt; (the Spain’s Network Operator Group meeting) where I delivered an introductory &lt;a href=&quot;https://www.youtube.com/watch?v=gEHkxwc6Jzg&quot;&gt;talk about Snabb&lt;/a&gt; (Spanish). &lt;a href=&quot;https://people.igalia.com/dpino/gore20/network-functions-with-snabb/#/&quot;&gt;Slides of the talk&lt;/a&gt; are also available online (English).&lt;/p&gt;

&lt;p&gt;Taking advantage of this presentation I decided to write down an introductory article about Snabb. Something that could allow anyone to understand what’s Snabb easily.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;What is Snabb?&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;Snabb is a toolkit for developing network functions in user-space. This definition refers to two keywords that are worth clarifying: &lt;strong&gt;network functions&lt;/strong&gt; and &lt;strong&gt;user-space&lt;/strong&gt;.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;What’s a network function?&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;A network function is any program that does something on network traffic. What kind of things can be done on traffic? For instance: to read packets, modify their headers, create new packets, discard packets or forward them. Any network function is a combination of these basic operations. Here are some examples:&lt;/p&gt;

&lt;ul&gt;
  &lt;li&gt;&lt;strong&gt;Filtering function&lt;/strong&gt; (i.e. firewalling): read incoming packets, compare to table of rules and execute an action (&lt;em&gt;forward&lt;/em&gt; or &lt;em&gt;drop&lt;/em&gt;).&lt;/li&gt;
  &lt;li&gt;&lt;strong&gt;Traffic mapping&lt;/strong&gt; (i.e. NAT): read incoming packets, modify headers and forward packet.&lt;/li&gt;
  &lt;li&gt;&lt;strong&gt;Encapsulation&lt;/strong&gt; (i.e. VPN): read incoming packets, create a new packet, embed packet into new one and send it.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;What’s user-space networking?&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;In the last few years, there has been a new trend for writing down network functions. This new trend consists of writing down the entire network function in user-space and do not leave any processing to the kernel.&lt;/p&gt;

&lt;p&gt;Traditionally when writing down network functions we use the abstractions provided by the OS. The goal of any OS is to create abstractions over hardware that programs can use. This happens at many levels. For instance, when dealing with a hard-drive we don’t need to think of heads, cylinders and sectors but use a higher level abstraction: the filesystem. Networking is another layer abstracted by the OS. As programmers, we don’t deal with the NIC directly, instead we work with sockets and have access to APIs to deal with the TCP/IP stack.&lt;/p&gt;

&lt;p&gt;However, the addition of higher level abstractions implicitly adds an overhead to the processing of our network function. The first disadvantage is that the function is split in two lands: user-space and kernel-space, and switching between both lands has a cost. And even if we move as much logic as possible to the kernel, there are inherit costs caused by the kernel’s networking layer.&lt;/p&gt;

&lt;p&gt;The need of skipping the kernel and program network functions entirely in user-space was triggered by the continuous improvement of hardware. Today is possible to buy a 10G NIC for less than 200 euros. Soon the idea of building high-performance network appliances out of commodity hardware seemed feasible. Someone could pick an Intel Xeon, fill in the available PCI slots with 10G NICs and expect to have the equivalent of a very expensive Cisco or Juniper router for a fraction of its cost.&lt;/p&gt;

&lt;p&gt;If we drive the hardware described above entirely with Linux we won’t be able to squeeze all its performance. Every packet hitting the NICs will have to go through the kernel’s networking layer and that has a cost caused by all the operations the kernel does onto packets before they’re available to manipulate by our program. To understand how much this is a problem, I need to introduce the concept of &lt;strong&gt;budget&lt;/strong&gt; in a network function.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Know your network function budget&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;If we want to make the most of our hardware we generally would like to run our network function at line-rate speed, that means, the maximum speed of the NIC. How much time is that? In a 10G NIC, if we are receiving packets of an average size of &lt;strong&gt;550-bytes&lt;/strong&gt; at the maximum speed then we’re receiving a new packet every &lt;strong&gt;440ns&lt;/strong&gt;. That’s all the time we have available to run our network function on a packet.&lt;/p&gt;

&lt;p&gt;Usually the way a NIC works is by placing incoming packets in a queue or buffer. This buffer is actually a &lt;a href=&quot;https://en.wikipedia.org/wiki/Circular_buffer&quot;&gt;ring-buffer&lt;/a&gt;, that means there are two cursors pointing to the buffer, the &lt;strong&gt;Rx&lt;/strong&gt; cursor and the &lt;strong&gt;Tx&lt;/strong&gt; cursor. When a new packet arrives, the packet is written at the Rx position and the cursor gets updated. When a packet leaves the buffer, the packet is read at the Tx position and the cursor gets updated after read. Our network function fetches packets from the Tx cursor. If it’s too slow processing a packet, the Rx cursor will eventually overpass the TX cursor. When that happens there’s a &lt;strong&gt;packet drop&lt;/strong&gt; (a packet was overwritten before it was consumed).&lt;/p&gt;

&lt;p&gt;Let’s go back to the 440ns number. How much time is that? Kernel hacker Jesper Brouer discusses this issue on his excellent talk &lt;a href=&quot;http://people.netfilter.org/hawk/presentations/LCA2015/net_stack_challenges_100G_LCA2015.pdf&quot;&gt;“Network stack challenges at increasing speed”&lt;/a&gt; (I also recommend LWN’s summary of the talk: &lt;a href=&quot;https://lwn.net/Articles/629155/&quot;&gt;Improving Linux networking performance&lt;/a&gt;). Here’s the cost of some common operations: (cost varies depending on hardware but the order of magnitude is similar across different hardware settings)&lt;/p&gt;

&lt;ul&gt;
  &lt;li&gt;Spinlock (Lock/Unlock): 16ns.&lt;/li&gt;
  &lt;li&gt;L2 cache hit: 4.3ns.&lt;/li&gt;
  &lt;li&gt;L3 cache hit: 7.9ns.&lt;/li&gt;
  &lt;li&gt;Cache miss: 32ns.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Taking into account those numbers 440ns doesn’t seem like a lot of time. System calls cost is also prohibitive, which should be minimized as much as possible.&lt;/p&gt;

&lt;p&gt;Another important thing to notice is that the smaller the size of the packet, the smaller the budget. On a 10G NIC if we’re receiving packets of &lt;strong&gt;64-byte&lt;/strong&gt; on average, the smallest IPv4 packet size possible, that means we are receiving a new packet every &lt;strong&gt;59ns&lt;/strong&gt;. In this scenario two straight cache misses would eat the whole budget.&lt;/p&gt;

&lt;p&gt;In conclusion, at these NIC speeds the additional overhead the kernel networking layer adds is non trivial, but significantly big enough to affect the execution of our network function. Since our budget gets reduced packets are more likely to be dropped at higher speeds or at smaller packet sizes, limiting the overall performance of our network card.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;NOTE&lt;/strong&gt;: This is a general picture of the issue of doing high-performance networking in the Linux kernel. The kernel hackers are not ignorant of these problems and have been working on ways to fix them in the last years. In that regard is worth mentioning the addition of &lt;a href=&quot;https://www.iovisor.org/technology/xdp&quot;&gt;XDP&lt;/a&gt; (&lt;em&gt;eXpress Data Path&lt;/em&gt;), a kernel abstraction to execute network functions as closer to the hardware as possible. But that’s a subject for another post.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;By-passing the kernel&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;User-space networking needs to by-pass the kernel’s networking layer so it can squeeze all the performance of the underlying hardware. There are several strategies to do that: user-space drivers, PF_RING, Netmap, etc (Cloudflare has an excellent article on &lt;a href=&quot;https://blog.cloudflare.com/kernel-bypass/&quot;&gt;kernel by-pass&lt;/a&gt;, commenting several of those strategies). Snabb chooses to handling the hardware directly, that means, to provide user-space drivers for the NICs it supports.&lt;/p&gt;

&lt;p&gt;Snabb offers support mostly for Intel cards (although some Solarflare and Mellanox models are also supported). Implementing a driver, either in kernel-space or user-space, is not an easy task. It’s fundamental to have access to the vendor’s datasheet (generally a very large document) to know how to initialize the NIC, how to read packets from it, how to transfer data, etc. Intel provides such datasheet. In fact, Intel started a few years ago a project with a similar goal: &lt;a href=&quot;http://dpdk.org/&quot;&gt;DPDK&lt;/a&gt;. DPDK is an open-source project that implements drivers in user-space. Although originally it only provided drivers for Intel NICs, as the adoption of the project increased, other vendors have started to add drivers for their hardware.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Inside Snabb&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;&lt;a href=&quot;https://github.com/snabbco/snabb&quot;&gt;Snabb&lt;/a&gt; was started in 2012 by free software hacker Luke Gorrie. Snabb provides direct access to the high-performance NICs but in addition to that it also provides an environment for building and running network functions.&lt;/p&gt;

&lt;p&gt;Snabb is composed of several elements:&lt;/p&gt;

&lt;ul&gt;
  &lt;li&gt;An &lt;strong&gt;Engine&lt;/strong&gt;, that runs the network functions.&lt;/li&gt;
  &lt;li&gt;&lt;strong&gt;Libraries&lt;/strong&gt;, that ease the development of network functions.&lt;/li&gt;
  &lt;li&gt;&lt;strong&gt;Apps&lt;/strong&gt;, reusable software components that generally manipulate packets.&lt;/li&gt;
  &lt;li&gt;&lt;strong&gt;Programs&lt;/strong&gt;, ready-to-use standalone network functions.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;A network function in Snabb is a combination of apps connected together by links. The Snabb’s engine is in charge of feeding the app graph with packets and give a chance to every app to execute.&lt;/p&gt;

&lt;p&gt;The engine processes the app graph in breadths. A breadth consists of two steps:&lt;/p&gt;

&lt;ul&gt;
  &lt;li&gt;&lt;strong&gt;Inhale&lt;/strong&gt;, puts packet into the graph.&lt;/li&gt;
  &lt;li&gt;&lt;strong&gt;Process&lt;/strong&gt;, every app has a chance to receive packets and manipulate them.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;During the inhale phase the method &lt;em&gt;pull&lt;/em&gt; of an app gets executed. Apps that implement such method act as packet generators within the app graph. Packets are placed at the app’s links. Generally there’s only one app of think kind for every graph.&lt;/p&gt;

&lt;p&gt;During the process phase the method push of an app gets executed. This gives a chance to every app to read packet from its incoming link, do something with them and likely place them out their outgoing link.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Hands-on example&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;Let’s build a network function that captures packets from a 10G NIC filters them using a packet-filtering expression and writes down the filtered packets to a pcap file. Such network function would look like this:&lt;/p&gt;

&lt;figure&gt;&lt;img src=&quot;/dpino/files/2017/11/snabb-apps.png&quot; title=&quot;Snabb basic filter&quot; alt=&quot;Snabb basic filter&quot; /&gt;&lt;figcaption style=&quot;text-align: center&quot;&gt;Snabb basic filter&lt;/figcaption&gt;&lt;/figure&gt;

&lt;p&gt;In Snabb code the equivalent graph above could be coded like this:&lt;/p&gt;

&lt;figure class=&quot;highlight&quot;&gt;&lt;pre&gt;&lt;code class=&quot;language-lua&quot; data-lang=&quot;lua&quot;&gt;&lt;span&gt;&lt;/span&gt;&lt;span class=&quot;kr&quot;&gt;function&lt;/span&gt; &lt;span class=&quot;nf&quot;&gt;run&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;()&lt;/span&gt;
	&lt;span class=&quot;kd&quot;&gt;local&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;c&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;config&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;new&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;()&lt;/span&gt;

	&lt;span class=&quot;c1&quot;&gt;-- App definition.&lt;/span&gt;
	&lt;span class=&quot;n&quot;&gt;config&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;add&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;c&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;s2&quot;&gt;&amp;quot;nic&amp;quot;&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;Intel82599&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;p&quot;&gt;{&lt;/span&gt;
		&lt;span class=&quot;n&quot;&gt;pci&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;s2&quot;&gt;&amp;quot;0000:04:00.0&amp;quot;&lt;/span&gt;
	&lt;span class=&quot;p&quot;&gt;})&lt;/span&gt;
	&lt;span class=&quot;n&quot;&gt;config&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;add&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;c&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;s2&quot;&gt;&amp;quot;filter&amp;quot;&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;PcapFilter&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;s2&quot;&gt;&amp;quot;src port 80&amp;quot;&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;)&lt;/span&gt;
	&lt;span class=&quot;n&quot;&gt;config&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;add&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;c&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;s2&quot;&gt;&amp;quot;pcap&amp;quot;&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;Pcap&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;PcapWriter&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;s2&quot;&gt;&amp;quot;output.pcap&amp;quot;&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;)&lt;/span&gt;

	&lt;span class=&quot;c1&quot;&gt;-- Link definition.&lt;/span&gt;
	&lt;span class=&quot;n&quot;&gt;config&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;link&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;c&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;s2&quot;&gt;&amp;quot;nic.tx        -&amp;gt; filter.input&amp;quot;&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;)&lt;/span&gt;
	&lt;span class=&quot;n&quot;&gt;config&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;link&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;c&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;s2&quot;&gt;&amp;quot;filter.output -&amp;gt; pcap.input&amp;quot;&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;)&lt;/span&gt;

	&lt;span class=&quot;n&quot;&gt;engine&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;configure&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;c&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;)&lt;/span&gt;
	&lt;span class=&quot;n&quot;&gt;engine&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;main&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;({&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;duration&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;=&lt;/span&gt;&lt;span class=&quot;mi&quot;&gt;10&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;})&lt;/span&gt;
&lt;span class=&quot;kr&quot;&gt;end&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/figure&gt;

&lt;p&gt;A configuration is created describing the app graph of the network function. The configuration is passed down to Snabb which executes it for 10 seconds.&lt;/p&gt;

&lt;p&gt;When Snabb’s engine runs this network function it executes the &lt;em&gt;pull&lt;/em&gt; method of each app to feed packets into the graph links, &lt;em&gt;inhale step&lt;/em&gt;. During the &lt;em&gt;process step&lt;/em&gt;, the method &lt;em&gt;push&lt;/em&gt; of each app is executed so apps have a chance to fetch packets from their incoming links, do something with them and likely place them into their outgoing links.&lt;/p&gt;

&lt;p&gt;Here’s how the real implementation of &lt;em&gt;PcapFilter.push&lt;/em&gt; method looks like:&lt;/p&gt;

&lt;figure class=&quot;highlight&quot;&gt;&lt;pre&gt;&lt;code class=&quot;language-lua&quot; data-lang=&quot;lua&quot;&gt;&lt;span&gt;&lt;/span&gt;&lt;span class=&quot;kr&quot;&gt;function&lt;/span&gt; &lt;span class=&quot;nc&quot;&gt;PcapFilter&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;:&lt;/span&gt;&lt;span class=&quot;nf&quot;&gt;push&lt;/span&gt; &lt;span class=&quot;p&quot;&gt;()&lt;/span&gt;
	&lt;span class=&quot;kr&quot;&gt;while&lt;/span&gt; &lt;span class=&quot;ow&quot;&gt;not&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;link&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;empty&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;self&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;input&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;rx&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;)&lt;/span&gt; &lt;span class=&quot;kr&quot;&gt;do&lt;/span&gt;
 		&lt;span class=&quot;kd&quot;&gt;local&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;p&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;link&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;receive&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;self&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;input&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;rx&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;)&lt;/span&gt;
  		&lt;span class=&quot;kr&quot;&gt;if&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;self&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;accept_fn&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;p&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;data&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;p&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;length&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;)&lt;/span&gt; &lt;span class=&quot;kr&quot;&gt;then&lt;/span&gt;
     		&lt;span class=&quot;n&quot;&gt;link&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;transmit&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;self&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;output&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;tx&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;p&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;)&lt;/span&gt;
     	&lt;span class=&quot;kr&quot;&gt;else&lt;/span&gt;
     		&lt;span class=&quot;n&quot;&gt;packet&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;free&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;p&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;)&lt;/span&gt;
		&lt;span class=&quot;kr&quot;&gt;end&lt;/span&gt;
	&lt;span class=&quot;kr&quot;&gt;end&lt;/span&gt;
&lt;span class=&quot;kr&quot;&gt;end&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/figure&gt;

&lt;p&gt;A packet in Snabb is a really simple data structure. Basically, it consists of a length field and an array of bytes of fixed size.&lt;/p&gt;

&lt;figure class=&quot;highlight&quot;&gt;&lt;pre&gt;&lt;code class=&quot;language-c&quot; data-lang=&quot;c&quot;&gt;&lt;span&gt;&lt;/span&gt;&lt;span class=&quot;k&quot;&gt;struct&lt;/span&gt; &lt;span class=&quot;nc&quot;&gt;packet&lt;/span&gt;&lt;span class=&quot;w&quot;&gt; &lt;/span&gt;&lt;span class=&quot;p&quot;&gt;{&lt;/span&gt;&lt;span class=&quot;w&quot;&gt;&lt;/span&gt;
&lt;span class=&quot;w&quot;&gt;	&lt;/span&gt;&lt;span class=&quot;kt&quot;&gt;uint16_t&lt;/span&gt;&lt;span class=&quot;w&quot;&gt; &lt;/span&gt;&lt;span class=&quot;n&quot;&gt;length&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;;&lt;/span&gt;&lt;span class=&quot;w&quot;&gt;&lt;/span&gt;
&lt;span class=&quot;w&quot;&gt;  	&lt;/span&gt;&lt;span class=&quot;kt&quot;&gt;unsigned&lt;/span&gt;&lt;span class=&quot;w&quot;&gt; &lt;/span&gt;&lt;span class=&quot;kt&quot;&gt;char&lt;/span&gt;&lt;span class=&quot;w&quot;&gt; &lt;/span&gt;&lt;span class=&quot;n&quot;&gt;data&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;[&lt;/span&gt;&lt;span class=&quot;mi&quot;&gt;10&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;*&lt;/span&gt;&lt;span class=&quot;mi&quot;&gt;1024&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;];&lt;/span&gt;&lt;span class=&quot;w&quot;&gt;&lt;/span&gt;
&lt;span class=&quot;p&quot;&gt;};&lt;/span&gt;&lt;span class=&quot;w&quot;&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/figure&gt;

&lt;p&gt;A link is a ring-buffer of packets.&lt;/p&gt;

&lt;figure class=&quot;highlight&quot;&gt;&lt;pre&gt;&lt;code class=&quot;language-c&quot; data-lang=&quot;c&quot;&gt;&lt;span&gt;&lt;/span&gt;&lt;span class=&quot;k&quot;&gt;struct&lt;/span&gt; &lt;span class=&quot;nc&quot;&gt;link&lt;/span&gt;&lt;span class=&quot;w&quot;&gt; &lt;/span&gt;&lt;span class=&quot;p&quot;&gt;{&lt;/span&gt;&lt;span class=&quot;w&quot;&gt;&lt;/span&gt;
&lt;span class=&quot;w&quot;&gt;	&lt;/span&gt;&lt;span class=&quot;k&quot;&gt;struct&lt;/span&gt; &lt;span class=&quot;nc&quot;&gt;packet&lt;/span&gt;&lt;span class=&quot;w&quot;&gt; &lt;/span&gt;&lt;span class=&quot;o&quot;&gt;*&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;packets&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;[&lt;/span&gt;&lt;span class=&quot;mi&quot;&gt;1024&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;];&lt;/span&gt;&lt;span class=&quot;w&quot;&gt;&lt;/span&gt;
&lt;span class=&quot;w&quot;&gt;  	&lt;/span&gt;&lt;span class=&quot;c1&quot;&gt;// the next element to be read&lt;/span&gt;
&lt;span class=&quot;w&quot;&gt;  	&lt;/span&gt;&lt;span class=&quot;kt&quot;&gt;int&lt;/span&gt;&lt;span class=&quot;w&quot;&gt; &lt;/span&gt;&lt;span class=&quot;n&quot;&gt;read&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;;&lt;/span&gt;&lt;span class=&quot;w&quot;&gt;&lt;/span&gt;
&lt;span class=&quot;w&quot;&gt;  	&lt;/span&gt;&lt;span class=&quot;c1&quot;&gt;// the next element to be written&lt;/span&gt;
&lt;span class=&quot;w&quot;&gt;  	&lt;/span&gt;&lt;span class=&quot;kt&quot;&gt;int&lt;/span&gt;&lt;span class=&quot;w&quot;&gt; &lt;/span&gt;&lt;span class=&quot;n&quot;&gt;write&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;;&lt;/span&gt;&lt;span class=&quot;w&quot;&gt;&lt;/span&gt;
&lt;span class=&quot;p&quot;&gt;};&lt;/span&gt;&lt;span class=&quot;w&quot;&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/figure&gt;

&lt;p&gt;Every app has zero or many input links and zero or many output links. The number of links is created on runtime when the graph is defined. In the example above, the nic app has one outgoing link (&lt;span style=&quot;color: blue&quot;&gt;nic.tx&lt;/span&gt;); the filter app has one incoming link (&lt;span style=&quot;color: red&quot;&gt;filter.rx&lt;/span&gt;) and one outgoing link (&lt;span style=&quot;color: blue&quot;&gt;filter.tx&lt;/span&gt;); and the pcap app has one incoming link (&lt;span style=&quot;color: red&quot;&gt;pcap.input&lt;/span&gt;).&lt;/p&gt;

&lt;p&gt;It might be surprising that packets and links are defined in C code, instead of Lua. Snabb runs on top of &lt;a href=&quot;http://luajit.org/&quot;&gt;LuaJIT&lt;/a&gt;, an ultra-fast virtual machine for executing Lua programs. LuaJIT implements an &lt;a href=&quot;http://luajit.org/ext_ffi.html&quot;&gt;FFI&lt;/a&gt; (&lt;em&gt;Foreign Function Interface&lt;/em&gt;) to interact with C data types and call C runtime functions or external libraries directly from Lua code. In Snabb most data structures are defined in C which allows to compact data more efficiently.&lt;/p&gt;

&lt;figure class=&quot;highlight&quot;&gt;&lt;pre&gt;&lt;code class=&quot;language-c&quot; data-lang=&quot;c&quot;&gt;&lt;span&gt;&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;local&lt;/span&gt;&lt;span class=&quot;w&quot;&gt; &lt;/span&gt;&lt;span class=&quot;n&quot;&gt;ether_header_t&lt;/span&gt;&lt;span class=&quot;w&quot;&gt; &lt;/span&gt;&lt;span class=&quot;o&quot;&gt;=&lt;/span&gt;&lt;span class=&quot;w&quot;&gt; &lt;/span&gt;&lt;span class=&quot;n&quot;&gt;ffi&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;typeof&lt;/span&gt;&lt;span class=&quot;w&quot;&gt; &lt;/span&gt;&lt;span class=&quot;p&quot;&gt;[[&lt;/span&gt;&lt;span class=&quot;w&quot;&gt;&lt;/span&gt;
&lt;span class=&quot;cm&quot;&gt;/* All values in network byte order.  */&lt;/span&gt;&lt;span class=&quot;w&quot;&gt;&lt;/span&gt;
&lt;span class=&quot;k&quot;&gt;struct&lt;/span&gt; &lt;span class=&quot;p&quot;&gt;{&lt;/span&gt;&lt;span class=&quot;w&quot;&gt;&lt;/span&gt;
&lt;span class=&quot;w&quot;&gt;   &lt;/span&gt;&lt;span class=&quot;kt&quot;&gt;uint8_t&lt;/span&gt;&lt;span class=&quot;w&quot;&gt;  &lt;/span&gt;&lt;span class=&quot;n&quot;&gt;dhost&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;[&lt;/span&gt;&lt;span class=&quot;mi&quot;&gt;6&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;];&lt;/span&gt;&lt;span class=&quot;w&quot;&gt;&lt;/span&gt;
&lt;span class=&quot;w&quot;&gt;   &lt;/span&gt;&lt;span class=&quot;kt&quot;&gt;uint8_t&lt;/span&gt;&lt;span class=&quot;w&quot;&gt;  &lt;/span&gt;&lt;span class=&quot;n&quot;&gt;shost&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;[&lt;/span&gt;&lt;span class=&quot;mi&quot;&gt;6&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;];&lt;/span&gt;&lt;span class=&quot;w&quot;&gt;&lt;/span&gt;
&lt;span class=&quot;w&quot;&gt;   &lt;/span&gt;&lt;span class=&quot;kt&quot;&gt;uint16_t&lt;/span&gt;&lt;span class=&quot;w&quot;&gt; &lt;/span&gt;&lt;span class=&quot;n&quot;&gt;type&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;;&lt;/span&gt;&lt;span class=&quot;w&quot;&gt;&lt;/span&gt;
&lt;span class=&quot;p&quot;&gt;}&lt;/span&gt;&lt;span class=&quot;w&quot;&gt; &lt;/span&gt;&lt;span class=&quot;n&quot;&gt;__attribute__&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;((&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;packed&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;))&lt;/span&gt;&lt;span class=&quot;w&quot;&gt;&lt;/span&gt;
&lt;span class=&quot;p&quot;&gt;]]&lt;/span&gt;&lt;span class=&quot;w&quot;&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/figure&gt;

&lt;p&gt;Calling a C-runtime function is really easy too.&lt;/p&gt;

&lt;figure class=&quot;highlight&quot;&gt;&lt;pre&gt;&lt;code class=&quot;language-c&quot; data-lang=&quot;c&quot;&gt;&lt;span&gt;&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;ffi&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;cdef&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;[[&lt;/span&gt;&lt;span class=&quot;w&quot;&gt;&lt;/span&gt;
&lt;span class=&quot;w&quot;&gt;  &lt;/span&gt;&lt;span class=&quot;kt&quot;&gt;void&lt;/span&gt;&lt;span class=&quot;w&quot;&gt; &lt;/span&gt;&lt;span class=&quot;n&quot;&gt;syslog&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;kt&quot;&gt;int&lt;/span&gt;&lt;span class=&quot;w&quot;&gt; &lt;/span&gt;&lt;span class=&quot;n&quot;&gt;priority&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt;&lt;span class=&quot;w&quot;&gt; &lt;/span&gt;&lt;span class=&quot;k&quot;&gt;const&lt;/span&gt;&lt;span class=&quot;w&quot;&gt; &lt;/span&gt;&lt;span class=&quot;kt&quot;&gt;char&lt;/span&gt;&lt;span class=&quot;err&quot;&gt;\&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;*&lt;/span&gt;&lt;span class=&quot;w&quot;&gt; &lt;/span&gt;&lt;span class=&quot;n&quot;&gt;format&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt;&lt;span class=&quot;w&quot;&gt; &lt;/span&gt;&lt;span class=&quot;p&quot;&gt;...);&lt;/span&gt;&lt;span class=&quot;w&quot;&gt;&lt;/span&gt;
&lt;span class=&quot;p&quot;&gt;]]&lt;/span&gt;&lt;span class=&quot;w&quot;&gt;&lt;/span&gt;
&lt;span class=&quot;n&quot;&gt;ffi&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;C&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;syslog&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;mi&quot;&gt;2&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt;&lt;span class=&quot;w&quot;&gt; &lt;/span&gt;&lt;span class=&quot;s&quot;&gt;&amp;quot;error:...&amp;quot;&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;);&lt;/span&gt;&lt;span class=&quot;w&quot;&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/figure&gt;

&lt;p&gt;&lt;strong&gt;Wrapping up and last thoughts&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;In this article I’ve covered the basics of Snabb. I showed how to use Snabb to build network functions and explained why Snabb is a very convenient toolkit to write such type of programs. Snabb runs very fast since it by-passes the kernel, which makes it very useful for high-performance networking. In addition, Snabb is written in the high-level language Lua which enormously simplifies the entry barrier to start writing network functions.&lt;/p&gt;

&lt;p&gt;However, there’s more things in Snabb I left out in this article. Snabb comes with a preset of programs ready to run. It also comes with a vast collection of apps and libraries which can help to speed up the construction of new network functions.&lt;/p&gt;

&lt;p&gt;You don’t need to own a Intel10G card to start using Snabb today. Snabb can be used over TAP interfaces. It won’t be highly performant but it’s the best way to start with Snabb.&lt;/p&gt;

&lt;p&gt;In a next article I plan to cover a more elaborated example of a network function using TAP interfaces.&lt;/p&gt;
</description>
        <pubDate>Mon, 13 Nov 2017 06:00:00 +0000</pubDate>
        <link>http://blogs.igalia.com/dpino/2017/11/13/snabb-network-toolkit/</link>
        <guid isPermaLink="true">http://blogs.igalia.com/dpino/2017/11/13/snabb-network-toolkit/</guid>
        
        <category>igalia</category>
        
        <category>networking</category>
        
        
      </item>
    
  </channel>
</rss>
