Effect of readahead and file system block reallocation
for LBCAS (LoopBack Content Addressable Storage)

                 @ Linux Symposium 2009,
                 Montreal, Canada, 17/July
    Paper: http://www.kernel.org/doc/ols/2009/ols2009-pages-275-286.pdf


            Kuniyasu Suzaki †, Toshiki Yagi †,
           Kengo Iijima †, Nguyen Anh Quynh †,
                  Yoshihito Watanabe ††

             †
                                 Research Center for Information Security

           ††                                                               1
Key words

• LBCAS: Loopback Content Addressable Storage
   – Virtual block device (network transparent block device)
• readahead
   – Disk prefetch mechanism in Linux kernel
      • System call “readahead” is different function.
• file system block reallocation
   – A kind of defrag tool
   – We developed “ext2/3optimizer” which reallocate i-node data block.


Today’s talk is optimization methods using them.


                                                                          2
Today’s Contents
• Motivation
    – What is LBCAS used for?
    – Correlation among LBCAS, file system block reallocation (ext2/3optimizer),
      and disk prefetch (readahead)
•   LBCAS: Loopback Content Addressable Storage
•   Optimization: ext2/3optimizer and readahead
•   Performance Results
•   Conclusions



                                                                            3
Motivation
              What is “LBCAS” used for?
• LBCAS is developed for OS Circular.
• OS Circular is a project to distribute bootable disk
  image for virtual machine and real machine.
   – OS Circular project
      • http://openlab.jp/oscircular/




                                                         4
OS Circular (Big Picture)

      OS Suppliers
      (update timely)




                                                                      block files on
LBCAS                                                                 HTTP Server
(Loopback Content                            Internet
Addressable Storage)


                                        Construct Virtual Disk
                                        from block files




                              KVM           QEMU
    Users
    Try OS without
    installation                                                                 5
                          Virtual Machine                        Real Machine
Performance Issues (Today’s Main Topic)
• LBCAS is sensitive for access patterns.
     – Performance is affected by Number and Size of Disk Prefetch
       (“readahead” of Linux kernel)
• Number and Size of readahead can be optimized by file
  system block reallocation.
     – Defrag Tools are not enough. We developed “ext2/3optimzer”.

 ext2/3optimizer              •Number of readahead
 reallocates blocks of        is reduced              Performance of
 ext2/3, which is based                               LBCAS is increased
                              •Size of readahead is
 on access profile.           extended

                   General Technique
Presentation   ③                       ②                   ①
Order                                                                 6
LBCAS: LoopBack Content Addressable Storage
• LBCAS= CAS + LoopBack
   – CAS
      • Indirect addressing by SHA-1 digest of block contents
      • Benefit: Same blocks are expressed by same SHA-1 digest and reduced
        total storage
      • Mainly used for Archive. Example: Venti of Plan9 [USENIX FAST’02]
   – LoopBack
      • Virtual block device. A file is used as a block device.
      • The abstraction by file makes easy to treat.
• LBCAS saves each block to a file, which is called “block file”. The
  file is named by SHA-1 digest of its contents.
• Block files are managed by “mapping table” file, which is a table of
  physical address and SHA-1 file name.

                                                                          7
Block files of LBCAS
                                               Address             File Name
                                               00000000-0003FFFF   4ad36ffe8…
                                               00040000-0007FFFF   974daf34a…
                                               00080000-000BFFFF   2d34ff3e1…
Block Device                                   000C0000-000FFFFF    3310012a…
                   Mapping Table and           …                   …
                      block files
  4KB Page                      map01.idx
                                4ad36ffe8…
    ext2       256KB            974daf34a…
    …                           2d34ff3e1…         The block files are re-
                                3310012a…       constructed as a virtual disk
    …                   …       …                      with LBCAS
                       Block file is named by
                       SHA-1 digest of its contents
    …
                   compressed
    …              by zlib


                                                                            8
LBCAS (1/2)
• The image of LBCAS are made from existing
  normal block device.
• Original block device is split by fixed size (64KB -
  512KB) and compressed by zlib.
• Block files are reconstructed to a loopback file by FUSE
  wrapper.
   – FUSE is a User-land File System.
      • http://fuse.sf.net
• Each block file is measured with the SHA1 file name
  when it mapped to loopback file.
                                                        9
Construct a virtual disk of LBCAS on a Client PC

                                          OS




                                               10
Structure of LBCAS
• Storage Cache
   – Suppress download
• Memory Cache
   – Suppress disk-access and
     uncompress




                                        11
LBCAS (2/2)
• When a file is updated or created on the original block device, the
  relevant block files are newly created with new SHA1 file name.
  The mapping table file is also renewed.
   – Old block files are reusable.
• HTTP for file deliver
   – Most popular and well designed for Internet.
      • Utilize inexpensive Web hosting services, Proxies, and Mirror Servers
        for world wide deployment.
• Block files are network/storage transparent.
   – If necessary block files are stored in a local storage, network connection is
     not necessary.



                                                                               12
Partial Update of LBCAS
                    Block Device           block file
                                           block files named by SHA-1
                      4KB Page                     map01.idx
                       ext2        256KB           4ad36ffe8…
                                                   974daf34a…
                        …                          2d34ff3e1…
                                                   3310012a…
                        …                    …     …


                                                         Same files
                       …                                 Reusable for
                                                         FUSE


   Update             4KB Page                     map02.idx
                                   256KB           4ad36ffe8…
                       ext2                                             FUSE
                                                   dd4daf34a…           driver
                        …                          2d34ff3e1…
                                                   3310012a…
                        …                   …      …

apt-get install …
                                                  Create Once, Use Many
                        …                                                        13
Performance Issues
• LBCAS is sensitive for access patterns.
• 2 types of block size mismatch
   (1) between File System and LBCAS (Static Mismatch)
      • ext2/3 4KB block size
      • LBCAS 64KB-512KB block size
          – Occupancy (Rate of necessary data in a block file) is low.
              » Kitagawa[LinuxKongress2006] reported the occupancy was 30% on
                KNOPPIX 3.8.2 on 256KB LBCAS.
   (2) between “readahead(disk prefetch)” and LBCAS (Dynamic
     Mismatch)
       • readahead 4KB-128KB coverage size
      • LBCAS 64KB-512KB lock size
         – Small and many access (worm-eaten access to a block file) causes
           redundant download and unnecessary uncompress for LBCAS Driver.

                                                                           14
CAUTION for readahead

• Disk prefetch “readahead” and System Call “readahead”
   – System Call “readahead” populates the page cache with data
     from a file. Thus, whole data of a file is stored at page cache.
     The coverage is size of a file.
   – It is not directly related to the disk prefetch but it achieves
     same function from user space.
   – Some boot procedure use the system call “readahead”. The
     files, which are populated the page cache at boot time in
     advance, are listed at “/etc/readahead/boot,desktop”. We call
     this function “u-readahead” in this presentation.


                                                                   15
Block size mismatch
• Solution (increasing locality of reference)
   1. (for static mismatch) Increase occupancy by reallocate necessary
      data in a block file.
   2. (for dynamic mismatch) Extend the coverage size of readahead
      by sequential access and high hit rate of page cache.


• “ext2/3optimizer” repacks the data blocks of ext2/3 file
  system to be in line.
   – The repacking is based on the block access profile at boot time.
   – As the results, ext2/3optimizer reduces the number of block files.

                                                                 16
Occupancy in a block file of LBCAS
•   Occupancy (necessary data in a block file) depends on the necessary data.
•   “Worn-eaten” access (readahead) causes redundant download of block file.

                        Ext2/3 File System    readahead           LBCAS
    Read Order          (4K)                  (4K~128K)           (256KB)
           ①
           ②           Hit Page-Cache
                                                                            Occupancy is low


           ③

                                              Cache missed and the
                                              coverage is shrunk




                                                          Redundant block
                                                                                      17
               Files        Block search      Disk access         Block files
                                              via readahead       downloaded
Readahead and LBCAS 1/2
• Readahead is a mechanism of disk prefetch. The data are saved to page cache.
• The coverage size is extended or shrank by the rate of page cache hit rate.




        start       ahead_start         I/O

                current_window         ahead_window


                                    Extend to “max_readahead”
          sequential read
          from application




                                                                I/O

                                      current_window              ahead_window



                                  sequential read
                                                                                 18
                                  from application
Readahead and LBCAS 2/2
•   When a readahead is issued, a part of block file is required and mapped to the virtual disk.
    The size depends on the coverage size of readahead.
     – Wide readahead is effective for LBCAS driver.
•   When a same block file is required sequentially, the block file is stored on the memory
    cache of LBCAS and the uncompression is eliminated.
                                                         D3E14…
                                                              Download block files Map
                              LBCAS                           to loopback device


            start       ahead_start                                       Low occupancy caused
                                            I/O                           size mismatch

                    current_window         ahead_window


                                        Extend to “max_readahead”
              sequential read
              from application                                                           3B441…

                                                  Stored     D3E14…
                                                  Memory cache                                    LBCAS

                                                                           I/O

                                          current_window                      ahead_window


                                      sequential read
                                                                                                          19
                                      from application
Readahead and Block Reallocation

• Readahead can be improved by block reallocation of
  File System, if the hit rate of page cache is increased.
• Defrag tools looks work well …
   – Unfortunately, current defrag tools are not suitable, because
     they are developed from the view of file defrag.
• We developed “ext2/3optimizer” which reallocate the
  data blocks of ext2/3 based on access profile.
   – It also increases occupancy in a block file.



                                                                20
Access profile and reallocation

                           App                           ext2/3optimizer                   App
   User


                                                 Access Profile
   Kernel                                        (via /proc/ )
                              VFS                                                             VFS



                   File System Driver (ext2/3)                                     File System Driver (ext2/3)
                            Profiler


                 Page Cache (Memory)                                             Page Cache (Memory)

 Readahead is
small and many                                                    Readahead is
                                                                   sequential
 (worm-eaten)            Block Driver                                                    Block Driver
                                                                     access
    access                (Loopback)                                                      (Loopback)


   Device

                                                         Reallocate                                         21
          scattered                                                        gathered
Block Relocation: Ext2/3optimizer [LinuxKongress06]
• Change data blocks to be arranged in line. Structure of meta data is not changed.
• The arrangement is based on the access profile.
• Feature:
       – Normal driver is used.
       – The fragmentation is occurred from the view of file
       – The relocation increases page-cache hit. readahead extend the coverage size.
Mode                                                 Mode
Owner info                                           Owner info
Size                                                 Size
                                                                                           high
Timestamps                                           Timestamps        readahead        occupancy
                                                                        is widen
Direct Blocks                                        Direct Blocks




Indirect Blocks                                      Indirect Blocks

Double Indirect                                      Double Indirect

Triple Indirect                                      Triple Indirect




                                                                                        22
Performance Analysis

• Confirm effect of ext2/3optimizer on LBCAS for booting.
   – Ubuntu 9.04 (2.6.28) installed on ext3 (8GB) with KVM-60.
      • The ext3 was optimized by ext2/3optimizer for boot profile.
      • The disk image is translated to LBCAS (64KB - 512KB).
• Compare with
   – Normal
   – u-readahead: user level readahead (system call) for booting
   – ext2/3optimzer



                                                                   23
Static Analyze by DAVL (Disk Allocation Viewer for Linux)


         Fragmentation 0.21%            Fragmentation 1.11%



System
 block
                              Non-
                           contiguous
                             block




                           contiguous
                             block




                                                              24
         normal                         ext2/3opt
Utilization of I/O
• BootChart showed utilization of I/O.
   – u-readahead caused spike of I/O.


         normal             u-readahead                 ext2/3opt




                                          Reduced I/O




                        I/O Spike


                                                                    25
Dynamic Analyze: Disk Access at boot time
• Ext2/3optimizer relocate data blocks, which are
  required at boot time, at the top of virtual disk.

Red: normal
Blue: ext2/3opt
              Time (s)




                         0   2.0   4.0        6.0   8.0
                                                          26
                                    Address (GB)
Trace of readahead coverage size
            128KB


normal
            64KB


            32KB


             0KB
                    0   10    20     30     40    50                 60
            128KB
                                                            Time (s)


u-hreadahead
            64KB


            32KB


             0KB
                    0   10    20     30     40    50                 60
            128KB                                           Time (s)

                                                       Fewer small
ext2/3opt                                              readahead
            64KB


            32KB

                                                            27
             0KB
                    0   10    20     30     40    50                 60
Frequency for each readahead coverage
• Ext2/3 optimizer reduced small “readahead”.
        Frequency




                    0   32   64                128
                                                       28
                                   request size (KB)
Volume Transition on processing level
                                   normal         u-readahead       ext2/3opt
 Volume of files (number, average) 203MB (2,248 Av: 92KB)         76MB (67%)
 Volume of required blocks         127MB
                                            +81MB         +104MB            +13MB
 Volume of access which includes 208MB           231MB 1/2          140MB
 coverage of readahead (frequency,
 average size)
                                   freq:6,379              1/3
                                                 freq:5, 827        freq:2,129
                                   size:33KB     size:41KB 2        size:67KB
•   Volume of downloaded block files MB, (uncompressed MB),
    Occupancy % (127MB/ uncompressed MB)
LBCAS size       normal               u-readahead             ext2/3opt
64KB             86.1(247), 51.5%     93.4(272), 46.9%        55.3(144), 88.7%
128KB            96.8(290), 43.9%     104(315), 40.3%         55.3(149), 85.3%
256KB            114(358), 35.5%      123(386), 35.0%         55.6(159), 80.0%
512KB            144(474), 26.9%      153(508), 25.1%         55.6(176), 71.8%   29
Consumed time in LBCAS

Time (s)

            43    43     42    37     43    43    45    38          45     45     46        44




            13    13     13    20     14   14     12   19           7      6          6         7


                 normal              u-readahead                ext2/3opt

                                                             512KB was not efficient
                                                              on each optimization
Time (s)




           5.0    6.5   9.0   14.0   5.2   6.7   7.3   11.4
                                                                   2.5   2.8    3.5       4.8

           5.7   4.6    4.7   3.1    6.6   5.8   2.9   4.5        3.6    2.7     1.7       1.1
                                                                                                    30
                 normal              u-readahead                ext2/3opt
Total download of LBCAS
• Ext2/3opt reduced the necessary block files (256KB).

                        140


                                + normal
                        120     □ u-readahead
                                × ext2/3opt


                        100
                              System call “readahead” downloaded
        Download (MB)




                              required files in advance. It caused I/O
                              spike. It also included redundant data.
                        80



                        60



                        40



                        20




                                                                         Time (s) 31
I/O Requests are
 independent of
    LBCAS
                           Frequency of function in LBCAS
normal             Requests (R)   Download   Storage              Uncompress   Memory            Files per request
                   (Av: 33KB)     (D)        Cache                 (U)         Cache (M)         R= ①+②+③
                                             (S)                  D+S=U                          U+M=①+②*2+③*3
64KB               6,338          3,958      1,663                5,621        3,647             ① 4,148
                                                                                                 ② 1,450
                                                                                                 ③ 740
128KB              6,381          2,321      1,729                4,050        3,793             ① 4,919
                                                                                                 ② 1,462
256KB              6,379          1,435      1,748                3,183        3,908             ① 5,667
                                                                                                 ② 717
512KB              6,395          848        1,769                2,717        4,019             ① 6,054
                                                                                                 ② 341

u-readahead        (Av: 41KB)
64KB               5,825          4,344      1,172                5,516        3,626            ① 3,537
                                                                                                ② 1,259
                                                                                                ③ 1,029
128KB              5,834          2,526      1,200                3,726        3,761            ① 4,181
                                                                                                ② 1,653
256KB              5,827          1,544      1,179                2,723        3.,908           ① 5,023
                                                                                                ② 804
512KB              5,822          1.015      1,172                2,187        4,023            ① 5,434
                                                                                                ② 388
                                                      download                          uncompress
ext2/3opt          (Av: 67KB)                        is reduced                         is reduced

64KB               2,165          2,296      626                  2,922        1,311            ① 941
                                                                                                ② 380
                                                                                                ③ 844
128KB              2,148          1,189      593                  1,782        1,398            ① 1,116
                                                                                                ② 1,032
256KB              2,129          634        576                  1,210        1,409            ① 1,639
                                                                                                ② 490
512KB              2,132          353        517                  870          1,520            ① 1,874      32
                                                                                                ② 258
Discussions
• Weak point of ext2/3optimizer
   – The reallocation is customized for booting. The other
     applications may be subject to adverse effect.
      • I guess boot procedure is special and has no strong relation
        to other applications.
   – The reallocation is customized for a certain version. When a
     part of boot procedure is updated, we have to re-optimize the
     image.




                                                                  33
Conclusions
• “ext2/3optimzer” is a strong tool to utilize “readahead”,
  because it reallocates data blocks which are used by boot
  procedure.
   – It increased occupancy (rate of necessary data in a block file)
     of LBCAS block file.
   – It made the coverage of readahead double and reduced the
     number of readahead to half.
• “ext2/3optimizer” is not for LBCAS. It is used for
  normal Linux Distributions.


                                                                   34
Summary

The some services are available. Just try!
 http://openlab.jp/oscircular/


  EXT2/3optimizer developers
      http://unit.aist.go.jp/itri/knoppix/ext2optimizer/index-en.htm
  DAVL developers
      http://sourceforge.net/projects/davl/
  BootChart
      http://www.bootchart.org/

                                                                 35

More Related Content

PDF
Linux.Conf.AU 2009 (LCA09) Slide "OS Circular: Internet bootable OS Archive" ...
PDF
ASPLOS2011 workshop RESoLVE "Effect of Disk Prefetching of Guest OS "
PDF
DaStor/Cassandra report for CDR solution
PDF
Ph.D. thesis presentation
PPTX
Couchbase Server 2.0 - XDCR - Deep dive
PDF
Linuxcon Barcelon 2012: LXC Best Practices
TXT
Packages
PDF
Virtualization which isn't: LXC (Linux Containers)
Linux.Conf.AU 2009 (LCA09) Slide "OS Circular: Internet bootable OS Archive" ...
ASPLOS2011 workshop RESoLVE "Effect of Disk Prefetching of Guest OS "
DaStor/Cassandra report for CDR solution
Ph.D. thesis presentation
Couchbase Server 2.0 - XDCR - Deep dive
Linuxcon Barcelon 2012: LXC Best Practices
Packages
Virtualization which isn't: LXC (Linux Containers)

What's hot (20)

PPTX
Key-value databases in practice Redis @ DotNetToscana
PDF
Linux containers-namespaces(Dec 2014)
PDF
Lightweight Virtualization in Linux
PPTX
Linux container, namespaces & CGroup.
PDF
(Free and Net) BSD Xen Roadmap
PDF
My sql with enterprise storage
PDF
When ACLs Attack
PDF
Oracle rac 10g best practices
PPTX
Red Hat System Administration
PDF
Linux cgroups and namespaces
ODP
Easy backup & restore with Clonezilla - Tips form Basic to Advanced
PDF
ICDE2010 Nb-GCLOCK
PDF
Git session-2012-2013
PDF
Greenstone aib 16_feb12_casarosa
PPT
VNSISPL_DBMS_Concepts_ch17
ODP
HIGH AVAILABLE CLUSTER IN WEB SERVER WITH HEARTBEAT + DRBD + OCFS2
PPTX
Containers are the future of the Cloud
PDF
Asiabsdcon14
PPTX
High Availability != High-cost
PPT
Linux concept workshop
Key-value databases in practice Redis @ DotNetToscana
Linux containers-namespaces(Dec 2014)
Lightweight Virtualization in Linux
Linux container, namespaces & CGroup.
(Free and Net) BSD Xen Roadmap
My sql with enterprise storage
When ACLs Attack
Oracle rac 10g best practices
Red Hat System Administration
Linux cgroups and namespaces
Easy backup & restore with Clonezilla - Tips form Basic to Advanced
ICDE2010 Nb-GCLOCK
Git session-2012-2013
Greenstone aib 16_feb12_casarosa
VNSISPL_DBMS_Concepts_ch17
HIGH AVAILABLE CLUSTER IN WEB SERVER WITH HEARTBEAT + DRBD + OCFS2
Containers are the future of the Cloud
Asiabsdcon14
High Availability != High-cost
Linux concept workshop
Ad

Similar to Linux Symposium 2009 Slide Suzaki "Effect of readahead and file system block reallocation for LBCAS (LoopBack Content Addressable Storage)" (20)

PDF
Hot sec10 slide-suzaki
PPT
final_rac
PPTX
Some key value stores using log-structure
PPTX
Reduce Resource Consumption & Clone in Seconds your Oracle Virtual Environmen...
PDF
Faster and Smaller qcow2 Files with Subcluster-based Allocation
PDF
Road show 2015 triangle meetup
PDF
Plam15 slides.potx
PDF
Shadow forensics print
PDF
CLFS 2010
PDF
An introduction and evaluations of a wide area distributed storage system
PDF
RocksDB meetup
PDF
Virtualization inside kubernetes
PDF
Oracle rac cachefusion - High Availability Day 2015
PDF
AOS Lab 9: File system -- Of buffers, logs, and blocks
PPTX
Collaborate vdb performance
PDF
Ippevent : openshift Introduction
PDF
Kernel Recipes 2015: Linux Kernel IO subsystem - How it works and how can I s...
PDF
Btrfs and Snapper - The Next Steps from Pure Filesystem Features to Integrati...
PDF
GlusterFS Update and OpenStack Integration
PDF
Container Attached Storage (CAS) with OpenEBS - Berlin Kubernetes Meetup - Ma...
Hot sec10 slide-suzaki
final_rac
Some key value stores using log-structure
Reduce Resource Consumption & Clone in Seconds your Oracle Virtual Environmen...
Faster and Smaller qcow2 Files with Subcluster-based Allocation
Road show 2015 triangle meetup
Plam15 slides.potx
Shadow forensics print
CLFS 2010
An introduction and evaluations of a wide area distributed storage system
RocksDB meetup
Virtualization inside kubernetes
Oracle rac cachefusion - High Availability Day 2015
AOS Lab 9: File system -- Of buffers, logs, and blocks
Collaborate vdb performance
Ippevent : openshift Introduction
Kernel Recipes 2015: Linux Kernel IO subsystem - How it works and how can I s...
Btrfs and Snapper - The Next Steps from Pure Filesystem Features to Integrati...
GlusterFS Update and OpenStack Integration
Container Attached Storage (CAS) with OpenEBS - Berlin Kubernetes Meetup - Ma...
Ad

More from Kuniyasu Suzaki (20)

PDF
RISC-Vのセキュリティ技術(TEE, Root of Trust, Remote Attestation)
PDF
遠隔デバイスとの信頼を築くための技術とその標準(TEEP RATS)
PDF
IETF111 RATS: Remote Attestation ProcedureS 報告
PDF
Slide presented at FIT 2021 Top Conference (Reboot Oriented IoT, ACSAC2021)
PDF
ACSAC2020 "Return-Oriented IoT" by Kuniyasu Suzaki
PDF
TEE (Trusted Execution Environment)は第二の仮想化技術になるか?
PDF
3種類のTEE比較(Intel SGX, ARM TrustZone, RISC-V Keystone)
PDF
Hardware-assisted Isolated Execution Environment to run trusted OS and applic...
PDF
RISC-V-Day-Tokyo2018-suzaki
PDF
BMC: Bare Metal Container @Open Source Summit Japan 2017
PDF
USENIX NSDI17 Memory Disaggregation
PDF
Io t security-suzki-20170224
PDF
”Bare-Metal Container" presented at HPCC2016
PDF
Kernel Memory Protection by an Insertable Hypervisor which has VM Introspec...
PDF
Report for S4x14 (SCADA Security Scientific Symposium 2014)
PDF
Slide used at ACM-SAC 2014 by Suzaki
PDF
OSセキュリティチュートリアル
PDF
Nested Virtual Machines and Proxies
PDF
Bitvisorをベースとした既存Windowsのドライバメモリ保護
PDF
Security on cloud storage and IaaS (NSC: Taiwan - JST: Japan workshop)
RISC-Vのセキュリティ技術(TEE, Root of Trust, Remote Attestation)
遠隔デバイスとの信頼を築くための技術とその標準(TEEP RATS)
IETF111 RATS: Remote Attestation ProcedureS 報告
Slide presented at FIT 2021 Top Conference (Reboot Oriented IoT, ACSAC2021)
ACSAC2020 "Return-Oriented IoT" by Kuniyasu Suzaki
TEE (Trusted Execution Environment)は第二の仮想化技術になるか?
3種類のTEE比較(Intel SGX, ARM TrustZone, RISC-V Keystone)
Hardware-assisted Isolated Execution Environment to run trusted OS and applic...
RISC-V-Day-Tokyo2018-suzaki
BMC: Bare Metal Container @Open Source Summit Japan 2017
USENIX NSDI17 Memory Disaggregation
Io t security-suzki-20170224
”Bare-Metal Container" presented at HPCC2016
Kernel Memory Protection by an Insertable Hypervisor which has VM Introspec...
Report for S4x14 (SCADA Security Scientific Symposium 2014)
Slide used at ACM-SAC 2014 by Suzaki
OSセキュリティチュートリアル
Nested Virtual Machines and Proxies
Bitvisorをベースとした既存Windowsのドライバメモリ保護
Security on cloud storage and IaaS (NSC: Taiwan - JST: Japan workshop)

Recently uploaded (20)

PPTX
AI IN MARKETING- PRESENTED BY ANWAR KABIR 1st June 2025.pptx
PPTX
future_of_ai_comprehensive_20250822032121.pptx
PDF
Transform-Your-Supply-Chain-with-AI-Driven-Quality-Engineering.pdf
PDF
Transform-Your-Factory-with-AI-Driven-Quality-Engineering.pdf
PDF
Flame analysis and combustion estimation using large language and vision assi...
PDF
Consumable AI The What, Why & How for Small Teams.pdf
PDF
NewMind AI Weekly Chronicles – August ’25 Week IV
PDF
The-2025-Engineering-Revolution-AI-Quality-and-DevOps-Convergence.pdf
PPTX
MuleSoft-Compete-Deck for midddleware integrations
PPTX
GROUP4NURSINGINFORMATICSREPORT-2 PRESENTATION
PPTX
Custom Battery Pack Design Considerations for Performance and Safety
PPTX
Training Program for knowledge in solar cell and solar industry
DOCX
search engine optimization ppt fir known well about this
PPTX
Build Your First AI Agent with UiPath.pptx
PDF
“A New Era of 3D Sensing: Transforming Industries and Creating Opportunities,...
PDF
4 layer Arch & Reference Arch of IoT.pdf
PDF
Early detection and classification of bone marrow changes in lumbar vertebrae...
PDF
Co-training pseudo-labeling for text classification with support vector machi...
PDF
Lung cancer patients survival prediction using outlier detection and optimize...
PPT
Galois Field Theory of Risk: A Perspective, Protocol, and Mathematical Backgr...
AI IN MARKETING- PRESENTED BY ANWAR KABIR 1st June 2025.pptx
future_of_ai_comprehensive_20250822032121.pptx
Transform-Your-Supply-Chain-with-AI-Driven-Quality-Engineering.pdf
Transform-Your-Factory-with-AI-Driven-Quality-Engineering.pdf
Flame analysis and combustion estimation using large language and vision assi...
Consumable AI The What, Why & How for Small Teams.pdf
NewMind AI Weekly Chronicles – August ’25 Week IV
The-2025-Engineering-Revolution-AI-Quality-and-DevOps-Convergence.pdf
MuleSoft-Compete-Deck for midddleware integrations
GROUP4NURSINGINFORMATICSREPORT-2 PRESENTATION
Custom Battery Pack Design Considerations for Performance and Safety
Training Program for knowledge in solar cell and solar industry
search engine optimization ppt fir known well about this
Build Your First AI Agent with UiPath.pptx
“A New Era of 3D Sensing: Transforming Industries and Creating Opportunities,...
4 layer Arch & Reference Arch of IoT.pdf
Early detection and classification of bone marrow changes in lumbar vertebrae...
Co-training pseudo-labeling for text classification with support vector machi...
Lung cancer patients survival prediction using outlier detection and optimize...
Galois Field Theory of Risk: A Perspective, Protocol, and Mathematical Backgr...

Linux Symposium 2009 Slide Suzaki "Effect of readahead and file system block reallocation for LBCAS (LoopBack Content Addressable Storage)"

  • 1. Effect of readahead and file system block reallocation for LBCAS (LoopBack Content Addressable Storage) @ Linux Symposium 2009, Montreal, Canada, 17/July Paper: http://www.kernel.org/doc/ols/2009/ols2009-pages-275-286.pdf Kuniyasu Suzaki †, Toshiki Yagi †, Kengo Iijima †, Nguyen Anh Quynh †, Yoshihito Watanabe †† † Research Center for Information Security †† 1
  • 2. Key words • LBCAS: Loopback Content Addressable Storage – Virtual block device (network transparent block device) • readahead – Disk prefetch mechanism in Linux kernel • System call “readahead” is different function. • file system block reallocation – A kind of defrag tool – We developed “ext2/3optimizer” which reallocate i-node data block. Today’s talk is optimization methods using them. 2
  • 3. Today’s Contents • Motivation – What is LBCAS used for? – Correlation among LBCAS, file system block reallocation (ext2/3optimizer), and disk prefetch (readahead) • LBCAS: Loopback Content Addressable Storage • Optimization: ext2/3optimizer and readahead • Performance Results • Conclusions 3
  • 4. Motivation What is “LBCAS” used for? • LBCAS is developed for OS Circular. • OS Circular is a project to distribute bootable disk image for virtual machine and real machine. – OS Circular project • http://openlab.jp/oscircular/ 4
  • 5. OS Circular (Big Picture) OS Suppliers (update timely) block files on LBCAS HTTP Server (Loopback Content Internet Addressable Storage) Construct Virtual Disk from block files KVM QEMU Users Try OS without installation 5 Virtual Machine Real Machine
  • 6. Performance Issues (Today’s Main Topic) • LBCAS is sensitive for access patterns. – Performance is affected by Number and Size of Disk Prefetch (“readahead” of Linux kernel) • Number and Size of readahead can be optimized by file system block reallocation. – Defrag Tools are not enough. We developed “ext2/3optimzer”. ext2/3optimizer •Number of readahead reallocates blocks of is reduced Performance of ext2/3, which is based LBCAS is increased •Size of readahead is on access profile. extended General Technique Presentation ③ ② ① Order 6
  • 7. LBCAS: LoopBack Content Addressable Storage • LBCAS= CAS + LoopBack – CAS • Indirect addressing by SHA-1 digest of block contents • Benefit: Same blocks are expressed by same SHA-1 digest and reduced total storage • Mainly used for Archive. Example: Venti of Plan9 [USENIX FAST’02] – LoopBack • Virtual block device. A file is used as a block device. • The abstraction by file makes easy to treat. • LBCAS saves each block to a file, which is called “block file”. The file is named by SHA-1 digest of its contents. • Block files are managed by “mapping table” file, which is a table of physical address and SHA-1 file name. 7
  • 8. Block files of LBCAS Address File Name 00000000-0003FFFF 4ad36ffe8… 00040000-0007FFFF 974daf34a… 00080000-000BFFFF 2d34ff3e1… Block Device 000C0000-000FFFFF 3310012a… Mapping Table and … … block files 4KB Page map01.idx 4ad36ffe8… ext2 256KB 974daf34a… … 2d34ff3e1… The block files are re- 3310012a… constructed as a virtual disk … … … with LBCAS Block file is named by SHA-1 digest of its contents … compressed … by zlib 8
  • 9. LBCAS (1/2) • The image of LBCAS are made from existing normal block device. • Original block device is split by fixed size (64KB - 512KB) and compressed by zlib. • Block files are reconstructed to a loopback file by FUSE wrapper. – FUSE is a User-land File System. • http://fuse.sf.net • Each block file is measured with the SHA1 file name when it mapped to loopback file. 9
  • 10. Construct a virtual disk of LBCAS on a Client PC OS 10
  • 11. Structure of LBCAS • Storage Cache – Suppress download • Memory Cache – Suppress disk-access and uncompress 11
  • 12. LBCAS (2/2) • When a file is updated or created on the original block device, the relevant block files are newly created with new SHA1 file name. The mapping table file is also renewed. – Old block files are reusable. • HTTP for file deliver – Most popular and well designed for Internet. • Utilize inexpensive Web hosting services, Proxies, and Mirror Servers for world wide deployment. • Block files are network/storage transparent. – If necessary block files are stored in a local storage, network connection is not necessary. 12
  • 13. Partial Update of LBCAS Block Device block file block files named by SHA-1 4KB Page map01.idx ext2 256KB 4ad36ffe8… 974daf34a… … 2d34ff3e1… 3310012a… … … … Same files … Reusable for FUSE Update 4KB Page map02.idx 256KB 4ad36ffe8… ext2 FUSE dd4daf34a… driver … 2d34ff3e1… 3310012a… … … … apt-get install … Create Once, Use Many … 13
  • 14. Performance Issues • LBCAS is sensitive for access patterns. • 2 types of block size mismatch (1) between File System and LBCAS (Static Mismatch) • ext2/3 4KB block size • LBCAS 64KB-512KB block size – Occupancy (Rate of necessary data in a block file) is low. » Kitagawa[LinuxKongress2006] reported the occupancy was 30% on KNOPPIX 3.8.2 on 256KB LBCAS. (2) between “readahead(disk prefetch)” and LBCAS (Dynamic Mismatch) • readahead 4KB-128KB coverage size • LBCAS 64KB-512KB lock size – Small and many access (worm-eaten access to a block file) causes redundant download and unnecessary uncompress for LBCAS Driver. 14
  • 15. CAUTION for readahead • Disk prefetch “readahead” and System Call “readahead” – System Call “readahead” populates the page cache with data from a file. Thus, whole data of a file is stored at page cache. The coverage is size of a file. – It is not directly related to the disk prefetch but it achieves same function from user space. – Some boot procedure use the system call “readahead”. The files, which are populated the page cache at boot time in advance, are listed at “/etc/readahead/boot,desktop”. We call this function “u-readahead” in this presentation. 15
  • 16. Block size mismatch • Solution (increasing locality of reference) 1. (for static mismatch) Increase occupancy by reallocate necessary data in a block file. 2. (for dynamic mismatch) Extend the coverage size of readahead by sequential access and high hit rate of page cache. • “ext2/3optimizer” repacks the data blocks of ext2/3 file system to be in line. – The repacking is based on the block access profile at boot time. – As the results, ext2/3optimizer reduces the number of block files. 16
  • 17. Occupancy in a block file of LBCAS • Occupancy (necessary data in a block file) depends on the necessary data. • “Worn-eaten” access (readahead) causes redundant download of block file. Ext2/3 File System readahead LBCAS Read Order (4K) (4K~128K) (256KB) ① ② Hit Page-Cache Occupancy is low ③ Cache missed and the coverage is shrunk Redundant block 17 Files Block search Disk access Block files via readahead downloaded
  • 18. Readahead and LBCAS 1/2 • Readahead is a mechanism of disk prefetch. The data are saved to page cache. • The coverage size is extended or shrank by the rate of page cache hit rate. start ahead_start I/O current_window ahead_window Extend to “max_readahead” sequential read from application I/O current_window ahead_window sequential read 18 from application
  • 19. Readahead and LBCAS 2/2 • When a readahead is issued, a part of block file is required and mapped to the virtual disk. The size depends on the coverage size of readahead. – Wide readahead is effective for LBCAS driver. • When a same block file is required sequentially, the block file is stored on the memory cache of LBCAS and the uncompression is eliminated. D3E14… Download block files Map LBCAS to loopback device start ahead_start Low occupancy caused I/O size mismatch current_window ahead_window Extend to “max_readahead” sequential read from application 3B441… Stored D3E14… Memory cache LBCAS I/O current_window ahead_window sequential read 19 from application
  • 20. Readahead and Block Reallocation • Readahead can be improved by block reallocation of File System, if the hit rate of page cache is increased. • Defrag tools looks work well … – Unfortunately, current defrag tools are not suitable, because they are developed from the view of file defrag. • We developed “ext2/3optimizer” which reallocate the data blocks of ext2/3 based on access profile. – It also increases occupancy in a block file. 20
  • 21. Access profile and reallocation App ext2/3optimizer App User Access Profile Kernel (via /proc/ ) VFS VFS File System Driver (ext2/3) File System Driver (ext2/3) Profiler Page Cache (Memory) Page Cache (Memory) Readahead is small and many Readahead is sequential (worm-eaten) Block Driver Block Driver access access (Loopback) (Loopback) Device Reallocate 21 scattered gathered
  • 22. Block Relocation: Ext2/3optimizer [LinuxKongress06] • Change data blocks to be arranged in line. Structure of meta data is not changed. • The arrangement is based on the access profile. • Feature: – Normal driver is used. – The fragmentation is occurred from the view of file – The relocation increases page-cache hit. readahead extend the coverage size. Mode Mode Owner info Owner info Size Size high Timestamps Timestamps readahead occupancy is widen Direct Blocks Direct Blocks Indirect Blocks Indirect Blocks Double Indirect Double Indirect Triple Indirect Triple Indirect 22
  • 23. Performance Analysis • Confirm effect of ext2/3optimizer on LBCAS for booting. – Ubuntu 9.04 (2.6.28) installed on ext3 (8GB) with KVM-60. • The ext3 was optimized by ext2/3optimizer for boot profile. • The disk image is translated to LBCAS (64KB - 512KB). • Compare with – Normal – u-readahead: user level readahead (system call) for booting – ext2/3optimzer 23
  • 24. Static Analyze by DAVL (Disk Allocation Viewer for Linux) Fragmentation 0.21% Fragmentation 1.11% System block Non- contiguous block contiguous block 24 normal ext2/3opt
  • 25. Utilization of I/O • BootChart showed utilization of I/O. – u-readahead caused spike of I/O. normal u-readahead ext2/3opt Reduced I/O I/O Spike 25
  • 26. Dynamic Analyze: Disk Access at boot time • Ext2/3optimizer relocate data blocks, which are required at boot time, at the top of virtual disk. Red: normal Blue: ext2/3opt Time (s) 0 2.0 4.0 6.0 8.0 26 Address (GB)
  • 27. Trace of readahead coverage size 128KB normal 64KB 32KB 0KB 0 10 20 30 40 50 60 128KB Time (s) u-hreadahead 64KB 32KB 0KB 0 10 20 30 40 50 60 128KB Time (s) Fewer small ext2/3opt readahead 64KB 32KB 27 0KB 0 10 20 30 40 50 60
  • 28. Frequency for each readahead coverage • Ext2/3 optimizer reduced small “readahead”. Frequency 0 32 64 128 28 request size (KB)
  • 29. Volume Transition on processing level normal u-readahead ext2/3opt Volume of files (number, average) 203MB (2,248 Av: 92KB) 76MB (67%) Volume of required blocks 127MB +81MB +104MB +13MB Volume of access which includes 208MB 231MB 1/2 140MB coverage of readahead (frequency, average size) freq:6,379 1/3 freq:5, 827 freq:2,129 size:33KB size:41KB 2 size:67KB • Volume of downloaded block files MB, (uncompressed MB), Occupancy % (127MB/ uncompressed MB) LBCAS size normal u-readahead ext2/3opt 64KB 86.1(247), 51.5% 93.4(272), 46.9% 55.3(144), 88.7% 128KB 96.8(290), 43.9% 104(315), 40.3% 55.3(149), 85.3% 256KB 114(358), 35.5% 123(386), 35.0% 55.6(159), 80.0% 512KB 144(474), 26.9% 153(508), 25.1% 55.6(176), 71.8% 29
  • 30. Consumed time in LBCAS Time (s) 43 43 42 37 43 43 45 38 45 45 46 44 13 13 13 20 14 14 12 19 7 6 6 7 normal u-readahead ext2/3opt 512KB was not efficient on each optimization Time (s) 5.0 6.5 9.0 14.0 5.2 6.7 7.3 11.4 2.5 2.8 3.5 4.8 5.7 4.6 4.7 3.1 6.6 5.8 2.9 4.5 3.6 2.7 1.7 1.1 30 normal u-readahead ext2/3opt
  • 31. Total download of LBCAS • Ext2/3opt reduced the necessary block files (256KB). 140 + normal 120 □ u-readahead × ext2/3opt 100 System call “readahead” downloaded Download (MB) required files in advance. It caused I/O spike. It also included redundant data. 80 60 40 20 Time (s) 31
  • 32. I/O Requests are independent of LBCAS Frequency of function in LBCAS normal Requests (R) Download Storage Uncompress Memory Files per request (Av: 33KB) (D) Cache (U) Cache (M) R= ①+②+③ (S) D+S=U U+M=①+②*2+③*3 64KB 6,338 3,958 1,663 5,621 3,647 ① 4,148 ② 1,450 ③ 740 128KB 6,381 2,321 1,729 4,050 3,793 ① 4,919 ② 1,462 256KB 6,379 1,435 1,748 3,183 3,908 ① 5,667 ② 717 512KB 6,395 848 1,769 2,717 4,019 ① 6,054 ② 341 u-readahead (Av: 41KB) 64KB 5,825 4,344 1,172 5,516 3,626 ① 3,537 ② 1,259 ③ 1,029 128KB 5,834 2,526 1,200 3,726 3,761 ① 4,181 ② 1,653 256KB 5,827 1,544 1,179 2,723 3.,908 ① 5,023 ② 804 512KB 5,822 1.015 1,172 2,187 4,023 ① 5,434 ② 388 download uncompress ext2/3opt (Av: 67KB) is reduced is reduced 64KB 2,165 2,296 626 2,922 1,311 ① 941 ② 380 ③ 844 128KB 2,148 1,189 593 1,782 1,398 ① 1,116 ② 1,032 256KB 2,129 634 576 1,210 1,409 ① 1,639 ② 490 512KB 2,132 353 517 870 1,520 ① 1,874 32 ② 258
  • 33. Discussions • Weak point of ext2/3optimizer – The reallocation is customized for booting. The other applications may be subject to adverse effect. • I guess boot procedure is special and has no strong relation to other applications. – The reallocation is customized for a certain version. When a part of boot procedure is updated, we have to re-optimize the image. 33
  • 34. Conclusions • “ext2/3optimzer” is a strong tool to utilize “readahead”, because it reallocates data blocks which are used by boot procedure. – It increased occupancy (rate of necessary data in a block file) of LBCAS block file. – It made the coverage of readahead double and reduced the number of readahead to half. • “ext2/3optimizer” is not for LBCAS. It is used for normal Linux Distributions. 34
  • 35. Summary The some services are available. Just try! http://openlab.jp/oscircular/ EXT2/3optimizer developers http://unit.aist.go.jp/itri/knoppix/ext2optimizer/index-en.htm DAVL developers http://sourceforge.net/projects/davl/ BootChart http://www.bootchart.org/ 35