Autor Tema: DwarFS. A fast high compression read-only file system  (Leído 103 veces)


  • Administrador
  • Usuario Héroe
  • *****
  • Mensajes: 8803
DwarFS. A fast high compression read-only file system
« en: 21 de Febrero de 2022, 05:20:58 pm »
I reconverted the img of Debian 11 LXQT and saved more than 480 MB, now the question is how i can make it readable for a installation


DwarFS is a read-only file system with a focus on achieving very high compression ratios in particular for very redundant data.This probably doesn't sound very exciting, because if it's redundant, it should compress well. However, I found that other read-only, compressed file systems don't do a very good job at making use of this redundancy. See here for a comparison with other compressed file systems.
DwarFS also doesn't compromise on speed and for my use cases I've found it to be on par with or perform better than SquashFS. For my primary use case, DwarFS compression is an order of magnitude better than SquashFS compression, it's 6 times faster to build the file system, it's typically faster to access files on DwarFS and it uses less CPU resources.
Distinct features of DwarFS are:

Clustering of files by similarity using a similarity hash function. This makes it easier to exploit the redundancy across file boundaries.
Segmentation analysis across file system blocks in order to reduce the size of the uncompressed file system. This saves memory when using the compressed file system and thus potentially allows for higher cache hit rates as more data can be kept in the cache.
Highly multi-threaded implementation. Both the file system creation tool as well as the FUSE driver are able to make good use of the many cores of your system.
Optional experimental Python scripting support to provide custom filtering and ordering functionality.


The SquashFS, xz, lrzip, zpaq and wimlib tests were all done on an 8 core Intel(R) Xeon(R) E-2286M CPU @ 2.40GHz with 64 GiB of RAM.
The Cromfs and EROFS tests were done with an older version of DwarFS on a 6 core Intel(R) Xeon(R) CPU D-1528 @ 1.90GHz with 64 GiB of RAM.
The systems were mostly idle during all of the tests.

With SquashFS

The source directory contained 1139 different Perl installations from 284 distinct releases, a total of 47.65 GiB of data in 1,927,501 files and 330,733 directories. The source directory was freshly unpacked from a tar archive to an XFS partition on a 970 EVO Plus 2TB NVME drive, so most of its contents were likely cached.
I'm using the same compression type and compression level for SquashFS that is the default setting for DwarFS:
$ time mksquashfs install perl-install.squashfs -comp zstd -Xcompression-level 22
Parallel mksquashfs: Using 16 processors
Creating 4.0 filesystem on perl-install-zstd.squashfs, block size 131072.

Exportable Squashfs 4.0 filesystem, zstd compressed, data block size 131072
compressed data, compressed metadata, compressed fragments,
compressed xattrs, compressed ids
duplicates are removed
Filesystem size 4637597.63 Kbytes (4528.90 Mbytes)
9.29% of uncompressed filesystem size (49922299.04 Kbytes)
Inode table size 19100802 bytes (18653.13 Kbytes)
26.06% of uncompressed inode table size (73307702 bytes)
Directory table size 19128340 bytes (18680.02 Kbytes)
46.28% of uncompressed directory table size (41335540 bytes)
Number of duplicate files found 1780387
Number of inodes 2255794
Number of files 1925061
Number of fragments 28713
Number of symbolic links 0
Number of device nodes 0
Number of fifo nodes 0
Number of socket nodes 0
Number of directories 330733
Number of ids (unique uids + gids) 2
Number of uids 1
mhx (1000)
Number of gids 1
users (100)

real 32m54.713s
user 501m46.382s
sys 0m58.528s

For DwarFS, I'm sticking to the defaults:
$ time mkdwarfs -i install -o perl-install.dwarfs
I 11:33:33.310931 scanning install
I 11:33:39.026712 waiting for background scanners...
I 11:33:50.681305 assigning directory and link inodes...
I 11:33:50.888441 finding duplicate files...
I 11:34:01.120800 saved 28.2 GiB / 47.65 GiB in 1782826/1927501 duplicate files
I 11:34:01.122608 waiting for inode scanners...
I 11:34:12.839065 assigning device inodes...
I 11:34:12.875520 assigning pipe/socket inodes...
I 11:34:12.910431 building metadata...
I 11:34:12.910524 building blocks...
I 11:34:12.910594 saving names and links...
I 11:34:12.910691 bloom filter size: 32 KiB
I 11:34:12.910760 ordering 144675 inodes using nilsimsa similarity...
I 11:34:12.915555 nilsimsa: depth=20000 (1000), limit=255
I 11:34:13.052525 updating name and link indices...
I 11:34:13.276233 pre-sorted index (660176 name, 366179 path lookups) [360.6ms]
I 11:35:44.039375 144675 inodes ordered [91.13s]
I 11:35:44.041427 waiting for segmenting/blockifying to finish...
I 11:37:38.823902 bloom filter reject rate: 96.017% (TPR=0.244%, lookups=4740563665)
I 11:37:38.823963 segmentation matches: good=454708, bad=6819, total=464247
I 11:37:38.824005 segmentation collisions: L1=0.008%, L2=0.000% [2233254 hashes]
I 11:37:38.824038 saving chunks...
I 11:37:38.860939 saving directories...
I 11:37:41.318747 waiting for compression to finish...
I 11:38:56.046809 compressed 47.65 GiB to 430.9 MiB (ratio=0.00883101)
I 11:38:56.304922 filesystem created without errors [323s]

​waiting for block compression to finish
330733 dirs, 0/2440 soft/hard links, 1927501/1927501 files, 0 other
original size: 47.65 GiB, dedupe: 28.2 GiB (1782826 files), segment: 15.19 GiB
filesystem: 4.261 GiB in 273 blocks (319178 chunks, 144675/144675 inodes)
compressed filesystem: 273 blocks/430.9 MiB written [depth: 20000]

real 5m23.030s
user 78m7.554s
sys 1m47.968s

So in this comparison, mkdwarfs is more than 6 times faster than mksquashfs, both in terms of CPU time and wall clock time.
$ ll perl-install.*fs
-rw-r--r-- 1 mhx users 447230618 Mar 3 20:28 perl-install.dwarfs
-rw-r--r-- 1 mhx users 4748902400 Mar 3 20:10 perl-install.squashfs

In terms of compression ratio, the DwarFS file system is more than 10 times smaller than the SquashFS file system. With DwarFS, the content has been compressed down to less than 0.9% (!) of its original size. This compression ratio only considers the data stored in the individual files, not the actual disk space used. On the original XFS file system, according to du, the source folder uses 52 GiB, so the DwarFS image actually only uses 0.8% of the original space.
Siempre que pasa igual sucede lo mismo