Commit 0cedbaa9 authored by Uli Middelberg's avatar Uli Middelberg

integrated support for aufs & Docker

parent 0fed6ed8
What: /debug/aufs/si_<id>/
Date: March 2009
Contact: J. R. Okajima <hooanon05g@gmail.com>
Description:
Under /debug/aufs, a directory named si_<id> is created
per aufs mount, where <id> is a unique id generated
internally.
What: /debug/aufs/si_<id>/plink
Date: Apr 2013
Contact: J. R. Okajima <hooanon05g@gmail.com>
Description:
It has three lines and shows the information about the
pseudo-link. The first line is a single number
representing a number of buckets. The second line is a
number of pseudo-links per buckets (separated by a
blank). The last line is a single number representing a
total number of psedo-links.
When the aufs mount option 'noplink' is specified, it
will show "1\n0\n0\n".
What: /debug/aufs/si_<id>/xib
Date: March 2009
Contact: J. R. Okajima <hooanon05g@gmail.com>
Description:
It shows the consumed blocks by xib (External Inode Number
Bitmap), its block size and file size.
When the aufs mount option 'noxino' is specified, it
will be empty. About XINO files, see the aufs manual.
What: /debug/aufs/si_<id>/xino0, xino1 ... xinoN
Date: March 2009
Contact: J. R. Okajima <hooanon05g@gmail.com>
Description:
It shows the consumed blocks by xino (External Inode Number
Translation Table), its link count, block size and file
size.
When the aufs mount option 'noxino' is specified, it
will be empty. About XINO files, see the aufs manual.
What: /debug/aufs/si_<id>/xigen
Date: March 2009
Contact: J. R. Okajima <hooanon05g@gmail.com>
Description:
It shows the consumed blocks by xigen (External Inode
Generation Table), its block size and file size.
If CONFIG_AUFS_EXPORT is disabled, this entry will not
be created.
When the aufs mount option 'noxino' is specified, it
will be empty. About XINO files, see the aufs manual.
What: /sys/fs/aufs/si_<id>/
Date: March 2009
Contact: J. R. Okajima <hooanon05g@gmail.com>
Description:
Under /sys/fs/aufs, a directory named si_<id> is created
per aufs mount, where <id> is a unique id generated
internally.
What: /sys/fs/aufs/si_<id>/br0, br1 ... brN
Date: March 2009
Contact: J. R. Okajima <hooanon05g@gmail.com>
Description:
It shows the abolute path of a member directory (which
is called branch) in aufs, and its permission.
What: /sys/fs/aufs/si_<id>/brid0, brid1 ... bridN
Date: July 2013
Contact: J. R. Okajima <hooanon05g@gmail.com>
Description:
It shows the id of a member directory (which is called
branch) in aufs.
What: /sys/fs/aufs/si_<id>/xi_path
Date: March 2009
Contact: J. R. Okajima <hooanon05g@gmail.com>
Description:
It shows the abolute path of XINO (External Inode Number
Bitmap, Translation Table and Generation Table) file
even if it is the default path.
When the aufs mount option 'noxino' is specified, it
will be empty. About XINO files, see the aufs manual.
Aufs3 -- advanced multi layered unification filesystem version 3.x
http://aufs.sf.net
Junjiro R. Okajima
0. Introduction
----------------------------------------
In the early days, aufs was entirely re-designed and re-implemented
Unionfs Version 1.x series. After many original ideas, approaches,
improvements and implementations, it becomes totally different from
Unionfs while keeping the basic features.
Recently, Unionfs Version 2.x series begin taking some of the same
approaches to aufs1's.
Unionfs is being developed by Professor Erez Zadok at Stony Brook
University and his team.
Aufs3 supports linux-3.0 and later.
If you want older kernel version support, try aufs2-2.6.git or
aufs2-standalone.git repository, aufs1 from CVS on SourceForge.
Note: it becomes clear that "Aufs was rejected. Let's give it up."
According to Christoph Hellwig, linux rejects all union-type
filesystems but UnionMount.
<http://marc.info/?l=linux-kernel&m=123938533724484&w=2>
PS. Al Viro seems have a plan to merge aufs as well as overlayfs and
UnionMount, and he pointed out an issue around a directory mutex
lock and aufs addressed it. But it is still unsure whether aufs will
be merged (or any other union solution).
<http://marc.info/?l=linux-kernel&m=136312705029295&w=1>
1. Features
----------------------------------------
- unite several directories into a single virtual filesystem. The member
directory is called as a branch.
- you can specify the permission flags to the branch, which are 'readonly',
'readwrite' and 'whiteout-able.'
- by upper writable branch, internal copyup and whiteout, files/dirs on
readonly branch are modifiable logically.
- dynamic branch manipulation, add, del.
- etc...
Also there are many enhancements in aufs1, such as:
- readdir(3) in userspace.
- keep inode number by external inode number table
- keep the timestamps of file/dir in internal copyup operation
- seekable directory, supporting NFS readdir.
- whiteout is hardlinked in order to reduce the consumption of inodes
on branch
- do not copyup, nor create a whiteout when it is unnecessary
- revert a single systemcall when an error occurs in aufs
- remount interface instead of ioctl
- maintain /etc/mtab by an external command, /sbin/mount.aufs.
- loopback mounted filesystem as a branch
- kernel thread for removing the dir who has a plenty of whiteouts
- support copyup sparse file (a file which has a 'hole' in it)
- default permission flags for branches
- selectable permission flags for ro branch, whether whiteout can
exist or not
- export via NFS.
- support <sysfs>/fs/aufs and <debugfs>/aufs.
- support multiple writable branches, some policies to select one
among multiple writable branches.
- a new semantics for link(2) and rename(2) to support multiple
writable branches.
- no glibc changes are required.
- pseudo hardlink (hardlink over branches)
- allow a direct access manually to a file on branch, e.g. bypassing aufs.
including NFS or remote filesystem branch.
- userspace wrapper for pathconf(3)/fpathconf(3) with _PC_LINK_MAX.
- and more...
Currently these features are dropped temporary from aufs3.
See design/08plan.txt in detail.
- test only the highest one for the directory permission (dirperm1)
- copyup on open (coo=)
- nested mount, i.e. aufs as readonly no-whiteout branch of another aufs
(robr)
- statistics of aufs thread (/sys/fs/aufs/stat)
- delegation mode (dlgt)
a delegation of the internal branch access to support task I/O
accounting, which also supports Linux Security Modules (LSM) mainly
for Suse AppArmor.
- intent.open/create (file open in a single lookup)
Features or just an idea in the future (see also design/*.txt),
- reorder the branch index without del/re-add.
- permanent xino files for NFSD
- an option for refreshing the opened files after add/del branches
- 'move' policy for copy-up between two writable branches, after
checking free space.
- light version, without branch manipulation. (unnecessary?)
- copyup in userspace
- inotify in userspace
- readv/writev
- xattr, acl
2. Download
----------------------------------------
There were three GIT trees for aufs3, aufs3-linux.git,
aufs3-standalone.git, and aufs-util.git. Note that there is no "3" in
"aufs-util.git."
While the aufs-util is always necessary, you need either of aufs3-linux
or aufs3-standalone.
The aufs3-linux tree includes the whole linux mainline GIT tree,
git://git.kernel.org/.../torvalds/linux.git.
And you cannot select CONFIG_AUFS_FS=m for this version, eg. you cannot
build aufs3 as an external kernel module.
On the other hand, the aufs3-standalone tree has only aufs source files
and necessary patches, and you can select CONFIG_AUFS_FS=m.
You will find GIT branches whose name is in form of "aufs3.x" where "x"
represents the linux kernel version, "linux-3.x". For instance,
"aufs3.0" is for linux-3.0. For latest "linux-3.x-rcN", use
"aufs3.x-rcN" branch.
o aufs3-linux tree
$ git clone --reference /your/linux/git/tree \
git://git.code.sf.net/p/aufs/aufs3-linux aufs-aufs3-linux \
aufs3-linux.git
- if you don't have linux GIT tree, then remove "--reference ..."
$ cd aufs3-linux.git
$ git checkout origin/aufs3.0
o aufs3-standalone tree
$ git clone git://git.code.sf.net/p/aufs/aufs3-standalone \
aufs3-standalone.git
$ cd aufs3-standalone.git
$ git checkout origin/aufs3.0
o aufs-util tree
$ git clone git://git.code.sf.net/p/aufs/aufs-util \
aufs-util.git
$ cd aufs-util.git
$ git checkout origin/aufs3.0
Note: The 3.x-rcN branch is to be used with `rc' kernel versions ONLY.
The minor version number, 'x' in '3.x', of aufs may not always
follow the minor version number of the kernel.
Because changes in the kernel that cause the use of a new
minor version number do not always require changes to aufs-util.
Since aufs-util has its own minor version number, you may not be
able to find a GIT branch in aufs-util for your kernel's
exact minor version number.
In this case, you should git-checkout the branch for the
nearest lower number.
For (an unreleased) example:
If you are using "linux-3.10" and the "aufs3.10" branch
does not exist in aufs-util repository, then "aufs3.9", "aufs3.8"
or something numerically smaller is the branch for your kernel.
Also you can view all branches by
$ git branch -a
3. Configuration and Compilation
----------------------------------------
Make sure you have git-checkout'ed the correct branch.
For aufs3-linux tree,
- enable CONFIG_AUFS_FS.
- set other aufs configurations if necessary.
For aufs3-standalone tree,
There are several ways to build.
1.
- apply ./aufs3-kbuild.patch to your kernel source files.
- apply ./aufs3-base.patch too.
- apply ./aufs3-mmap.patch too.
- apply ./aufs3-standalone.patch too, if you have a plan to set
CONFIG_AUFS_FS=m. otherwise you don't need ./aufs3-standalone.patch.
- copy ./{Documentation,fs,include/uapi/linux/aufs_type.h} files to your
kernel source tree. Never copy $PWD/include/uapi/linux/Kbuild.
- enable CONFIG_AUFS_FS, you can select either
=m or =y.
- and build your kernel as usual.
- install the built kernel.
Note: Since linux-3.9, every filesystem module requires an alias
"fs-<fsname>". You should make sure that "fs-aufs" is listed in your
modules.aliases file if you set CONFIG_AUFS_FS=m.
- install the header files too by "make headers_install" to the
directory where you specify. By default, it is $PWD/usr.
"make help" shows a brief note for headers_install.
- and reboot your system.
2.
- module only (CONFIG_AUFS_FS=m).
- apply ./aufs3-base.patch to your kernel source files.
- apply ./aufs3-mmap.patch too.
- apply ./aufs3-standalone.patch too.
- build your kernel, don't forget "make headers_install", and reboot.
- edit ./config.mk and set other aufs configurations if necessary.
Note: You should read $PWD/fs/aufs/Kconfig carefully which describes
every aufs configurations.
- build the module by simple "make".
Note: Since linux-3.9, every filesystem module requires an alias
"fs-<fsname>". You should make sure that "fs-aufs" is listed in your
modules.aliases file.
- you can specify ${KDIR} make variable which points to your kernel
source tree.
- install the files
+ run "make install" to install the aufs module, or copy the built
$PWD/aufs.ko to /lib/modules/... and run depmod -a (or reboot simply).
+ run "make install_headers" (instead of headers_install) to install
the modified aufs header file (you can specify DESTDIR which is
available in aufs standalone version's Makefile only), or copy
$PWD/usr/include/linux/aufs_type.h to /usr/include/linux or wherever
you like manually. By default, the target directory is $PWD/usr.
- no need to apply aufs3-kbuild.patch, nor copying source files to your
kernel source tree.
Note: The header file aufs_type.h is necessary to build aufs-util
as well as "make headers_install" in the kernel source tree.
headers_install is subject to be forgotten, but it is essentially
necessary, not only for building aufs-util.
You may not meet problems without headers_install in some older
version though.
And then,
- read README in aufs-util, build and install it
- note that your distribution may contain an obsoleted version of
aufs_type.h in /usr/include/linux or something. When you build aufs
utilities, make sure that your compiler refers the correct aufs header
file which is built by "make headers_install."
- if you want to use readdir(3) in userspace or pathconf(3) wrapper,
then run "make install_ulib" too. And refer to the aufs manual in
detail.
There several other patches in aufs3-standalone.git. They are all
optional. When you meet some problems, they will help you.
- aufs3-loopback.patch
Supports a nested loopback mount in a branch-fs. This patch is
unnecessary until aufs produces a message like "you may want to try
another patch for loopback file".
- vfs-ino.patch
Modifies a system global kernel internal function get_next_ino() in
order to stop assigning 0 for an inode-number. Not directly related to
aufs, but recommended generally.
- tmpfs-idr.patch
Keeps the tmpfs inode number as the lowest value. Effective to reduce
the size of aufs XINO files for tmpfs branch. Also it prevents the
duplication of inode number, which is important for backup tools and
other utilities. When you find aufs XINO files for tmpfs branch
growing too much, try this patch.
4. Usage
----------------------------------------
At first, make sure aufs-util are installed, and please read the aufs
manual, aufs.5 in aufs-util.git tree.
$ man -l aufs.5
And then,
$ mkdir /tmp/rw /tmp/aufs
# mount -t aufs -o br=/tmp/rw:${HOME} none /tmp/aufs
Here is another example. The result is equivalent.
# mount -t aufs -o br=/tmp/rw=rw:${HOME}=ro none /tmp/aufs
Or
# mount -t aufs -o br:/tmp/rw none /tmp/aufs
# mount -o remount,append:${HOME} /tmp/aufs
Then, you can see whole tree of your home dir through /tmp/aufs. If
you modify a file under /tmp/aufs, the one on your home directory is
not affected, instead the same named file will be newly created under
/tmp/rw. And all of your modification to a file will be applied to
the one under /tmp/rw. This is called the file based Copy on Write
(COW) method.
Aufs mount options are described in aufs.5.
If you run chroot or something and make your aufs as a root directory,
then you need to customize the shutdown script. See the aufs manual in
detail.
Additionally, there are some sample usages of aufs which are a
diskless system with network booting, and LiveCD over NFS.
See sample dir in CVS tree on SourceForge.
5. Contact
----------------------------------------
When you have any problems or strange behaviour in aufs, please let me
know with:
- /proc/mounts (instead of the output of mount(8))
- /sys/module/aufs/*
- /sys/fs/aufs/* (if you have them)
- /debug/aufs/* (if you have them)
- linux kernel version
if your kernel is not plain, for example modified by distributor,
the url where i can download its source is necessary too.
- aufs version which was printed at loading the module or booting the
system, instead of the date you downloaded.
- configuration (define/undefine CONFIG_AUFS_xxx)
- kernel configuration or /proc/config.gz (if you have it)
- behaviour which you think to be incorrect
- actual operation, reproducible one is better
- mailto: aufs-users at lists.sourceforge.net
Usually, I don't watch the Public Areas(Bugs, Support Requests, Patches,
and Feature Requests) on SourceForge. Please join and write to
aufs-users ML.
6. Acknowledgements
----------------------------------------
Thanks to everyone who have tried and are using aufs, whoever
have reported a bug or any feedback.
Especially donators:
Tomas Matejicek(slax.org) made a donation (much more than once).
Since Apr 2010, Tomas M (the author of Slax and Linux Live
scripts) is making "doubling" donations.
Unfortunately I cannot list all of the donators, but I really
appreciate.
It ends Aug 2010, but the ordinary donation URL is still available.
<http://sourceforge.net/donate/index.php?group_id=167503>
Dai Itasaka made a donation (2007/8).
Chuck Smith made a donation (2008/4, 10 and 12).
Henk Schoneveld made a donation (2008/9).
Chih-Wei Huang, ASUS, CTC donated Eee PC 4G (2008/10).
Francois Dupoux made a donation (2008/11).
Bruno Cesar Ribas and Luis Carlos Erpen de Bona, C3SL serves public
aufs2 GIT tree (2009/2).
William Grant made a donation (2009/3).
Patrick Lane made a donation (2009/4).
The Mail Archive (mail-archive.com) made donations (2009/5).
Nippy Networks (Ed Wildgoose) made a donation (2009/7).
New Dream Network, LLC (www.dreamhost.com) made a donation (2009/11).
Pavel Pronskiy made a donation (2011/2).
Iridium and Inmarsat satellite phone retailer (www.mailasail.com), Nippy
Networks (Ed Wildgoose) made a donation for hardware (2011/3).
Max Lekomcev (DOM-TV project) made a donation (2011/7, 12, 2012/3, 6 and
11).
Sam Liddicott made a donation (2011/9).
Era Scarecrow made a donation (2013/4).
Bor Ratajc made a donation (2013/4).
Alessandro Gorreta made a donation (2013/4).
POIRETTE Marc made a donation (2013/4).
Alessandro Gorreta made a donation (2013/4).
lauri kasvandik made a donation (2013/5).
"pemasu from Finland" made a donation (2013/7).
The Parted Magic Project made a donation (2013/9 and 11).
Pavel Barta made a donation (2013/10).
Nikolay Pertsev made a donation (2014/5).
James B made a donation (2014/7).
Stefano Di Biase made a donation (2014/8).
Thank you very much.
Donations are always, including future donations, very important and
helpful for me to keep on developing aufs.
7.
----------------------------------------
If you are an experienced user, no explanation is needed. Aufs is
just a linux filesystem.
Enjoy!
# Local variables: ;
# mode: text;
# End: ;
# Copyright (C) 2005-2014 Junjiro R. Okajima
#
# This program is free software; you can redistribute it and/or modify
# it under the terms of the GNU General Public License as published by
# the Free Software Foundation; either version 2 of the License, or
# (at your option) any later version.
#
# This program is distributed in the hope that it will be useful,
# but WITHOUT ANY WARRANTY; without even the implied warranty of
# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
# GNU General Public License for more details.
#
# You should have received a copy of the GNU General Public License
# along with this program. If not, see <http://www.gnu.org/licenses/>.
Introduction
----------------------------------------
aufs [ei ju: ef es] | [a u f s]
1. abbrev. for "advanced multi-layered unification filesystem".
2. abbrev. for "another unionfs".
3. abbrev. for "auf das" in German which means "on the" in English.
Ex. "Butter aufs Brot"(G) means "butter onto bread"(E).
But "Filesystem aufs Filesystem" is hard to understand.
AUFS is a filesystem with features:
- multi layered stackable unification filesystem, the member directory
is called as a branch.
- branch permission and attribute, 'readonly', 'real-readonly',
'readwrite', 'whiteout-able', 'link-able whiteout' and their
combination.
- internal "file copy-on-write".
- logical deletion, whiteout.
- dynamic branch manipulation, adding, deleting and changing permission.
- allow bypassing aufs, user's direct branch access.
- external inode number translation table and bitmap which maintains the
persistent aufs inode number.
- seekable directory, including NFS readdir.
- file mapping, mmap and sharing pages.
- pseudo-link, hardlink over branches.
- loopback mounted filesystem as a branch.
- several policies to select one among multiple writable branches.
- revert a single systemcall when an error occurs in aufs.
- and more...
Multi Layered Stackable Unification Filesystem
----------------------------------------------------------------------
Most people already knows what it is.
It is a filesystem which unifies several directories and provides a
merged single directory. When users access a file, the access will be
passed/re-directed/converted (sorry, I am not sure which English word is
correct) to the real file on the member filesystem. The member
filesystem is called 'lower filesystem' or 'branch' and has a mode
'readonly' and 'readwrite.' And the deletion for a file on the lower
readonly branch is handled by creating 'whiteout' on the upper writable
branch.
On LKML, there have been discussions about UnionMount (Jan Blunck,
Bharata B Rao and Valerie Aurora) and Unionfs (Erez Zadok). They took
different approaches to implement the merged-view.
The former tries putting it into VFS, and the latter implements as a
separate filesystem.
(If I misunderstand about these implementations, please let me know and
I shall correct it. Because it is a long time ago when I read their
source files last time).
UnionMount's approach will be able to small, but may be hard to share
branches between several UnionMount since the whiteout in it is
implemented in the inode on branch filesystem and always
shared. According to Bharata's post, readdir does not seems to be
finished yet.
There are several missing features known in this implementations such as
- for users, the inode number may change silently. eg. copy-up.
- link(2) may break by copy-up.
- read(2) may get an obsoleted filedata (fstat(2) too).
- fcntl(F_SETLK) may be broken by copy-up.
- unnecessary copy-up may happen, for example mmap(MAP_PRIVATE) after
open(O_RDWR).
Unionfs has a longer history. When I started implementing a stacking filesystem
(Aug 2005), it already existed. It has virtual super_block, inode,
dentry and file objects and they have an array pointing lower same kind
objects. After contributing many patches for Unionfs, I re-started my
project AUFS (Jun 2006).
In AUFS, the structure of filesystem resembles to Unionfs, but I
implemented my own ideas, approaches and enhancements and it became
totally different one.
Comparing DM snapshot and fs based implementation
- the number of bytes to be copied between devices is much smaller.
- the type of filesystem must be one and only.
- the fs must be writable, no readonly fs, even for the lower original
device. so the compression fs will not be usable. but if we use
loopback mount, we may address this issue.
for instance,
mount /cdrom/squashfs.img /sq
losetup /sq/ext2.img
losetup /somewhere/cow
dmsetup "snapshot /dev/loop0 /dev/loop1 ..."
- it will be difficult (or needs more operations) to extract the
difference between the original device and COW.
- DM snapshot-merge may help a lot when users try merging. in the
fs-layer union, users will use rsync(1).
Several characters/aspects of aufs
----------------------------------------------------------------------
Aufs has several characters or aspects.
1. a filesystem, callee of VFS helper
2. sub-VFS, caller of VFS helper for branches
3. a virtual filesystem which maintains persistent inode number
4. reader/writer of files on branches such like an application
1. Callee of VFS Helper
As an ordinary linux filesystem, aufs is a callee of VFS. For instance,
unlink(2) from an application reaches sys_unlink() kernel function and
then vfs_unlink() is called. vfs_unlink() is one of VFS helper and it
calls filesystem specific unlink operation. Actually aufs implements the
unlink operation but it behaves like a redirector.
2. Caller of VFS Helper for Branches
aufs_unlink() passes the unlink request to the branch filesystem as if
it were called from VFS. So the called unlink operation of the branch
filesystem acts as usual. As a caller of VFS helper, aufs should handle
every necessary pre/post operation for the branch filesystem.
- acquire the lock for the parent dir on a branch
- lookup in a branch
- revalidate dentry on a branch
- mnt_want_write() for a branch
- vfs_unlink() for a branch
- mnt_drop_write() for a branch