The Internet Archive discovers and captures web pages through many different web crawls.
At any given time several distinct crawls are running, some for months, and some every day or longer.
View the web archive through the Wayback Machine.
CRFS is an acronym which stands for 'Coherent Remote File System'. CRFS is
a network protocol that lets clients share access to a file system which is
hosted on a server. Its main features include:
general-purpose "posix" file system semantics
cache coherency
aggressive metadata performance
efficient and flexible snapshots
robust data integrity protection
The implementation hosted here includes a client module for the Linux kernel
and a UNIX user space server daemon. 'crfsd', the UNIX daemon which runs in
userspace, provides these features by storing data in a BTRFS file system on
disk.
Status as of June, 2008
CRFS is in the initial phase of design and implementation and is proceeding
towards an alpha release. It is not ready for stable use and should not be
trusted to reliably store data.
CRFS is made available at this early a stage so that people can participate
in the design and implementation.
The following list of features will be included in the alpha release and are
immediate priorities, though perhaps not in the given order:
upgrade to a recent BTRFS disk format
re-establish failed TCP connections
clients should use ranges to guide inode allocation
crfsd should verify more on-disk btrfs metadata as it reads
open()/unlink() should enable orphan recovery
crfsd should evict clients that timeout and reclaim their ranges
xattrs support should be a straight-forward operation on items
non-4k page sizes should be supported, which means inline data cleanups
basic client cache coherency should be supported
basic man pages would be nice
O_DIRECT and aio should function, even if actually copying and sync
The following list of features will not be included in the alpha
release:
clients sharing cache for reading
snapshots
backups
failover
quotas
untrusted clients
capability handshake
ENOSPC
multi-device (probably)
shared-writable mmap coherency (maybe)
Installation instructions
Download the source and follow the instructions contained in the
'docs/developer-instructions' file
$ hg clone http://oss.oracle.com/mercurial/zab/crfs crfs
$ less crfs/docs/developer-instructions
Send a mail to crfs-devel and let us know how it goes!