Formed in 2009, the Archive Team (not to be confused with the archive.org Archive-It Team) is a rogue archivist collective dedicated to saving copies of rapidly dying or deleted websites for the sake of history and digital heritage. The group is 100% composed of volunteers and interested parties, and has expanded into a large amount of related projects for saving online and digital history.
History is littered with hundreds of conflicts over the future of a community, group, location or business that were "resolved" when one of the parties stepped ahead and destroyed what was there. With the original point of contention destroyed, the debates would fall to the wayside. Archive Team believes that by duplicated condemned data, the conversation and debate can continue, as well as the richness and insight gained by keeping the materials. Our projects have ranged in size from a single volunteer downloading the data to a small-but-critical site, to over 100 volunteers stepping forward to acquire terabytes of user-created data to save for future generations.
The main site for Archive Team is at archiveteam.org and contains up to the date information on various projects, manifestos, plans and walkthroughs.
This collection contains the output of many Archive Team projects, both ongoing and completed. Thanks to the generous providing of disk space by the Internet Archive, multi-terabyte datasets can be made available, as well as in use by the Wayback Machine, providing a path back to lost websites and work.
Our collection has grown to the point of having sub-collections for the type of data we acquire. If you are seeking to browse the contents of these collections, the Wayback Machine is the best first stop. Otherwise, you are free to dig into the stacks to see what you may find.
The Archive Team Panic Downloads are full pulldowns of currently extant websites, meant to serve as emergency backups for needed sites that are in danger of closing, or which will be missed dearly if suddenly lost due to hard drive crashes or server failures.
ArchiveBot is an IRC bot designed to automate the archival of smaller websites (e.g. up to a few hundred thousand URLs). You give it a URL to start at, and it grabs all content under that URL, records it in a WARC, and then uploads that WARC to ArchiveTeam servers for eventual injection into the Internet Archive (or other archive sites).
To use ArchiveBot, drop by #archivebot on EFNet. To interact with ArchiveBot, you issue commands by typing it into the channel. Note you will need channel operator permissions in order to issue archiving jobs. The dashboard shows the sites being downloaded currently.
In this section, we begin the specification of HTML 4, starting with the
contract between authors, documents, users, and user agents.
The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT",
"SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this document
are to be interpreted as described in [RFC2119]. However, for
readability, these words do not appear in all uppercase letters in this
specification.
At times, the authors of this specification recommend good practice for
authors and user agents. These recommendations are not normative and
conformance with this specification does not depend on their realization. These
recommendations contain the expression "We recommend ...", "This specification
recommends ...", or some similar wording.
An author is a person or program that writes or generates HTML documents.
An authoring tool is a special case of
an author, namely, it's a program that generates HTML.
We recommend that authors write documents that conform to the strict DTD rather than the other DTDs defined by this
specification. Please see the section on version information for details about the
DTDs defined in HTML 4.
An HTML user agent is any device that interprets HTML documents. User
agents include visual browsers (text-only and graphical), non-visual browsers
(audio, Braille), search robots, proxies, etc.
A conforming user agent for HTML
4 is one that observes the mandatory conditions ("must") set forth in this
specification, including the following points:
A user agent should avoid imposing arbitrary length limits on attribute
value literals (see the section on capacities in the SGML Declaration). For introductory information on
SGML attributes, please consult the section on attribute definitions.
A user agent must ensure that rendering is unchanged by the presence or
absence of start tags and end tags when the HTML DTD indicates that these are
optional. See the section on
element definitions for introductory information on SGML elements.
For reasons of backwards compatibility, we recommend that tools
interpreting HTML 4 continue to support HTML 3.2 (see [HTML32]) and HTML
2.0 (see [RFC1866]).
Error conditions
This specification does not define how conforming user agents handle
general error conditions,
including how user agents behave when they encounter elements, attributes,
attribute values, or entities not specified in this document.
A deprecated element or attribute is one that has been outdated by newer
constructs. Deprecated elements are defined in the reference manual in
appropriate locations, but are clearly marked as deprecated. Deprecated
elements may become obsolete in future versions of HTML.
User agents should continue to support deprecated
elements for reasons of backward compatibility.
Definitions of elements and attributes clearly indicate which are
deprecated.
This specification includes examples that illustrate how to avoid using
deprecated elements. In most cases these depend on user agent support for style
sheets. In general, authors should use style sheets to achieve stylistic and
formatting effects rather than HTML presentational attributes. HTML
presentational attributes have been deprecated when style sheet alternatives
exist (see, for example, [CSS1]).
An obsolete element or attribute is one for which there is no guarantee of
support by a user agent. Obsolete elements are no longer
defined in the specification, but are listed for historical purposes in the changes section of the reference manual.
Please consult the section on HTML
version information for details about when to use the strict, transitional,
or frameset DTD.
Comments appearing in the
HTML 4 DTD have no normative value; they are informative only.
User agents must not render SGML processing instructions (e.g., <?full
volume>) or comments. For more information about this and other
SGML features that may be legal in HTML but aren't widely supported by HTML
user agents, please consult the section on SGML features with limited support.
HTML documents are sent over the Internet as a sequence of bytes accompanied
by encoding information (described in the section on character encodings). The structure of the
transmission, termed a message entity, is defined by
[RFC2045] and [RFC2616]. A message entity with a content type of "text/html" represents an
HTML document.
The content type for HTML documents is defined as
follows:
The optional parameter "charset" refers to the character encoding used to represent the
HTML document as a sequence of bytes. Legal values for this parameter are
defined in the section on character
encodings. Although this parameter is optional, we recommend that it always
be present.