Formed in 2009, the Archive Team (not to be confused with the archive.org Archive-It Team) is a rogue archivist collective dedicated to saving copies of rapidly dying or deleted websites for the sake of history and digital heritage. The group is 100% composed of volunteers and interested parties, and has expanded into a large amount of related projects for saving online and digital history.
History is littered with hundreds of conflicts over the future of a community, group, location or business that were "resolved" when one of the parties stepped ahead and destroyed what was there. With the original point of contention destroyed, the debates would fall to the wayside. Archive Team believes that by duplicated condemned data, the conversation and debate can continue, as well as the richness and insight gained by keeping the materials. Our projects have ranged in size from a single volunteer downloading the data to a small-but-critical site, to over 100 volunteers stepping forward to acquire terabytes of user-created data to save for future generations.
The main site for Archive Team is at archiveteam.org and contains up to the date information on various projects, manifestos, plans and walkthroughs.
This collection contains the output of many Archive Team projects, both ongoing and completed. Thanks to the generous providing of disk space by the Internet Archive, multi-terabyte datasets can be made available, as well as in use by the Wayback Machine, providing a path back to lost websites and work.
Our collection has grown to the point of having sub-collections for the type of data we acquire. If you are seeking to browse the contents of these collections, the Wayback Machine is the best first stop. Otherwise, you are free to dig into the stacks to see what you may find.
The Archive Team Panic Downloads are full pulldowns of currently extant websites, meant to serve as emergency backups for needed sites that are in danger of closing, or which will be missed dearly if suddenly lost due to hard drive crashes or server failures.
To use ArchiveBot, drop by #archivebot on EFNet. To interact with ArchiveBot, you issue commands by typing it into the channel. Note you will need channel operator permissions in order to issue archiving jobs. The dashboard shows the sites being downloaded currently.
There is a dashboard running for the archivebot process at http://www.archivebot.com.
ArchiveBot's source code can be found at https://github.com/ArchiveTeam/ArchiveBot.

We recommend upgrading to the latest Google Chrome or Firefox.
If you are not familiar with PDF.js, then first take a look at the README's getting started section.
Below is an overview of how to contribute code to the PDF.js project. The basic workflow is as follows:
npm install -g gulp-cli)If you develop on Windows, read Setting up pdf.js Development Environment for Windows. For font testing you will need Python.
Before you make any changes to the code you will probably want jump down to the "Generating reference images" section to create the reference snapshot images so you can run the test framework and check for regressions.
If you are familiar with GitHub and creating feature branches you can skip down to the "Run lint and testing" section.
To fork the repository you need to have a GitHub account. Once you have an account you can click the fork button up top. Now that you have your fork you need to clone it (replace {username} with your GitHub username) using
git clone git://github.com/{username}/pdf.js.git
cd pdf.js
and pull additional libraries and tools:
git submodule init
git submodule update
npm install
It is useful to have the upstream repository registered as well using
git remote add upstream git://github.com/mozilla/pdf.js.git
and periodically fetch it using git fetch upstream.
We always work with feature branches. For example, to create and switch to branch use:
git checkout -b {branch_name} upstream/master
and replace {branch_name} with a meaningful name that describes your feature or change. For instance,
if you are working on adding support for Type3 fonts, a good branch name would be type3-font-support.
Now that you have a new branch you can edit/create/delete files. Follow the standard Git workflow to stage and locally commit your changes -- there are lots of guides that explain Git.
If the branch contains lot of small commits, you might be asked to squash the commits. You can use Git's rebase option or follow the instructions on the Squashing Commits page.
Run lint
Make sure that your code follows our Style Guide and run from the PDF.js folder:
gulp lint
Protip: If you are a Vim user, then install Syntastic, install ESLint globally using npm install -g eslint and add the following line to your .vimrc:
let g:syntastic_javascript_checkers=['eslint']
Now you have automatic linting of your changes to JavaScript files whenever you save.
Run testing
To ensure your changes did not introduce any regressions you need to run the testing framework. There are four basic types of tests:
load test: checks if the PDF file can be loaded without crashingeq test: a reference test that takes correctly rendered snapshots and compares them to snapshots from the current codetext test: a reference test that takes snapshots of the textLayer overlay and compares them to snapshots from the current codeannotations test: a reference test that takes snapshots of the annotationLayer overlay (and the underlying page) and compares them to snapshots from the current codefbf test: a forward-back-forward testGenerating reference images
The reference tests require you to generate original snapshots for comparison. The snapshots should be generated before you make any changes. If you have already made some changes, git stash your work. Then make sure you have created a browser_manifest.json file. Copy the example browser manifest located in test/resources/browser_manifests to get started:
cp test/resources/browser_manifests/browser_manifest.json.example test/resources/browser_manifests/browser_manifest.json
Then edit the manifest and make sure it points to the browser(s) you want to use for generating the reference images.
Now we can generate the reference images:
gulp makeref
You can then run the test suite from the PDF.js root folder:
gulp test
Running unit tests separately
Unit tests are run when gulp test is run, but they can also be run separately two different ways:
{url-to-pdf.js}/test/unit/unit_test.html page. If the web server is started using the gulp server command, the URL will be http://localhost:8888/test/unit/unit_test.html.gulp unittest will run all the tests using the regression test framework.After lint and all tests pass, push the changes to your fork/branch on GitHub:
git push origin {branch_name}
Create a pull request on GitHub for your feature branch. The code will then be reviewed and tested further by our contributors and test bot.
Note that the translations for PDF.js in the l10n folder are synchronized with the Aurora branch of Mozilla Firefox. This means that we will only accept pull requests that add strings currently missing in the Aurora branch (as it will take at least six weeks before the most recent translations are in the Aurora branch), but keep in mind that the changes will be overwritten when we synchronize again.
In addition to the GitHub pull request workflow, it is highly recommended that you communicate with the PDF.js team, for example via the #pdfjs IRC channel at irc.mozilla.org. That will help to find a reviewer for your patch and speed up the review process. The reviewer will kick off further testing and do a code review.
You can speed up fetching a remote GitHub branch (possibly belonging to another user) using git try {username} {branch_name}. Add the following to the .git/config file to be able to do that:
[alias]
try = !sh -c 'IFS=\":\" read -ra ARGS <<< \"$0\" && git fetch https://github.com/${ARGS[0]}/pdf.js.git ${ARGS[1]} && git checkout FETCH_HEAD'
If all goes well, a collaborator will merge your changes into the main repository.