dl.google.com: Powered by Go
26 July 2013
Brad Fitzpatrick
Gopher, Google
| Oct | NOV | Dec |
| 25 | ||
| 2015 | 2016 | 2017 |
Formed in 2009, the Archive Team (not to be confused with the archive.org Archive-It Team) is a rogue archivist collective dedicated to saving copies of rapidly dying or deleted websites for the sake of history and digital heritage. The group is 100% composed of volunteers and interested parties, and has expanded into a large amount of related projects for saving online and digital history.
History is littered with hundreds of conflicts over the future of a community, group, location or business that were "resolved" when one of the parties stepped ahead and destroyed what was there. With the original point of contention destroyed, the debates would fall to the wayside. Archive Team believes that by duplicated condemned data, the conversation and debate can continue, as well as the richness and insight gained by keeping the materials. Our projects have ranged in size from a single volunteer downloading the data to a small-but-critical site, to over 100 volunteers stepping forward to acquire terabytes of user-created data to save for future generations.
The main site for Archive Team is at archiveteam.org and contains up to the date information on various projects, manifestos, plans and walkthroughs.
This collection contains the output of many Archive Team projects, both ongoing and completed. Thanks to the generous providing of disk space by the Internet Archive, multi-terabyte datasets can be made available, as well as in use by the Wayback Machine, providing a path back to lost websites and work.
Our collection has grown to the point of having sub-collections for the type of data we acquire. If you are seeking to browse the contents of these collections, the Wayback Machine is the best first stop. Otherwise, you are free to dig into the stacks to see what you may find.
The Archive Team Panic Downloads are full pulldowns of currently extant websites, meant to serve as emergency backups for needed sites that are in danger of closing, or which will be missed dearly if suddenly lost due to hard drive crashes or server failures.
To use ArchiveBot, drop by #archivebot on EFNet. To interact with ArchiveBot, you issue commands by typing it into the channel. Note you will need channel operator permissions in order to issue archiving jobs. The dashboard shows the sites being downloaded currently.
There is a dashboard running for the archivebot process at http://www.archivebot.com.
ArchiveBot's source code can be found at https://github.com/ArchiveTeam/ArchiveBot.

Brad Fitzpatrick
Gopher, Google
$ apt-get update
each "payload" (~URL) described by a protobuf:
n, err := io.Copy(dst, src)
payload_server, not the payload_fetcherpayload_fetcher still runningpayload_fetcher entirely; fast start-up time.package main import ( "fmt" "log" "net/http" "os" ) func handler(w http.ResponseWriter, r *http.Request) { fmt.Fprintf(os.Stdout, "%s details: %+v\n", r.URL.Path, r) fmt.Fprintf(w, "Hello, world! at %s\n", r.URL.Path) } func main() { log.Printf("Running...") log.Fatal(http.ListenAndServe("127.0.0.1:8080", http.HandlerFunc(handler))) }
package main import ( "log" "net/http" "os" "path/filepath" ) func main() { log.Printf("Running...") log.Fatal(http.ListenAndServe( "127.0.0.1:8080", http.FileServer(http.Dir( filepath.Join(os.Getenv("HOME"), "go", "doc"))))) }
$ curl -H "Range: bytes=5-" http://localhost:8080
package main import ( "log" "net/http" "strings" "time" ) func main() { log.Printf("Running...") err := http.ListenAndServe("127.0.0.1:8080", http.HandlerFunc(func(w http.ResponseWriter, r *http.Request) { http.ServeContent(w, r, "foo.txt", time.Now(), strings.NewReader("I am some content.\n")) })) log.Fatal(err) }
Declare who you are and who your peers are.
me := "http://10.0.0.1" peers := groupcache.NewHTTPPool(me) // Whenever peers change: peers.Set("http://10.0.0.1", "http://10.0.0.2", "http://10.0.0.3")
This peer interface is pluggable. (e.g. inside Google it's automatic.)
Declare a group. (group of keys, shared between group of peers)
var thumbNails = groupcache.NewGroup("thumbnail", 64<<20, groupcache.GetterFunc( func(ctx groupcache.Context, key string, dest groupcache.Sink) error { fileName := key dest.SetBytes(generateThumbnail(fileName)) return nil }))
Request keys
var data []byte err := thumbNails.Get(ctx, "big-file.jpg", groupcache.AllocatingByteSliceSink(&data)) // ... http.ServeContent(w, r, "big-file-thumb.jpg", modTime, bytes.NewReader(data))
// A SizeReaderAt is a ReaderAt with a Size method. // // An io.SectionReader implements SizeReaderAt. type SizeReaderAt interface { Size() int64 io.ReaderAt } // NewMultiReaderAt is like io.MultiReader but produces a ReaderAt // (and Size), instead of just a reader. func NewMultiReaderAt(parts ...SizeReaderAt) SizeReaderAt { m := &multi{ parts: make([]offsetAndSource, 0, len(parts)), } var off int64 for _, p := range parts { m.parts = append(m.parts, offsetAndSource{off, p}) off += p.Size() } m.size = off return m }
// NewChunkAlignedReaderAt returns a ReaderAt wrapper that is backed // by a ReaderAt r of size totalSize where the wrapper guarantees that // all ReadAt calls are aligned to chunkSize boundaries and of size // chunkSize (except for the final chunk, which may be shorter). // // A chunk-aligned reader is good for caching, letting upper layers have // any access pattern, but guarantees that the wrapped ReaderAt sees // only nicely-cacheable access patterns & sizes. func NewChunkAlignedReaderAt(r SizeReaderAt, chunkSize int) SizeReaderAt { // ... }
r only sees ReadAt calls on 2MB offset boundaries, of size 2MB (unless final chunk)// +build ignore,OMIT
package main
import (
"io"
"log"
"net/http"
"sort"
"strings"
"time"
)
var modTime = time.Unix(1374708739, 0)
func part(s string) SizeReaderAt { return io.NewSectionReader(strings.NewReader(s), 0, int64(len(s))) } func handler(w http.ResponseWriter, r *http.Request) { sra := NewMultiReaderAt( part("Hello, "), part(" world! "), part("You requested "+r.URL.Path+"\n"), ) rs := io.NewSectionReader(sra, 0, sra.Size()) http.ServeContent(w, r, "foo.txt", modTime, rs) }
func main() {
log.Printf("Running...")
http.HandleFunc("/", handler)
log.Fatal(http.ListenAndServe("127.0.0.1:8080", nil))
}
// START_1 OMIT
// A SizeReaderAt is a ReaderAt with a Size method.
//
// An io.SectionReader implements SizeReaderAt.
type SizeReaderAt interface {
Size() int64
io.ReaderAt
}
// NewMultiReaderAt is like io.MultiReader but produces a ReaderAt
// (and Size), instead of just a reader.
func NewMultiReaderAt(parts ...SizeReaderAt) SizeReaderAt {
m := &multi{
parts: make([]offsetAndSource, 0, len(parts)),
}
var off int64
for _, p := range parts {
m.parts = append(m.parts, offsetAndSource{off, p})
off += p.Size()
}
m.size = off
return m
}
// END_1 OMIT
type offsetAndSource struct {
off int64
SizeReaderAt
}
type multi struct {
parts []offsetAndSource
size int64
}
func (m *multi) Size() int64 { return m.size }
func (m *multi) ReadAt(p []byte, off int64) (n int, err error) {
wantN := len(p)
// Skip past the requested offset.
skipParts := sort.Search(len(m.parts), func(i int) bool {
// This function returns whether parts[i] will
// contribute any bytes to our output.
part := m.parts[i]
return part.off+part.Size() > off
})
parts := m.parts[skipParts:]
// How far to skip in the first part.
needSkip := off
if len(parts) > 0 {
needSkip -= parts[0].off
}
for len(parts) > 0 && len(p) > 0 {
readP := p
partSize := parts[0].Size()
if int64(len(readP)) > partSize-needSkip {
readP = readP[:partSize-needSkip]
}
pn, err0 := parts[0].ReadAt(readP, needSkip)
if err0 != nil {
return n, err0
}
n += pn
p = p[pn:]
if int64(pn)+needSkip == partSize {
parts = parts[1:]
}
needSkip = 0
}
if n != wantN {
err = io.ErrUnexpectedEOF
}
return
}
payload_server, no payload_fetcher)
groupcache, now open source (github.com/golang/groupcache)Brad Fitzpatrick
Gopher, Google