The Internet Archive discovers and captures web pages through many different web crawls.
At any given time several distinct crawls are running, some for months, and some every day or longer.
View the web archive through the Wayback Machine.
Web wide crawl with initial seedlist and crawler configuration from March 2011. This uses the new HQ software for distributed crawling by Kenji Nagahashi.
What’s in the data set:
Crawl start date: 09 March, 2011
Crawl end date: 23 December, 2011
Number of captures: 2,713,676,341
Number of unique URLs: 2,273,840,159
Number of hosts: 29,032,069
The seed list for this crawl was a list of Alexa’s top 1 million web sites, retrieved close to the crawl start date. We used Heritrix (3.1.1-SNAPSHOT) crawler software and respected robots.txt directives. The scope of the crawl was not limited except for a few manually excluded sites.
However this was a somewhat experimental crawl for us, as we were using newly minted software to feed URLs to the crawlers, and we know there were some operational issues with it. For example, in many cases we may not have crawled all of the embedded and linked objects in a page since the URLs for these resources were added into queues that quickly grew bigger than the intended size of the crawl (and therefore we never got to them). We also included repeated crawls of some Argentinian government sites, so looking at results by country will be somewhat skewed.
We have made many changes to how we do these wide crawls since this particular example, but we wanted to make the data available “warts and all� for people to experiment with. We have also done some further analysis of the content.
If you would like access to this set of crawl data, please contact us at info at archive dot org and let us know who you are and what you’re hoping to do with it. We may not be able to say “yes� to all requests, since we’re just figuring out whether this is a good idea, but everyone will be considered.
TIMESTAMPS
The Wayback Machine - https://web.archive.org/web/20110717011341/http://www.chicagosistercities.com/cities/delhi.php
Committee Chair: Smita Shah [ Bio ] Committee Vice-Chair: Niranjan Shah
ABOUT DELHI – Fast Facts
Chief Minister of Delhi: Sheila Dikshit Country Location:  Southern Asia  Country Population:  Over 1 billion  City Population:  Over 15 million  Time Zone:  IST (UTC+5:30) Geography: Northern India.  It borders the Indian sate of Uttar Pradesh on the south and Haryana on the west. Delhi can be divided into three major geographical regions: the Yamuna flood plain, the Delhi ridge and the Gangetic Plains. Yamuna is the only main river flowing through Delhi. Most of the city, including New Delhi lies west of the river. East of the river is the urban area of Shahdara. Demographics:  Delhi is a cosmopolitan city due to the multi-ethnic and multi-cultural presence of the Indian bureaucracy, political system and expanding economy. There are more than 160 embassies and a every growing population. In 2003 the National Capital Terrioty of Delhi had a population of 14.1 million people making Delhi the second largest metropolitan area in India after Mumbai. Climate:  Delhi has a semi-arid climate with a high variation between summer and winter temparatures. Summers are long, from early April until October with the monsoon season in between. Winter starts in November and peaks in January. The city has a pleasant climate from February to April and from August to November. History: The area has been settled for 2,500 years. Since the 12th century, Delhi has seen the rise and fall of 7 major powers. In 1803, the British captured Delhi and made it the capital in 1911. Since Independence in 1947, Delhi has prospered as the capital of India. In the past decade, its population has increased by 50%, largely due to rapid economic expansion and increased job opportunities. Language:  Hindi is the principal language that is spoken and written.  The secondary spoken language is English. India host a multitude of various dialects whereas most people commonly speak more than one language. Did You Know?  The Chicago Art Institute happens to be the venue where the famous Hindu monk Swami Vivekananda addressed the Parliament of the World’s Religions in 1893. On September 11, 1995, the Art Institute put up a bronze plaque to commemorate Swami Vivekananda’s historic address. The plaque reads: “ On this site between September 11 and 27, 1893, Swami Vivekananda (1863-1902), the first Hindu monk from India to teach Vedanta in America, addressed the World’s Parliament of Religions, held in conjunction with the World’s Columbian Exposition. His unprecedented success opened the way for the dialogue between eastern and western religions. �  On November 11, 1995, the stretch of Michigan Avenue that passes in front of the Art Institute was formally conferred the honorary name “Swami Vivekananda Way.�
PAST DELHI – CHICAGO PROGRAMS
2010
September 20
Focus: Business, Government On the morning of September 20th, the Chicago Council on Global Affairs and the Confederation of Indian Industry, in collaboration with the U.S. Commercial Service - U.S. Department of Commerce, Illinois Department of Commerce and Economic Opportunity, the Delhi Committee of Chicago Sister Cities International, and the Consulate General of India in Chicago hosted the 2nd Annual U.S. – India Business Opportunities Summit at the Swissotel Chicago. The keynote address was made by His Excellency Anand Sharma, Minister of Commerce and Industry, Government of India, and the program included two panel discussions with U.S. and Indian business executives that explored manufacturing and R & D collaborations. A networking luncheon followed the program.
Though a few thousand Indians congregated on the West Coast by the early part of the twentieth century, the first major influx of Indians into Chicago awaited the arrival of graduate students and professionals eligible under the Immigration and Nationality Act of 1965. As with many other immigrant groups, the men arrived first, followed some years later by their families. The Indian population has grown steadily, though the increase owes less to the arrival of new professionals and more to the extended family system prevalent in India. By the end of the twentieth century, Chicago had the third-largest concentration of Indians in the United States. The 1980 census recorded 33,541 Indians in the Chicago metropolitan region; in 2000, the number had grown to 125,208. Many are professionals, particularly prominent in the sciences, medicine, the computer industry, and management. The number of Indian students at universities remains large, but a working-class population is also emerging. As in other large cities, Indians are visible as taxi drivers, shopkeepers, and gas station owners.
When:Â Thursday, March 17, 2011
Where:Â The Chicago Club
81 East Van Buren Street
Chicago, IL 60605
Tickets:  Members $35, Nonmembers $45
Time
7:30 a.m:Â Registration and Continental Breakfast
8:00 a.m:Â Remarks and Q & A
9:15 a.m:Â Â Adjournment
Featuring
Rik Geiersbach, Vice President, Corporate Strategy, The Boeing Company
Varun Bajpai, Chief Executive Officer, SBI Macquarie Infrastructure Management
Rajat Gupta, Senior Partner and Leader, Infrastructure and Power Practice, [...]
The 6th Annual Chicago Sister Cities International Festival will transform Daley Plaza into an international village filled with food, music, dance and merchants, August…