Next Generation Cybertools
As part of the NSF's Cyberinfrastructure initiative, Cornell University has a major grant to carry out social science research on very large semi-structured datasets.
The starting point for this research is the Web. The flood of available on-line information – from Web pages to chat logs – has the potential to open up new frontiers in social science research on collective behavior of individuals. However, there are significant obstacles in realizing this opportunity. The project team, composed of experts from social and computer science, was drawn together by the enormous promise of a unique and largely untapped dataset: the Internet Archive's 40-billion page collection of Web pages. These snapshots of the Web have been captured and archived about every two months for nearly ten years. Large portions of the data are being moved to Cornell’s supercomputing center.
The research program has two major components:
- Social and Information Networks. The study of social and information networks is a topic of broad research interest at Cornell. In this program, sociologists and computer scientists are working together to develop next generation cybertools to use historical Web data to study the diffusion of ideas across time and space.
- The Web Lab. The Web Lab is a joint project of Cornell University and the Internet Archive to provide data and computing tools for research about the Web and the information on the Web. The data is provided from the Web collections of the Internet Archive and the computing facilities are based at the Cornell Theory Center.
The Web Lab supports researchers in computer science, the social sciences, and humanities, whose interests lie in the information on the Web, and computer scientists, who carry out research on the Web as an information structure. Although based at Cornell, the collection is designed for use by researchers from other universities and research centers.
Further information
- The NSF's Next Generation Cybertools program
- NSF proposal: Michael Macy, William Arms, Daniel Huttenlocher, Jon Kleinberg, David Strang,
Very Large Semi-Structured Datasets
for Social Science Research, 2005.
Summary
Research Description
References - Cornell University press release
Acknowledgments
This is an NSF Next Generation Cybertools project, grant number SES-0537606. Additional support for the Web Library comes from NSF grants CNS-0403340 and DUE-0127308, Unisys, Microsoft and Dell, and from Cornell University.
This work would not be possible without the forethought and longstanding commitment of the Internet Archive to capture and preserve the content of the Web for future generations.