Overview
Using this web site, you can download the Sloan Digital Sky
Survey (SDSS) data if you have access to a high
speed wide area network. For example, if your organization is
attached to the National Lambda Rail or
Internet2's Abilene Network,
then you should be able to download the entire SDSS BESTDR5 catalog data
set in less than five hours.
In general, it can be quite challenging to use effectively the available bandwidth
over a wide area, high performance network. This project uses the
UDP-based Data Transfer Protocol or UDT,
which has been developed by the National Center for Data Mining
(NCDM) at the University
of Illinois at Chicago to make effective use of the bandwidth available from high
performance wide area networks.
The project is supported by the National Science Foundation
through the grant SCI II: The TeraFlow Project: High Performance
Flows for Mining Large Distributed Data Archives, Award
SCI-0430781.
Sloan Digital Sky Survey (SDSS)
The SDSS is systematically mapping a
quarter of the entire sky, producing a detailed image of it, and
determining the positions and absolute brightness of more than
100 million celestial objects. It is also measuring the distances
to a million of the nearest galaxies, giving us a
three-dimensional picture of the universe through a volume one
hundred times larger than that explored to date. SDSS is also
recording the distances to 100,000 quasars — the most
distant objects known — giving us unprecedented knowledge of
the distribution of matter to the edge of the visible
universe.
The SDSS completed its first phase of operations — SDSS-I — in June, 2005. Over the course of five years, SDSS-I imaged more than 8,000 square degrees of the sky in five band passes, detecting nearly 200 million celestial objects, and it measured spectra of more than 675,000 galaxies, 90,000 quasars, and 185,000 stars. These data have supported studies ranging from asteroids and nearby stars to the large scale structure of the Universe.
The most recent data product is DR6, which was released on June, 2007.
For more information about the project, see their web site www.sdss.org.
Sector
In the spring of 2005, we began developing a new P2P distributed storage system,
based upon UDT, called Sector. Sector is currently running on a
wide area 10G, high performance NSF-supported testbed we operate called the Teraflow Network .
Either the entire SDSS DR6 catalog, or a portion of it, is stored on various nodes
that are part of the testbed.
The P2P Sector software you can download will select automatically
the node or nodes that are nearest to you in order to provide you the requested files.
We chose to use a P2P distributed file system for the
following reasons. First, data sets these days are often so large that
it is difficult to store them on a single node, and therefore it is
convenient to distribute them across several nodes. Second, you can
usually achieve higher performance by retrieving files from nodes that
are closer to you. Sector automatically retrieves the requested data
from the required node or nodes. Finally, P2P distributed file
systems are more robust than traditional file systems in the sense
that nodes can easily be added or dropped without effecting the
availability of the data.
The core of Setcor is a distributed file system built on top of
a P2P routing infrastructure. The client for downloading the SDSS
data is specific application using the Sector API. You can
download other data stored on the Teraflow Network by simply
providing the appropriate list of files.
©2006 National Center for Data Mining. Last updated on Wednesday August 15, 2007 12:01 AM.