Software
During the last several years, we have developed several related software tools for bulk data transfer over high speed wide area networks.
Sector
In the spring of 2005, we began developing a new P2P distributed storage system,
based upon UDT, called Sector. Sector is currently running on a
wide area 10G, high performance NSF-supported testbed we operate called the Teraflow Testbed.
Either the entire SDSS DR6, or a portion of it, is stored on various nodes
that are part of the testbed.
The P2P Sector software you can download will select automatically
the node or nodes that are nearest to you in order to provide you the requested files.
We chose to use a P2P distributed file system for the
following reasons. First, data sets these days are often so large that
it is difficult to store them on a single node, and therefore it is
convenient to distribute them across several nodes. Second, you can
usually achieve higher performance by retrieving files from nodes that
are closer to you. Sector automatically retrieves the requested data
from the required node or nodes. Finally, P2P distributed file
systems are more robust than traditional file systems in the sense
that nodes can easily be added or dropped without effecting the
availability of the data.
The core of Sector is a distributed file system built on top of
a P2P routing infrastructure. The client for downloading the SDSS
data is specific application using the Sector API. You can
download other data stored on the Teraflow Testbed by simply
providing the appropriate list of files.
UDT
UDT is an application level data transport protocol designed
for the emerging applications that will require transfer of large
amounts of data distributed over high-speed wide area networks
(e.g., 1 Gb/s or above). UDT uses UDP to transfer data but unlike
simple UDP it has its own reliability control and congestion
control mechanisms. UDT is not only for private or QoS-enabled
links, but also for shared networks. Furthermore, the current
version of UDT (version 3.0) is designed using a Composible
framework that supports multiple congestion control
algorithms.
For more information about UDT, please visit udt.sf.net.
UDT-Gateway
For many end users, it is easier to use a file transfer utility
employing TCP, or a web application employing HTTP and TCP, rather
than to use UDT directly. To support this requirement, we
developed the UDT-Gateway utility. To the user, it appears they
are accessing data using a TCP-based application on the gateway
machine, but, in fact, the data resides on a data server that is
connected to the gateway machine using a high performance network
and UDT. The data server can serve multiple gateway machines.
Specifically, the UDT gateway behaves exactly as an HTTP file
server, and serves clients files via the ordinary HTTP/TCP
channels. However, the gateway server does not host the files it
serves locally. When a request arrives for a file, the file is
streamed from a central repository via UDT, then streamed to the
end consumer via TCP. In other words, the gateway machine allows
the user to access large data sets using UDT and high performance
networks for all except the “last mile,” which is
handled using more standard networks and TCP.
©2006 National Center for Data Mining. Last updated on Friday June 23, 2006 11:35 PM.