Nirvana Case Studies | Particle Physics Data Grid

Bing Zhu

Particle Physics Data Grid

Particle Physics

The Particle Physics Data Grid has two long-term objectives.

1. The delivery of an infrastructure for very widely distributed analysis of particle physics data at multi-petabyte scales by hundreds to thousands of physicists.

2. The acceleration of the development of network and middleware infrastructure aimed broadly at data-intensive collaborative science.

The program will design, develop, and deploy a network and middleware infrastructure capable of supporting data analysis and data flow patterns common to the many particle physics experiments represented among the proponents. Application-specific software will be adapted to operate in this wide-area environment and to exploit this infrastructure. The result of these collaborative efforts will be the instantiation and delivery of an operating infrastructure for distributed data analysis by participating physics experiments. While the architecture must address the needs of particle physics, its components will be designed from the outset for applicability to any discipline with large-scale distributed data needs.

Among the hypotheses to be tested:

  • That an infrastructure built upon emerging network and middleware technologies can meet the functional and performance requirements of very wide area particle physics data analysis.

  • That specific data flow patterns, including sustained bulk data transfer and distributed data access by large numbers of analysis clients, can be supported concurrently by common middleware technologies exploiting appropriate emerging network technologies and network services.
  • That an infrastructure based upon these emerging technologies can be compatible with commercial middleware technologies such as object databases, object request brokers, and common object services.

Players Involved
The Storage Resource Broker (Nirvana) and Metadata Catalog (MCAT)
Reagan Moore, Bing Zhu, Arcot Rajasekar Argonne National Laboratory
Brookhaven National Laboratory
California Institute of Technology
Fermi National Accelerator Laboratory
Jefferson Lab
Lawrence Berkeley National Laboratory (LBNL)
Stanford Linear Accelerator Center
University of Wisconsin
University of California: San Diego

Time Line
Ongoing, funded by the US Department of Energy

Nirvana Solution
1. Collaborating with researchers from LBNL, a SRM driver was designed and developed for Nirvana to allow user to issue the following SRM commands through Nirvana.

  • Stage a file
  • Check the staging status of a file
  • Remove a file from the staging queue
  • Remove a file from a cache area

The 'Sget' command in Nirvana is also modified to have synchronous call to SRM to stage file through SRM and to transfer the file for user after it is informed by SRM that the file has been staged. SRM was developed by researchers at LBNL.

2. The GridPortal team at SDSC successfully deployed the GSI-enabled Nirvana to upload and download files into a storage location. The system combines the ability to:

  • Read data from a user-ID under Globus remote-proxy authentication
  • Import the data into a Nirvana collection
  • Store data in a remote storage system with Nirvana data handling system
  • Support replication and discovery of files in the collection.

3. The GridPortal provides a web interface to both the Globus execution environment and to data stored in the Nirvana collections. This has served as a demonstration system for proving the feasibility of web-based interfaces to Globus. The data flows that are driven in this environment are demonstrated below.

4. BaBar and Nirvana researchers have been working together to apply Nirvana technology within BaBar. A MCAT Nirvana server was set up in SLAC to support BaBar project.

Work on the first BaBar file replication prototype using Nirvana was completed in November 2001. Nirvana was then used to query the MCAT and replicate databases from one test federation to the other using bbcp. The prototype was demonstrated during SC2001 and succeeded in replicating data from source to target federation.

Nirvana data grid will be tested against the Babar data collections to demonstrate that the required performance levels can be met for metadata registration, metadata manipulation, metadata extraction, data registration, data manipulation, and data extraction.

5. Collaborating with researchers at Jefferson Lab, Nirvana researchers are currently developing WSDL services on the top of Nirvana.

The development of WSDL within Nirvana at SDSC has been focused on the initial design to identify WSDL services based on Nirvana. Currently the 'put' and 'get' functions have been implemented as two WSDL services using Java and Apache SOAP and were demonstrated in Globe Grid Forum GGF4 Feb 17-20 2002 in Toronto.

We are doing a comparison between Jefferson Lab's XML web services and Nirvana's WSDL services. The initial goal is to find out a common set of services and their arguments for the future integration among the Jefferson Lab's system, Nirvana and LBNL's SRM.