Skip to content



Bringing Data Together: Q&A with Chaitan Baru

Chaitan Baru wears many hats, all of them related to data. He's the director of SDSC's Science R&D Division and also directs the SDSC/Calit2 Synthesis Center. He is involved in leading national-scale projects developing cyberinfrastructure for an array of science communities, including geoscience (GEON), ecology (NEON), hydrology (CUAHSI Hydrologic Information Systems), earthquake engineering (NEESit), and environmental engineering (CLEANER), see acronyms below. Prior to joining SDSC in 1996, Baru spent time in both industry at IBM Santa Teresa Lab and in academia at the University of Michigan. Baru received his PhD and ME in Electrical Engineering from the University of Florida and BTech in Electronics Engineering from the Indian Institute of Technology in Madras.

During Hurricane Katrina, thousands of people needed to locate missing loved ones, and you coordinated an urgent SDSC effort to develop the "missing and found" database at for the Red Cross. What lessons did you learn, and what has grown out of that experience?

Baru: This was a transforming experience, and showed us in very clear terms that at SDSC we have the end-to-end cyberinfrastructure capabilities and, more importantly, the human expertise to respond immediately with a working solution that gave real, immediate help to people even while this emergency was still unfolding. We also saw the dedication and tireless commitment of our staff to helping out in this disaster, working around the clock. It was amazing how quickly our team developed the capability to "clean" data collected in the field, literally in "real time" as the disaster was taking place.

As the Red Cross saw what we could do, their reliance on us literally grew day by day. We also learned that there are simply no facilities at the state or federal level that can provide the kind of IT services we were delivering during the disaster. Based on this experience and need, and working with the American Red Cross we've now been asked to host the website at SDSC, which is linked directly from the main Web page of the Red Cross and gives people displaced in a disaster a place to post messages so loved ones can know they're safe.

To carry this capability and experience forward, we also received an NSF grant for exploratory research on Cyberinfrastructure Preparedness for Emergency Response and Relief. And the University of California recently launched a multi-campus research effort called the California Hazards Research Institute, where I'll serve as associate director from UCSD. So SDSC is definitely playing a growing role in disaster preparedness.

You lead the IT component of the groundbreaking NSF GEON "Cyberinfrastructure for the Geosciences" project, which has become a model for other disciplines. Can you tell us some highlights of that project?

Baru: GEON is truly an equal collaboration between Earth Scientists and IT researchers, and the results of our work have energized many in the Earth Sciences community as they see the value of GEON's cyberinfrastructure and "data portal" that can deliver discovery, access, and integration of distributed, heterogeneous data sets - capabilities that are changing the way science is done.

Based on these advances, a lot of other projects have been inspired to build on the cyberinfrastructure developed in GEON. For example, the Network for Earthquake Engineering Simulation (NEES) project has deployed a prototype portal based on this technology, and the Hydrologic Information System (HIS) project has extended the GEON software for their specific applications. Two recently funded NSF projects under the Cyberinfrastructure for Environmental Observatories Prototype (CEOP) program are also basing their systems on GEON infrastructure.

The GEON Light Detection And Ranging (LIDAR) Workflow has also become very popular because it gives easy access to massive LiDAR datasets, which give much higher resolution topogrpahy of the earth's surface than ever before, and are just not practical for most scientists to access without this service. This is a classic SDSC example of applying high-performance computing to data, and providing access to this powerful capability through easy-to-use, Web-based portal interfaces. We're using a 32-way node of SDSC's data-oriented DataStar supercomputer and a parallel DB2 database to serve the LiDAR data.

GEON has also inspired a lot of international activity, which we've done in close collaboration with the Pacific Rim Applications and Grid Middleware Assembly (PRAGMA) project. There's now an international GEON node in India and strong collaborations with the Japanese GEO Grid project. And we've held GEON Cyberinfrastructure workshops in Hyderabad, India and Beijing, China, with a joint GEON-GEO Grid workshop being planned for Bangkok in March, and a GEON workshop scheduled for Moscow in June. So the GEON effort is having a big impact, carrying SDSC cyberinfrastructure to many other disciplines and around the world.

Building on experience in GEON, you're playing an important role in emerging biodiversity informatics projects such as NSF's NEON, the National Ecological Observatory Network. What's on the horizon in that area?

Baru: The NSF National Ecological Observatory Network (NEON) is a pioneering project that will collect ecosystem data nationwide at an unprecedented scale. We're currently working on the NEON Diagnostic Testbed, and SDSC is leading the design and implementation of NEON cyberinfrastructure. We'll have to manage data from thousands of sensor streams across the US as well as data from airborne platforms and field data collections. We'll need to provide the capability to monitor all aspects of this system, and the Diagnostic Testbed allows us to work out the complex "end-to-end" challenges of NEON's instrument platforms. The next step will be to build a NEON prototype as a precursor to rolling out the full continental-scale observatory.

We're also involved in an effort with Conservation International and their Tropical Ecological Assessment and Monitoring (TEAM) project. Our UCSD colleague Dr. Peter Arzberger put us in touch with them, and SDSC will provide the cyberinfrastructure for collecting environmental and ecological data in 10 tropical sites across South America and Africa. All of this data will be entered from the field into data systems at SDSC and made accessible through data portals we'll develop.

Both of these projects are designed as long-term efforts that will last for decades. That's one of the exciting challenges in designing this cyberinfrastructure -- to make sure that the structure we design today is flexible enough to stand the test of time, and to evolve over time.

What is your background and how did you come to SDSC?

Baru: My background is in engineering and computer science. During my career I've worked in both academia and industry, which is very helpful in this environment. My expertise is in database systems, and I came to SDSC because I was attracted by the challenges of applying database technologies to problems in scientific data management. That sounded more interesting than applying database technologies to e-commerce applications. I saw this as a fertile area for R&D-and I haven't been disappointed!

When I came to SDSC I was hired by Reagan Moore to work as the Technical Project Manager for the Distributed Object Computation Testbed (DOCT) project, which in many ways was a precursor to NPACI, the National Partnership for Advanced Computational Infrastructure. Along with Mike Wan and Arcot Rajasekar, I was one of the original members of the SRB (Storage Resource Broker) project. Since then the last 10 years have really been fast-paced!

Tell us about what you enjoy doing outside of work.

Baru: I enjoy spending time with my family in the beautiful outdoors of San Diego. I have an 11-year old son and a 9-year old daughter. Last year, the family did the 21-mile Tour of Borrego bicycle ride…it was a blast! We plan to do it again this year. I enjoy hiking with my son on trails along the coast or in the back country. And we all love to go on walks along the beaches in Torrey Pines and Del Mar.

More information:
GEON "Cyberinfrastructure for the Geosciences"
National Ecological Observatory Network (NEON)
Consortium of Universities for the Advancement of Hydrologic Science, Inc. (CUAHSI) Hydrologic Information Systems
Network for Earthquake Engineering Simulation Cyberinfrastructure Center (NEESit)
Collaborative Large-scale Engineering Analysis Network for Environmental Research (CLEANER)