Directing DataCentral: Q&A with Natasha Balac
Natasha Balac is responsible for the leadership and management of the Data Applications and Services group at SDSC. This group supports DataCentral, the first national program of its kind to host and make available significant research and community data collections and databases. Natasha received her Bachelor's degree in Computer Science from Middle Tennessee State University as well as her Master's and Ph.D. in Computer Science from Vanderbilt University. She has been with SDSC since 2003.
You direct DataCentral, a national repository at SDSC for research and education data. What are your goals for DataCentral?
Balac: Interdisciplinary collaborations and community-shared data are becoming increasingly important for the progress of science. In recent years, the rate of data generation has increased such that data storage, management and analysis are now among the fundamental challenges encountered by scientists.
Our goal for DataCentral is to provide a wide range of professional services and resources as a national hosting environment for data collections of importance in the science, engineering, education, social science, art and humanities communities.
There are many important data collections stored on DVD and tape just collecting dust on researchers' shelves due to a lack of resources and/or expertise. Our mission is to make all of these significant collections available to user communities to share, analyze, mine and preserve in order to advance science and education.
SDSC has experienced increasing demand from many domain communities for collaborations on data management. Applications include the publishing of data in digital libraries, sharing of large data collections, creating large-scale databases and analyzing/mining the data. We have launched DataCentral in order to support these endeavors and satisfy user requirements.
How do you get an account on DataCentral and what do you get for it?
Balac: There are two kinds of accounts (or Data Allocations as we call them): regular- and short-term. Short-term allocations are awarded for collections containing up to one terabyte of data requesting storage for one year or less. We have made efforts to streamline the application review process for short-term allocation requests - generally, they are reviewed and made available within a week. If the collection is larger in size, or there is a need for storing the data for more than one year, we offer medium and large data allocations for longer durations. Medium and large allocations are reviewed on a quarterly basis by the Data Allocations Committee (DAC).
As I mentioned before, an increasing number of researchers and projects are utilizing SDSC to host their data in a way that facilitates sharing, analysis and preservation. Since the start of DataCentral, we have received more than 40 applications.
Once a user's request is approved by the DAC, the researcher gains expanded access to SDSC's storage resources and data services. These resources include advanced tools, data storage (disk and tape), related software and expert assistance in all data-related activities. Such assistance includes data collection and database management, hosting, analysis and mining.
We help users with creating, optimizing and porting of large-scale databases as well as the sharing of data through the Web and data grids. DataCentral services and expert staff also support publication, sharing and preservation of community data all with professional, qualified 24/7 support. More info about the resources can be found at cloud.sdsc.edu.
DataCentral's staff works with each data user, project or gateway, assisting them with their data needs and particular usage scenarios. Thus far, DataCentral has encountered a broad range of collection usage models, sizes, types, accessibilities and preservation styles.
What kind of collections are in DataCentral? Tell us about a collection that's particularly cool or different.
Balac: That's a tough question. Every one of these collections is unique, special and very valuable. I really can't say that I have a favorite. However, I think what is really great about all of the collections is diversity. We have collections from many different disciplines that enable researchers to accomplish fascinating, important work. The collections are comprised of a variety of data used to study topics as diverse as molecules, genes, neurons, earthquakes, tsunamis, the earth, galaxies, art, engineering and education.
Your Ph.D. is in Computer Science. What did you write your thesis on? How did you end up at SDSC?
Balac: I received my Ph.D. from Vanderbilt University in Computer Science with emphasis on Machine Learning and Artificial Intelligence. My dissertation focused on applying machine learning and data mining techniques to mobile robots. I had a great time building devices from scratch (I love soldering!), playing with several different mobile robots and making them more intelligent.
Mobile robots are capable of collecting very large amounts of data through their many sensors. We have applied existing and novel learning methods to help them make intelligent decisions on how to most efficiently accomplish given tasks. My Ph.D. adviser is currently working at JPL (Jet Propulsion Laboratory) and is putting some of these techniques on the Mars rovers!
What do you do for fun?
Balac: First and foremost, I like to spend time with my family. In the past year I have enjoyed raising and nurturing two babies - DataCentral and our 10-month-old son Luke. My husband and I are having a blast watching him grow. We are fortunate to have some of our extended family living near by. Some of my happiest moments are playing with Luke and my niece, Sophie, who is just nine days younger than Luke.
In my few spare moments I love to travel and walk/run on the beach. I enjoy sewing as a creative outlet and I love to play tennis. I am originally from Yugoslavia and came to the United States on tennis scholarship. In my 'previous life' I used to be a tennis pro. I love playing for fun and also love competing in both singles and doubles matches. It's a great way to have fun, de-stress, get exercise and make great friends!