The San Diego Supercomputer Center (SDSC) at the University of California, San Diego, has formally established a new ‘center of excellence’ to assist researchers in creating workflows to better manage the tremendous amount of data being generated across a wide range of scientific disciplines, from natural sciences to marketing research.
Called the WorDS Center (or Workflows for Data Science Center of Excellence), the initiative leverages more than a decade of experience within SDSC’s Scientific Workflow Automation Technologies Laboratory in developing and validating scientific workflows for computational science, data science, and engineering at the intersection of distributed and parallel computing, big data analysis, and reproducible science, while fostering a collaborative working culture.
A full overview of WorDS services and potential applications can be viewed online.
A data science workflow is the process of combining data and processes into a configurable, structured set of steps that lead to automated computational solutions of an application. The workflows contain a full range of capabilities such as execution management, provenance tracking and reporting tools, integration of distributed computational and data management technologies, and data streaming interfaces. Creating such data science workflows, however, is not without technological challenges.
“The WorDS Center’s purpose is to allow scientists to focus on their specific areas of research rather than having to solve workflow issues, or the computational challenges that arise as data analysis progresses from task to task,” said Ilkay Altintas, SDSC’s deputy coordinator for research and director of SDSC’s Scientific Workflow Automation Technologies Laboratory, and director of the new WorDS Center. “The amount of potentially valuable information buried in what is commonly known as ‘Big Data’ is of interest to numerous data science applications, and big data workflows have been an active area of research ever since the introduction of scientific workflows in the early 2000s.”
Specifically, the expertise and services in the center will include:
The WorDS Center will be funded by a combination of sponsored agreements and recharge services. In addition to Altintas, the center’s key personnel include the following SDSC researchers: Dr. Jianwu Wang as Assistant Director for Research; Dr. Daniel Crawl as Assistant Director for Development; and Shweta Purawat as New User Applications Specialist.
“We view WorDS as an excellent opportunity to teach a much larger number of researchers how to create efficient and effective workflows,” said Altintas, one of the founders of the Kepler workflow collaboration, which provides researchers with the means to access, arrange, and share data and workflows via a common interface. “The technology behind such systems has matured to the point where this is now an opportune time to establish such a center.”
“The WorDS Center is a natural addition to SDSC’s other centers of excellence, which are part of SDSC’s larger strategic focus to help researchers across all domains, including those who are relatively new to computational science, manage the challenges posted by massive data sets or numerous smaller ones,” said SDSC Director Michael Norman. “The age of data-enabled science is upon us, and it’s here to stay.”
SDSC Centers of Excellence
The WorDS Center joins four other SDSC centers of excellence that are focused on big data management across multiple disciplines, as well as Internet topologies.
As an Organized Research Unit of UC San Diego, SDSC is considered a leader in data-intensive computing and cyberinfrastructure, providing resources, services, and expertise to the national research community, including industry and academia. Cyberinfrastructure refers to an accessible, integrated network of computer-based resources and expertise, focused on accelerating scientific inquiry and discovery. SDSC supports hundreds of multidisciplinary programs spanning a wide variety of domains, from earth sciences and biology to astrophysics, bioinformatics, and health IT. With its two newest supercomputers, Trestles and Gordon, and a new system called Comet to be deployed in early 2015, SDSC is a partner in XSEDE (Extreme Science and Engineering Discovery Environment), the most advanced collection of integrated digital resources and services in the world.
Jan Zverina, SDSC Communications
858 534-5111 or firstname.lastname@example.org
Warren R. Froelich, SDSC Communications
858 822-3622 or email@example.com