Press Archive

NSF Funds ‘Big Data’ Innovation Hub for the Western U.S.

SDSC Director Michael Norman Named a Principal Investigator

Published November 2, 2015

Image from a 15-hour forecast of IWV (Integrated Water Vapor); an estimate of the total amount of water in the atmosphere that could become precipitation. The CalWater 2015 provided an opportunity earlier this year to test new forecast methods using large observational data. Hot colors (red) indicate high values; cool colors (blue) indicate low values. The arrows are wind barbs indicating wind speed and direction.  Image courtesy of Andrew Martin/SIO and John Helly/SDSC, SIO

The National Science Foundation (NSF) has announced funding for a ‘Big Data’ Innovation Hub for the Western United States intended to facilitate collaboration among the region’s technology sector and other organizations to address research challenges in areas such as precision medicine, natural resource utilization, hazard management, and metro regional development.

The Western Hub is part of an NSF program announced today that includes four awards totaling more than $5 million to establish regional hubs for data science innovation. The consortia are coordinated by top data scientists at Columbia University (Northeast Hub); Georgia Tech and the University of North Carolina (South Hub); the University of Illinois at Urbana-Champaign (Midwest Hub); and the University of California, San Diego, the University of California, Berkeley, and the University of Washington (West Hub).

Covering all 50 states, they include commitments from 281 organizations – from universities and cities to foundations and Fortune 500 corporations – with the ability to expand further over time. Building upon the White House National Big Data Research and Development Initiative announced in 2012, the awards are made through the Big Data Regional Innovation Hubs (BD Hubs) program, which creates a new framework for multi-sector collaborations among academia, industry and government.

The program calls for creating an infrastructure to define and evaluate those collaborations. The Western BD Hub will connect state and regional organizations including academia, industry, state agencies, and non-profit organizations that regard the potential of large-scale data management and analysis as transforming or adding value to their operations.

Project principal investigators for the Western BD Hub include Michael Norman, director of the San Diego Supercomputer Center (SDSC) at UC San Diego; Michael Franklin, the Thomas M. Siebel Professor of Computer Science and Chair of the Computer Sciences Division at UC Berkeley; and Ed Lazowska, the Bill & Melinda Gates Chair in Computer Science & Engineering at the University of Washington. A full list of project personnel can be found on the BDHUB website.

The NSF has defined big data as large, diverse, complex, longitudinal, and/or distributed data sets generated from instruments, sensors, Internet transactions, email, video, click streams, and/or all other digital sources available today and in the future.          

“Partnerships created through the Western BD Hub will focus on development and application of big data technologies, data standards, relevant policies and ethics, and innovative data-intensive discovery techniques,” said SDSC Director Michael Norman. “These will be leveraged with the aim of transforming how data is collected, integrated, stored, analyzed, and shared, all with the goal of assessing risks related to regional and long-term decisions.”

Moreover, partnerships enabled by the BD Hubs program will lead to professional certificate programs and student internships, creating a pipeline of graduates from partner institutions to join industry, public/government agencies, national labs, resource-planning agencies, and regulatory commissions.

“The BD Hubs program represents a unique approach to improving the impact of data science by establishing partnerships among likeminded stakeholders,” said Jim Kurose, NSF’s head of Computer and Information Science and Engineering. “In doing so, it enables teams of data science researchers to come together with domain experts, with cities and municipalities, and with anchor institutions to establish and grow collaborations that will accelerate progress in a wide range of science and education domains with the potential for great societal benefit.”

Big Data “Spokes”

Along with the BD Hubs awards, the NSF posted a solicitation for the next phase of the BD Hubs program. The agency will award approximately $10 million in grants as part of the Big Data Spokes program (BD Spokes) to help initiate research in specific priority areas identified by the BD Hubs. Each BD Spoke will focus on a specific BD Hub priority area and address one or more of three key issues: improving access to data; automating the data lifecycle; and applying data science techniques to solve a domain science problem and/or demonstrate societal impact.

The following are thematic areas which could develop into spokes over the course of the BD Hubs program:

  • Big Data Technology: Widespread interest in big data is fueling a surge of activity and innovation in data management technologies across the entire hardware/software stack. The Western region leads the nation in these efforts through its unique blend of leading universities and national laboratories such as the Lawrence Berkeley National Laboratory and Lawrence Livermore National Laboratory, which develop technology and applications that push the limits of existing technologies. The region also has a developed ecosystem of start-ups and established companies that are at the center of big data research and analysis.
  • Managing Natural Resources and Hazards: Challenges and related opportunities in the region include fresh and salt-water management, land management, plant and animal management, air quality management, and natural disaster management and response related to earthquakes, tsunamis, and wildfires. All of these require the need to catalog, control, mitigate, and defend the region’s resources, while eliminating or mitigating associated hazards.
  • Precision Medicine: Medicine is undergoing a dramatic change through the aggregation, integration and analysis of big data. In early 2015, President Obama announced the launch of the Precision Medicine Initiative to enhance innovation in biomedical research, with the goal of moving the U.S. into an era where medical treatment is tailored to each patient based on data about multiple factors to individually optimize their prognosis. These factors include genetics and other molecular profiles, individual history and lifestyle, and multiple assays of a patient’s physiological state.
  • Metro Data Science: Cities in the Western U.S., and particularly the areas of Seattle, San Francisco, and San Diego represented by the three BD Hub co-leads, are frequently structured as true metropolitan areas, interconnected and interdependent urban, suburban, and rural regions with complex dynamics between citizens, policy, infrastructure, and the environment. With increasing urbanization comes new challenges of creating efficient infrastructures in transportation, utilities, housing, communication, public services, and resource consumption.
  • Data-Enabled Scientific Discovery and Learning: Digitally generated data is streaming in from myriad sources: simulations such as global climate models or earthquake scenarios; networks of powerful sensors on the seafloor or in buildings, roads and bridges; high-bandwidth remote sensing platforms such as satellites and telescopes; high-throughput laboratory instruments; and social science data created ranging from global economic indicators. Applying the data science methodology fields of computer science, statistics, and mathematics to the traditional research domains such as the life, environmental, physical, and social sciences, will advance discovery and the nation’s ability to extract meaningful value and knowledge from the massive amounts data.

BD Hubs Leadership Meeting

The announcement of the BD Hubs awards and BD Spokes solicitation comes days before the first national stakeholders meeting of the BD Hubs, to be held on November 3-5 in Arlington, Virginia. This national BD Hubs “charrette” will provide an opportunity for leaders and researchers representing each BD Hub to discuss governance and sustainability models, coordinate ideas for BD Spokes and identify next steps.

The last day of the meeting will include two public webinars. At the first webinar, BD Hubs representatives will publicly present and discuss their plans, as well as mechanisms for governance and coordination among BD Hubs stakeholders. The second webinar will be held in conjunction with the National Data Science Organizer’s Workshop and will discuss the role of the BD Hubs in engaging with grassroots data science groups, such as Meetup groups and non-profits.

About SDSC

As an Organized Research Unit of UC San Diego, SDSC is considered a leader in data-intensive computing and cyberinfrastructure, providing resources, services, and expertise to the national research community, including industry and academia. Cyberinfrastructure refers to an accessible, integrated network of computer-based resources and expertise, focused on accelerating scientific inquiry and discovery. SDSC supports hundreds of multidisciplinary programs spanning a wide variety of domains, from earth sciences and biology to astrophysics, bioinformatics, and health IT. SDSC’s Comet joins the Center’s data-intensive Gordon cluster, and are both part of the National Science Foundation’s XSEDE (eXtreme Science and Engineering Discovery Environment) program, the most advanced collection of integrated digital resources and services in the world.