Published November 2, 2015
The National Science Foundation (NSF) has announced funding for a ‘Big Data’ Innovation Hub for the Western United States intended to facilitate collaboration among the region’s technology sector and other organizations to address research challenges in areas such as precision medicine, natural resource utilization, hazard management, and metro regional development.
The Western Hub is part of an NSF program announced today that includes four awards totaling more than $5 million to establish regional hubs for data science innovation. The consortia are coordinated by top data scientists at Columbia University (Northeast Hub); Georgia Tech and the University of North Carolina (South Hub); the University of Illinois at Urbana-Champaign (Midwest Hub); and the University of California, San Diego, the University of California, Berkeley, and the University of Washington (West Hub).
Covering all 50 states, they include commitments from 281 organizations – from universities and cities to foundations and Fortune 500 corporations – with the ability to expand further over time. Building upon the White House National Big Data Research and Development Initiative announced in 2012, the awards are made through the Big Data Regional Innovation Hubs (BD Hubs) program, which creates a new framework for multi-sector collaborations among academia, industry and government.
The program calls for creating an infrastructure to define and evaluate those collaborations. The Western BD Hub will connect state and regional organizations including academia, industry, state agencies, and non-profit organizations that regard the potential of large-scale data management and analysis as transforming or adding value to their operations.
Project principal investigators for the Western BD Hub include Michael Norman, director of the San Diego Supercomputer Center (SDSC) at UC San Diego; Michael Franklin, the Thomas M. Siebel Professor of Computer Science and Chair of the Computer Sciences Division at UC Berkeley; and Ed Lazowska, the Bill & Melinda Gates Chair in Computer Science & Engineering at the University of Washington. A full list of project personnel can be found on the BDHUB website.
The NSF has defined big data as large, diverse, complex, longitudinal, and/or distributed data sets generated from instruments, sensors, Internet transactions, email, video, click streams, and/or all other digital sources available today and in the future.
“Partnerships created through the Western BD Hub will focus on development and application of big data technologies, data standards, relevant policies and ethics, and innovative data-intensive discovery techniques,” said SDSC Director Michael Norman. “These will be leveraged with the aim of transforming how data is collected, integrated, stored, analyzed, and shared, all with the goal of assessing risks related to regional and long-term decisions.”
Moreover, partnerships enabled by the BD Hubs program will lead to professional certificate programs and student internships, creating a pipeline of graduates from partner institutions to join industry, public/government agencies, national labs, resource-planning agencies, and regulatory commissions.
“The BD Hubs program represents a unique approach to improving the impact of data science by establishing partnerships among likeminded stakeholders,” said Jim Kurose, NSF’s head of Computer and Information Science and Engineering. “In doing so, it enables teams of data science researchers to come together with domain experts, with cities and municipalities, and with anchor institutions to establish and grow collaborations that will accelerate progress in a wide range of science and education domains with the potential for great societal benefit.”
Big Data “Spokes”
Along with the BD Hubs awards, the NSF posted a solicitation for the next phase of the BD Hubs program. The agency will award approximately $10 million in grants as part of the Big Data Spokes program (BD Spokes) to help initiate research in specific priority areas identified by the BD Hubs. Each BD Spoke will focus on a specific BD Hub priority area and address one or more of three key issues: improving access to data; automating the data lifecycle; and applying data science techniques to solve a domain science problem and/or demonstrate societal impact.
The following are thematic areas which could develop into spokes over the course of the BD Hubs program:
BD Hubs Leadership Meeting
The announcement of the BD Hubs awards and BD Spokes solicitation comes days before the first national stakeholders meeting of the BD Hubs, to be held on November 3-5 in Arlington, Virginia. This national BD Hubs “charrette” will provide an opportunity for leaders and researchers representing each BD Hub to discuss governance and sustainability models, coordinate ideas for BD Spokes and identify next steps.
The last day of the meeting will include two public webinars. At the first webinar, BD Hubs representatives will publicly present and discuss their plans, as well as mechanisms for governance and coordination among BD Hubs stakeholders. The second webinar will be held in conjunction with the National Data Science Organizer’s Workshop and will discuss the role of the BD Hubs in engaging with grassroots data science groups, such as Meetup groups and non-profits.
As an Organized Research Unit of UC San Diego, SDSC is considered a leader in data-intensive computing and cyberinfrastructure, providing resources, services, and expertise to the national research community, including industry and academia. Cyberinfrastructure refers to an accessible, integrated network of computer-based resources and expertise, focused on accelerating scientific inquiry and discovery. SDSC supports hundreds of multidisciplinary programs spanning a wide variety of domains, from earth sciences and biology to astrophysics, bioinformatics, and health IT. SDSC’s Comet joins the Center’s data-intensive Gordon cluster, and are both part of the National Science Foundation’s XSEDE (eXtreme Science and Engineering Discovery Environment) program, the most advanced collection of integrated digital resources and services in the world.