NYU has launched a new multi-million dollar collaboration to enable university researchers to harness the full potential of the data-rich world that characterizes all fields of science and discovery.
This partnership, which also includes the University of California, Berkeley and the University of Washington, will spur collaborations within and across the three campuses and other partners pursuing similar data-intensive science goals.
The new five-year, $37.8 million initiative, with support from the Gordon and Betty Moore Foundation and Alfred P. Sloan Foundation, was announced today at a meeting sponsored by the White House Office of Science and Technology Policy (OSTP) focused on developing innovative partnerships to advance technologies that support advanced data management and data analytic techniques.
At a time when the natural, mathematical, computational, and social sciences are all producing data with relentlessly increasing volume, variety, and velocity, capturing the full potential of a progressively data-rich world has become a daunting hurdle for both data scientists and those who use data science to advance their research.
While data science is already contributing to scientific discovery, substantial systemic challenges need to be overcome to maximize its impact on academic research.
To overcome these challenges, this effort seeks to achieve three core goals:
• Develop meaningful and sustained interactions and collaborations between researchers with backgrounds in specific subjects (such as astrophysics, genetics, economics) and in the methodology fields (such as computer science, statistics, and applied mathematics), with the specific aim of recognizing what it takes to move each of the sciences forward;
• Establish career paths that are long-term and sustainable, using alternative metrics and reward structures to retain a new generation of scientists whose research focuses on the multi-disciplinary analysis of massive, noisy, and complex scientific data and the development of the tools and techniques that enable this analysis; and
• Build on current academic and industrial efforts to work toward an ecosystem of analytical tools and research practices that is sustainable, reusable, extensible, learnable, easy to translate across research areas, and enables researchers to spend more time focusing on their science.
“Dramatic expansion in the scale of data collection, analysis, and dissemination could revolutionize the speed and volume of discovery,” said Chris Mentzel, Moore’s Data-Driven Discovery program officer. “However, success ultimately depends on the individuals and teams that combine subject-matter expertise with computational, statistical, and mathematical skills – what we are calling ‘data science.’ ”
“It’s been hard to establish these essential roles as durable and attractive career paths in academic research,” explained Josh Greenberg, who directs the Sloan Foundation’s Digital Information Technology program. “This joint project will work to create examples at the three universities that demonstrate how an institution-wide commitment to data scientists can deliver dramatic gains in scientific productivity.”
The initiative will tap leading researchers at their respective institutions – and some of the best minds in science and academia. Faculty leads include:
• Yann LeCun, Silver Professor of Computer Science and Neural Science at NYU’s Courant Institute of Mathematical Sciences and founding director of NYU’s Center for Data Science;
• Saul Perlmutter, professor of physics at the University of California, Berkeley, astrophysicist at Lawrence Berkeley National Laboratory, and Nobel laureate; and
• Ed Lazowska, Bill & Melinda Gates Chair in Computer Science and Engineering at the University of Washington and director of the University of Washington’s eScience Institute.
The three leaders believe universities are uniquely positioned to empower researchers to harness the deluge of valuable, heterogeneous, and noisy data continuing to come their way – and help navigate the flood of software analysis tools and approaches that are often incompatible, hard to learn, or poorly written by brilliant scientists trying to get their job done.
“As someone whose research science depends on the fluent use of data,” said Perlmutter, lead faculty member at the University of California, Berkeley, “I'm excited that we now have an opportunity to identify the typical data-science barriers, little and big, that slow our progress, and to see which could be mitigated – or, occasionally, just plain solved!”
“We must build on our existing efforts that leverage existing industry tools, generate new working tools and practices, and support the multi-disciplinary experts who develop new approaches and tools needed to fill gaps,” said Lazowska, faculty lead at the University of Washington. “Working together, we believe we're going to shift the culture at our universities – and help accelerate broader uptake – for supporting data-intensive discovery.”
“With the onslaught of data, much of the knowledge in the world is going to be extracted by machines,” said LeCun, faculty lead at NYU. “Universities must find new ways to advance data-science methodologies while facilitating the use of new methods and tools by researchers from every field. Universities also have an opportunity to train new generations of researchers in data- driven science.”
“This initiative isn't just to ‘do’ science—it is to ‘change’ science,” added NYU Physics Professor David Hogg, who is also part of the initiative. “To that end, we have designed a set of programs and positions at NYU that will help scientists in the domains—for example, astronomy, psychology, and sociology—interact with scientists in the methods—applied mathematics, statistics, and computer science—to make both groups of scientists more capable and more successful.
“We hope that by leading by example in these areas, we will encourage other universities in the U.S. and around the world to think about interdisciplinary programs and projects that will create new opportunities for young scientists to make breakthroughs using big, complex, and rich data sets.”
Each of the three universities will contribute additional resources to the investment made by the Moore and Sloan foundations, including new faculty positions, physical space on campus, and research support.
Each of the partner universities distinguished itself in recent years by pioneering new approaches to discovery in fields as diverse as astronomy, biology, oceanography, and sociology through deep collaborations between researchers in these fields and researchers in data science methodology fields such as computer science, statistics, and applied mathematics.
This new partnership—a coordinated, distributed experiment involving researchers at these leading universities—hopes to establish models that will dramatically accelerate this data science revolution by addressing several specific challenges.
Cross-university teams will organize their efforts around six primary areas: strengthening an ecosystem of tools and software environments, establishing academic careers for data scientists, championing education and training in data science at all levels, promoting and facilitating efforts that are accessible and reproducible, creating physical and intellectual hubs for data science activities, and identifying the scientists’ data- science bottlenecks and needs through directed ethnography.
This partnership will connect with others, practice open science, and share lessons along the way.
Earlier this year, NYU established a Center for Data Science and, this fall, began a two-year master’s degree program in data science.