Skip to All NYU Navigation Skip to Main Content

Physicists Find Way to Address Field’s “Big Data” Gap

A team from NYU, the University of Washington, and the University of California, Berkeley has developed a new kind of interactive workshop in data science for researchers at multiple stages of their careers.

Above, the 2018 Neurohackademy, a two-week version of Neuro Hack Week. © 2018 University of Washington, Credit: Alex Alspaugh/University of Washington.

Physics has become a big-data endeavor.

Each night, high-definition cameras mounted to telescopes collect terabytes of data about objects in the sky. Each day, scientists sequence the genomes of people, animals, plants and microbes for biomedical and evolutionary research. Each year, the Large Hadron Collider produces 30 petabytes of data on particle collisions.

However, scientists are not universally adept in “data science” — the computing and statistical skillsets needed to handle, sort, analyze and draw conclusions from big data. The shortage of know-how in data science can hamper research, medicine and even private industry.

To address this gap, a team from NYU, the University of Washington, and the University of California, Berkeley has developed a new kind of interactive workshop in data science for researchers at multiple stages of their careers.

The course format, called “hack week,” blends elements from both traditional lecture-style pedagogy with participant-driven projects. The most recent was a neuroscience-themed event held in July on the University of Washington campus.

As the team reports in a paper published this month in the Proceedings of the National Academy of Sciences, participants rated the hack weeks as opportunities to learn about new concepts, foster new connections, share data openly, and develop skills and work on problems that will positively affect their day-to-day research lives.

“Everyone at the hack week learns by trying things out in real time, and by trying things out on their own research problems,” explains paper co-author David Hogg, a professor of physics and deputy director of NYU’s Center for Data Science. “That means that what they learn is learned efficiently and effectively, and with an assurance of its relevance to their work.”

“Participants bring their research problems, and directly apply what they are learning, as they are learning it, to their projects,” he adds. “This means that everything we do at the hack week is directly contextualized in the work that participants care about.”

“The idea behind hack week was to bring together people who were interested in data science and give them a place to meet, talk and exchange ideas,” says lead author Daniela Huppenkothen, associate director of the University of Washington’s astronomy-focused DIRAC Institute. “But instead of a traditional format with experts lecturing nonexperts, this would allow participants to mingle more and teach one another.”

Hogg and Huppenkothen were involved in the inaugural hack week event, “Astro Data Hack Week,” held at the University of Washington in 2014, followed by Astro Hack Week at NYU in 2015. Those events brought together big-data researchers in astrophysics and cosmology. Since then, the team has held three more Astro Hack Week events, three “Neuro Hack Week” events for neuroscience and two “Geo Hack Week” events for the geosciences.

All hack week events have the same basic design and organizing principles. They usually commence with some structured periods for instruction, and then shift toward time for participant-driven, open-ended projects, as well as peer networking and free discussion. The projects can resemble a hackathon, but with greater emphasis on collaboration and learning rather than specific outcomes. Hack week participants tackle their projects in smaller groups, with organizers circulating to observe and provide feedback or encouragement.

The projects range from experiments that the participants brought from their home institutions to ideas that come up during the course. One project from the inaugural Astro Hack Week, for example, eventually became Stingray, a software project to provide algorithms to analyze time-series data in astronomy. At last month's Neurohackademy, a new two-week version of Neuro Hack Week, one team worked on developing common ways to analyze different types of MRI scans.

The events’ open-ended structure places greater responsibility on the organizers of each hack week.

Their paper includes supplementary materials detailing the hack week experiences and advice for other groups interested in starting their own workshops.

Participants gave hack weeks high scores for promoting open-science principles — in which researchers publicly post and share their datasets, code, and methods. Open science principles are critical to addressing challenges that researchers face in making their research more reproducible, says co-author Ariel Rokem, a data scientist with the University of Washington’s eScience Institute and co-organizer of the recent Neurohackademy, along with Tal Yarkoni at the University of Texas at Austin.

“One of our goals with the hack week format is to elevate the quality of science being done,” said Rokem. “The best way to do that is to try out ideas and share what you’ve learned.”

Additional co-authors are Karthik Ram at the Berkeley Institute for Data Science at the University of California, Berkeley and Anthony Arendt and Jake VanderPlas at the University of Washington’s eScience Institute.

The research was funded by the National Institutes of Health; the University of Washington; New York University; the University of California, Berkeley; the Charles and Lisa Simonyi Fund for Arts and Sciences; and the Washington Research Foundation.

Additional contact:
James Urton
University of Washington