As November 3 approaches, new tools help uncover some shadowy secrets—from YouTube's role in radicalization to the hidden trackers on your favorite websites.

In the aftermath of the 2016 US presidential election, tech and social media leaders faced renewed scrutiny for the internet’s role in shaping the election outcome. Viral conspiracy theories muddied voters’ ability to distinguish between fact and fiction, and algorithms helped to create digital echo chambers, where people were only shown information that reflected and reinforced their existing views and opinions. Shadowy online networks were placed under a national microscope as many scrambled to understand the role of social media in fuelling political polarization and threatening core tenets of US democracy.

These issues continue to plague the political process in 2020. The internet has irrevocably changed the way voters engage with our democratic institutions and, with the November 3 election just days away, journalists and researchers worldwide are working to understand how what happens online could influence the election outcome in 2020, much like it did in 2016.

At NYU, a number of faculty, students, and alumni have turned their attention to this pressing issue, conducting research into rapidly-moving online phenomena and developing tools that will empower journalists, researchers, and the public to better understand how the internet and social media are informing debate, influencing voters, and shaping the political process in the United States and abroad.


Researching Radicalization on YouTube


Between January 2018 and October 2020, “Q says” was mentioned 3,829 times by 52 YouTube channels identified as belonging to alt-right or radical communities. Research into YouTube has always been difficult, but a new tool developed by Erik Van Zummeren, a student in NYU Tisch’s Interactive Telecommunications Program (ITP), launching Friday, October 30, is allowing journalists and researchers to gain new insights into a previously hidden piece of the internet.  

This new tool, called Raditube, works by indexing the automated subtitles of videos from approximately 350 prominent alt-right and far left channels and employs behind-the-scenes infrastructure—including YouTube’s public API—to make the data accessible and searchable. The subtitle text is retrieved via a search engine, allowing users to see how a particular search term is being discussed within the tracked channels, and to access datasets and data visualizations to inform their research. This research has been limited to text-based platforms like Facebook and Twitter in the past, as it’s difficult to parse information from thousands of hours of video content and YouTube’s personal recommendation engine obscures how the platform works. But with this new tool, it would be possible for a researcher to find videos mentioning QAnon-related conspiracy theories, compare their context, and understand how these concepts evolve over time—for example.

“Even though a lot of radicalization takes place on YouTube, it remains a place that is rather difficult to research due to its audiovisual character. Yet, we know that a lot of public debate is taking place on YouTube. Many right-wing figures have gained prominence outside of mainstream media, purely on YouTube, yet that content is moderated by just one company. It’s crucial to facilitate access to that information so we can gain a better understanding of how debate is being shaped through the platform and how algorithms help misinformation spread through various communities,” said Van Zummeren.

Right now, the tracked channels are curated by Van Zummeren alone, which can become challenging when channels are removed for inflammatory content—before popping up in new forms—on a daily basis. He is working to create partnerships in the future to allow researchers to curate their own tracked channels and adapt the platform for their individual needs.

Strings of text against a black background

Screenshot showing search query results from Raditube


Analyzing Twitter Mentions of Presidential Debates and Town Halls

 

Researchers at Courant Institute of Mathematical Sciences' AI and Predictive Analytics Research Group have been studying online activity surrounding this fall's presidential and vice-presidential debates and town halls. Its work has tracked the candidates' interruptions of each other as well as who among them is most popular in Twitter mentions and Google searches.

Researchers found that President Trump was the focus of a higher number of tweets while former Vice President Joseph Biden was the subject of a greater number of Google searches surrounding the first presidential debate, suggesting the debate prompted interest in knowing more about the presidential challenger. They also uncovered a greater increase in positive Twitter mentions of President Trump, relative to Joe Biden, in activity related to the second presidential debate—a finding that sharply contrasted with post-debate polls, which showed Biden winning the exchange by double-digit figures.

Notably, the fly that landed on Vice President Mike Pence’s head during his debate with Senator Kamala Harris—sparking a viral meme and numerous parody accounts—received more mentions on Twitter than did any of the presidential or vice-presidential candidates during either debate.

“While the exchange between the vice-presidential candidates may have produced some memorable moments, they couldn’t compete with the insect they shared, if only briefly, the debate stage with,” says Anasse Bari, a clinical assistant professor in computer science at the Courant Institute and the senior author of the study. “Our results make clear that online activity stemming from live events can be driven by the most inconsequential, and unpredictable, incidents.”

 


Protecting Privacy and User Data


Who is peeking over your shoulder while you work, watch videos, learn, explore, and shop online? That’s what Blacklight, a tool developed by ITP alum Surya Mattu, helps internet users to understand. Plug in any website address and Blacklight will scan the site and reveal the specific user-tracking technologies the site is employing, providing crucial insight into who is accessing your data.

Mattu also published an investigation which revealed how common, free website-building tools offered by ad-tech companies—such as social media and comment plugins—lead to trackers loading on users’ browsers, often without the website operators’ knowledge or disclosure to users. The investigation found that although “website operators may agree to set cookies—small strings of text that identify you—from one outside company...they are not always aware that the code setting those cookies can also load dozens of other trackers along with them, like nesting dolls, each collecting user data.”

Blacklight scanned more than 80,000 of the world’s most popular websites and found more than 5,000 that were identifying users, even if the user had blocked third-party cookies, including more than 12,000 websites using ‘session recording’ to track all user interactions on a page. Some even used ‘key logging’ to capture personal information that users plugged into website forms.

"Ad tracking is pervasive on the internet but it is hard to know how it affects us personally. Our goal with Blacklight is to allow people to see how the ad tracking ecosystem operates through the lens of their own browsing habits," said Mattu.  


Improving the Transparency of Online Political Advertising

 

As tech companies introduce new measures to limit the spread of election misinformation and safeguard their platforms, questions about the nature of online political advertising remain: Who exactly is running political Facebook ads in the United States? And what objectives are those groups trying to achieve with their ad spend? A group of researchers is working to shed light on the answers to these questions and more through the NYU Ad Observatory, a non-partisan, independent project which operates from NYU's Tandon School of Engineering as part of the NYU Online Political Ads Transparency Project.

Graph showing Trump vs. Biden: Facebook political ad spending by week

Trump vs. Biden: Facebook political ad spending by week. Source: Facebook Ad Library

The Ad Observatory is corralling data and developing interactive tools to provide crucial insight into how candidates, super PACs, and dark money groups are using Facebook political ads in the 2020 election to achieve specific political goals. For example, the researchers analyzed data from the Facebook Ad Library and found that since July 1 2020, Donald Trump's campaign has spent $88.2 million on Facebook advertising, compared with Joe Biden's $70 million; the primary objective for both campaigns was to encourage Facebook users to donate.  

The project aims to improve the transparency of online political advertising and archive and make accessible political advertising data, allowing journalists and researchers to better analyze digital campaigns. The project also facilitates research at NYU, including a cybersecurity analysis of vulnerabilities in the Facebook Ad Library.

"This transparency project emerged somewhat opportunistically. Facebook in particular, but also Google and Twitter, had pressure applied from the US government and other governments around the world to become more transparent and work more with outside parties to combat election interference and disinformation around political advertising," said Damon McCoy, assistant professor of Computer Science and Engineering at NYU Tandon, in an interview from 2019.  

"Ideally, this will be effective in training the younger generation to realize it's their responsibility to be able to pick out disinformation campaigns—and if they like them and reshare them, they're potentially harming the society in which they live."