The Download: Feature Articles
An App That Translates Voice and Sign Language
By Tatiana Polunina and Swapnil Sunil Bhatkar | May 29, 2018
Using NYU IT High Performance Computing’s Brooklyn Research Cluster to Build a Communication Bridge
A team of students from the NYU Tandon School of Engineering recently built a prototype mobile app that translates spoken words into sign language to facilitate communication between deaf and hearing people. The project, called ARSL (Augmented Reality Sign Language), is part of Verizon’s Connected Futures challenge which, in partnership with NYC Media Lab, supports new media and technology projects from universities across New York City.
The Tandon team is led by Zhongheng Li and includes Jacky Chen and Mingfei Huang. Li was inspired by his friend, Fanny, whose parents are deaf. Fanny’s family was experiencing difficulty with effective communication after moving to the United States because there is no universal sign language. To help address this and foster a more unified method of communicating across abilities and languages, Li and his team created an app which aims to empower millions of deaf people across the globe.
To build the prototype, the team leveraged machine learning, augmented reality, computer vision, and cloud computing. They used NYU IT’s Brooklyn Research Cluster (BRC), an OpenStack cluster, as the high-performance cloud computing platform to host their deep-learning application programming interface (API), using both OpenPose and TensorFlow trained image classification models in the cloud. Initially, they wanted to use “Depth Mode” camera features for better recognition, but they quickly realized that not everyone can afford a high-end smartphone model with depth camera features.
Instead, they converted red-green-blue color model (RGB) images into skeleton images using the OpenPose library for greater accuracy. This eliminated the need for a camera with depth mode capability. As a next step, they leveraged the power and flexibility of cloud computing to enhance their recognition model. By using RGB cameras, then processing the information in the cloud, they didn’t have to depend on any particular device or platform, which allowed them to implement their framework on a variety of technologies. Those include Hololens, Microsoft’s wearable sunglass-style holographic computer.
After exploring other options, such as Amazon Web Services Elastic Compute Cloud (EC2), the team decided to use the Brooklyn Research Cluster for this project given its high availability, reasonable cost, greater support, and more resources. As full-time NYU students, they could also take advantage of the BRC’s high performance computing resources, including free-of-charge computing power and storage. The students were provided with three NVIDIA P100 graphics processing units (GPU), the NYU IT HPC team’s guidance and support on using these services effectively, and detailed instructions on certain topics that arose during the prototype development.
Still in its pilot phase, the app can detect and translate a limited number of spoken phrases. As its proof of concept, the app enables a user to book an appointment with NYU Langone using sign interpretations. This demo video illustrates how this promising app works.