The first-ever Irish language newspaper to be printed anywhere in the world was born not in Ireland, where spoken Irish had declined under British colonial rule, but—like so many expressions of diasporic pride—right here in linguistically diverse New York City. Some 800 languages are spoken in the five boroughs today, with some neighborhoods—especially in Brooklyn and Queens—housing rare enclaves of endangered languages that aren’t spoken much anymore even back in their home communities.
Already a haven for minority languages by the late 19th century, New York was home to an estimated 200,000 to 250,000 native Irish speakers in 1881, when the Irish paper An Gaodhal got its start. Its creator, Micheál Ó Lócháin, a former newspaper reporter from County Galway who found work in the city as a school teacher and then a real-estate agent, was among the 1.5 million children and adults who had left Ireland for a new start in America in waves of migration following the Great Famine of the 1840s and ’50s.
Whereas previous Irish-American newspapers beginning in the mid-19th century may have included a few lines of Irish here or there, An Gaodhal was the first fully bilingual attempt, with full sections of Irish text rendered in Gaelic type (known as the sean-chló) interspersed with different content in English. Often working out of his home at 267 Kosciuszko Street in Bedford-Stuyvesant, Ó Lócháin, a passionate cheerleader for his mother tongue, published the paper up until his death in 1899.
Today, issues of An Gaodhal survive in only about 20 libraries worldwide, with one near-complete run housed in the Hardiman library at the University of Galway. For decades it has remained what researchers call a “hidden collection”—a valuable repository of, in this case, Irish songs, poetry, and folklore, as well as local Brooklyn history—that is inaccessible to most scholars worldwide.
Nicholas Wolf, a data services and research data management librarian at the NYU Libraries and affiliated Glucksman Ireland House faculty member, is working to change that. Wolf became interested in the history of Irish speakers as an understudied subject in the course of his graduate research on the country’s 19th-century social and cultural history, a pursuit that resulted in his first book on the subject, An Irish-Speaking Island (2014), and 15 years of studying the topic. With Galway-based NYU research scholar Deirdre Ní Chonghaile (a native Irish speaker) and colleagues, he is leading an effort to build a fully digital edition of the newspaper with searchable text and metadata.
In the first phase of the project, supported with Irish government funds and by Glucksman Ireland House, the team created digital images of all of the newspaper’s pages. Now, with funding from the Robert David Lion Gardiner Foundation, the Irish Institute of New York, Glucksman Ireland House, and the University of Galway, they’re embarking on a new challenge: training machines to read the Gaelic type, an AI feat that would make the entire digital edition easily searchable.
“When you have a media object like this in a minority language—something spoken by a small number of people in the world, as Irish is now—digitization isn’t always going to be at the top of the priority list for vendors, and libraries, because the readership today, even among scholars, is going to be smaller,” Wolf says. That’s why the dedicated investment is crucial: Without it, this rare trove of Irish Americana risks being overlooked and forgotten. But once the digital transcriptions are complete, the files can potentially be integrated into existing research databases and accessed from anywhere.
NYU News spoke with Wolf about Irish New York, methods for teaching 21st-century computers to read historic fonts, and librarians’ passion for data preservation.
What was the purpose of An Gaodhal, and who was it trying to reach?
It’s part of what are often called the “ethnic newspapers” of the time. There were a lot of them, in languages other than English. The notion was “we have our community here in America, and we want to talk to that community.” So an Irish-American community audience is first and foremost what Ó Lócháin was envisioning. But there had been other Irish-American newspapers before, and this one is actually even more specialized within that. It was really interested in reaching a subset of that community who were interested in perpetuating or preserving the Irish language for the long term. This idea was just picking up when the newspaper was founded, in the 1880s—there was a growing community of people going to Irish language classes at night, or founding Irish language societies with monthly meetings in places like Boston and New York and New Jersey. So the attendees at those types of things are who the paper is really trying to reach. Ó Lócháin himself taught an Irish language class around this time. He was a bit of a fanatic about it.
What is Irish type? And why does it make digitization more difficult?
The closest analogy that you might be familiar with would be the old Gothic fonts used in printing in German up until the early 20th century. In the case of Irish, the sean-chló, as it’s called, or the old type that was used, was derived from the manuscripts that Irish was written in before that, and so the roots are in handwriting. There’s some correspondence between the characters in these fonts and what you would see in contemporary Irish, so it’s not as though a completely different system was used. But there are some aspects of the Irish language that don’t correspond directly with a typical Romance language system or English language system, and those sounds are conveyed through characters that aren’t present in typical contemporary Latin-character based alphabets. And so to do that in print, writers in Irish had to make slight adjustments to Latin-alphabet based characters to convey the additional letters needed for Irish. It’s not impossible to train a machine to learn this, but it does take a little more work.
So how do you train computers to read this font?
We have to produce the training data, which is where the extra work comes in. So we’ll look at a page and transcribe it. Once we have about 70 pages transcribed, we can integrate that into an optical character recognition machine learning process, and try to gauge its success in reading pages it hasn’t seen before. What typically happens is that about 2 to 5% of the transcribed pages are held back, and at the end of every cycle the computer will try to adjust its knowledge or ability to recognize the characters by checking against our validation set. The bottom line is we have to do those transcriptions to give it what is called a “ground truth” to work from—otherwise it would have no way of knowing if it was getting the answer right. Printed text recognition is one of the oldest machine learning challenges, so this is a typical process that’s been around quite a while. But it has not been done extensively with this particular font for this language compared to, say, the English language.
Who will benefit from being able to access the digital, searchable An Gaodhal files, once the project is complete?
There are definitely a few categories of folks who would benefit. On the one hand, here’s a newspaper that gives us a window into the history of the Irish language, Irish song, or Irish folklore. That’s the media history component for academics—we’re trying to understand the history of certain minority language media. And then linguists might want to look up a certain word in Irish and see all the historical contexts in which it appears, which can be more precise than a modern dictionary. But certainly this could also be useful to the general community who might be looking into genealogy, or even the local history of New York City. If you look up an address in New York, and it happens to be mentioned in this newspaper, you get a sense of what was happening at that location, whether that’s a place where a Celtic society met or the address of a person who sent in a snippet of an Irish language poem. And for genealogy, someone could search for a great-grandparent and see if the name comes up, giving a sense of where they were at a given point in time and their touchpoints with the Irish immigrant community.