Teaching Computers ʻŌlelo Hawaiʻi Prompts Debate on Data Sovereignty
Efforts are underway to teach computers to understand ʻōlelo Hawaiʻi (Hawaiian language). Using artificial intelligence technology could be a game changer in advancing the use of Hawaiian language. But some worry about tech companies and control – an area of concern they call “data sovereignty.”
Hawaiian language researchers have compiled around 400 audio recordings of mānaleo or native speakers from the 1970s and 80s. But transcribing those recordings has been a labor intensive task, says ʻŌiwi Parker Jones, a research fellow focusing on artificial intelligence at Oxford University.
“He hana nui kēlā ʻeā? Ke kikokiko ʻana, hoʻolohe, he mau hola kanaka no ka hoʻomākaukau. A inā hiki ke hana ke kamepiula. Kohu mea kākau ke kamepiula i kekahi kāmua. ʻAʻole paha pololei loa, akā ʻo ka maikaʻi pololei loa. A he kōkua kēlā i ka hoʻowikiwiki.”
Parker Jones says if we can automate that transcription process, the computer can generate a rough draft. It won’t be completely accurate, but it would speed up the process. He’s working on voice-to-text or speech recognition technology to enhance access to these recordings.
He says the challenge when it comes to indigenous languages like Hawaiian, which are spoken by a relatively smaller population than say English, is that developing that kind of technology requires tens of thousands of hours of transcribed audio, if not more.
“Pono he mau hola o ka leo i kākau mua ʻia. ʻO ka leo a me ke kikokiko, mau hāneli hola paha i makemake ʻia, a loaʻa kekahi mea maikaʻi. Ma ka ʻōlelo Pelekāne hoʻohana ʻia ma ʻō aku o kēlā hāneli kaukani hola paha.”
Parker Jones says he could create a program with several hundred hours of audio, but he estimates similar English programs use at least hundreds of thousands of hours of audio.
Keoni Mahelona, Chief Technology Officer for Te Hiku Media in Aotearoa (New Zealand), has been a vocal proponent of indigenous communities taking advantage of AI and machine learning technology.
“You can do all sorts of crazy stuff, natural language processing, you can look for words that maybe donʻt show up often, idiomatic expressions, you can do automatic parts of speech tagging,” says Mahelona, “There’s just boundless opportunities.”
Career opportunities, economic opportunities, and opportunities to connect with what it means to be Hawaiian or Māori in the case of Te Hiku.
“What weʻre trying to do is enhance access to te reo maori, te ao maori – the knowledge that’s embedded in the ʻōlelo,” says Mahelona, “The ʻōlelo itself was quite different. What weʻre trying to do is to decolonize te reo maori, decolonize the sound, decolonize the actual language, and decolonize the digital space.”
Te Hiku Media has positioned itself as a leader in indigenous AI, developing the first automatic speech recognition technology for te reo Māori, the indigenous language of Aotearoa.
One of the biggest concerns moving forward is that indigenous communities whether Māori, Hawaiian or otherwise maintain control over the data involved in creating these programs, and that the benefits flow back to the community. A concept known as data sovereignty.
“ʻAʻole kākou makemake e hāʻawi aku a piha i ka pakeke o lākou ala. Mamake mua kākou i kekahi o kēlā. Makemake kākou e hai ʻia ka poʻe Hawaiʻi no kēlā hana a loaʻa ke kālā i ko kākou kaiaulu.”
Parker Jones says he doesn’t want to just give this data away so tech companies can profit. He wants Hawaiians to be hired to run these programs, ensuring profits return to the community.
HPR's Kuʻuwehi Hiraishi also shared this story on HPR's The Conversation on Sept. 15, 2021. Listen to that interview below.