{"id":70942,"date":"2024-05-17T10:39:56","date_gmt":"2024-05-17T14:39:56","guid":{"rendered":"https:\/\/news.samsung.com\/us\/?p=70942"},"modified":"2024-06-13T14:53:16","modified_gmt":"2024-06-13T18:53:16","slug":"the-learning-curve-part-2-how-to-build-an-ai-for-diverse-dialects","status":"publish","type":"post","link":"https:\/\/news.samsung.com\/us\/how-to-build-ai-for-diverse-arabic-dialects-samsung-learning-curve-part-2\/","title":{"rendered":"The Learning Curve, Part 2: How to Build an AI for Diverse Dialects"},"content":{"rendered":"<p><a href=\"https:\/\/www.samsung.com\/us\/galaxy-ai\/\" target=\"_blank\" rel=\"noopener\">Galaxy AI<\/a><sup><a href=\"#_ftn1\" name=\"_ftnref1\">1<\/a><\/sup> now supports <a href=\"https:\/\/news.samsung.com\/us\/samsung-galaxy-ai-now-supports-more-languages-latest-update\/\" target=\"_blank\" rel=\"noopener\">16 languages<\/a>, helping more people to lower language barriers with real-time and on-device translation. Samsung opened the door to a new era of mobile AI, so we are<a href=\"https:\/\/news.samsung.com\/us\/tag\/the-learning-curve\/\" target=\"_blank\" rel=\"noopener\"> visiting Samsung Research centers all over the world<\/a> to learn how <a href=\"https:\/\/www.samsung.com\/us\/galaxy-ai\/\" target=\"_blank\" rel=\"noopener\">Galaxy AI<\/a> came to life and what it took to overcome the challenges of <a href=\"https:\/\/news.samsung.com\/us\/tag\/artificial-intelligence-ai\/\" target=\"_blank\" rel=\"noopener\">AI development<\/a>. While part one of the series examined the task of determining what data is needed, this installment looks at the complex task of accounting for dialects.<\/p>\n<p>Teaching a language to an AI model is a complex process, but what if it isn\u2019t a singular language, but a collection of diverse dialects? That was the challenge faced by the team at Samsung R&amp;D Institute Jordan (SRJO). Although Arabic was added to Galaxy AI features such as <a href=\"https:\/\/www.samsung.com\/us\/galaxy-ai\/#features\" target=\"_blank\" rel=\"noopener\">Live Translate<\/a><sup><a href=\"#_ftn2\" name=\"_ftnref2\">2<\/a><\/sup>, the team had to cater to the dialects that span the Middle East, with each varying in pronunciation, vocabulary, and grammar.<\/p>\n<div class=\"embedded product-module\">\n\t\t<a id=\"product-module-0\" href=\"https:\/\/www.samsung.com\/us\/smartphones\/galaxy-s26-ultra\/buy\/\" title=\"Galaxy S26 Series\" class=\"snr-article_product-image\" target=\"_blank\">\n\t\t<img decoding=\"async\" src=\"https:\/\/img.us.news.samsung.com\/us\/wp-content\/uploads\/2024\/01\/01202144\/2026-Product-Banner-Refresh-1.png\" alt=\"Galaxy S26 Series\">\n\t\t<\/a>\n\t<\/div>\n<p>Arabic is one of the top six most widely spoken languages in the world, used daily by more than 400 million people<sup><a href=\"#_ftn3\" name=\"_ftnref3\">3<\/a><\/sup>. The language is categorized into two forms: Fus&#8217;ha (Modern Standard Arabic) and Ammiya (the dialects of Arabic). Fus&#8217;ha is typically used in public and official events, as well as in news broadcasts, while Ammiya is more commonly spoken on the streets. Over 20 countries use Arabic, and there are currently around 30 dialects in the region.<\/p>\n<h5>Unwritten Rules<\/h5>\n<p>Recognizing the variation presented by these dialects, the team at SRJO employed a range of techniques to discern and process the unique linguistic features inherent in each. This approach was crucial in ensuring that Galaxy AI could understand and respond in a way that accurately reflects the regional nuances.<\/p>\n<p>\u201cUnlike other languages, the pronunciation of the object in Arabic varies depending on the subject and verb in the sentence,\u201d says Mohammad Hamdan, project leader of the Arabic language development team. \u201cOur goal is to develop a model that understands all these dialects and can answer in standard Arabic.\u201d<\/p>\n<p>TTS is the component of Galaxy AI\u2019s Live Translate feature that lets users interact with speakers of different languages by <a href=\"https:\/\/news.samsung.com\/us\/samsung-new-era-galaxy-ai-coming-here-is-glimpse\" target=\"_blank\" rel=\"noopener\">translating spoken words into written text<\/a>, and then vocally reproducing them. The TTS team faced a unique challenge, caused by the quirk of working with Arabic.<\/p>\n<p><img loading=\"lazy\" decoding=\"async\" class=\"size-full wp-image-70945\" src=\"https:\/\/img.us.news.samsung.com\/us\/wp-content\/uploads\/2024\/05\/15161546\/samsung-jordan-research-srjo-team.jpg\" alt=\"Samsung R&amp;D Institute Jordan (SRJO) team members standing on steps\" width=\"582\" height=\"389\" srcset=\"https:\/\/img.us.news.samsung.com\/us\/wp-content\/uploads\/2024\/05\/15161546\/samsung-jordan-research-srjo-team.jpg 582w, https:\/\/img.us.news.samsung.com\/us\/wp-content\/uploads\/2024\/05\/15161546\/samsung-jordan-research-srjo-team-268x178.jpg 268w, https:\/\/img.us.news.samsung.com\/us\/wp-content\/uploads\/2024\/05\/15161546\/samsung-jordan-research-srjo-team-437x292.jpg 437w\" sizes=\"auto, (max-width: 582px) 100vw, 582px\" \/><\/p>\n<p>Arabic uses diacritics, which are guides for the pronunciation of words in some contexts, such as religious texts, poetry and books for language learners. Diacritics are widely understood by native speakers but absent in everyday writing. This makes it difficult for a machine to convert raw text into phonemes, the basic units of sound that are the building blocks of speech.<\/p>\n<p>\u201cThere is a shortage of high-quality and reliable datasets that accurately represent how diacritics are correctly used,\u201d explains Haweeleh. \u201cWe had to design a neural model that can predict and restore those missing diacritics with high accuracy.\u201d<\/p>\n<p>Neural models work similarly to human brains. To predict diacritics, a model needs to study lots of Arabic text, learn the language\u2019s rules and understand how words are used in different contexts. For instance, the pronunciation of a word can vary greatly depending on the action or gender it describes. Extensive training from the team was the key to enhancing the Arabic TTS model\u2019s accuracy.<\/p>\n<h5><strong>Enhancing Understanding<\/strong><\/h5>\n<p>The SRJO team also had to collect diverse audio recordings of the dialects from various sources, which had to be transcribed, focusing on unique sounds, words and phrases. \u201cWe assembled a team of native speakers in the dialects who were well-versed in the nuances and variations,\u201d says Ayah Hasan, whose team was responsible for database creation. \u201cThey listened to the recordings and manually converted the spoken words into text.\u201d<\/p>\n<p>This work was crucial for enhancing the Automatic Speech Recognition (ASR) process so that <a href=\"https:\/\/www.samsung.com\/us\/galaxy-ai\/\" target=\"_blank\" rel=\"noopener\">Galaxy AI<\/a> could handle the rich tapestry of Arabic dialects. ASR is pivotal in enabling Galaxy AI\u2019s real-time understanding and response capabilities.<\/p>\n<p><img loading=\"lazy\" decoding=\"async\" class=\"size-full wp-image-70941\" src=\"https:\/\/img.us.news.samsung.com\/us\/wp-content\/uploads\/2024\/05\/15155941\/samsung-mohammad-haweeleh.jpg\" alt=\"Mohammad Haweeleh, Arabic TTS Lead and team\" width=\"592\" height=\"396\" srcset=\"https:\/\/img.us.news.samsung.com\/us\/wp-content\/uploads\/2024\/05\/15155941\/samsung-mohammad-haweeleh.jpg 592w, https:\/\/img.us.news.samsung.com\/us\/wp-content\/uploads\/2024\/05\/15155941\/samsung-mohammad-haweeleh-268x178.jpg 268w, https:\/\/img.us.news.samsung.com\/us\/wp-content\/uploads\/2024\/05\/15155941\/samsung-mohammad-haweeleh-437x292.jpg 437w\" sizes=\"auto, (max-width: 592px) 100vw, 592px\" \/><\/p>\n<p>\u201cBuilding an ASR system that supports multiple dialects in a single model is a complex undertaking,\u201d says Mohammad Hamdan, ASR lead for the project. \u201cIt demands a thorough understanding of the language\u2019s intricacies, careful data selection and advanced modeling techniques.\u201d<\/p>\n<h5>The Culmination of Innovation<\/h5>\n<p>After months of planning, building and testing, the team was ready to release Arabic as a language option for <a href=\"https:\/\/www.samsung.com\/us\/galaxy-ai\/\" target=\"_blank\" rel=\"noopener\">Galaxy AI<\/a>, enabling many more people to communicate across borders. This single team has made Galaxy AI services accessible to Arabic speakers, lowering the language and cultural barriers between them and people all over the world. In doing so, they have established new best practices that can be rolled out globally. This success is only the beginning: the team continues to refine their models and enhance the quality of Galaxy AI\u2019s language capabilities.<\/p>\n<h5>How to Activate Arabic Live Translate<\/h5>\n<p><iframe loading=\"lazy\" title=\"How to activate Arabic Live Translate | Samsung\" width=\"500\" height=\"281\" src=\"https:\/\/www.youtube.com\/embed\/KOU1HXipelo?feature=oembed\" frameborder=\"0\" allow=\"accelerometer; autoplay; clipboard-write; encrypted-media; gyroscope; picture-in-picture; web-share\" referrerpolicy=\"strict-origin-when-cross-origin\" allowfullscreen><\/iframe><\/p>\n<p>Arabic is just one part of the languages and dialects newly supported by <a href=\"https:\/\/www.samsung.com\/us\/galaxy-ai\/\" target=\"_blank\" rel=\"noopener\">Galaxy AI<\/a> and available for download from the Settings app. Galaxy AI\u2019s language features such as Live Translate and Interpreter are available on <a href=\"https:\/\/www.samsung.com\/us\/mobile\" target=\"_blank\" rel=\"noopener\">Galaxy devices<\/a> running <a href=\"https:\/\/news.samsung.com\/us\/millions-have-tried-samsung-galaxy-ai-now-available-to-even-more-users\/\" target=\"_blank\" rel=\"noopener\">Samsung\u2019s One UI 6.1 update<\/a>.<sup><a href=\"#_ftn4 name=\">4<\/a><\/sup><\/p>\n<p>In the next episode, we go to Vietnam to see how the team makes language data better. Plus, what does it take to train an effective AI model?<\/p>\n\n\t\t<\/div>\n\t\t<\/div>\n\t\t<div class=\"embedded recommended recommended-post\">\n\t\t\t<div class=\"embedded-inner\">\n\t\t\t\t<div class=\"card-badge\">\n\t\t\t\t\t<p>Recommended News<\/p>\n\t\t\t\t<\/div>\n\t\t\t\t<div class=\"recommended-card\">\n\t\t\t\t\t<a href=\"https:\/\/news.samsung.com\/us\/samsung-teaching-ai-new-languages-begins-with-data-learning-curve-part-1\/\" class=\"recommended-news\">\n\t\t\t\t\t\t<img decoding=\"async\" src=\"https:\/\/img.us.news.samsung.com\/us\/wp-content\/uploads\/2024\/05\/14154208\/samsung-learning-curve-part-1-main-268x178.png\" alt=\"The Learning Curve, Part 1: Why Teaching AI New Languages Begins with Data\">\n\t\t\t\t\t\t<div class=\"details\">\n\t\t\t\t\t\t\t<div class=\"details-inner\">\n\t\t\t\t\t\t\t\t<p class=\"post-category\">Smartphones<\/p>\n\t\t\t\t\t\t\t\t<h4>The Learning Curve, Part 1: Why Teaching AI New Languages Begins with Data<\/h4>\n\t\t\t\t\t\t\t<\/div>\n\t\t\t\t\t\t<\/div>\n\t\t\t\t\t<\/a>\n\t\t\t\t<\/div>\n\t\t\t<\/div>\n\t\t<\/div>\n\t\t<div class=\"article-content\">\n\t\t<div class=\"article-body\">\n<h6><a href=\"#_ftnref1\" name=\"_ftn1\"><sup>1<\/sup><\/a> Galaxy AI features by Samsung will be provided for free until the end of 2025 on supported Samsung Galaxy devices.<\/h6>\n<h6><a href=\"#_ftnref2\" name=\"_ftn2\"><sup>2<\/sup><\/a> Samsung account log-in required. Calls must be made using the native Samsung phone app. Samsung does not make any promises, assurances or guarantees as to the accuracy, completeness or reliability of the output provided by AI features.<\/h6>\n<h6><a href=\"#_ftnref3\" name=\"_ftn3\"><sup>3<\/sup><\/a> Unesco, <a href=\"https:\/\/www.unesco.org\/en\/world-arabic-language-day\" target=\"_blank\" rel=\"noopener\">World Arabic Language Day 2023<\/a><\/h6>\n<h6><a href=\"#_ftnref4\" name=\"_ftn4\"><sup>4<\/sup><\/a>One UI 6.1 was first released on Galaxy S24 series devices with a wider roll out to other Galaxy devices including S23 series, S23 FE, S22 series, S21 series<em>.<\/em><\/h6>\n","protected":false},"excerpt":{"rendered":"<p>Tales from the Middle East on the complexity of creating AI tools for Arabic, a language with many facets<\/p>\n","protected":false},"author":84,"featured_media":70940,"comment_status":"closed","ping_status":"closed","sticky":false,"template":"","format":"standard","meta":{"_acf_changed":false,"footnotes":""},"categories":[29720,29721],"tags":[953,25940,30339,16225,30338,40,30424,30425],"blue-badge":[],"class_list":["post-70942","post","type-post","status-publish","format-standard","has-post-thumbnail","hentry","category-product-mobile","category-product-mobile-smartphones","tag-artificial-intelligence-ai","tag-galaxy","tag-galaxy-ai","tag-innovation","tag-live-translate","tag-mobile","tag-samsung-rd","tag-the-learning-curve"],"acf":{"turn_off_retargeting":false},"fimg_mobile_url":"https:\/\/img.us.news.samsung.com\/us\/wp-content\/uploads\/2024\/05\/16094858\/samsung-ayah-hasan-jordan-research-200x200.jpg","fimg_url":"https:\/\/img.us.news.samsung.com\/us\/wp-content\/uploads\/2024\/05\/16094858\/samsung-ayah-hasan-jordan-research-432x286.jpg","primary_category":{"term_id":29721,"name":"Smartphones","slug":"product-mobile-smartphones","term_group":0,"term_taxonomy_id":29721,"taxonomy":"category","description":"","parent":29720,"count":401,"filter":"raw","term_link":"https:\/\/news.samsung.com\/us\/category\/product\/product-mobile\/product-mobile-smartphones\/","term_path":"product\/product-mobile\/product-mobile-smartphones"},"badge":false,"_links":{"self":[{"href":"https:\/\/news.samsung.com\/us\/wp-json\/wp\/v2\/posts\/70942","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/news.samsung.com\/us\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/news.samsung.com\/us\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/news.samsung.com\/us\/wp-json\/wp\/v2\/users\/84"}],"replies":[{"embeddable":true,"href":"https:\/\/news.samsung.com\/us\/wp-json\/wp\/v2\/comments?post=70942"}],"version-history":[{"count":0,"href":"https:\/\/news.samsung.com\/us\/wp-json\/wp\/v2\/posts\/70942\/revisions"}],"wp:featuredmedia":[{"embeddable":true,"href":"https:\/\/news.samsung.com\/us\/wp-json\/wp\/v2\/media\/70940"}],"wp:attachment":[{"href":"https:\/\/news.samsung.com\/us\/wp-json\/wp\/v2\/media?parent=70942"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/news.samsung.com\/us\/wp-json\/wp\/v2\/categories?post=70942"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/news.samsung.com\/us\/wp-json\/wp\/v2\/tags?post=70942"},{"taxonomy":"blue-badge","embeddable":true,"href":"https:\/\/news.samsung.com\/us\/wp-json\/wp\/v2\/blue-badge?post=70942"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}