The backbone of the TTS remains the same across multiple languages, consisted of a feature encoder, aligner, decoder module, and speaker encoder for the multi-speaker TTS. The differences are the text normalization and the input embedding model. For instance, English language is an intonation language and also consists of homograph words, thus it requires a grapheme-to-phoneme model and the punctuation mark to improve pronunciation. Chinese and Thai is a tonal language, thus it required a tonal mark. In Thai language, the tonal mark is clearly written in text, with an exception of “คำทับศัพท์” which is an Thai word that originate from other languages. Thai language is not a homograph language, thus, although it is highly complex, the Thai syllable can be one-to-one matching to the pronunciation with some rare exceptions. The “ไม้ยมก” also affects the pronunciation where the first utterance is spoke shorter and than the second utterance. There are a closed set of “คำควบกล้ำ ทร” that pronounce as “ซ” which can be preprocessed using pattern matching. “คำควบไม่แท้” is the combination of the character that discarded the pronunciation of the following character. Thai language does not have a clear word boundary when written in a sentence, causing a difficulty in determining the word boundary.