Nature of Malayalam spelling mistakes

Malayalam uses an alphasyllabary writing system. Each letter you write corresponds to the grapheme representation of a phoneme. In broader sense Malayalam can be considered as a language with one to one grapheme to phoneme correspondence. Where as in English and similar languages, letters might represent a variety of sounds, or the same sounds can be written in different ways. The way a person learns writing a language strongly depends on the writing system.

In Malayalam, since there is one and only one set of characters that can correspond to a syllable, the confusion of letters does not happen. For example, in English, Education, Ship, Machine, Mission all has sh sound [สƒ]. So a person can mix up these combinations. But in Malayalam, if it is sh sound [สƒ], then it is always เดท.

Because of this, the spelling mistakes that is resulted by four edit operations(deletion, insertion, change, or transposition) may not be an accurate classification of errors in Malayalam. Let us try to classify and analyse the spelling mistake patterns of Malayalam.

  1. Phonetic approximation: The 1:1 grapheme to phoneme correspondence is the theory. But because of this the inaccurate utterance of syllables will cause incorrect spellings. For example, เดฌเต‚เดฎเดฟ is a relaxed way of reading for เดญเต‚เดฎเดฟ since it is relatively effortless. Since the relaxed way of pronunciation is normal, sometimes people think that they are writing in wrong way and will try to correct it unnecessarily เดชเต€เดขเดจเด‚->เดชเต€เดกเดจเด‚ is one such example.Consonants: Each consonant in Malayalam has aspirated, unaspirated, voiced and unvoiced variants. Between them, it is very usual to get mixed upAspirated and Unaspirated mix-up: Aspirated consonant can be mistakenly written as Unaspirated consonant. For Example, เดง -> เดฆ, เดข -> เดก . Similarly Unaspirated consonant can be mistakenly written as aspirated consonant โ€“ Example, เดฆ ->เดง, เดก ->เดข.Voiced and Voiceless mix-up. Voiced consonants like เด—, เด˜ can be mistakenly written as voiceless forms เด•, เด–. And vice versa.Gemination of consonants is often relaxed or skipped in the speech, hence it appear in writing too. Gemination in Malayalam script is by combining two consonants using virama. เดจเต€เดฒเดคเดพเดฎเดฐ/เดจเต€เดฒเดคเตเดคเดพเดฎเดฐ is an example for this kind of mistakes. There are a few debatable words too, like เดธเตเดตเตผเดฃเด‚/เดธเตเดตเตผเดฃเตเดฃเด‚, เดชเดพเตผเดŸเดฟ/เดชเดพเตผเดŸเตเดŸเดฟ. Another way of consonant stress indication is by using Unaspirated Consonant + Virama + Aspirated Consonant. เด…เดฆเตเดงเตเดฏเดพเดชเด•เตป/เด…เดงเตเดฏเดพเดชเด•เตป, เดคเต€เตผเดฅเด‚/เดคเต€เตผเดคเตเดฅเด‚, เดตเดฟเดกเตเดกเดฟ/เดตเดฟเดกเตเดขเดฟ pairs are examples.Hard, Soft variants confusion. Examples: เดถ/เดท, เดฐ/เดฑ, เดฒ/เดณVowels: Vowel elongation or shortening, gliding vowels and semi vowels are the cause for vowel related mistakes in writing.Each vowel in Malayalam can be a short vowel or long vowel. Local dialect can confuse people to use one for the other. เดšเดฟเดฒเดชเตเดชเตŠเตพ/เดšเดฟเดฒเดชเตเดชเต‹เตพ is one example. Since many input tools place the short and long vowels forms with very close keystrokes, it is possible to cause errors. In Inscript keyboard, short and long vowels are in normal and shift position. In transliteration based input methods, long vowel is often typed by repeated keys(i, ii for เดฟ, เต€). The vowel เด‹ is close to เดฑเดฟ or เดฑเต in pronunciation. Example: เด‹เดคเต/เดฑเดฟเดคเต. The vowel sign of เด‹ while appearing with a consonant is close to เตเดฐ. Example เด—เตƒเดนเด‚/เด—เตเดฐเดนเด‚. เดนเตƒเดฆเดฏเด‚/เดนเตเดฐเตเดฆเดฏเด‚. Gliding vowels เด, เด” get confused with its constituent vowels. เด•เตˆ/เด•เด‡/เด•เดฏเต, เด”/เด…เด‰/เด…เดตเต are example.In Malayalam, there is a tendency to use เดŽ instead of เด‡, since the reduced effort. Examples: เดšเดฟเดฒเดตเต/เดšเต†เดฒเดตเต, เด‡เดฒ/เดŽเดฒ, เดคเดฟเดฐเดฏเตเด•/เดคเต†เดฐเดฏเตเด•. Due to wide usage of these variants, it is sometimes very difficult to say one word is wrong. See the discussion about the โ€˜Standard Malayalamโ€™ at the end of this essay.Chillus: Chillus are pure consonants. A consonant + virama sequence sometimes has no phonetic difference from a chillu. For example, เด•เดฒเตเดชเดจ/เด•เตฝเดชเดจ, เดจเดฟเตฝเด•เตเด•เตเด•/เดจเดฟเดฒเตเด•เตเด•เตเด• combinations. The chillu เตผ is sometimes confused with เด‹ sign. Examples are: เดชเตเดฐเดตเตผเดคเตเดคเดฟ/เดชเตเดฐเดตเตƒเดคเตเดคเดฟ. The chillu form of เดฎ โ€“ เด‚ can appear are as anuswara or ma+virama forms. Examples: เดชเด‚เดช, เดชเดฎเตเดช. But it is not rare to see เดชเด‚เดฎเตเดช for this. Sometimes, the anuswara get confused with เดจเต, and เดชเดฎเตเดช becomes เดชเดจเตเดช. There were a few buggy fonts that used เดจเต+เดช for เดฎเตเดช ligature too.

  2. Weak Phoneme-Grapheme correspondence: Due to historic or evolutionary nature of the script, Malayalam also has some phonemes which has a weak relationship with the graphemes.เดนเตเดฎ/ เดฎเตเดฎ as in เดฌเตเดฐเดนเตเดฎเด‚/เดฌเตเดฐเดฎเตเดฎเด‚, เดจเตเดฆ/เดจเตเดจ as in เดจเดจเตเดฆเดฟ/เดจเดจเตเดจเดฟ, เดนเตเดจ/เดจเตเดจ as in เดšเดฟเดนเตเดจเด‚/เดšเดฟเดจเตเดจเด‚ are some examples where what you pronounce is not exactly same as what you write.เดฑเตเดฑ, เดจเตเดฑ โ€“ These two highly used conjuncts heavily deviate from the letters and pronunciation. While writing using pen, people donโ€™t make much mistakes since they just draw the shape of these ligatures, but while typing, one need to know the exact key sequence and they get confused. Common mistakes for these conjuncts are เดฑเดฑ, เตปเดฑ, เตปเดฑเตเดฑ , เตปเดฑเดฑ

  3. Visual similarity: While using visual input methods such as handwriting based or some onscreen keyboards, either the users or the input tool makes mistakes due to visual similarityเตƒ, เตเดฏ often get confused.เดœเตเดž, เดžเตเดœ is one very common sequence where people are confused. เด†เดฆเดฐเดพเดœเตเดžเดฒเดฟ/เด†เดฆเดฐเดพเดžเตเดœเดฒเดฟ.เดคเตเดธ, เด is another combinationThe handwriting based input methods like Google handwriting tool is known for recognizing anuswara เด‚ as zero, English o, O etc.When people donโ€™t know how to insert visarga เดƒ, and since there is a very similar key in keyboard- colon : they use it. Example: เดฆเตเดƒเด–เด‚/เดฆเต:เด–เด‚เดณเตเดณ, the geminated form of เดณ, is very similar to two adjacent เดณ. This kind of mistakes are very frequent among people whi studied Malayalam inputting informally. Two adjacent เดฑ, is another mistake for เดฑเตเดฑ,The informal, trial-and-error based Malayalam inputting training also introduced some other mistakes such as using open parenthesis โ€˜(โ€˜ for เตเดฐ, closing parenthesis โ€˜)โ€™ for เดพ sign.

  4. Ambiguity due to regional dialect: A good example for this is insertion of เดฏเต in verbs. เด•เตเดฑเด•เตเด•เตเด•/เด•เตเดฑเดฏเตเด•เตเด•เตเด•, เดšเดฟเดฐเดฟเด•เตเด•เตเด•/เดšเดฟเดฐเดฟเดฏเตเด•เตเด•เตเด•, Also in nominal inflections: เดชเต‚เดšเตเดšเดฏเตเด•เตเด•เต/เดชเต‚เดšเตเดšเด•เตเด•เต. Usuage of Samvruthokaram to distinguish between a pure consonant and stressed consonant at the end of word is a highly debated topic. For example, เด…เดตเดจเต/เด…เดตเดจเตเต/เด…เดตเดจเต. All these forms are common, even though the usage of เดจเตเต is less after the script reformation. But since script reformation was not an absolute transformation, it still exist in usage

  5. Spaces: Malayalam is an agglutinative language. Words can be agglutinated, but nothing prevents people to put space and write in simple words. But this should be done carefully since it can alter the meaning. An example is โ€œเด†เดจ เดชเตเดฑเดคเตเดคเต เด•เดฏเดฑเดฟโ€, เด†เดจเดชเตเดชเตเดฑเดคเตเดคเต เด•เดฏเดฑเดฟโ€, โ€œเด†เดจเดชเตเดชเตเดฑเดคเตเดคเตเด•เดฏเดฑเดฟโ€, โ€œเด†เดจเดชเตเดชเตเดฑเดคเตเดคเต เด•เดฏเดฑเดฟโ€. Another example: โ€œเดฎเดฒเดฏเดพเดณ เดญเดพเดทโ€, โ€œเดฎเดฒเดฏเดพเดณเดญเดพเดทโ€ โ€“ Here, there is no valid word โ€œเดฎเดฒเดฏเดพเดณโ€. The anuswara at the end get deleted only when it joins with เดญเดพเดท as adjective. A morphology analyser can correctly parse โ€œเดฎเดฒเดฏเดพเดณเดญเดพเดทโ€ as เดฎเดฒเดฏเดพเดณเด‚<proper-noun><adjective>เดญเดพเดท<noun>. But since language already broke this rule and many people are liberally using space, a spellchecker would need to handle this cases.

  6. Slip of Finger: Accidental insertions or omissions of key presses is the common reason for spelling mistakes. For alphabetic language, mostly this type of errors are addressed. For Malayalam also, this type of accidental slip of finger can happen. For Latin based languages, we can make some analysis since we know a QWERTY keyboard layout and do optimized checks for this kind of issues. Since Malayalam will use another level of mapping on top of QWERTY for inputting(inscript, phonetic, transliteration), it is not easy to analyse this errors. So, in general, we can expect random characters or omission of some characters in the query word. An accidental space insertion has the challenge that it will split the word to two words and if the spellchecking is done by one word at a time, we will miss it.

I must add that the above classification is not based on a systematic study of any test data that I can share. Ideally, this classification should done with real sample of Malayalam written on paper and computer. It should be then manually checked for spelling mistakes, list down the mistakes and analyse the patterns. This exercise would be very beneficial for spellcheck research. In my case, even since I released my word list based spellchecker, noticing spelling errors in internet(social media, mainly) has been my obsession. Sometimes I also tried to point out spelling mistakes to authors and that did not give much pleasant experience to me. The above list is based on my observation from such patterns.

The common Malayalam spelling mistakes and confusables were presented in great depth by Renowned linguist and author Panmana Ramachandran Nair in his books โ€˜เดคเต†เดฑเตเดฑเดฟเดฒเตเดฒเดพเดคเตเดค เดฎเดฒเดฏเดพเดณเด‚โ€™, โ€˜เดคเต†เดฑเตเดฑเตเด‚ เดถเดฐเดฟเดฏเตเด‚โ€™, โ€˜เดถเตเดฆเตเดง เดฎเดฒเดฏเดพเดณเด‚โ€™ and โ€˜เดจเดฒเตเดฒ เดฎเดฒเดฏเดพเดณเด‚โ€™.

Last updated