Syllable
A syllable is a unit of organization for a sequence of speech sounds. Each syllable can be considered as pronounciation units that constitutes a word pronounciation. For example, โเดฎเดฒเดฏเดพเดณเดโ has เดฎ, เดฒ, เดฏเดพ, เดณเด as 4 syllables. If you ask a native Malayalam speaker, โHow many letters are in the word เดฎเดฒเดฏเดพเดณเด?โ The answer would be 4 and it corresponds to syllable count. The โletterโ concept, known as โเด เดเตเดทเดฐเดโ in Malayalam often refers to syllables.
Along with a verbal description of syllables in Malayalam we attempt to formalize a grammar using PEG โ Parser Expression grammar. That grammar is then used for writing a parser to find the syllables in a given word. A web interface is also provided to try out the system.
Before starting with definition of syllable model, we need to define some terminology.
Definitions
Vowel
โ Vowels of Malayalam -Any of the set: [เด เดเดเดเดเดเดเดเดเดเดเดเดเดเด เด]VowelSign
โ Vowel signs. โ Any of the set [เดพเดฟเตเตเตเตเตเตเตเตเตเต]Consonant
โ Consonants โ Any of the set [เดเดเดเดเดเดเดเดเดเดเดเด เดกเดขเดฃเดคเดฅเดฆเดงเดจเดชเดซเดฌเดญเดฎเดฏเดฐเดฒเดตเดถเดทเดธเดนเดณเดดเดฑ]Virama
โ The sign เต.Visarga
The sign เดAnuswara
โ The vowel sign of เด เด.ie เด. This share some properties of Chillu.Chillu
โ Pure consonants, without any vowels. Chillus are any of เตป, เตผ, เตฝ, เตพ, เตบ, เตฟ, เต, เต, เต. The last 4 chillus are rarely used or archaic. But we can consider them for our modeling. Due to historic encoding reasons, Chillus can also appear as baseConsonant
+Virama
+ZWJ
form. That means, เตป = เดจ + เต +ZWJ
. Chillus never appear in the begininning of word, but is not relevant for a syllable analyser.ZWNJ
Zero Width Non Joiner.\u200CZWJ
Zero with Joiner \u200DSigns
A term used to address various signs that modify aConsonant
. Any ofVowelSign
,Virama
,Anuswara
,Visarga
.Conjunct
:Refer the formal definition of this we discussed in previous blog post. We defined it as AConsonant
combined with anotherConjunct
orConsonant
usingVirama
. Example: เดธ+ เต + เดค => เดธเตเดค , เดธเตเดค + เต + เดฐ = เดธเตเดคเตเดฐ. เดฆเตเดง + เต เดฐ = เดฆเตเดงเตเดฐ, เดฆเตเดงเตเดฐ + เต + เดฏ = เดฆเตเดงเตเดฐเตเดฏ. But we need an advanced version. That definition did not support DotReph (เต) which combines with a consonant or conjunct to form Conjunct. To supportDotReph
as well, we will redefine Conjunct asHalfConsonant Conjunct / Consonant
DotReph
The sign (เต). It combines with other consonants as in this example: เต + เดฏ -> เตเดฏ in เดญเดพเตเดฏHalfConsonant
: AConsonant
followed byVirama
Example: เดชเต, เดฐเต, เดฎเต etc. Or aDotReph
Syllable model
A syllable in Malayalam can be any of the following.
An independent
Vowel
. Vowels are often found at the begininning of the word. Example: เด เดฎเตเดฎ. But for the specific case of Syllables, we can relax this rule of being in the start of word and generally state that a vowel is syllable. Note that vowel appearing as vowel sign is not what we are considering here.Vowel signs
has its own properties.A
Chillu
letter is a syllable.A
Consonant
without anySigns
is a syllable. For example, in the word เดคเดฑ, both เดค and เดฑ are Syllables.A
Consonant
orConjunct
withSigns
is a syllable. Here the Signs can be repeated more than once, but not freely. This syllable has the following characteristics:Signs
can beVirama
only if it is the last items of a given word. For example. เด เดคเต has เด , เดคเต as syllables, but เด เดคเตเดญเตเดคเด has เด , เดคเตเดญเต, เดคเด as syllables.Signs
can occur 2 times in folllowing cases:(a) First Sign is เต and Second isVirama
This combination is also called Samvruthokaram. Example: เดคเตเต in เด เดคเตเต. (b) First Sign is aVowelSign
and Second isAnuswara
. Examples: เดคเดพเด, เดคเตเด, เดคเตเด, เดคเตเด etc.
A
ZWNJ
marks a syllable boundary. A ZWNJ inserted between two blocks of text inserts a ligature as well as syllable boundary. For example: เดคเดฎเดฟเดดเตโเดจเดพเดเต, the ZWNJ inserted after เดดเต and before เดจเดพ prevents possible เดดเตเดจ Conjunct and hence also makes a point that the pronounciation should break at that point. It is a bit wierd to say a ZWNJ forms a syllable since it is just a seperator. But while analysing a series of letters from begininning to end, it is technically okey to consider ZWNJ as a syllable block.
Parser Expression Grammar
You can try this in a PEG evaluator and try various conjucts to see if they all getting parsed. Use https://pegjs.org/online, copy paste the above grammar try inputs like โเดถเดพเดธเตเดคเตเดฐเดตเดฟเดทเดฏเดเตเดเตพโ.
Characteristics of the Grammar
There are a few important characteristics of this grammar.
It does certain validations against the signs
. For example, it does not allow a VowelSign
, virama
or anuswara
after a visarga
. If that happens, the parser will fail to parse a word. It permits a virama
after a VowelSign
, but that is only for Samvruthokaram(vowel sign = เต ).
Among the signs, you can see Virama, but it is permitted only at the end of the word. For example: เด
เดคเต. If virama
comes in between a word, it has the nature of consonant combining.
The order of Signs
is also enforced. For example, you cannot have a virama
and then VowelSign
เต even though the reverse order is permitted.
Above rules creates some strictness for the parser. At the same time, there are some relaxed rules too. There is no maximum limit on a possible conjuct. A nonsense conjunct like โเดเตเดเตเดเตเดคเตเดชเตเดฌเตเดญเตเดฎเตเดเตเดคเตเดเตโ will be accepted by parser. Malayalam has valid conjuncts upto 5 as far as I know(Example: เดเตเดฆเตเดงเตเดฐเตเดฏ ). Usually the longer conjuncts will have the ending consonants as เดฏ, เดฐ, เดฒ, เดต.
In informal Malayalam, vowel sign duplication is sometimes used to denote elongation. For example, เดตเดพเดเดพเดพเดพ. Our parser wonโt accept that.
Syllable boundaries
If you want to know syllable boundaries and donโt care about anything else, there is an easy way to find boundaries.
A syllable boundary is after:
A vowel. Note that this not vowel sign. Example: เด |เดฑ, เด|เดฐ, เด|เดชเตเดชเต
A vowel sign, if not followed by virama, anuswara or visarga. Example: เดคเตเดคเดฟ|เตฝ, เดชเต|เด,
A consonant if followed by another consonant or chillu. Example: เดค|เดฑ, เดทเตเด|เดฎเดฟ, เด|เตฝ
A chillu. Example: เดธ|เตผ|เดชเตเดชเด
An Anuswara. Example: เดเต|เดเตเด|เดฌเด,
A Visarga_._ Example: เดฆเตเด|เดเด
A ZWNJ is syllable boundary.
Web interface
I prepared a web interface if you just want to try out the syllable analyser and dont want to play with PEG.
Now that comes with a JS API too, just include the following file in your web application:
https://phon.smc.org.in/syllables/lib/malayalam-syllables.js
Then use the following method to split a word to syllables.
I prepared a codepen project to demonstrate this. See the Pen Malayalam syllable analyser by Santhosh Thottingal (@santhoshtr) on CodePen.
Source code
https://github.com/santhoshtr/malayalam-syllable-analyser
Please report any issues or ideas to improve this model there. Thanks!
Last updated