๐Ÿ“š
Docs
  • Welcome
  • Santhosh Thottingal
    • Coding
    • Software I use
    • Research Papers
    • Talks
    • Projects
    • In news
    • Ideas
    • Books
  • Malayalam Computing
    • Unicode
      • Syllable
      • Conjunct
      • Articles
    • Input methods
      • Inscript
      • Swanalekha
      • Handwriting Recognition
        • Procrustes Analysis
      • Proprietory Input Methods
      • What is a good input method?
      • Typewriter
    • Script Rendering
      • Orthography
      • Ya Ra Va Signs
      • U signs
    • Type Design
      • Color Fonts
      • Curves
      • Design Ideas
      • Manjari
        • Gallery
      • Chilanka
      • Gayathri
      • Customize Malayalam fonts in Linux
      • Articles
      • Tools
      • Type classification
        • Display typefaces
    • Spellcheck
      • History
      • Dictionary based approach
      • Nature of Malayalam spelling mistakes
      • Morphology analyser based approach
      • Tools and services
      • Links
    • Hyphenation
      • Web page
    • Typesetting
      • LaTeX
      • Scribus
      • PDF
      • XeTeX
      • Indesign
      • Markup languages
    • Speech Recognition
    • Speech Synthesis
      • Dhvani
    • Collation
    • Corpus
    • Morphology Analysis
      • Mlmorph
        • Snippets
      • Part of speech tagging
      • Morphology complexity
    • Named Entity Recognition
    • Numbers
      • Number spellout
      • Hindi
    • Machine Translation
      • Neural Machine Translation
    • Optical Character recognition
    • Transliteration
    • Digitization
    • NLP
      • Low resource languages
      • Natural Language Generation
    • Grammar analysis
      • Style checkers
    • Dictionary
      • Lexicon
    • Natural Language Understanding
    • Natural Language Generation
    • Swathanthra Malayalam Computing
    • Meta
      • Malayalam Sign Language
      • เดชเดฆเดจเดฟเตผเดฎเดฟเดคเดฟ
      • History
      • เดฒเดฟเดชเดฟเดชเดฐเดฟเดฃเดพเดฎเด‚ เดจเดฟเดฒเดšเตเดšเตเดชเต‹เดฏเต‹?
      • เดญเดพเดทเดพ เดชเด เดจเด‚
      • เดถเตเดฐเต‡เดทเตเด  เดญเดพเดท
      • Dictionary
    • Encyclopedia
    • Government
      • Script
      • เด•เต‡เดฐเดณ เดญเดพเดทเดพ เด‡เตปเดธเตเดฑเตเดฑเดฟเดฑเตเดฑเตเดฏเต‚เดŸเตเดŸเต
  • Academic Research
    • Knowledge Dissemination
    • Research papers
    • Reproducible Research
  • Arts
  • Books
  • Blockchain
  • Computer Science
    • Data, Information, Knowledge
    • Theory of computation
    • Compilers and Interpreters
    • Graphics
    • Data Visualization
    • Parsers
    • Data Structures & Algorithms
    • Finite State Transducer
  • Cyberspace
    • Digital Governance
    • เด•เต‡เดฐเดณเดคเตเดคเดฟเตฝ
    • Online Abuse
  • Databases
  • Education
    • Finite State Transducers
    • Digital Education
    • Digital Literacy
      • เดกเดฟเดœเดฟเดฑเตเดฑเตฝ เดธเดพเด•เตเดทเดฐเดคเดพ เดชเดฆเตเดงเดคเดฟ
      • Resources
    • Remote Learning
    • General Learning
  • Entertainment
  • Frontend technology
    • Colors
    • Design systems
    • CSS
    • PWA
    • SPA
    • Vue
  • Generative Graphics
    • Drawbot
    • Matrix Digital Rain
  • Hardware
  • Internet
    • Etiquettes
    • Privacy
    • IPFS
    • Resilience
    • Decentralization
    • Network debugging tools
  • Knowledge Representation
  • Languages & Scripts
    • Arabic
    • Vattezhuth
  • Life
    • Digital Minimalism
  • Linux
  • Machine learning
    • Neural Networks
    • Dialog systems, Information retrieval
    • Large Language Models
    • Embedding
    • ML in Production
    • Retrieval Augmented Generation
  • Mathematics
  • Music
  • Parenting
  • Politics
    • Hatred, Hinduthwa, Nationalism
  • Productivity
  • Problem Solving
  • Science
  • Software Libraries
  • Software Engneering
    • Architecture
    • Product Management
    • Docker
    • Programming
      • Javascript
    • People
    • Performance
    • Code Review
  • Web3
  • Web Typography
  • Writing
  • เดชเดพเดŸเตเดŸเตเด•เตพ
    • เด•เตเดŸเตเดŸเดฟเดชเตเดชเดพเดŸเตเดŸเตเด•เตพ
  • เดฎเดฒเดฏเดพเดณเด‚ เด…เดšเตเดšเดŸเดฟ
  • เด—เดตเต‡เดทเดฃเดชเตเดฐเดฌเดจเตเดงเด™เตเด™เตพ
Powered by GitBook
On this page
  1. Malayalam Computing
  2. Unicode

Conjunct

PreviousSyllableNextArticles

Last updated 4 years ago

A formal grammar for Malayalam conjunct

In Malayalam a conjunct(เด•เต‚เดŸเตเดŸเด•เตเดทเดฐเด‚) is formed by combining 2 or more consonants by Virama(เดšเดจเตเดฆเตเดฐเด•เตเด•เดฒ). โ€œเด•เตเด•โ€ is a conjunct with 2 consonants, formed by เด• + เต + เด•. เดธเตเดคเตเดฐ is a conjuct with 3 consonants เดธ+ เต + เดค +เต+ เดฐ. เดจเตเดคเตเดฐเตเดฏ is a conjunct with 4 consonants โ€“ เดจ + เต + เดค + เต + เดฐ + เต + เดฏ. Conjuncts with more than 4 consonant is rare. เด—เตเดฆเตเดงเตเดฐเตเดฏ is formed by 5 consonants.

Can we define this formation in a ?

Let us try. For the simplicity, I am using formalism since we can quickly write a parser for that to test and evaluate.

Before that let us define the conjuct in plain English in a bit more concise and unambigous way.

Conjunct: A Consonant combined with another Conjunct or Consonant using Virama

We need to define Consonant and Virama also.

  • Virama: เต.

  • Consonants โ€“ Any of the set [เด•เด–เด—เด˜เด™เดšเด›เดœเดเดžเดŸเด เดกเดขเดฃเดคเดฅเดฆเดงเดจเดชเดซเดฌเดญเดฎเดฏเดฐเดฒเดตเดถเดทเดธเดนเดณเดดเดฑ]

Writing this in PEG syntax

You can try this in a PEG evaluator and try various conjucts to see if they all getting parsed. Use , copy paste the above grammar try inputs like เดจเตเดคเตเดฐเตเดฏ.

Let us look at the definition again.

Conjunct = Consonant Virama (Conjunct / Consonant )
Conjunct = (Conjunct / Consonant ) Virama Consonant

This will have the same result of our previous expression. We can also rewrite our plain English definition as well accordingly:

Conjunct: A Conjunct or Consonant combined with another Consonant using Virama

One more thing: I wrote ( Conjunct / Consonant ) instead of (Consonant / Conjunct ). The order was chosen intentionally since the matches are done left to right. Since a Conjunct anyway start with a Consonant, the parsing will proceed with that path. We avoid it by using the Conjunct, Consonant order.

This is a . Meaning, The **Conjuct ** get expanded towards the end of the expression. Can we rewrite this using a ? We can. see:

There is a problem with this new definition since it is , depending up on the parser implementation, it can cause infinite recursion. The PEGjs parser which we used above for testing and evaluation does not support Left recursion since it is a top down parser(). You can try modify the above pegjs grammar in the online evaluation tool, you will see the parser warns about ininite recursion.

But if the parser is capable of avoiding this issue, nothing stops you to write the grammar using Left recursion. such as GNU Bison can very well support left recursion. But big issue here is GNU Flex/Bison used for writing such grammars does not support Unicode!. You can make it working by doing some low level byte manipulation. I did not try.

tail recursion
Left recursion
Left recursion
recursive descent parser
LALR parsers
formal grammar
Parser Expression Grammar
https://pegjs.org/online