# Optical Character recognition

## Tesseract

Install tesseract and mlayalam language pack

```
$ apt install tesseract-ocr tesseract-ocr-mal
```

Make sure tesseract is available with Malayalam support

```bash
$ tesseract --list-langs
List of available languages (4):
Malayalam
eng
mal
osd
```

Run OCR on an image. Assuming you have testocr.png as the image file to OCR. The out.txt is your output file

```bash
$ tesseract ~/testocr.png out.txt -l mal
```

The `out.txt` file will have the text recognized from image

## Tesseract OCR in browser

{% embed url="<https://ocr.smc.org.in/>" %}

![](https://620052135-files.gitbook.io/~/files/v0/b/gitbook-x-prod.appspot.com/o/spaces%2F-MJkzx1KF5XGsfOgiP-a%2Fuploads%2Fgit-blob-fd457ac5c907bbc2cd15b322b6c86a083bc13000%2Fimage.png?alt=media)

## Links

{% embed url="<https://github.com/harish2704/pottan-ocr>" %}
