Appending additional OCR language dictionaries
The language dictionaries provided within the installation package are:
ara (Arabic)
deu (German)
eng (English)
fra (French)
heb (Hebrew)
ita (Italian)
nld (Dutch; Flemish)
por (Portuguese)
spa (Spanish; Castilian)
vie (Vietnamese)
Of course the OCR engine isn't restricted to those languages only and can recognize many more.
If the language you wish to recognize is not in the above list, please download the complete OCR languages pack.
It includes more than120 languages and can be downloaded from https://www.gdpicture.com/download/tesseract_ocr_4x_language_pack.zip
You can also try other language files provided by the Tesseract team here: https://github.com/tesseract-ocr/tesseract/wiki/Data-Files#updated-data-files-for-version-400-september-15-2017
Once the download is completed, simply extract the archive content in the folder, where you have your OCR dictionaries already installed.
To obtain language names from language codes please visit this page: https://github.com/tesseract-ocr/tesseract/wiki/Data-Files#updated-data-files-for-version-400-september-15-201
If for any reason you want to use previous language data files (without LSTM engine usage) you can download the complet pack from this link: https://www.gdpicture.com/download/tesseract_ocr_304_language_pack.zip