New LanguageIdentifier class - single point for identifying language.
There are two editions: basic and full.
Basic edition contains original textcat's language models + "Wikipedia-Experimental-UTF8Only"
Full edition contains all files of basic edition + FULL language pack -- "Wikipedia-Experimental-AllEncodings" -- 27263 lanugage models (280 languages and flavors of wikipedia encoded in all encodings capable of representing at least 90% of sample text).
PLEASE BEWARE OF AROUND 40 SECONDS DELAY BEFORE APPLICATION SHOWS PROMPT WHEN YOU START IT FOR THE FIRST TIME.
This happens because of huge number of language models loaded (27263).
Delay is around 15 seconds for the second time you start application (because all files will be cached already).
Full matrix of language-encoding compatibility can be found in languageEncodingMatrix.csv (pairs with values of >90% are included into release).
Please find sample material in Samples folder (some languages I know and popular encodings).
Example of usage (default settings used):
NTextCatLegacy.exe -noprompt < Samples\ukrainian-1251.txt
First result returned is considered the best. Format is <lanugage>_cp<codepage>". E.g. uk_cp1251