NTextCat 0.1.4

Rating: No reviews yet
Downloads: 78
Change Set: 7672
Released: Jul 24, 2011
Updated: Aug 8, 2011 by IvanAkcheurov
Dev status: Alpha Help Icon

Recommended Download

Application Binaries + Language Models
application, 884K, uploaded Jul 24, 2011 - 59 downloads

Other Available Downloads

Application Binaries + 27263 Language Models (280 languages per ~100 compatible encodings)
application, 6719K, uploaded Aug 8, 2011 - 19 downloads

Release Notes

New LanguageIdentifier class - single point for identifying language.

There are two editions: basic and full.
Basic edition contains original textcat's language models + "Wikipedia-Experimental-UTF8Only"

Full edition contains all files of basic edition + FULL language pack -- "Wikipedia-Experimental-AllEncodings" -- 27263 lanugage models (280 languages and flavors of wikipedia encoded in all encodings capable of representing at least 90% of sample text).

PLEASE BEWARE OF AROUND 40 SECONDS DELAY BEFORE APPLICATION SHOWS PROMPT WHEN YOU START IT FOR THE FIRST TIME.
This happens because of huge number of language models loaded (27263).
Delay is around 15 seconds for the second time you start application (because all files will be cached already).

Full matrix of language-encoding compatibility can be found in languageEncodingMatrix.csv (pairs with values of >90% are included into release).

Please find sample material in Samples folder (some languages I know and popular encodings).


Example of usage (default settings used):

NTextCatLegacy.exe -noprompt < Samples\ukrainian-1251.txt

First result returned is considered the best. Format is <lanugage>_cp<codepage>". E.g. uk_cp1251

Reviews for this release

No reviews yet for this release.