NTextCat 0.1.5

Rating: No reviews yet
Downloads: 395
Change Set: 8843
Released: Aug 28, 2011
Updated: Aug 29, 2011 by IvanAkcheurov
Dev status: Alpha Help Icon

Recommended Download

Application Basic: Binaries (.Net 4.0) + Language Models
application, 621K, uploaded Aug 29, 2011 - 243 downloads

Other Available Downloads

Application Basic: Binaries(.Net 3.5) + Language Models
application, 622K, uploaded Aug 29, 2011 - 50 downloads
Application Full: Binaries (.Net 4.0) + 27263 Lang Models (280 languages per ~100 encodings)
application, 6799K, uploaded Aug 29, 2011 - 67 downloads
Application Full: Binaries (.Net 3.5) + 27263 Lang Models (280 languages per ~100 encodings)
application, 6801K, uploaded Aug 29, 2011 - 35 downloads

Release Notes

Added basic and full editions compiled for .NET Framework 3.5 Client Profile. It enables using NTextCat within SQL Server 2008.

There are two edition types: basic and full.
Basic edition contains original textcat's language models (which have poor encoding coverage) + "Wikipedia-Experimental-UTF8Only" (which are capable of identifying language of UTF8 text only or of string if you use ClassifyText from API)

Full edition contains all files of basic edition + FULL language pack -- "Wikipedia-Experimental-AllEncodings" -- 27263 lanugage models (280 languages and flavors of wikipedia encoded in all encodings capable of representing at least 90% of sample text).

Full Edition:
PLEASE BEWARE OF AROUND 40 SECONDS DELAY BEFORE APPLICATION SHOWS PROMPT WHEN YOU START IT FOR THE FIRST TIME.
This happens because of huge number of language models loaded (27263).
Delay is around 15 seconds for the second time you start application (because all files will be cached already).

Full matrix of language-encoding compatibility can be found in languageEncodingMatrix.csv (pairs with values of >90% are included into release).

Please find sample material in Evaluation folder (some languages I know and popular encodings).


Example of usage (default settings used):

NTextCatLegacy.exe -noprompt < Evaluation\ukrainian-1251.txt

First result returned is considered the best. Format is <lanugage>_cp<codepage>". E.g. uk_cp1251

Reviews for this release

No reviews yet for this release.