- Recommended length of a text snippet has been reduced to 5 (though mostly a single word is handled correctly).
- Much better support for Asian languages (Chinese, Japanese).
- Simplified and made more consistent API. (examples of usage in unit tests)
var factory = new RankedLanguageIdentifierFactory();
var identifier = factory.Load("Core14.profile.xml");
var res = identifier.Identify("your text to get its language identified");
- Fixed NaiveBayesLanguageIdentifier so that it performs as good as RankedLanguageIdentifier
- NTextCat.exe provides the main command line interface from now on (it's command line API may be changed in several subsequent releases).
- Based on the feedback, a set of 14 the most popular languages has been selected. It has become a default. The set: Chinese, Danish, Dutch, English, French, German, Italian, Japanese, Korean, Norwegian, Portugese, Russian, Spanish, Swedish
- SqlServerClrIntegration is not in the release yet. It will be reintroduced in one of the next releases recompiled and verified for SQL Server 2012.
- Fixed a bug in GaussianBag
- More rigid testing routines as preparations to produce a stable release.