This project is read-only.

Speed up factory.load by "remembering" the XML-Content?

Nov 1, 2013 at 8:02 PM
It seems that the loading of an XML file with 14MB (like the provided Wiki82.profile.xml) is incredible slow using the factory.load command. At least, if you use it in a loop with hundreds of strings to identify.

How can we speed up this process and load the content of the XML file only once and re-use it for every detection of the language with .Identify?
Dec 26, 2013 at 11:18 PM
Edited Jan 7, 2014 at 11:45 PM
Actually you need to load XML only once with use of a factory.

A factory creates an identifier which internally contains a loaded language profile. This happens only once.
This is the code which creates an identifier (the whole sample is here):
var factory = new RankedLanguageIdentifierFactory();
var identifier = factory.Load("Core14.profile.xml");
Once you have the identifier, you can run identification as many times as you want without loading the XML profile again. Example:
foreach (var nextSnippet in your1000snippets)
    // actual recognition happens here, NO XML NEEDS TO BE LOADED
    var languages = identifier.Identify(nextSnippet);

    var mostCertainLanguage = languages.FirstOrDefault();
    if (mostCertainLanguage != null)  
        Console.WriteLine("The language of the text is '{0}' (ISO639-3 code)", mostCertainLanguage.Item1.Iso639_3);  
        Console.WriteLine("The language couldn’t be identified with an acceptable degree of certainty");
Marked as answer by IvanAkcheurov on 1/7/2014 at 3:45 PM