Wiki82.profile.xml

Mar 13, 2015 at 5:34 PM
We have implemented the .NET version of NTextCat into our software to identify languages. As part of the identification process, NTextCat uses the Wiki82.profile.xml to identify languages. Within this file there appears to be several ambiguous language codes. I have tried to compare these:

zh_classical = Chinese Classical (ZH) I believe?
zh_yue = Chinese (ZH)?
Simple = Unsure
roa_rup = Romance Languages; Aromanian; Arumanian; Macedo-Romanian (RUP)?
be_x_old = Belarusian (BE)?
Bpy = Unsure
lmo = unsure
pms = unsure
sh = unsure

To verify all items, I used the following link:

http://www.loc.gov/standards/iso639-2/php/code_list.php

Does anyone have any thoughts as to what these codes are referencing?
Coordinator
Apr 22, 2015 at 8:46 AM
Hi,
Wikipedia 82 was generated from Wiki languages. You can find the codes and descriptions here:
http://en.wikipedia.org/wiki/List_of_Wikipedias#List
Additionally, you can remove the language profiles you don't need from the XML (and rename it to something like Wikipedia.37.profile.xml)
Best Regards,
Ivan Akcheurov
Apr 22, 2015 at 2:23 PM
Thanks Ivan for the link. Is there any 1 place to get the definition for the codes provided in this file? For example what is zh_classical? I h ave found the following information which I think defines most of the codes, but if you have 1 place to get this information that would be exactly what I need. http://www.loc.gov/standards/iso639-2/php/code_list.php