LSA
Website developed in partnership with the LINGUIST List
Member Login  |  Join the LSA
LSA Resources Publications Jobs Institutes Meetings Members LSA

Take Note!
  • The LSA's 2009 Annual Meeting has been moved to San Francisco due to an ongoing labor dispute affecting the Portland Hilton. The host hotel will be the San Francisco Hilton. The LSA will enjoy the same excellent room rate--$99/night single or double, $129/night triple or quad--that we were to have had in Portland. More information
  • Nominations for the Leonard Bloomfield Book Award are due 1 June, 2008.
  • Nominations for the LSA's "Linguistics, Language and the Public" Award will be accepted until 1 June, 2008.

Open Document Standards and Language Identification

13 February 2007
LSA Technology Advisory Committee (TAC)


Background

Through the organization Ecma International, Microsoft has proposed that ISO, the International Organization for Standardization, adopt a standard for the encoding of electronic documents under the heading, "ECMA-376 Office Open XML File Formats" (OOXML). This is a large and complex standard which, among other things, specifies mechanisms through which documents can be encoded for the languages their content is written in.

The proposed standard does not unambiguously call for the use of ISO 639 language codes for language identification. Furthermore, in at least some parts of the standard specification, a mechanism for identifying languages is described which would only allow for 256 distinct codes.

Clearly, from the perspective of the linguistics community, a coding mechanism for language identification only allowing for 256 unique codes is woefully inadequate. Since this mechanism is proposed to be included in a standard to be used in future versions of Microsoft Word, if adopted, it has the potential to affect a large segment of the linguistic community.

The issues here are complex, and it is not completely clear if this part of the standard specification was included intentionally, or if it was merely an oversight on the part of the specification's authors. Nevertheless, it seemed worthwhile for the LSA to make a statement on the issue, indicating its objections to the parts of the standard specification not conformant with ISO 639 standards. Accordingly, on February 13, 2007, the attached letter was sent to the appropriate representative at the American National Standards Institute (ANSI), which serves as the US delegate to ISO.

Because ISO is an international organization comprised of delegates from its member countries where each country has one vote, linguists outside the USA may want to forward the text of this letter to the relevant ISO representatives in their countries if they agree with the LSA's position on this matter. This will serve to amplify the view that the OOXML standard should use existing ISO language code standards-especially the newly adopted ISO 639-3 standard which aims to comprehensively cover all of the languages of the world-rather than create a new standard, especially one that only allows for 256 unique codes.

Information on who to contact with respect to this matter for a number countries other than the US can be found here: http://www.grokdoc.net/index.php/EOOXML_Contacts

If you have any questions regarding this, please feel free to contact Jeff Good (jcgood@buffalo.edu), the current chair of the LSA Technology Advisory Committee.


Sara Desautels
ANSI Program Manager
ISO/IEC JTC1/SC 34 Information technology--Document description and processing languages

Dear Ms. Desautels,

I am writing you as President of the Linguistic Society of America (LSA), on behalf of the Executive Committee of the Society and its members. The LSA understands that the ECMA 376 Office Open XML (OOXML) standard is being proposed for adoption as an ISO/IEC standard by JTC1/SC34. The LSA has reviewed the OOXML standard in relation to use of language identifiers and requests that any ISO/IEC standard for OOXML incorporate revisions to consistently specify the use of the recommendations in IETF BCP 47 for language tags in OOXML documents. A detailed explanation follows.

The LSA has reviewed the ECMA 376 Office Open XML standard in relation to internationalization and, specifically, metadata elements for language identification. As observed in §4.2 of SC34/N0809, WordprocessingML and DrawingML use language identifiers for each paragraph and run. The specifications for these in §4:2.18.51 and §4:5.1.12.72, however, are vague, unnecessarily inconsistent, and underrepresent the world's languages. To be specific:

In the specification of ST_TextLanguageID in §4:5.1.12.72, the type is said to be a restriction of the XML Schema string data type, yet no restriction of any sort is, in fact, described. The ST_TextLanguageID allows any string, and no convention for the form and semantics of these strings is specified.

In summary, then, if ECMA 376 is considered as a proposed ISO/IEC standard, then the LSA requests that it be revised to unambiguously specify the use of the recommendations in IETF BCP 47 for language tags per the specific changes described above.

Sincerely yours,
signature
Stephen R. Anderson
Dorothy R. Diebold Professor of Linguistics
Yale University
President (2007), Linguistic Society of America

Members Only indicates content restricted to members only.