This page contains information relating to the specification of CDL, the Character Description Language. CDL is an XML application, designed for precise and compact description and indexing of any and all 漢/汉 Han (Chinese, Japanese, Korean, and Vietnamese) characters, encoded and unencoded.
The basic elements of CDL are a standard grid space, and a set of basic stroke types. Using these simple elements, CDL provides a framework for describing characters and components, and for reuse of character and component descriptions in the descriptions of other characters and components. CDL also supports a variant mechanism, for associating any number of CDL descriptions with any Unicode codepoint.
|
by 毕晓普 (畢曉普) & 曲理查
News in brief (reverse chronological):
In Unicode/ISO parlance, certain blocks of 漢 Hàn characters are called "CJK Unified Ideographs". CJK (a trademark of the RLG) stands for "Chinese, Japanese, and Korean", and is sometimes extended to CJKV "Chinese, Japanese, Korean and Vietnamese". Scripts in all four of these locales make use of Chinese-derived characters. These characters are "Unified" in that locale-specific differences in character forms have been ignored for encoding purposes. These characters are "Chinese-derived" in that the principles for character creation originated in China (more than 3,000 years ago). These characters are termed 漢 "Hàn" because of the legacy of the influential Hàn Dynasty (c. 121 A.D.) script reforms. Of course, there are characters in all locales which are unique to those locales, and there may also be different stylistic conventions (typeface expectations) in different locales.
The term "ideograph" is used in information-technology circles to signify 'the uniquely CJKV script entity', which is to say, "ideographs" constitute a certain subset of the "characters" to be found in Asian texts. Japanese Kana (Hiragana and Katakana) are also "characters" in Japanese, but are not considered "ideographs" (though they derive from Chinese-derived "ideographs").
The term "ideograph" in info-tech might also be understood to indicate that the phonological information conveyed is somewhat imprecise relative to strictly "phonographic" scripts (those using alphabets and isographic syllabaries to convey specific sound values). "A syllabary being a system for writing the elements of the syllable canon of a language, the syllabograph would be a graphical element of a syllabary. When there is a one-to-one correspondence between syllable type and syllabograph, this is an isographic syllabary. In that it sometimes has multiple representations of a given syllable type, the Chinese writing system might be termed an imperfect or heterographic syllabary. Chinese characters, the elements of a heterographic syllabary, might be termed heterographic syllabographs, or heterosyllabographs. No matter what they are called, there is clearly some degree of imprecision in the Chinese script, in terms of its ability to convey specific sound values." [Cook, 2003:195]
At any rate, these are some of the distinctions that the Unicode/ISO "ideograph" (or "ideogram", as it is sometimes written) might be understood to convey. The term "ideograph" is not used in info-tech to indicate pure "idea writing" (circumventing graphic representation of speech), nor is it used to indicate the small set of 指事 zhǐshì 'indicative of the deed' Chinese characters.
In naming CDL, we use "character" with its common English meaning, intentionally avoiding the uncommonly understood (or commonly misunderstood) information-technology terms "ideograph" and "glyph". Arguably, there are some other good reasons not to call CDL a "CJK Ideographic Glyph Description Language".
Finally, if you are not bothered by the jargon and prefer to think of CDL as a C(IG)DL "CJK (Ideographic Glyph) Description Language", please feel free to do so.
Last updated: Friday, April 27, 2007
Copyright © 2003-2007 Wenlin Institute, Inc. All Rights Reserved.