CDL

Character Description Language

字形描述语言 (字描语)

CDL

An XML application for describing Han (CJKV) characters


This page contains information relating to the specification of CDL, the Character Description Language. CDL is an XML application, designed for precise and compact description and indexing of any and all 漢/汉 Han (Chinese, Japanese, Korean, and Vietnamese) characters, encoded and unencoded.

The basic elements of CDL are a standard grid space, and a set of basic stroke types. Using these simple elements, CDL provides a framework for describing characters and components, and for reuse of character and component descriptions in the descriptions of other characters and components. CDL also supports a variant mechanism, for associating any number of CDL descriptions with any Unicode codepoint.


On this page:

WLCDL

Current drafts of core CDL documents

by 毕晓普 (畢曉普) & 曲理查




CDL Status

News in brief (reverse chronological):


Related Links:


Two passages from The Unicode Standard 4.0
Two papers from Academia Sinica's Info-Tech Laboratory:





Jargon notes:

In Unicode/ISO parlance, certain blocks of 漢 Hàn characters are called "CJK Unified Ideographs". CJK (a trademark of the RLG) stands for "Chinese, Japanese, and Korean", and is sometimes extended to CJKV "Chinese, Japanese, Korean and Vietnamese". Scripts in all four of these locales make use of Chinese-derived characters. These characters are "Unified" in that locale-specific differences in character forms have been ignored for encoding purposes. These characters are "Chinese-derived" in that the principles for character creation originated in China (more than 3,000 years ago). These characters are termed 漢 "Hàn" because of the legacy of the influential Hàn Dynasty (c. 121 A.D.) script reforms. Of course, there are characters in all locales which are unique to those locales, and there may also be different stylistic conventions (typeface expectations) in different locales.

The term "ideograph" is used in information-technology circles to signify 'the uniquely CJKV script entity', which is to say, "ideographs" constitute a certain subset of the "characters" to be found in Asian texts. Japanese Kana (Hiragana and Katakana) are also "characters" in Japanese, but are not considered "ideographs" (though they derive from Chinese-derived "ideographs").

The term "ideograph" in info-tech might also be understood to indicate that the phonological information conveyed is somewhat imprecise relative to strictly "phonographic" scripts (those using alphabets and isographic syllabaries to convey specific sound values). "A syllabary being a system for writing the elements of the syllable canon of a language, the syllabograph would be a graphical element of a syllabary. When there is a one-to-one correspondence between syllable type and syllabograph, this is an isographic syllabary. In that it sometimes has multiple representations of a given syllable type, the Chinese writing system might be termed an imperfect or heterographic syllabary. Chinese characters, the elements of a heterographic syllabary, might be termed heterographic syllabographs, or heterosyllabographs. No matter what they are called, there is clearly some degree of imprecision in the Chinese script, in terms of its ability to convey specific sound values." [Cook, 2003:195]

At any rate, these are some of the distinctions that the Unicode/ISO "ideograph" (or "ideogram", as it is sometimes written) might be understood to convey. The term "ideograph" is not used in info-tech to indicate pure "idea writing" (circumventing graphic representation of speech), nor is it used to indicate the small set of 指事 zhǐshì 'indicative of the deed' Chinese characters.

In naming CDL, we use "character" with its common English meaning, intentionally avoiding the uncommonly understood (or commonly misunderstood) information-technology terms "ideograph" and "glyph". Arguably, there are some other good reasons not to call CDL a "CJK Ideographic Glyph Description Language".

  1. In terms of a "character" vs. "glyph" distinction: CDL descriptions lie somewhere between abstract character (script entity class) and concrete glyph (instantiated member of a script entity class). The underlying stroke-based CDL description is rather abstract in that it specifies only a skeleton trajectory to be fleshed out by the CDL interpreter (rather than a complete outline). Interpreted and rasterized, the abstract CDL character becomes concrete CDL glyph.
  2. In terms of an "ideograph" vs. "character" (e.g. Kanji vs. Kana) disinction: CDL is truly a "character description language" in that its basic principles are applicable to any script entity in any script (not simply "ideographs"). Wenlin's own stroked Latin font, for example, is implemented using CDL technology: base characters and diacritics are comprised of curved and straight segments; these basic components may then be combined to describe precomposed Latin script entities.

Finally, if you are not bothered by the jargon and prefer to think of CDL as a C(IG)DL "CJK (Ideographic Glyph) Description Language", please feel free to do so.


Standards Organizations:


Last updated: Friday, April 27, 2007


The home page of Wenlin Institute

Copyright © 2003-2007 Wenlin Institute, Inc. All Rights Reserved.

Valid XHTML 1.0 Transitional