To install click the Add extension button. That's it.

The source code for the WIKI 2 extension is being checked by specialists of the Mozilla Foundation, Google, and Apple. You could also do it yourself at any point in time.

4,5
Kelly Slayton
Congratulations on this excellent venture… what a great idea!
Alexander Grigorievskiy
I use WIKI 2 every day and almost forgot how the original Wikipedia looks like.
Live Statistics
English Articles
Improved in 24 Hours
Added in 24 Hours
What we do. Every page goes through several hundred of perfecting techniques; in live mode. Quite the same Wikipedia. Just better.
.
Leo
Newton
Brights
Milds

Double-byte character set

From Wikipedia, the free encyclopedia

A double-byte character set (DBCS) is a character encoding in which either all characters (including control characters) are encoded in two bytes, or merely every graphic character not representable by an accompanying single-byte character set (SBCS) is encoded in two bytes (Han characters would generally comprise most of these two-byte characters). A DBCS supports national languages that contain many unique characters or symbols (the maximum number of characters that can be represented with one byte is 256 characters, while two bytes can represent up to 65,536 characters). Examples of such languages include Japanese and Chinese. Korean Hangul does not contain as many characters, but KS X 1001 supports both Hangul and Hanja, and uses two bytes per character.

In CJK computing

The term DBCS traditionally refers to a character encoding where each graphic character is encoded in two bytes.

In an 8-bit code, such as Big-5 or Shift JIS, a character from the DBCS is represented with a lead (first) byte with the most significant bit set (i.e., being greater than seven bits), and paired up with a single-byte character-set (SBCS). For the practical reason of maintaining compatibility with unmodified, off-the-shelf software, the SBCS is associated with half-width characters and the DBCS with full-width characters. In a 7-bit code such as ISO-2022-JP, escape sequences or shift codes are used to switch between the SBCS and DBCS.

Sometimes, the use of the term "DBCS" can imply an underlying structure that does not comply with ISO 2022. For example, "DBCS" can sometimes mean a double-byte encoding that is specifically not Extended Unix Code (EUC).

This original meaning of DBCS is different from what some consider correct usage today. Some insist that these character encodings be properly called multi-byte character sets (MBCS) or variable-width encodings, because character encodings such as EUC-JP, EUC-KR, EUC-TW, GB 18030, and UTF-8 use more than two bytes for some characters, and they support one byte for other characters.

Ambiguity

Some people use DBCS to mean the UTF-16 and UTF-8 encodings, while other people use the term DBCS to mean older (pre-Unicode) character encodings that use more than one byte per character. Shift JIS, GB 2312 and Big5 are a few character encodings that can contain more than one byte per character, but even using the term DBCS for these character encodings is incorrect terminology because these character encodings are really variable-width encodings (as are both UTF-16 and UTF-8). Some IBM mainframes do have true DBCS code pages, which contain only the double byte portion of a multi-byte code page.

If a person uses the term "DBCS enablement" for software internationalization, they are using ambiguous terminology. They either mean they want to write software for East Asian markets using older technology with code pages, or they are planning on using Unicode. Sometimes this term also implies translation into an East Asian language. Usually "Unicode enablement" means internationalizing software by using Unicode, and "DBCS enablement" means using incompatible character encodings that exist between the various countries in East Asia for internationalizing software. Since Unicode, unlike many other character encodings, supports all the major languages in East Asia, it is generally easier to enable and maintain software that uses Unicode. DBCS (non-Unicode) enablement is usually only desired when much older operating systems or applications do not support Unicode.

TBCS

A triple-byte character set (TBCS) is a character encoding in which characters (including control characters) are encoded in three bytes.

See also

External links

This page was last edited on 29 February 2024, at 04:36
Basis of this page is in Wikipedia. Text is available under the CC BY-SA 3.0 Unported License. Non-text media are available under their specified licenses. Wikipedia® is a registered trademark of the Wikimedia Foundation, Inc. WIKI 2 is an independent company and has no affiliation with Wikimedia Foundation.