-UNICODE-

.

-as of [24 AUGUST 2024]

.

*UNICODE CONSORTIUM*

.

“UNICODE TRANSFORMATION FORMAT”

.

-[unicode] is a ‘computing industry standard’ for the consistent (‘encoding’ / ‘representation’ / ‘handling’) of ‘text’ expressed in most of the world’s [writing systems]-

.

(the standard is maintained by the ‘unicode consortium’, and as of ‘may 2019’ the most recent version (unicode 12.1) contains a repertoire of ‘137,994 characters’ covering 150 modern + historic scripts, as well as ‘multiple symbol sets’ and ’emoji’)

.

(the ‘character repertoire’ of the ‘unicode standard’ is synchronized with ‘ISO/IEC 10646’, and both are ‘code-for-code identical’)

.

(the ‘unicode standard‘ consists of…)

*a set of ‘code charts’ for ‘visual reference’*

*an ‘encoding method’ and set of ‘standard character encodings’*

*a set of ‘reference data files’*

.

(…and a number of related items, such as…)

*character properties*

*rules for ‘normalization’*

*de-composition*

*collation*

*rendering*

.

*’bi-directional display order’ for the correct display of text containing both ‘right-to-left scripts’ (such as ‘arabic’ + ‘hebrew’) and ‘left-to-right scripts’*

.

(unicode’s success at unifying ‘character sets’ has led to its ‘widespread’ + ‘predominant’ use in the ‘internationalization’ + ”localization’ of ‘computer software”)

.

(the ‘standard’ has been implemented in many recent technologies, including…)

*modern operating systems*

*XML*

*java* 
(and other ‘programming languages’)

*the ‘.NET framework’*

.

(‘unicode’ can be implemented by different ‘character encodings’)

(the ‘unicode standard’ defines (‘UTF-8’ / ‘UTF-16’ / ‘UTF-32’), and several other ‘encodings’ are in use)

(the most commonly used ‘encodings’ are (‘UTF-8’ / ‘UTF-16’ / ‘UCS-2’ (without full support for ‘unicode’), a precursor of ‘UTF-16’)

(‘GB18030 is standardized in ‘china’ and implements ‘unicode’ fully, while not an ‘official unicode standard’)

(‘UTF-8’ (the dominant encoding on the ‘world wide web’ (used in over 93% of websites)), uses ‘1 byte’ for the ‘first 128 code points’, and up to ‘4 bytes’ for other characters)

(the first 128 unicode code points are the ‘ASCII characters’, which means that any ‘ASCII text’ is also a ‘UTF-8 text’)

(UCS-2 uses ‘2 bytes’ (/ ’16 bits’) for each ‘character’ but can only encode the ‘first 65,536 code points’ (the so-called ‘basic multi-lingual plane’ (BMP))

(with ‘1,114,112 code points’ on ’17 planes’ being possible, and with over ‘137,000 code points’ defined so far, ‘UCS-2’ is only able to represent less than half of all ‘encoded unicode characters’)

(therefore, ‘UCS-2’ is outdated, though still widely used in ‘software’)

(‘UTF-16’ extends ‘UCS-2′, by using the same ’16-bit encoding’ as ‘UCS-2’ for the ‘basic multilingual plane’, and a ‘4-byte encoding’ for the other ‘planes’)

(as long as it contains no ‘code points’ in the ‘reserved range’ U+D800–U+DFFF, a ‘UCS-2 text’ is a ‘valid UTF-16 text’)

(‘UTF-32’ (also referred to as ‘UCS-4’) uses ‘4 bytes’ for each ‘character’)

(like UCS-2, the # of ‘bytes per character’ is ‘fixed’, facilitating ‘character indexing’; but unlike ‘UCS-2’, ‘UTF-32’ is able to encode all ‘unicode code points’)

.

(however, because each character uses ‘4 bytes’, ‘UTF-32’ takes significantly more space than other encodings, and is not widely used)

.

.

*👨‍🔬🕵️‍♀️🙇‍♀️*SKETCHES*🙇‍♂️👩‍🔬🕵️‍♂️*

.

📚📖|/\-*WIKI-LINK*-/\|📖📚

.

.

👈👈👈 ☜ *“PROGRAMMING LANGUAGES″*☞ 👉👉👉

.

.

💕💝💖💓🖤💙🖤💙🖤💙🖤❤️💚💛🧡❣️💞💔💘❣️🧡💛💚❤️🖤💜🖤💙🖤💙🖤💗💖💝💘

.

.

*🌈✨ *TABLE OF CONTENTS* ✨🌷*

.

.

🔥🔥🔥🔥🔥🔥*we won the war* 🔥🔥🔥🔥🔥🔥