C# 4.0 in a Nutshell by Joseph Albahari & Ben Albahari
Author:Joseph Albahari & Ben Albahari [Joseph Albahari]
Language: eng
Format: epub
Tags: COMPUTERS / Programming Languages / Visual BASIC
ISBN: 9781449380458
Publisher: O'Reilly Media
Published: 2010-01-19T16:00:00+00:00
Note
A StreamReader or StreamWriter will throw an exception if it encounters bytes that do not have a valid string translation for their encoding.
The simplest of the encodings is ASCII, because each character is represented by one byte. The ASCII encoding maps the first 127 characters of the Unicode set into its single byte, covering what you see on a U.S.-style keyboard. Most other characters, including specialized symbols and non-English characters, cannot be represented and are converted to the □ character. The default UTF-8 encoding can map all allocated Unicode characters, but it is more complex. The first 127 characters encode to a single byte, for ASCII compatibility; the remaining characters encode to a variable number of bytes (most commonly two or three). Consider this:
using (TextWriter w = File.CreateText ("but.txt")) // Use default UTF-8 w.WriteLine ("but-"); // encoding. using (Stream s = File.OpenRead ("but.txt")) for (int b; (b = s.ReadByte()) > −1;) Console.WriteLine (b);
The word “but” is followed not by a stock-standard hyphen, but by the longer em dash (—) character, U+2014. This is the one that won’t get you into trouble with your book editor! Let’s examine the output:
98 // b 117 // u 116 // t 226 // em dash byte 1 Note that the byte values 128 // em dash byte 2 are >= 128 for each part 148 // em dash byte 3 of the multibyte sequence. 13 // <CR> 10 // <LF>
Because the em dash is outside the first 127 characters of the Unicode set, it requires more than a single byte to encode in UTF-8 (in this case, three). UTF-8 is efficient with the Western alphabet, as most popular characters consume just one byte. It also downgrades easily to ASCII simply by ignoring all bytes above 127. Its disadvantage is that seeking within a stream is troublesome, since a character’s position does not correspond to its byte position in the stream. An alternative is UTF-16 (labeled just “Unicode” in the Encoding class). Here’s how we write the same string with UTF-16:
using (Stream s = File.Create ("but.txt")) using (TextWriter w = new StreamWriter (s, Encoding.Unicode)) w.WriteLine ("but-"); foreach (byte b in File.ReadAllBytes ("but.txt")) Console.WriteLine (b);
The output is then:
255 // Byte-order mark 1 254 // Byte-order mark 2 98 // 'b' byte 1 0 // 'b' byte 2 117 // 'u' byte 1 0 // 'u' byte 2 116 // 't' byte 1 0 // 't' byte 2 20 // '--' byte 1 32 // '--' byte 2 13 // <CR> byte 1 0 // <CR> byte 2 10 // <LF> byte 1 0 // <LF> byte 2
Technically, UTF-16 uses either two or four bytes per character (there are close to a million Unicode characters allocated or reserved, so 2 bytes is not always enough). However, because the C# char type is itself only 16 bits wide, a UTF-16 encoding will always use exactly two bytes per .NET char. This makes it easy to jump to a particular character index within a stream.
UTF-16 uses a two-byte prefix
Download
This site does not store any files on its server. We only index and link to content provided by other sites. Please contact the content providers to delete copyright contents if any and email us, we'll remove relevant links or contents immediately.
Hello! Python by Anthony Briggs(9929)
OCA Java SE 8 Programmer I Certification Guide by Mala Gupta(9805)
The Mikado Method by Ola Ellnestam Daniel Brolund(9794)
Algorithms of the Intelligent Web by Haralambos Marmanis;Dmitry Babenko(8315)
Sass and Compass in Action by Wynn Netherland Nathan Weizenbaum Chris Eppstein Brandon Mathis(7796)
Test-Driven iOS Development with Swift 4 by Dominik Hauser(7774)
Grails in Action by Glen Smith Peter Ledbrook(7706)
The Well-Grounded Java Developer by Benjamin J. Evans Martijn Verburg(7571)
Windows APT Warfare by Sheng-Hao Ma(6996)
Layered Design for Ruby on Rails Applications by Vladimir Dementyev(6723)
Blueprints Visual Scripting for Unreal Engine 5 - Third Edition by Marcos Romero & Brenden Sewell(6597)
Secrets of the JavaScript Ninja by John Resig Bear Bibeault(6429)
Kotlin in Action by Dmitry Jemerov(5077)
Hands-On Full-Stack Web Development with GraphQL and React by Sebastian Grebe(4325)
Solidity Programming Essentials by Ritesh Modi(4089)
Functional Programming in JavaScript by Mantyla Dan(4049)
WordPress Plugin Development Cookbook by Yannick Lefebvre(3878)
Unity 3D Game Development by Anthony Davis & Travis Baptiste & Russell Craig & Ryan Stunkel(3829)
The Ultimate iOS Interview Playbook by Avi Tsadok(3799)
