UTF16 RFC + UTF8 Default Reading and Writing ...

May 20, 2009 at 2:14 PM

This RFC explains the UTF16:

http://www.ietf.org/rfc/rfc2781.txt

Somewhere you will find that UTF16 format takes 1 or 2 16 Bits..
so in theory we can have 32bits but thats not the case..there are ranges that determines how to interpret the character.


Re: UTF8 being the Default enconding and NOT UTF16

Easy, create a streamreader and a streamwriter with only the file name as parameter and check the "Encoding" attribue.

you will see in both cases its UTF8 and not UTF16 as the book mentions.

Case closed...now can I get a refund.

May 21, 2009 at 6:48 PM

I think I did not convey correctly my original point.  I was not trying to reading and writing of streams I was talking the internal representation of a character.

"UTF-16 is often used natively, as in the Microsoft.Net char type, the Windows WCHAR type, and other common types. Most common Unicode code points take only one UTF-16 code point (2 bytes). Unicode supplementary characters U+10000 and greater still require two UTF-16 surrogate code points."

http://msdn.microsoft.com/en-us/library/zs0350fy.aspx

Also mentioned on page 125 of the book.

May 21, 2009 at 8:09 PM

We are both on the same wave length, the original post is targetting both issues.

issue 1: size of a UTF16 char on Disk --> 2 or 4 bytes

issue 2: default Encoding when using Stream--> UTF8