( this is the behavior I want)Ĭonclusion: both times it reads the ASCII string as UTF-8 so the result should have been the same. Now run with unicode on, now it just jumps over é. Wrong Letters 0.9 screenshot replace wrong Unicode letters to correct. Okay, time for some code to really test this stuff.įirst run it with unicode off, then it cuts the string at é. Sure UTF-8 is backwards compatible with ASCII so I just changed his code to read it as ASCII and the problem was solved.īut maybe it would be better for PureBasic to throw an error, or just jump over é. So that's why I started thinking about this. And that caused some very strange behavior in his program. True, well, it's some code found on the forum which my dad (which I'm trying to get to use PureBasic) used when downloading web pages into a string, since most web pages are UTF-8 encoded I guess.īut then apparently he came over a web page which used ASCII and this é character somewhere in the text, but since the é character is ASCII code 233 (the extended range) reading it as UTF-8 is not possible. You can still process ASCII data, read ASCII data converted on the fly to Unicode and create ASCII and UTF8 buffers from Unicode strings, but the string model. => Why do you try to read it as UTF8 string? With Unicode option switched off the string "Mélissa" is not stored as UTF-8 string. regardless of the source string #PB_xxxxx !!! , #PB_xxxxx) will return the string in ASCII or Unicode depending on the compiler option which was used. => Pay attention to the fact that PeekS(. when writing strings to files or network streams. when reading strings from files or network streams. => Use the optional flags (#PB_ASCII, #PB_UTF8, #PB_Unicode, #PB_UTF16. => I suggest to use always UTF-8 for file encoding and "Create unicode executable" as compiler option! There are only rare application areas today were you should use Plain ASCII file encoding and "Create unicode executable" switched to off! => If you will try these tests with different fonts in Editor and Debugger output, the result can be different, if the Debugger font can not display the same characters as the Editor font! Unicode option and UTF-8 encoding will make no sense if you use an non-unicode font to display strings. In your case just use because the ASCII or Unicode characterset (PB_ASCII or PB_Unicode) will be used depending on the compiler option!
0 Comments
Leave a Reply. |
AuthorWrite something about yourself. No need to be fancy, just an overview. ArchivesCategories |