Go to bug ID
Hello, guest. We have noticed that you are not registered at this bug tracker. Your experience will be greatly enhanced if you log in. To do so, you first must register by clicking on the Register tab at the top. If you are already registered, you can login at the Login tab.
Syndicate Syndicate Listing Display Search Login/Register
Bug Id ?
Reporter ?
Pvt_Ryan
Product/Version ?
Crimson Editor / Version 3.72 (beta, prior to r241)
Status ?
Confirmed
Severity ?
Enhancement
Duplicate Of ?
- none -
Summary ?
php & HTML default to ASCII and cannot detect UTF-8 unless reloaded
Report Time ?
April 10, 2007 10:34:13 AM
Assignment ?
- none -
Resolution ?
Open
Priority ?
Low
Dependencies ?
- none -


Votes
For: 1 (100%)
Against: 0 (0%)
Total: 1

April 10, 2007 10:34:13 AM Pvt_Ryan
--------------
Arantor Wrote:
--------------
One of the recent queries raised on crimsoneditor.com was regarding defaulting to UTF-8 (or not) on file load. The original query is posted below for anyone who wants to see what exactly was asked, but my take on it is as follows.

CE currently allows UTF-8 files to be loaded (and presumably saved) as UTF-8 as well as ASCII. It also offers big/little endian, and UTF-8 with and without BOM, which is where the query below came in.

It defaults to ASCII, but doesn't seem to have any mechanism to either differentiate when it should use UTF-8, or to override with your own default.

So, can we include somewhere a dropdown (combobox) in Preferences (under File, presumably) to specify the default choice?
--------------
--------------



July 5, 2007 12:24:01 PM Raf
"CE currently allows UTF-8 files to be loaded (and presumably saved) as UTF-8 as well as ASCII"

This is true, as long as there are no characters being used that are outside the ASCII charset. That is, the file will still be UTF-8, but any "funny" characters will be saved as question marks.

July 13, 2007 04:14:10 PM Ankit Singla
when you say "saved as ?", Raf, do you mean that they get saved as the character "?", or they show up as "?" but get saved as the original character?

July 14, 2007 11:20:02 AM Raf
Yes, I mean they actually get saved as question marks. Even if I copy some characters from something encoded as UTF-8, such as these Hebrew characters I got off Wikipedia (עברית), they get pasted into CE as question marks. The same with Greek characters, or anything outside of ISO-8859-1, it seems. I tried this with CE's UTF-8 encoding type with and without the BOM and it's the same. The result is also the same when opening a file saved as UTF-8 elsewhere, like Notepad (which, incidentally, saves with a BOM).