Session vars...

Animal: Dog
Count: 1

[Another page that uses this session]

UTF-8

This file displays OK if saved as regular ANSI despite the header saying it's UTF-8 because the standard ANSI (single byte) chars are a subset of of UTF-8. However, if I start typing other 'out of range' chars, like the copyright symbol "" ("©") then it's not going to show correctly unless the document is actually saved as UTF-8. "©"

Presumably linked stylesheets don't need to be encoded as UTF-8 unless they contain unicode characters? Unlikely?!


Penders: "Just to note, the document will need to be saved as UTF-8 encoding as well (without a BOM - signature) - so your editor will need to support this. And presumably if you are using a CMS to store content in a DB then the DB will need to support/save as UTF-8 ? "

Why does it need to be saved in UTF-8? The source file I'm using is in ANSI and it displays in HTML with the headers in UTF-8 because I read some values from DB that are in UTF-8. In fact, if I encode the source file to UTF-8 and I use session_start or something that uses the headers, it sends some wierd characters before and fails. So all my source files are encoded in ANSI even though it displays UTF-8 charcaters in portuguese and so on.

Why does it need to be saved in UTF-8?

If you are typing any unicode characters directly into your HTML/PHP document then you will need to save the file as UTF-8. I guess you don't have any 'special' chars in the document itself? You are lucky in this repect as the (single byte) characters you are using are a subset of UTF-8.

The document is ANSI but you are telling the browser to display it as UTF-8. Try typing the copyright symbol (ALT+0169 on Windows), save it as ANSI, tell the browser it's UTF-8 and it won't display correctly. Save it as ANSI, display it as ANSI - OK. Save it as UTF-8, display it as UTF-8 - OK.

I read some values from DB that are in UTF-8.

They are in UTF-8 and you are displaying them as UTF-8 - OK. The rest of the 'ANSI' document shares the same codes as UTF-8 (possibly by chance, since I guess you are not using any out of the ordinary characters, just the regular a-z, A-Z, 0-9 and basic punctuation. Start using curly quotes etc. and it will be a problem.) - but otherwise OK.

A test: These two 'ANSI' characters "π" are in fact the single UTF-8 character for the 'Greek Small Letter Pi' (U+03C0). Change the character encoding in your browser to UTF-8 and you will see the UTF-8 character as intended. The other characters remain the same, yet they are ANSI (but share the same codes as UTF-8). This is how your webpage is coping.

if I encode the source file to UTF-8 and I use session_start or something that uses the headers, it sends some wierd characters before and fails.

Do you get something like:
Warning: session_start(): Cannot send session cache limiter - headers already sent...

This sounds as if you are including the BOM (Byte Order Mark) when you save the file as UTF-8? This must be omitted. The BOM appears in the first 3 bytes of the file (although invisible to you in your text editor when viewed as UTF-8). And importantly before your "<?php ...". Unfortunately, as far as I'm aware, PHP does not understand the BOM. It will treat this as output (some weird characters) before the headers are sent and will consequently fail.

In Notepad++ this is Format > Encode in UTF-8 without BOM

See this recent thread for more info on the BOM (and removing it): http://www.webmasterworld.com/html/3591542.htm

[Home]