Asked  1 Year ago    Answers:  5   Viewed   6 times

Supposed that im encoding my files with UTF-8.

Within PHP script, a string will be compared:

$string="?";
$string = utf8_encode($string); //Do i need this step?
if(preg_match('/?/u',$string))
//Do if match...

Its that string really UTF-8 without the utf8_encode() function? If you encode your files with UTF-8 dont need this function?

 Answers

4

If you read the manual entry for utf8_encode, it converts an ISO-8859-1 encoded string to UTF-8. The function name is a horrible misnomer, as it suggests some sort of automagic encoding that is necessary. That is not the case. If your source code is saved as UTF-8 and you assign "?" to $string, then $string holds the character "?" encoded in UTF-8. No further action is necessary. In fact, trying to convert the UTF-8 string (incorrectly) from ISO-8859-1 to UTF-8 will garble it.

To elaborate a little more, your source code is read as a byte sequence. PHP interprets the stuff that is important to it (all the keywords and operators and so on) in ASCII. UTF-8 is backwards compatible to ASCII. That means, all the "normal" ASCII characters are represented using the same byte in both ASCII and UTF-8. So a " is interpreted as a " by PHP regardless of whether it's supposed to be saved in ASCII or UTF-8. Anything between quotes, PHP simply takes as the literal bit sequence. So PHP sees your "?" as "11100011 10000001 10000010". It doesn't care what exactly is between the quotes, it'll just use it as-is.

Saturday, May 29, 2021
 
Xavio
 
4

It's a pass by reference. The variable inside the function "points" to the same data as the variable from the calling context.

function foo(&$bar)
{
  $bar = 1;
}

$x = 0;
foo($x);
echo $x; // 1
Saturday, May 29, 2021
 
LOKESH
 
3
var $ = "some value we don't care about";

 // v=====normal plain old function
(function ($) {
 //        ^=======receives jQuery object as the $ parameter

    //majority of code here, where $ === jQuery...

    $('.myclass').do().crazy().things();


})(jQuery);
 //  ^=======immediately invoked, and passed the jQuery object


 // out here, $ is undisturbed
alert( $ ); // "some value we don't care about"
Friday, June 25, 2021
 
RahulG
 
5

Did you save the php-file without BOM? If not, try it. Potential issues with the UTF-8 BOM


Further try with 'utf-8' using single quotes and without SET CHARACTER_SET

mysql_query("SET NAMES 'utf8'");

and with charset utf-8 in the html-document header:

header("content-type: text/html; charset=utf-8");
Saturday, August 14, 2021
 
MKM
 
MKM
1

I agree with the previous answers that UTF-8 is a good choice for most applications.

Beware the traps that might be awaiting you, though! You'll want to be careful that you use a consistent character encoding throughout your system (input forms, output web pages, other front ends that might access or change the data).

I have spent some unpleasant hours trying to figure out why a simple β or é was mangled on my web page, only to find that something somewhere had goofed up an encoding. I've even seen cases of text that gets run through multiple encoders--once turning a single quotation mark into eight bytes.

Bottom line, don't assume the correct translation will be done; be explicit about character encoding throughout your project.

Edit: I see in your update you've already started to discover this particular joy. :)

Tuesday, December 28, 2021
 
Only authorized users can answer the question. Please sign in first, or register a free account.
Not the answer you're looking for? Browse other questions tagged :