PHP Internals News: Episode 98: Deprecating utf8_encode and utf8_decode PHP Internals News

    • テクノロジー

PHP Internals News: Episode 98: Deprecating utf8_encode and utf8_decode


Thursday, March 3rd 2022, 09:02 GMT

London, UK



In this episode of "PHP Internals News" I chat with Rowan Tommins (GitHub, Website, Twitter) about the "Deprecate and Remove utf8_encode and utf8_decode" RFC.

The RSS feed for this podcast is https://derickrethans.nl/feed-phpinternalsnews.xml, you can download this episode's MP3 file, and it's available on Spotify and iTunes. There is a dedicated website: https://phpinternals.news


Transcript

Derick Rethans 0:14

Hi, I'm Derick. Welcome to PHP Internals News, a podcast dedicated to explaining the latest developments in the PHP language. This is episode 98. Today I'm talking with Rowan Tommins about the "Deprecate and remove UTF8_encode and UTF8_decode" RFC that he's proposing. Hi, Rowan, would you please introduce yourself?


Rowan Tommins 0:38

Hi, I'm Rowan Tommins. I'm a PHP software architect by day and try and contribute back to the community and have been hanging around in the internals mailing list for about 10 years and contributed to make the language better, where I can.


Derick Rethans 0:57

Excellent. Yeah, that's how I started out as well, many, many more years before that, to be honest. This RFC, what problem is this trying to solve?


Rowan Tommins 1:08

PHP has these two functions, utf8_encode and utf8_decode, which, in themselves, they're not broken. They do what they are designed to do. But they are very frequently misunderstood. Mostly because of their name. And because Character Encodings in general, are not very well understood. People use them wrong, and end up getting in all sorts of pickles that are worse than if the functions weren't there in first place.


Derick Rethans 1:37

What are you proposing with the RFC then?


Rowan Tommins 1:39

Fundamentally, I'm proposing to remove the functions. As of PHP 8.2, there will be a deprecation notice whenever you use them, and then in 9.0, they would be gone forever, and you wouldn't be able to use them by mistake, because they just wouldn't be there.


Derick Rethans 1:56

I reckon there's going to be a way to actually do what people originally intended to do with it at some point, right?


Rowan Tommins 2:02

So yeah, there are alternatives to these functions, which are much clearer in what you're doing, and much more flexible in what you can do with them so that they cover the cases that these functions sound like they're going to do, but don't actually do when you understand what they're really doing.


Derick Rethans 2:20

I think we'll get back to that a little bit later on. You're wanting to deprecate these functions. But what do these functions actually do?


Rowan Tommins 2:27

What they actually do is convert between a character encoding called Latin-1, ISO 8859-1, and UTF-8. So utf8_encode converts from Latin-1 into UTF-8, utf8_decode does the opposite. And that's all they do. Their names make it sound like they're some kind of fix all the UTF 8 things in my text. But they are actually just these one very specific conversion, which is occasionally useful, but not clear from their names.


Derick Rethans 3:01

It's certainly how I have seen it used in the past, where people just throw everything and the kitchen sink at it, and expecting it to be valid UTF 8, and then at the end, decode. I mean, the decoding was not even part much of this, right? It's just throw everything at it, and t

PHP Internals News: Episode 98: Deprecating utf8_encode and utf8_decode


Thursday, March 3rd 2022, 09:02 GMT

London, UK



In this episode of "PHP Internals News" I chat with Rowan Tommins (GitHub, Website, Twitter) about the "Deprecate and Remove utf8_encode and utf8_decode" RFC.

The RSS feed for this podcast is https://derickrethans.nl/feed-phpinternalsnews.xml, you can download this episode's MP3 file, and it's available on Spotify and iTunes. There is a dedicated website: https://phpinternals.news


Transcript

Derick Rethans 0:14

Hi, I'm Derick. Welcome to PHP Internals News, a podcast dedicated to explaining the latest developments in the PHP language. This is episode 98. Today I'm talking with Rowan Tommins about the "Deprecate and remove UTF8_encode and UTF8_decode" RFC that he's proposing. Hi, Rowan, would you please introduce yourself?


Rowan Tommins 0:38

Hi, I'm Rowan Tommins. I'm a PHP software architect by day and try and contribute back to the community and have been hanging around in the internals mailing list for about 10 years and contributed to make the language better, where I can.


Derick Rethans 0:57

Excellent. Yeah, that's how I started out as well, many, many more years before that, to be honest. This RFC, what problem is this trying to solve?


Rowan Tommins 1:08

PHP has these two functions, utf8_encode and utf8_decode, which, in themselves, they're not broken. They do what they are designed to do. But they are very frequently misunderstood. Mostly because of their name. And because Character Encodings in general, are not very well understood. People use them wrong, and end up getting in all sorts of pickles that are worse than if the functions weren't there in first place.


Derick Rethans 1:37

What are you proposing with the RFC then?


Rowan Tommins 1:39

Fundamentally, I'm proposing to remove the functions. As of PHP 8.2, there will be a deprecation notice whenever you use them, and then in 9.0, they would be gone forever, and you wouldn't be able to use them by mistake, because they just wouldn't be there.


Derick Rethans 1:56

I reckon there's going to be a way to actually do what people originally intended to do with it at some point, right?


Rowan Tommins 2:02

So yeah, there are alternatives to these functions, which are much clearer in what you're doing, and much more flexible in what you can do with them so that they cover the cases that these functions sound like they're going to do, but don't actually do when you understand what they're really doing.


Derick Rethans 2:20

I think we'll get back to that a little bit later on. You're wanting to deprecate these functions. But what do these functions actually do?


Rowan Tommins 2:27

What they actually do is convert between a character encoding called Latin-1, ISO 8859-1, and UTF-8. So utf8_encode converts from Latin-1 into UTF-8, utf8_decode does the opposite. And that's all they do. Their names make it sound like they're some kind of fix all the UTF 8 things in my text. But they are actually just these one very specific conversion, which is occasionally useful, but not clear from their names.


Derick Rethans 3:01

It's certainly how I have seen it used in the past, where people just throw everything and the kitchen sink at it, and expecting it to be valid UTF 8, and then at the end, decode. I mean, the decoding was not even part much of this, right? It's just throw everything at it, and t

テクノロジーのトップPodcast

ゆるコンピュータ科学ラジオ
ゆるコンピュータ科学ラジオ
Rebuild
Tatsuhiko Miyagawa
backspace.fm
backspace.fm
GoriPod - ゴリミーポッドキャスト
g.O.R.i
Joi Ito's Podcast
伊藤穰一
Off Topic // オフトピック
Off Topic