03.02.02
What it does ? Closes all non-closed tags in posts and comments, ensuring that a mistake does not make your whole page italicized, for example.
This balanceTags code is courtesy of Leonard Lin @ http://randomfoo.net - praises to him.
tj_edit - email - url
Hi, Michel.
What charsets does the HTML entities to Unicode convertor support?
05.02.02 @ 02:05:09 336
michel v - email - url
It converts named HTML entities to their Unicode character reference; and that's about it. I guess this means ISO-8859-1, but it surely covers more than that
TJ, if you could test typing some japanese text, I would be glad. Not sure about that.
05.02.02 @ 10:31:26 688
tj_edit - email - url
Michel V:
I ask because mapping the charsets to Unicode (utf-8, in this case) is --in theory-- easily done. But in practice, not so easily done.
Some of the big boys [IBM] have developed applications in C or Perl to do just this: convert documents from charset_x to utf-8 (or utf-16 | 32).
But those applications don't work as interfaces to content management systems. They just batch process the selected docs.
Likewise, you can download as shareware and/or Open Source various charset convertors.
But in my experience thus far, and I would happily be corrected, the ones that work right work only for Latin-1 and the major European langauges | charsets. So that's a bummer.
The Asian | subcontinent languages/charsets are not truly supported in the versions I've tried | tested so far. Again, I would be happily corrected / given new apps to try.
But even that would not give us an interface that would convert charset-x to utf-8 on the fly. That would be really cool!
But it seems to be quite a challenge.
Here are some common
encodings for B2/Cafelog sites:
iso-8859-1 (Western Europe),
iso-8859-2 (Central Europe),
iso-8859-4 (Baltic Rim),
iso-8859-5 (Cyrillic),
windows-1250 (Central Europe), windows-1251 (Cyrillic),
windows-1252 (Western Europe),
windows-1257 (Baltic Rim).
Latin-1 -- iso-8859-1 -- is relatively transparent (a non-issue).
How could these other charsets be automatically mapped/converted to utf-8 (Unicode)?
That's the question for which I do NOT have an answer.
It might depend on how much Unicode support PHP has [at this time].
If it covers those cases, great! If not, I have no idea how to proceed. ;-(
Below, some shift_jis | Japanese. This, I'm told, is a particularly difficult/ugly charset to convert. If it looks like pure gibberish, don't be surprised!
ˆâ“`žq‘gŠ·H•i Genetically Engineered Food
05.02.02 @ 17:46:17 990
michel v - email - url
Actually there are 2 misunderstatements here
1. I don't mean it translates between charsets: I have given up on this at some point. It translates to stuff like — HTML numeric entities.
2. Errrr, I meant posting japanese text in the v0.6 testdrive blog. Though I've just tested, and the Unicode filter kills the japanese text I pasted from sega.jp - looks like I'll have to provide an option to disable it for non-Latin charsets users. It could also be a bug on my end, since I'm using windows-1252 on the computer. I wish for the latter.
05.02.02 @ 19:24:01 058
michel v - email - url
by the way TJ I'm online these days on Y!IM as 'cafelog', but I never see you
05.02.02 @ 19:24:38 058
tj_edit - email - url
Michel:
My fault! I wa jumping from entities to charsets.
As for you "giving up on it" [translating between charsets for web-based applications], HEY:
it's internationalization--i8ln--issue. If you solve it by yourself, great! You'll be famous.
But I suspect that a team of well funded smart people will crack it. I know of some bits and pieces of related projects, but nothing that would apply directly to content management systems| blogs. [That is, the problems described above].
I was just hoping you heard something I did not.
As for YIM, my bad. I'm make sure it's on.
Best to all,
TJH
06.02.02 @ 00:09:25 256
Chief - email - url
Please forgive me typing this here. But I've been having a comments post bug on my install of b2 and you had suggested it might be a broswer problem. So I thought I would post here to see if the problem occured here as well. That way I can let you know in the forum.
15.02.02 @ 15:18:56 888