b2 [project page / test blog] [login] [register]

[ This is a test blog, with posts about the development of b2, and comments ]

[ Bugs/suggestions ? Check the Forums ! ]


In addition to the HTML entities to Unicode converter, there is now an HTML corrector in b2 (in v0.6pre).
What it does ? Closes all non-closed tags in posts and comments, ensuring that a mistake does not make your whole page italicized, for example.
This balanceTags code is courtesy of Leonard Lin @ http://randomfoo.net - praises to him.
michel v @ 02:20:42 472
7 comments, no trackback, no pingback


:: comments


tj_edit - email - url
Hi, Michel.

What charsets does the HTML entities to Unicode convertor support?

05.02.02 @ 02:05:09 461


michel v - email - url
It converts named HTML entities to their Unicode character reference; and that's about it. I guess this means ISO-8859-1, but it surely covers more than that ; ) 
TJ, if you could test typing some japanese text, I would be glad. Not sure about that.
05.02.02 @ 10:31:26 813


tj_edit - email - url
Michel V:

I ask because mapping the charsets to Unicode (utf-8, in this case) is --in theory-- easily done. But in practice, not so easily done.

Some of the big boys [IBM] have developed applications in C or Perl to do just this: convert documents from charset_x to utf-8 (or utf-16 | 32).

But those applications don't work as interfaces to content management systems. They just batch process the selected docs.

Likewise, you can download as shareware and/or Open Source various charset convertors.

But in my experience thus far, and I would happily be corrected, the ones that work right work only for Latin-1 and the major European langauges | charsets. So that's a bummer.

The Asian | subcontinent languages/charsets are not truly supported in the versions I've tried | tested so far. Again, I would be happily corrected / given new apps to try.

But even that would not give us an interface that would convert charset-x to utf-8 on the fly. That would be really cool!

But it seems to be quite a challenge.

Here are some common
encodings for B2/Cafelog sites:
iso-8859-1 (Western Europe),
iso-8859-2 (Central Europe),
iso-8859-4 (Baltic Rim),
iso-8859-5 (Cyrillic),
windows-1250 (Central Europe), windows-1251 (Cyrillic),
windows-1252 (Western Europe),
windows-1257 (Baltic Rim).

Latin-1 -- iso-8859-1 -- is relatively transparent (a non-issue).

How could these other charsets be automatically mapped/converted to utf-8 (Unicode)?

That's the question for which I do NOT have an answer.

It might depend on how much Unicode support PHP has [at this time].

If it covers those cases, great! If not, I have no idea how to proceed. ;-(

Below, some shift_jis | Japanese. This, I'm told, is a particularly difficult/ugly charset to convert. If it looks like pure gibberish, don't be surprised!

ˆâ“`žq‘gŠ·H•i Genetically Engineered Food
05.02.02 @ 17:46:17 115


michel v - email - url
Actually there are 2 misunderstatements here ; ) 
1. I don't mean it translates between charsets: I have given up on this at some point. It translates to stuff like — HTML numeric entities.
2. Errrr, I meant posting japanese text in the v0.6 testdrive blog. Though I've just tested, and the Unicode filter kills the japanese text I pasted from sega.jp - looks like I'll have to provide an option to disable it for non-Latin charsets users. It could also be a bug on my end, since I'm using windows-1252 on the computer. I wish for the latter.
05.02.02 @ 19:24:01 183


michel v - email - url
by the way TJ I'm online these days on Y!IM as 'cafelog', but I never see you : ( 
05.02.02 @ 19:24:38 183


tj_edit - email - url

My fault! I wa jumping from entities to charsets.

As for you "giving up on it" [translating between charsets for web-based applications], HEY:

it's internationalization--i8ln--issue. If you solve it by yourself, great! You'll be famous.

But I suspect that a team of well funded smart people will crack it. I know of some bits and pieces of related projects, but nothing that would apply directly to content management systems| blogs. [That is, the problems described above].

I was just hoping you heard something I did not.

As for YIM, my bad. I'm make sure it's on.

Best to all,

06.02.02 @ 00:09:25 381


Chief - email - url
Please forgive me typing this here. But I've been having a comments post bug on my install of b2 and you had suggested it might be a broswer problem. So I thought I would post here to see if the problem occured here as well. That way I can let you know in the forum.
15.02.02 @ 15:18:56 013


:: leave a comment





your comment

Auto-BR (line-breaks become <br> tags)


[powered by b2.]

march 2004
january 2004
december 2003
november 2003
october 2003
july 2003
june 2003
may 2003
march 2003
november 2002
october 2002
september 2002
august 2002
july 2002
june 2002
may 2002
april 2002
march 2002
february 2002
january 2002
december 2001
november 2001
october 2001
september 2001
august 2001
july 2001
june 2001

What is b2 ?
A classy news/weblog tool (aka logware).

How does it work ?
You type something and hit "blog this" and in the next second it's on your page(s). You can write extended entries, or even entries that span multiple pages. You can also use BloggerAPI clients to post to your b2 weblog.
What's original in b2 ? Pages are generated dynamically from the MySQL database, so no clumsy 'rebuilding' is involved. It also means faster search/display capabilities, and the ability to serve your news in different 'templates' without any hassle.

Requirements ?
A server that can run PHP4, and a MySQL database (you can install b2 in an already existing database, and you can put several b2's in one database).

Where can I download it ?
b2 0.6 is the latest public release.
You can also visit the CVS server for the latest code, at your own risks.
See the ReadMe file for requirements and installation instructions.

Contact info ?
E-mail: m@tidakada.com
Forums: over there. :)

Post categories:



They are powered by b2:

e-mail me when you install b2 on your site, include your URL to be linked here.


Recently updated b2 weblogs:

To be included in that list whenever you post to your weblog, please use b2 v0.6 or later, and then e-mail update@tidakada.com with: your site's name, URL, e-mail, and a password. You will then receive an e-mail with an ID string that you'll have to paste in your b2config.php file. And then you'll be linked there :)


<< # [powered by b2] ? >>


If you like b2, please rate it at HotScripts.com