Forums > Bugs > Mystery characters

user avatar

-Chris (7757) on 4/6/2003 10:26 PM · Permalink · Report

Argh. My entry of The Games '92 suffers from the "mysterious character" bug in the title. I'm absolutely positive that this showed up correctly when I submitted the game. Sigh. Could somebody please correct this?

user avatar

Brian Hirt (10408) on 4/14/2003 5:38 PM · Permalink · Report

I'll look into this, but it's not a bug per se. MobyGames is just storing the information sumbitted, which in this case appears to be an encoding MobyGames doesn't support. All the information in mobygames is stored in a character set called Western (ISO-8859-1). That's what it gets displayed back as well. For some reason certain browers are submitting information in a different character set causing this problem. I'm trying to find out if there is some way we can tell browers what character set to submit information in.

What browser, language and operation system are you using?

What's the proper name of the game, i'll fix it.

user avatar

-Chris (7757) on 4/14/2003 9:02 PM · Permalink · Report

The correct title should be The Games '92 - España, with a wave (wosname?) above the "n".

I'm running the 6th edition Explorer on XP, German language setting.

If I understand this correctly, information is stored in MobyGames in the character set that it was submitted in, and cannot be corrected automatically. The strange thing is that the mystery characters turned up from one day to the next in texts that had formerly been displayed correctly for me. Is there not a way to, like, undo this change?

user avatar

Brian Hirt (10408) on 4/14/2003 9:52 PM · Permalink · Report

I just looked at the game audit. It seems like the title got funky when Jeanne changed the title from "The Games '92 - España" to "Games '92 - España, The" Chris, This would explain why it looked correct and then mysteriously looked different later.

Jeanne, which browser are you using? I'm just looking for clues here.

user avatar

quizzley7 (21493) on 4/15/2003 12:26 AM · Permalink · Report

Here's the deal:

The browsers are uploading UTF-8 encoded text. UTF-8 is the same thing as ISO-8859-1 for characters 0x00-0x8F, but it is different for 0x10 and beyond. So here's what's happening:

1) User uses a browser that uploads UTF-8, and submits text that contains a character greater than 0x8F, such as ñ (0xF1 in ISO-8859-1). In UTF-8, this character gets translated to 0xC3 0xB1. The algorithm for converting from ISO-8859-1 to UTF-8 is pretty simple... a google search will turn it up for you quickly.

2) The user's text is stored in the database.

3) A page request retrieves the text from the database.

4) The HTML generated by MobyGames is delivered back to the user's browser. The encoding is assumed to be, or set to, ISO-8859-1. The browser dutifully displays ñ (0xC3 0xB1), rather than enterpreting it as UTF-8.

You can verify this for yourself. Go to a page that has the goofy characters in it. Change your browser's character set for that page from ISO-8859-1 to UTF-8. The characters magically appear correct!

Thus, you can either force all uploaded text to be ISO-8859-1 somehow, or make all of MobyGames return pages that specify UTF-8 encoding. I think UTF-8 would be the better way to go, since it can represent the entire Unicode character set, versus only 256 characters for ISO-8859-1.

user avatar

Brian Hirt (10408) on 4/15/2003 12:39 AM · Permalink · Report

I would like to use UTF-8, and it's something I've looked into, but it needs to be done at a much lower level than the web-page. I need to be able to order and search on strings in the database, and examine strings in the code. Converting the database to UTF-8 will, increases the size of the database, add more id, slow down string queries and consume more disk space. I've read mixed reviews from the Postgresql database users about having a UTF-8 dataabes. Also, the language I program in (perl) didn't properly deal with UTF-8 until pretty recently. I think long term, I'll probably need to switch, but short term I want to figure out a way to make things work within the current framework.

The main thing I'm trying to figure out right now is why some browsers submit UTF-8 and others submit ISO-8859-1. If my browser was submitting UTF-8 I would not have been able to correct the spelling of the entry. There has to be a way to tell the browser what kind of encoding information should be submitted in. I know there is a way to tell it which encoding to display in, and we tell it 8859-1.

Simply changing the web page to UTF-8 won't fix the problem, because then the 99% of the ISO-8859-1 encoded stuff in the database won't display correctly.

user avatar

quizzley7 (21493) on 4/15/2003 2:07 AM · Permalink · Report

Hmm, how about the ACCEPT-CHARSET attribute to the FORM element?

http://www.htmlhelp.com/reference/html40/forms/form.html

user avatar

Chris Martin (1155) on 4/18/2003 12:52 AM · Permalink · Report

The main thing I'm trying to figure out right now is why some browsers submit UTF-8 and others submit ISO-8859-1.

Wouldn't that depend on the country that the computer is set to? For instance, if a user is in Germany and types on a German computer (utilizing the German charcter set), but then submits something in English (still using the German Character set), wouldn't that automatically use the German character set during the submission process?

user avatar

Jeanne (75931) on 4/15/2003 11:54 AM · Permalink · Report

I'm using whatever is standard in Microsoft Internet Explorer 6.0 in English. I assume it's Western something or other. Where do I look to find out?

user avatar

Brian Hirt (10408) on 4/15/2003 2:35 PM · Permalink · Report

hmm... i guess i'll just try the setting quizzley7 mentioned and see what happens. However, i was reading on w3c.org that all data should be submitted in the same encoding that is sent by the browser which make the mysterious characters more mysterious.

user avatar

-Chris (7757) on 4/16/2003 12:54 PM · Permalink · Report

Sorry for being nitpicking, but there's a space missing between the comma and the "The" in the title...

http://www.mobygames.com/game/sheet/gameId,8810/

user avatar

Brian Hirt (10408) on 4/16/2003 2:11 PM · Permalink · Report

fixed