Switched to a new database and now some characters have turned to junk!?
Posted: 24 April 2008 01:53 PM   [ Ignore ]  
Lab Technician
Avatar
RankRankRankRank
Total Posts:  1570
Joined  08-02-2006

http://test.ksba.org/insurance

You can see on the page above that some apostrophes, ampersands, commas and other characters have been turned into a black diamond with a question mark in the center.

Can anyone explain why this is happening and how to fix it? This happened the last time I moved to a new database with my new host and I had to manually go back in and edit every article and I REALLY don’t want to do that again.

 Signature 

Deron Sizemore
==========
Random Jabber | Sizcons | NiceStylesheet | Kentucky Golfing | LogoGala
Twitter: Deron Sizemore | LogoGala

Profile
 
 
Posted: 24 April 2008 02:06 PM   [ Ignore ]   [ # 1 ]  
Moderator
Avatar
RankRankRankRankRankRankRank
Total Posts:  15489
Joined  05-15-2004

I would like to say “an encoding issue, what else?”, but your case looks rather weird. None of the usual tell-tale signs. I actually only see two apostrophes, and they are not mangled in any recognizable way. Do you have a page with more such characters?

 Signature 

Everything will be good in the end. If it’s not good, it’s not the end.

Profile
MSG
 
 
Posted: 24 April 2008 02:10 PM   [ Ignore ]   [ # 2 ]  
Lab Technician
Avatar
RankRankRankRank
Total Posts:  1570
Joined  08-02-2006

Yup, here are few more pages:

http://test.ksba.org/news
http://test.ksba.org/board-team-development
http://test.ksba.org/governmental-relations

Seems like it may just be ampersands and apostrophes from the pages that I’ve seen…

I figured it was an encoding issue too, but to be honest, I’m not familiar with all the different encoding values and such. I really don’t know what I would need to change to alleviate this problem now and going forward so that it doesn’t happen again?

 Signature 

Deron Sizemore
==========
Random Jabber | Sizcons | NiceStylesheet | Kentucky Golfing | LogoGala
Twitter: Deron Sizemore | LogoGala

Profile
 
 
Posted: 24 April 2008 03:53 PM   [ Ignore ]   [ # 3 ]  
Moderator
Avatar
RankRankRankRankRankRankRank
Total Posts:  15489
Joined  05-15-2004

It’s also dashes, but it’s still weird. I would like you to try one thing for me (after making a backup, of course.) Bring up a page in your browser where you see it happening, and copy the character. Then go to CP Home ›  Admin ›  Utilities ›  Find and Replace, and paste the character in the upper field, and the “correct” replacement (dash, quote, whatever) in the lower. Choose where the replacement should take place, and run it. Then reload the site and tell me if that changes anything.

 Signature 

Everything will be good in the end. If it’s not good, it’s not the end.

Profile
MSG
 
 
Posted: 24 April 2008 07:17 PM   [ Ignore ]   [ # 4 ]  
Lab Technician
Avatar
RankRankRankRank
Total Posts:  1570
Joined  08-02-2006

Thanks for the tips. I’ll give those a try tomorrow at work. I’m curious though, since the problem isn’t just with one particular character, if the find/replace does work, I could essentially have some commas (or other characters) in the wrong place?

 Signature 

Deron Sizemore
==========
Random Jabber | Sizcons | NiceStylesheet | Kentucky Golfing | LogoGala
Twitter: Deron Sizemore | LogoGala

Profile
 
 
Posted: 24 April 2008 11:44 PM   [ Ignore ]   [ # 5 ]  
Moderator
Avatar
RankRankRankRankRankRankRank
Total Posts:  15489
Joined  05-15-2004
deronsizemore - 24 April 2008 07:17 PM

Thanks for the tips. I’ll give those a try tomorrow at work. I’m curious though, since the problem isn’t just with one particular character, if the find/replace does work, I could essentially have some commas (or other characters) in the wrong place?

I don’t think so. Just because you are seeing the same, genereic character does not mean it is the same character. It’s just that the system can’t display it, because it’s missing fromt he installed fonts, so it displays this generic character.

But: As always with any database issues, we make a backup first, so we can easily roll back if something happens.

 Signature 

Everything will be good in the end. If it’s not good, it’s not the end.

Profile
MSG
 
 
Posted: 25 April 2008 06:40 AM   [ Ignore ]   [ # 6 ]  
Lab Technician
Avatar
RankRankRankRank
Total Posts:  1570
Joined  08-02-2006

Ok, we’ll see what happens. I guess I just misunderstood how the find and replace worked. I thought if I copied/pasted that generic character (diamond with question mark) into the find field and then put a comma in the replace field, it would go out and find every instance of that generic character and replace it with a comma, which wouldn’t work since other characters besides commas are getting messed up.

I’ll post back with my results hopefully in a little bit. smile

 Signature 

Deron Sizemore
==========
Random Jabber | Sizcons | NiceStylesheet | Kentucky Golfing | LogoGala
Twitter: Deron Sizemore | LogoGala

Profile
 
 
Posted: 25 April 2008 06:50 AM   [ Ignore ]   [ # 7 ]  
Moderator
Avatar
RankRankRankRankRankRankRank
Total Posts:  12433
Joined  04-29-2002

Deron, just as a warning. Make sure you back up your database.

 Signature 

Quick Reference - EE Trial Options - EE Wiki - Docs for updating a build

Profile
MSG
 
 
Posted: 25 April 2008 08:05 AM   [ Ignore ]   [ # 8 ]  
Lab Assistant
RankRank
Total Posts:  131
Joined  04-02-2006

Deron,

Why not make a copy of the old database backup and open it in a *plain text* editor.
You’ll probably find a statement like CHARSET right at the beginning.
This tells you the Character Set that the old database was using.

If you look at the Database Control Panel on the new server then you will see
(1) If you are allowed to create a new database, or if one has been created for you and you can’t change it.
(2) If the database has already been created for you, and you can’t change it,
then the Control Panel will tell you what Character Set the new database is using.

This is the information that you need—(1) CHARSETs of old database and new database, also
(2) whether your new host allows you to create a new database with the same CHARSET as the old database.

Profile
 
 
Posted: 25 April 2008 09:38 AM   [ Ignore ]   [ # 9 ]  
Lab Technician
Avatar
RankRankRankRank
Total Posts:  1570
Joined  08-02-2006

Thanks everyone for the suggestions. We’ve got multiple backups created of the database and our host also has a backup, so hopefully we’re squared away in that regard.

KeithW:

I found this information. Our old database was UTF-8 and the new database was Latin1 after I backed each of them up and viewed in a plain text editor. So, we had our host delete the new database with the wrote CHARSET and then we had them create a new database. At that time, we went in and imported our old database with the correct CHARSET and during the import process we made sure to select UTF-8. After these changes, we are still having this same problem. So… back to the drawing board on this one. Our host is working with the database right so I can’t do much, but I’m still planning on giving Ingmar’s suggests for find/replace a try whenever I can.

 Signature 

Deron Sizemore
==========
Random Jabber | Sizcons | NiceStylesheet | Kentucky Golfing | LogoGala
Twitter: Deron Sizemore | LogoGala

Profile
 
 
Posted: 25 April 2008 10:23 AM   [ Ignore ]   [ # 10 ]  
Lab Technician
Avatar
RankRankRankRank
Total Posts:  1570
Joined  08-02-2006

Our host seems to have solved this problem. I’m going to attempt to explain what happened and apologize in advance if my terminology is wrong because as I stated above, I don’t have much experience with SQL and the different charsets and what not.

So, the situation was that we went from MySQL Version 3.23.53 to 4.1. This required that we simply move our data to a new database. When our host imported the data from the old database to the new database, some special characters (dashes, commas, colons, etc) were rendering as black diamonds with white question marks in the center.

According to our host, the problem ended up being that initially, they imported the data from “latin_general” and converted it to UTF-8, but evidently, the original database wasn’t in “latin_general.” After quite a bit of testing they tried importing the data as “latin1_swedish_ci” and converted to UTF-8 and that fixed all of the problems with the special characters.

Maybe someone will understand that better than I do? Like I said, it’s all a bit confusing to me, but it seems to be fixed now and that’s all that I really care about. If anyone can put what really happened in baby talk so that I can understand it, I’d appreciate it. wink

 Signature 

Deron Sizemore
==========
Random Jabber | Sizcons | NiceStylesheet | Kentucky Golfing | LogoGala
Twitter: Deron Sizemore | LogoGala

Profile
 
 
Posted: 25 April 2008 03:30 PM   [ Ignore ]   [ # 11 ]  
Moderator
Avatar
RankRankRankRankRankRankRank
Total Posts:  15489
Joined  05-15-2004
deronsizemore - 25 April 2008 10:23 AM

According to our host, the problem ended up being that initially, they imported the data from “latin_general” and converted it to UTF-8, but evidently, the original database wasn’t in “latin_general.” After quite a bit of testing they tried importing the data as “latin1_swedish_ci” and converted to UTF-8 and that fixed all of the problems with the special characters.

Well, it didn’t look quite like your regular run of the mill encoding issue.

Maybe someone will understand that better than I do? Like I said, it’s all a bit confusing to me, but it seems to be fixed now and that’s all that I really care about.

To be honest, it’s a bit confusing to me, and I am considered the resident expert on such issues smile But as you say: whatever works, ultimately.

 Signature 

Everything will be good in the end. If it’s not good, it’s not the end.

Profile
MSG
 
 
Posted: 25 April 2008 09:57 PM   [ Ignore ]   [ # 12 ]  
Lab Technician
Avatar
RankRankRankRank
Total Posts:  1570
Joined  08-02-2006

Ingmar,

The weird thing is that I had this very same problem when I moved my blog from my old host: A Small Orange to Engine Hosting. I had no idea what was going on at that time (even less than now) and I ended up manually going into the templates and entries and changing these weird diamond characters to the correct characters. Spent a lot of time on that and think I still missed some of my older articles.

Seems to just be an issue with changing servers or database versions. I took a look at the databases for my three sites on EngineHosting. My first site (my blog which was moved from the old host) is set to Latin1_Swedish_ci and the other two of my sites which were create initially with EngineHosting are set to UTF8_general_ci. So, I’m not sure what relevance that has to anything, but it seems that when transferring it has to be set to swedish, but even then I still had this problem which like I said above, I just changed each one manually.

 Signature 

Deron Sizemore
==========
Random Jabber | Sizcons | NiceStylesheet | Kentucky Golfing | LogoGala
Twitter: Deron Sizemore | LogoGala

Profile
 
 
Posted: 26 April 2008 04:15 AM   [ Ignore ]   [ # 13 ]  
Lab Assistant
RankRank
Total Posts:  131
Joined  04-02-2006

Surely all that is relevant is the character set, usually Latin1 or UTF8.
The order of the characters for sorting (swedish or general) should not be relevant.

When you (or your provider) backs up a database, if the character set of the backup
is different from the original character set of the database, then there’s
a possibility that characters will be garbled. If you keep the same character set
when backing up and when recreating the database on a new server, then there’s
much less potential for disaster wink

Assuming that all providers understand database character sets
is like assuming that all providers will backup your data and
provide you with a copy when you need it :
if the user doesn’t ask the provider the right questions, then
the user is likely to find that that the data is mangled or lost wink

Profile
 
 
Posted: 26 April 2008 10:42 AM   [ Ignore ]   [ # 14 ]  
Lab Technician
Avatar
RankRankRankRank
Total Posts:  1570
Joined  08-02-2006

Hi Keith, thanks for the explanation.

I’m curious though. My only experience with backing up my database comes from the EE admin panel (which has since been removed) and through phpMyAdmin. Through phpMyAdmin, I click on the correct database and then click the “export” tab. Can you tell me where I will specify the character set of the backup so that I can match it up with the character set of the current database so I can try to avoid these problems in the future? I don’t see an option to specify character set when exporting.

Thanks

 Signature 

Deron Sizemore
==========
Random Jabber | Sizcons | NiceStylesheet | Kentucky Golfing | LogoGala
Twitter: Deron Sizemore | LogoGala

Profile
 
 
Posted: 26 April 2008 08:10 PM   [ Ignore ]   [ # 15 ]  
Lab Assistant
RankRank
Total Posts:  131
Joined  04-02-2006

I’ve had the same problem (a provider messing up the character set of a database when moving it to a new server).
In my case some of the tables in the database were in English, and some were in Japanese.

Many hosting companies do not allow users to create a database—they give the user a set number of databases,
often only one database, and they decide the default character set when they create the database (the user has
no control over this). With MySQL 3.x or 4.0, it’s not possible to specify a different character set for tables
in the database, and the database character set (CHARSET) is likely to be set to Latin1 by default.

However it *is* possible to specify a different CHARSET for displaying the data in phpMyAdmin.
If you set the display character set to UTF-8 Japanese, for example, and the “MySQL connection”
setting is also UTF-8, then the actual data being stored in the database or retrieved from the database
is UTF-8, even though the provider has specified the MySQL CHARSET as “Latin1”.

Now suppose that you (or your provider) go to back up your data. 
For MySQL 3.x and 4.0, if the database character set has been preset to “Latin1”, then phpMyAdmin
thinks that the data is “Latin1” unless you override it. If you try to back up the UTF-8 Japanese tables
without changing the connection CHARSET to UTF-8 (the same as when the data was stored) then
the data is going to be mangled.

I think that probably even if you change the connection CHARSET to UTF-8, the UTF-8 backup file is still
going to contain a statement that the database was created as “Latin1”—because that’s what MySQL says it is.

Likewise when you (or your provider) use phpMyAdmin to restore the data or move it to a new server.
For MySQL 4.1 or later, even if you have no control over the default CHARSET of the database (because
the hosting company creates the single database you get) you can set the CHARSET of tables in the database.
But unless you explicitly set the table CHARSET (and display/connection CHARSET, if necessary)
to the same as the CHARSET of the (UTF-8) backup when you create the table (and INSERT the data)
the data may get mangled at this step.

The ideal solution is to set up Apache, PHP and MySQL on your own (company or home) Windows, Mac, or LINUX machine.
Use the same, or similar, versions of PHP and MySQL as your host. Then you can test any proposed changes locally
rather than on a system that is live. And you can test backups from the host, to make sure that your backup
has not been lost or mangled.  HTH

Profile
 
 
Posted: 27 April 2008 10:15 AM   [ Ignore ]   [ # 16 ]  
Moderator
Avatar
RankRankRankRankRankRankRank
Total Posts:  23547
Joined  05-20-2002

Very nice overview- from a couple of folks.  It can be a giant pain when this gets off- if you can set it up from the start to be consistent across the board, life is made much easier.

Think I’ll go ahead and close this out, since the original issue is solved.  If someone does come across it in the future and has a related problem, make sure to link back.

 Signature 

AKA rob1

Help Request TipsPro Network

Profile
 
 
   
 
 
Post Marker Legend
New Topic New posts Hot Topic Hot Topic with new posts New Poll New Poll Moved Topic Moved Topic Sticky Topic Sticky topic
Old Topic No new posts Hot Old Topic Hot Topic with no new posts Old Poll Old Poll Closed Topic Closed Topic Announcement Announcements
Theme
Change Theme
Visitor Statistics
The most visitors ever was 1149, on July 16, 2007 09:33 AM
Total Registered Members: 65087 Total Logged-in Users: 34
Total Topics: 82226 Total Anonymous Users: 23
Total Replies: 441924 Total Guests: 214
Total Posts: 524150    
Members ( View Memberlist )