You can see on the page above that some apostrophes, ampersands, commas and other characters have been turned into a black diamond with a question mark in the center.
Can anyone explain why this is happening and how to fix it? This happened the last time I moved to a new database with my new host and I had to manually go back in and edit every article and I REALLY don’t want to do that again.
I would like to say “an encoding issue, what else?”, but your case looks rather weird. None of the usual tell-tale signs. I actually only see two apostrophes, and they are not mangled in any recognizable way. Do you have a page with more such characters?
Seems like it may just be ampersands and apostrophes from the pages that I’ve seen…
I figured it was an encoding issue too, but to be honest, I’m not familiar with all the different encoding values and such. I really don’t know what I would need to change to alleviate this problem now and going forward so that it doesn’t happen again?
It’s also dashes, but it’s still weird. I would like you to try one thing for me (after making a backup, of course.) Bring up a page in your browser where you see it happening, and copy the character. Then go to CP Home › Admin › Utilities › Find and Replace, and paste the character in the upper field, and the “correct” replacement (dash, quote, whatever) in the lower. Choose where the replacement should take place, and run it. Then reload the site and tell me if that changes anything.
Thanks for the tips. I’ll give those a try tomorrow at work. I’m curious though, since the problem isn’t just with one particular character, if the find/replace does work, I could essentially have some commas (or other characters) in the wrong place?
Thanks for the tips. I’ll give those a try tomorrow at work. I’m curious though, since the problem isn’t just with one particular character, if the find/replace does work, I could essentially have some commas (or other characters) in the wrong place?
I don’t think so. Just because you are seeing the same, genereic character does not mean it is the same character. It’s just that the system can’t display it, because it’s missing fromt he installed fonts, so it displays this generic character.
But: As always with any database issues, we make a backup first, so we can easily roll back if something happens.
Ok, we’ll see what happens. I guess I just misunderstood how the find and replace worked. I thought if I copied/pasted that generic character (diamond with question mark) into the find field and then put a comma in the replace field, it would go out and find every instance of that generic character and replace it with a comma, which wouldn’t work since other characters besides commas are getting messed up.
I’ll post back with my results hopefully in a little bit.
Why not make a copy of the old database backup and open it in a *plain text* editor.
You’ll probably find a statement like CHARSET right at the beginning.
This tells you the Character Set that the old database was using.
If you look at the Database Control Panel on the new server then you will see
(1) If you are allowed to create a new database, or if one has been created for you and you can’t change it.
(2) If the database has already been created for you, and you can’t change it,
then the Control Panel will tell you what Character Set the new database is using.
This is the information that you need—(1) CHARSETs of old database and new database, also
(2) whether your new host allows you to create a new database with the same CHARSET as the old database.
Thanks everyone for the suggestions. We’ve got multiple backups created of the database and our host also has a backup, so hopefully we’re squared away in that regard.
KeithW:
I found this information. Our old database was UTF-8 and the new database was Latin1 after I backed each of them up and viewed in a plain text editor. So, we had our host delete the new database with the wrote CHARSET and then we had them create a new database. At that time, we went in and imported our old database with the correct CHARSET and during the import process we made sure to select UTF-8. After these changes, we are still having this same problem. So… back to the drawing board on this one. Our host is working with the database right so I can’t do much, but I’m still planning on giving Ingmar’s suggests for find/replace a try whenever I can.
Our host seems to have solved this problem. I’m going to attempt to explain what happened and apologize in advance if my terminology is wrong because as I stated above, I don’t have much experience with SQL and the different charsets and what not.
So, the situation was that we went from MySQL Version 3.23.53 to 4.1. This required that we simply move our data to a new database. When our host imported the data from the old database to the new database, some special characters (dashes, commas, colons, etc) were rendering as black diamonds with white question marks in the center.
According to our host, the problem ended up being that initially, they imported the data from “latin_general” and converted it to UTF-8, but evidently, the original database wasn’t in “latin_general.” After quite a bit of testing they tried importing the data as “latin1_swedish_ci” and converted to UTF-8 and that fixed all of the problems with the special characters.
Maybe someone will understand that better than I do? Like I said, it’s all a bit confusing to me, but it seems to be fixed now and that’s all that I really care about. If anyone can put what really happened in baby talk so that I can understand it, I’d appreciate it.
According to our host, the problem ended up being that initially, they imported the data from “latin_general” and converted it to UTF-8, but evidently, the original database wasn’t in “latin_general.” After quite a bit of testing they tried importing the data as “latin1_swedish_ci” and converted to UTF-8 and that fixed all of the problems with the special characters.
Well, it didn’t look quite like your regular run of the mill encoding issue.
Maybe someone will understand that better than I do? Like I said, it’s all a bit confusing to me, but it seems to be fixed now and that’s all that I really care about.
To be honest, it’s a bit confusing to me, and I am considered the resident expert on such issues But as you say: whatever works, ultimately.
The weird thing is that I had this very same problem when I moved my blog from my old host: A Small Orange to Engine Hosting. I had no idea what was going on at that time (even less than now) and I ended up manually going into the templates and entries and changing these weird diamond characters to the correct characters. Spent a lot of time on that and think I still missed some of my older articles.
Seems to just be an issue with changing servers or database versions. I took a look at the databases for my three sites on EngineHosting. My first site (my blog which was moved from the old host) is set to Latin1_Swedish_ci and the other two of my sites which were create initially with EngineHosting are set to UTF8_general_ci. So, I’m not sure what relevance that has to anything, but it seems that when transferring it has to be set to swedish, but even then I still had this problem which like I said above, I just changed each one manually.
Surely all that is relevant is the character set, usually Latin1 or UTF8.
The order of the characters for sorting (swedish or general) should not be relevant.
When you (or your provider) backs up a database, if the character set of the backup
is different from the original character set of the database, then there’s
a possibility that characters will be garbled. If you keep the same character set
when backing up and when recreating the database on a new server, then there’s
much less potential for disaster
Assuming that all providers understand database character sets
is like assuming that all providers will backup your data and
provide you with a copy when you need it :
if the user doesn’t ask the provider the right questions, then
the user is likely to find that that the data is mangled or lost
I’m curious though. My only experience with backing up my database comes from the EE admin panel (which has since been removed) and through phpMyAdmin. Through phpMyAdmin, I click on the correct database and then click the “export” tab. Can you tell me where I will specify the character set of the backup so that I can match it up with the character set of the current database so I can try to avoid these problems in the future? I don’t see an option to specify character set when exporting.
I’ve had the same problem (a provider messing up the character set of a database when moving it to a new server).
In my case some of the tables in the database were in English, and some were in Japanese.
Many hosting companies do not allow users to create a database—they give the user a set number of databases,
often only one database, and they decide the default character set when they create the database (the user has
no control over this). With MySQL 3.x or 4.0, it’s not possible to specify a different character set for tables
in the database, and the database character set (CHARSET) is likely to be set to Latin1 by default.
However it *is* possible to specify a different CHARSET for displaying the data in phpMyAdmin.
If you set the display character set to UTF-8 Japanese, for example, and the “MySQL connection”
setting is also UTF-8, then the actual data being stored in the database or retrieved from the database
is UTF-8, even though the provider has specified the MySQL CHARSET as “Latin1”.
Now suppose that you (or your provider) go to back up your data.
For MySQL 3.x and 4.0, if the database character set has been preset to “Latin1”, then phpMyAdmin
thinks that the data is “Latin1” unless you override it. If you try to back up the UTF-8 Japanese tables
without changing the connection CHARSET to UTF-8 (the same as when the data was stored) then
the data is going to be mangled.
I think that probably even if you change the connection CHARSET to UTF-8, the UTF-8 backup file is still
going to contain a statement that the database was created as “Latin1”—because that’s what MySQL says it is.
Likewise when you (or your provider) use phpMyAdmin to restore the data or move it to a new server.
For MySQL 4.1 or later, even if you have no control over the default CHARSET of the database (because
the hosting company creates the single database you get) you can set the CHARSET of tables in the database.
But unless you explicitly set the table CHARSET (and display/connection CHARSET, if necessary)
to the same as the CHARSET of the (UTF-8) backup when you create the table (and INSERT the data)
the data may get mangled at this step.
The ideal solution is to set up Apache, PHP and MySQL on your own (company or home) Windows, Mac, or LINUX machine.
Use the same, or similar, versions of PHP and MySQL as your host. Then you can test any proposed changes locally
rather than on a system that is live. And you can test backups from the host, to make sure that your backup
has not been lost or mangled. HTH
Very nice overview- from a couple of folks. It can be a giant pain when this gets off- if you can set it up from the start to be consistent across the board, life is made much easier.
Think I’ll go ahead and close this out, since the original issue is solved. If someone does come across it in the future and has a related problem, make sure to link back.