Encoding and Character Sets in Koha: Revision 4 / KOHA

KOHA

Encoding and Character Sets in Koha: Revision 4

http://wiki.koha.org/doku.php?id=encodingscratchpad

you need to consider that there are encodings set for Database, Table, AND Session. So even if you successfully loaded the tables and DB as UTF8, if your session is latin1, then you won't be able to view all data in the DB. Make sure you check the "SHOW VARIABLES..." as the user that will be connecting when Apache is accessed.

Bilješke u vezi konverzije:

Doing a Latin-1 to UTF-8 conversion on the mysqldump directly will likely make any MARC records that are touched unparseable. I suggest as part of your process that you export the MARC bib and authority records separately, fix them using MARC::Record and the techniques you've already identified, then import them back into your 2.2.9 test
database. Then you can fix a mysqldump of the non-MARC tables.

Very briefly, Koha 3's C4::Charset module's MarcToUTF8Record routine should give you some ideas. You can use that as the core of a routine to convert a file that contains mixed Latin-1 and UTF-8 records to UTF-8. However, it will not correctly handle a MARC record that has both Latin-1 and UTF-8, but could be modified to test each field and subfield to see if it contains UTF-8 or Latin-1.
http://git.koha.org/cgi-bin/gitweb.cgi?p=Koha;a=blob;f=C4/Charset.pm