software.tarxjf.info
8 votes c/freepost Posted by dis.k — 8 votes, 2 commentsSource

The code is probably very ugly, this is the first time I wrote something that long. Corrections very welcome. I already suspect some inefficiencies around file saving.

Why: I have a multilingual music library (4 different alphabets), which includes songs that I ripped a decade ago when I was still using Windows.

Windows saved using their own codepages (like windows-1253), which are not the same as the ISO codepages, and not the same as UTF-8. In addition, I think that the ripping software back then didn’t even store encoding information in the metadata, so I had a lot of garbage displayed on PCs and portable music players, who read those files as the ‘default’ iso-8859-1/Latin-1.

I got fed up with this situation, so I tried to find a way to automate fixing the tags. If I couldn’t automate it, I would just delete my music library and be done with it. Fortunately, it worked!

PS: Not tested with non-alphabetic scripts. Here be dragons!

Is this meant to read windows-1253, convert to utf-8, and update the mp3 metadata?

Is this meant to read windows-1253

It will try to guess the correct encoding using Unicode, Dammit from BeautifulSoup4. In my test, it could handle 3 codepages (Greek Windows, Turkish ISO, Finnish ISO) at the same time, but it got confused with four. So I first run the script against the fourth group (a few albums only) with -c iso-8859-7, and then against the whole library (it will skip anything that doesn’t need changes).

MISS: artist "ÕÐÏÃÅÉÁ ÑÅÕÌÁÔÁ" looks like koi8-r. It will become "упоцеиа яеулата"
CORR: artist "Õðüãåéá Ñåýìáôá" looks like iso-8859-7. It will become "Υπόγεια Ρεύματα"
CORR: album "þu anda! þimdi" looks like iso-8859-9. It will become "şu anda! şimdi"
CORR: artist "Social Waste" looks like it is ascii. No change will be made

I left some commented-out debug output in the code, which can be used to see what exactly goes one detection-wise.

convert to utf-8, and update the mp3 metadata?

yes, hopefully.