The code is probably very ugly, this is the first time I wrote something that long. Corrections very welcome. I already suspect some inefficiencies around file saving.
Why: I have a multilingual music library (4 different alphabets), which includes songs that I ripped a decade ago when I was still using Windows.
Windows saved using their own codepages (like windows-1253), which are not the same as the ISO codepages, and not the same as UTF-8. In addition, I think that the ripping software back then didn’t even store encoding information in the metadata, so I had a lot of garbage displayed on PCs and portable music players, who read those files as the ‘default’ iso-8859-1/Latin-1.
I got fed up with this situation, so I tried to find a way to automate fixing the tags. If I couldn’t automate it, I would just delete my music library and be done with it. Fortunately, it worked!
PS: Not tested with non-alphabetic scripts. Here be dragons!
Is this meant to read windows-1253, convert to utf-8, and update the mp3 metadata?
It will try to guess the correct encoding using Unicode, Dammit from BeautifulSoup4. In my test, it could handle 3 codepages (Greek Windows, Turkish ISO, Finnish ISO) at the same time, but it got confused with four. So I first run the script against the fourth group (a few albums only) with
-c iso-8859-7
, and then against the whole library (it will skip anything that doesn’t need changes).I left some commented-out debug output in the code, which can be used to see what exactly goes one detection-wise.
yes, hopefully.