If like me you deal with your typical Japanese administration office on a regular basis, you probably receive your fair share of documents, some of them occasionally packaged as a Zip archive…
If also like me, you are not using a Windows machine, but a Mac running OS X or some flavour of Linux, you routinely end up with files bearing such poetic names as “Åuäwà ò_ï∂ä÷åWèëóﬁíÒèoìÕ.pdf”, “äwà ê\êøìÕÅyÉfÅ[É^Åz.xls” etc. This is due to some incompatibility between the way each system stores Japanese characters1 and the fact the Zip format was never conceived to handle such differences. Not a big problem if you have one file, bit tedious if the archive contains 300 of them.
In the spirit of sharing the fruit of my last productivity-sink effort to fix that problem, I present you with a small script that takes such a Zip archive as input and correctly extract all the files (with their properly encoded filenames):
To use: Open a terminal window2 and use the command:
(with your own zip file name, obviously)
That’s it! Your files should get extracted in the same folder as the original archive.
By default, the script assumes the Zip file has been made on Windows and filenames are encoded as SJIS, but if you know this not to be the case, you can try providing it with a different source encoding:
unzip.py yourziparchive.zip encoding
Where encoding is the source encoding (‘SJIS’, ‘UTF-8’ etc.).
You need to have Python installed (it’s there by default on recent installs of OS X).
This script comes with no support whatsoever and is merely provided as a small present to the next unfortunate person who would otherwise spend 20 minutes searching for such a tool on the web. Feel free to post suggestions and improvements in the comments.