Recovering Japanese Filenames from Zip Archives on OS X

If like me you deal with your typical Japanese administration office on a regular basis, you probably receive your fair share of documents, some of them occasionally packaged as a Zip archive

If also like me, you are not using a Windows machine, but a Mac running OS X or some flavour of Linux, you routinely end up with files bearing such poetic names as “Åuäwà ò_ï∂ä÷åWèëófiíÒèoìÕ.pdf”, “äwà ê\êøìÕÅyÉfÅ[É^Åz.xls” etc. This is due to some incompatibility between the way each system stores Japanese characters1 and the fact the Zip format was never conceived to handle such differences. Not a big problem if you have one file, bit tedious if the archive contains 300 of them.

In the spirit of sharing the fruit of my last productivity-sink effort to fix that problem, I present you with a small script that takes such a Zip archive as input and correctly extract all the files (with their properly encoded filenames):

Download unzip.py

To use: Open a terminal window2 and use the command:

unzip.py yourziparchive.zip

(with your own zip file name, obviously)

That’s it! Your files should get extracted in the same folder as the original archive.

Extra Notes:

By default, the script assumes the Zip file has been made on Windows and filenames are encoded as SJIS, but if you know this not to be the case, you can try providing it with a different source encoding:

unzip.py yourziparchive.zip encoding

Where encoding is the source encoding (‘SJIS’, ‘UTF-8’ etc.).

You need to have Python installed (it’s there by default on recent installs of OS X).

This script comes with no support whatsoever and is merely provided as a small present to the next unfortunate person who would otherwise spend 20 minutes searching for such a tool on the web. Feel free to post suggestions and improvements in the comments.

  1. To be specific: Windows seems to be using good-old antiquated Japanese-only SJIS, whereas OS X and others prefer spiffy universal encodings like UTF-8. []
  2. If I have lost you at ‘terminal’: sorry, there is not much I can do to help… this script is probably not for you. []

1 comment

Leave a Reply