Create ZIP Archives containing Unicode filenames with Java

January 5, 2010 by Michael

It is harder than i thought to create a simple Zip Archive from within Java that contains entries with unicode names in it.

I’m actually to lazy to read all the specs, but it says something that the entries in a zip archive are encoded using “Cp437”. The buildin Java compressing api has nothing to offer for setting the encoding so i tried Apache Commons Compress.

The manual says the following about interop :

For maximum interop it is probably best to set the encoding to UTF-8, enable the language encoding flag and create Unicode extra fields when writing ZIPs. Such archives should be extracted correctly by java.util.zip, 7Zip, WinZIP, PKWARE tools and most likely InfoZIP tools. They will be unusable with Windows’ “compressed folders” feature and bigger than archives without the Unicode extra fields, though.

That didn’t work for me.

After some cursing, this is my solution:

final ZipArchiveOutputStream zout = new ZipArchiveOutputStream(new BufferedOutputStream(new FileOutputStream(fc.getSelectedFile())));
zout.setEncoding("Cp437");
zout.setFallbackToUTF8(true);
zout.setUseLanguageEncodingFlag(true);								
zout.setCreateUnicodeExtraFields(ZipArchiveOutputStream.UnicodeExtraFieldPolicy.NOT_ENCODEABLE);

I specifying explicitly the encoding but instead of using utf-8, which didn’t work for my utf-8 strings (wtf??), i’m using the Cp437 from the specs and some other magic options and it works for me in 7zip, WinZip and even Windows’ “compressed folders”.

Edit: Unfortunately, in Mac OS X’s Unzip utility, the non Cp437 are broken. If anyone has a good idea, feel free to leave a comment.

12 comments

  1. Ricardo wrote:

    Hello,

    Your post has been very useful to me. My zip files now have proper filename encoding. Thanks for sharing your knowledge.

    Posted on May 6, 2010 at 10:14 AM | Permalink
  2. Michael wrote:

    You’re very welcome!

    Posted on May 6, 2010 at 10:39 AM | Permalink
  3. Ricardo Costa wrote:

    Thank you very much my friend. I have spent too much time to solve my encoding filename problem.
    Congratulations.

    Posted on June 17, 2010 at 4:12 PM | Permalink
  4. Michael wrote:

    Thank your for your kind comment 🙂

    Posted on June 17, 2010 at 5:32 PM | Permalink
  5. yogav wrote:

    it does not seem to work for me, would you consider publishing your full code?

    Posted on February 13, 2011 at 10:02 PM | Permalink
  6. Michael wrote:

    yogav: This is the full code for configuring the outputstream. Everything else is in the api.

    Posted on March 9, 2011 at 8:24 AM | Permalink
  7. Z wrote:

    Hi! Thank you for this piece of code, I’m having the same problem (encoding localized files in Zip archives), and I saw you mentioned your solution working with Windows’ Compressed Folders.
    Sadly this is not the case with my files, after using your code, the file “ęóąśłżźćń.xlsx” still decompresses as “-Ö+¦-à+¢+é+++¦-ç+ä.xlsx”.
    Would you happen to have any idea how to address this, or is this reproducible by you?

    Posted on September 23, 2015 at 10:44 AM | Permalink
  8. Michael wrote:

    Hi,

    sadly i can only confirm your findings with the given filename.

    Sorry.

    Posted on September 23, 2015 at 11:18 AM | Permalink
  9. Z wrote:

    Thank you for trying and another thank you for replying this quickly 🙂
    Have a nice day!

    Posted on September 23, 2015 at 12:58 PM | Permalink
  10. Michael wrote:

    You’re welcome! I can only guess that i just tried öäü and the like which i needed 5 years ago.

    You too!

    Posted on September 23, 2015 at 12:59 PM | Permalink
  11. Anirudh Mittal wrote:

    I tried this solution but now I cannot even open the zip file.

    Also will this work in both Windows and Mac?

    Currently I use java output stream and with that my Mac book zip extractor shows the names fine but it is the windows who messes up with the file name

    Posted on October 16, 2015 at 12:42 AM | Permalink
  12. Siva wrote:

    It shows error while using this code. Can you post the full code?

    Posted on August 16, 2016 at 1:35 PM | Permalink
3 Trackbacks/Pingbacks
  1. […] utilizar Apache Commons Compress. Michael Simons escribió este bonito fragmento de […]

  2. […] may use Apache Commons Compress. Michael Simons wrote this nice piece of […]

  3. […] may use Apache Commons Compress. Michael Simons wrote this nice piece of […]

Post a Comment

Your email is never published. We need your name and email address only for verifying a legitimate comment. For more information, a copy of your saved data or a request to delete any data under this address, please send a short notice to michael@simons.ac from the address you used to comment on this entry.
By entering and submitting a comment, wether with or without name or email address, you'll agree that all data you have entered including your IP address will be checked and stored for a limited time by Automattic Inc., 60 29th Street #343, San Francisco, CA 94110-4929, USA. only for the purpose of avoiding spam. You can deny further storage of your data by sending an email to support@wordpress.com, with subject “Deletion of Data stored by Akismet”.
Required fields are marked *