[Cryptography] Cryptographic archive format

Jon Callas jon at callas.org
Mon Dec 21 16:15:36 EST 2020

> On Dec 21, 2020, at 10:29 AM, Phillip Hallam-Baker <phill at hallambaker.com> wrote:
> [Yes, I have researched ZIP, yes, it is easier to chuck away the legacy and start from scratch, no, it is not useful to discuss these design parameters.]

Also research tar files, and other zip-related things like jar, etc. There are lots of them around.

Research as well things like "bundles" in Nextstep and its descendants. Meaning macOS, iOS, etc. A bundle is a directory that contains subdirectories and the thing is treated as if it were a single file. An "app," for example is a bundle.

Also go look at chroot jails, and all their relatives like "containers" and even things like Docker, Kubernetes, etc.

> OK so that is why I need a new format, it is simply not worth the effort to back-haul the necessary changes into an existing scheme and the systems won't be compatible anyway. Another difference is that the new format is not a compression format for good reasons I won't go into here. the TL;DR; is that compression is better applied at the application file level these days. There is no value to compression in an archive format when people are archiving bundles of JPGs. It might make sense to automatically encode HTML in Brotli but for most purposes the value is likely small.

I think you've argued why compression is not needed, not anything else. The compressed archive formats have two pieces, the archive and the compression. This is another reason why I think you should look at tar.

> Now the (hard) question: 
> What precautions do I need to take in recording file paths in the archive?

Actually, what you need are precautions in undoing the archive. As you note next, someone could create a malicious archive.

> The risk here is that someone crafts a malicious file path and sticks it into an archive so that the files end up overwriting the system files.

Yes. This is why you need work on pulling things out of the archive, not putting them in.

I think a reasonable rule is that when you unpack, you make a subdirectory and everything goes in it. It's legal to go down (making a subdirectory in your base directory for extract) and never up. If you do that, then you need a predicate that answers if the hypothetical target is in that subtree. If not, you put it in the base directory (or make an error directory and put all your errors there).

I think you're making it harder than it needs to be. Yes, you need an archive to work with in a subtree and never go out, I'm not sure you need anything else.

And really, I'm sure you can steal this from somewhere.


More information about the cryptography mailing list