Archiving OS X Mavericks tags (and other data) with git

October 25, 2013 by Michael

For the last 6 months i’ve been archiving all my paper work (OCR’ing and than trashing it) to a personal documents repository.

There are some document managers out there but every single one felt like overkill to me, so i just stick to a pretty simple directory structure which is enough for me.

Although i need those documents across devices, i didn’t want to use a cloud service to sync them. git does a pretty good job here.

With Mac OS X Mavericks comes a great new feature: Tagging. Certainly we’ve all used tags somewhere on the internet and i really like this kind of taxonomies. It’s way better than a fixed folder structure.

So, i can now tag all my documents without the need for an external program.

But what about sync? Those tags are stored in the extended file attributes of the Mac OS X filesystem (along with other stuff, for example if the file has been download from the web or email). git does not include those extended attributes in a repository so they will be lost.

xattr to the rescue. xattr can dump all extended attributes for all files in a directory and also can write them back.

I use the following pre-commit hook to dump all extended attributes of my archive to a file named .metadata

#!/bin/sh
 
xattr -lrx . > .metadata
git add -f .metadata

This can be a problem if only tags are modified as nothing will be committed. This can be handled by an empty commit:

git commit --allow-empty -m "New Tags

To restore them i use the following post-merge hook which is also executed after a pull (i’m pretty much doing only pulls on this repository anyway).

#!/usr/bin/env ruby
# Be careful, this can be something you don't want:
# strip all existing extended attributes
system("xattr -cr .")
 
pattern_header = /([^\0]+): (.+):/
pattern_data = /\d{8} (.+) +\|.+\|/
 
data, current_file, current_attribute = '', nil, nil
 
File.readlines('.metadata').each do |line|  
  # collect hex data
  if(m = pattern_data.match(line) and current_file)
    m = pattern_data.match line
    data += m[1].to_s.strip if m and m[1]  
  # starting hex data for a new file
  elsif(m = pattern_header.match(line))
    # we have some data for the current file
    if current_file and data != ''
      system("xattr -wx #{current_attribute} #{data.gsub(/ /, '')} \"#{current_file}\"")
    end   
    data, current_file, current_attribute = '', m[1], m[2] 
  elsif current_file   
    m = pattern_data.match line
    data += m[1].to_s.strip if m and m[1]  
  end
end

This hook is pretty simple and one can surely think of better ways for storing (and / or parsing) the data and add some error handling, but this works quite well for my purpose.

This hook also stores every extended attribute. If you’re only interested in meta tags, than only sync the “com.apple.metadata:_kMDItemUserTags” attribute.

2 comments

  1. resource_fork wrote:

    The solution works quite well, thanks. pre-commit does gather all the attributes, however, post-merge does not restore the com.apple.ResourceFork attr. I am, in particular, trying to restore the file icon. i can see that it is saved as ResourceFork, but not restored. Would you have any insights into it ?

    Posted on November 12, 2015 at 6:55 PM | Permalink
  2. Michael wrote:

    Hi,
    thanks for your feedback!
    Never cared about the icons, though. So, sorry, but not.

    Posted on November 13, 2015 at 8:25 AM | Permalink
Post a Comment

Your email is never published. We need your name and email address only for verifying a legitimate comment. For more information, a copy of your saved data or a request to delete any data under this address, please send a short notice to michael@simons.ac from the address you used to comment on this entry.
By entering and submitting a comment, wether with or without name or email address, you'll agree that all data you have entered including your IP address will be checked and stored for a limited time by Automattic Inc., 60 29th Street #343, San Francisco, CA 94110-4929, USA. only for the purpose of avoiding spam. You can deny further storage of your data by sending an email to support@wordpress.com, with subject “Deletion of Data stored by Akismet”.
Required fields are marked *