In my role as a Spring library developer at Neo4j, I spent the last year – together with Gerrit on creating the next version of Spring Data Neo4j. Our name so far has been Spring Data Neo4j⚡️RX but in the end, it will be SDN 6.
Anyway. Part of the module is our Neo4j Cypher-DSL. After working with jOOQ, a fantastic tool for writing SQL in Java, and seeing what our friends at VMWare are doing with an internal SQL DSL for Spring Data JDBC, I never wanted to create Cypher queries via string operations in our mapping code ever again.
So, we gave it a shot and started modeling a Cypher-DSL after openCypher, but with Neo4j extensions supported.
You’ll find the result these days at neo4j-contrib/cypher-dsl.
Wait, what? This repository is nearly ten years old.
Yes, that is correct. My friend Michael started it back in the days. There are only few things were you won’t find him involved in. He even created jequel, a SQL-DSL as well and was an author on this paper: On designing safe and flexible embedded DSLs with Java 5, which in turn had influence on jOOQ.
Therefor, when Michael offered that Gerrit and I could extract our Cypher-DSL from SDN/RX into a new home under the coordinates org.neo4j:neo4j-cypher-dsl
, I was more than happy.
Now comes the catch: It would have been easy to just delete the main branch, create a new one, dump our stuff into it and call it a day. But: I actually wanted to honor history. The one of the original project as well as ours. We always tried to have meaningful commits and also took a lot of effort into commit messages and I didn’t want to lose that when things are not working.
Adding content from one repository into an unrelated one is much easier than it sounds:
# Get your self a fresh copy of the target git clone git@wherever/whatever.git targetrepo # Add the source repo as a new origin git remote add sourceRepo git@wherever/somethingelse.git # Fetch and merge the branch in question from the sourceRepo as unrelated history into the target git pull sourceRepo master --allow-unrelated-histories |
Done.
But then, one does get everything from the source. Not what I wanted.
The original repository needed some preparation.
git filter-branch
to the rescue. filter-branch
works with the “snapshot” model of commits in a repository, where each commit is a snapshot of the tree, and rewrites these commits. This is in contrast to git rebase
, that actually works with diffs. The command will apply filters to the snapshots and create new commits, creating a new, parallel graph. It won’t care about conflicts.
Manisch has a great post about the whole topic: Understanding Git Filter-branch and the Git Storage Model.
For my use case above, the build in subdirectory-filter
was most appropriate. It makes a given subdirectory the new repository root, keeping the history of that subdirectory. Let’s see:
# Clone the source, I don't want to mess with my original copy git clone sourceRepo git@wherever/somethingelse.git # Remove the origin, just in case I screw up AND accidentally push things git remote rm origin # Execute the subdirectory filter for the openCypher DSL git filter-branch --subdirectory-filter neo4j-opencypher-dsl -- --all |
Turns out, this worked good, despite that warning
WARNING: git-filter-branch has a glut of gotchas generating mangled history
rewrites. Hit Ctrl-C before proceeding to abort, then use an
alternative filtering tool such as ‘git filter-repo’
(https://github.com/newren/git-filter-repo/) instead. See the
filter-branch manual page for more details; to squelch this warning,
set FILTER_BRANCH_SQUELCH_WARNING=1.
I ended up with a rewritten repo, containing only the subdirectory I was interested in as new root. I could have stopped here, but I noticed that some of my history was missing: The filtering only looks at the actual snapshots of the files in question, not at their history you get when using --follow
. As we moved around those files around a bit already, I lost all the value information.
Well, let’s read the above warning again and we find filter-repo. filter-repo
can be installed on a Mac for example with brew install git-filter-repo
and it turns out, it does exactly what I want, given I know vaguely the original places of the stuff I want to have in my new root:
# Use git filter-repo to make some content the new repository root git filter-repo --force \ --path neo4j-opencypher-dsl \ --path spring-data-neo4j-rx/src/main/java/org/springframework/data/neo4j/core/cypher \ --path spring-data-neo4j-rx/src/main/java/org/neo4j/springframework/data/core/cypher \ --path spring-data-neo4j-rx/src/test/java/org/springframework/data/neo4j/core/cypher \ --path spring-data-neo4j-rx/src/test/java/org/neo4j/springframework/data/core/cypher \ --path-rename neo4j-opencypher-dsl/: |
This takes a couple of paths into consideration, tracks the history and renames the one path (the blank after the :
makes it the new root). Turns out that git-filter-branch
.
With the source repository prepared in that way, I cleaned up some meta and build information, added one more commit and incorporated it into the target as described at the first step.
I’m writing this down because I found it highly useful and also because we are gonna decompose the repository of SDN/RX further. Gerrit described our plans in his post Goodbye SDN⚡️RX. We will do something similar with SDN/RX and Spring Data Neo4j. While we have to manually transplant our Spring Boot starter into the Spring Boot project via PRs, we want to keep the history of SDNR/RX for the target repo.
Long story short: While I was skeptical at first ripping the work of a year apart and distributing it on a couple of projects, I’m seeing it now more as a positive decomposing of things (thanks Nigel for that analogy).
Featured image courtesy of Nathan Dumlao on Unsplash.
No comments yet
One Trackback/Pingback
[…] >> Rewriting and Filtering History [info.michael-simons.eu] […]
Post a Comment