Spring Data and Spring Data Neo4j join the “No OFFSET Movement!”
Markus Winand has blogged about tool support for keyset pagination nearly a decade ago and of course my friend Lukas Eder has picked up that topic and did not only blog about it several times but implemented tool support with the the synthetic seek-clause of jOOQ. As the seek clause in jOOQ is excellent for relational databases, I’m gonna refrain now for calling the following the “very best way” of doing keyset based pagination and leave that to others.
So what is this about? The Spring Data Commons project – that is the base project for a broad variety of store implementations such as JPA, MongoDB, Redis and certainly Neo4j – added infrastructure support for keyset-based scrolling.
How does that look like from a Spring Data repositories point of view?
import java.util.UUID; import org.springframework.data.domain.ScrollPosition; import org.springframework.data.domain.Sort; import org.springframework.data.domain.Window; import org.springframework.data.neo4j.integration.shared.common.ScrollingEntity; import org.springframework.data.neo4j.repository.Neo4jRepository; public interface ScrollingRepository extends Neo4jRepository<ScrollingEntity, UUID> { Window<ScrollingEntity> findTop4By(Sort sort, ScrollPosition position); } |
Keyset based pagination drops the notion of offset completely. It is dependent on an associated sort object, in this case given through the first parameter, the Sort
object. As with all pagination efforts, we need to know how many items per page shall be retrieved. In this case, 4. This is determined by the derived finder method. The scroll position (the second parameter) determines the offset.
The above will be possible from the next Spring Data releases for the MongoDB- and Neo4j implementations. Some stores might offer additional support on their data access templates, Neo4j does not as of writing (we just added the feature just days prior to the current RC).
The beauty of the above is: For you as a user, the calling of this just works the same for all the stores. Imagine the following simple entity:
@Node public class ScrollingEntity { @Id @GeneratedValue private UUID id; @Property("foobar") private String a; private Integer b; private LocalDateTime c; } |
And some test data (here, Neo4j is being used):
Connected to Neo4j using Bolt protocol version 5.0 at neo4j://localhost:7687 as user neo4j. Type :help for a list of available commands or :exit to exit the shell. Note that Cypher queries must end with a semicolon. neo4j@neo4j> match (n:ScrollingEntity) return n order by n.b asc, n.a desc; +-----------------------------------------------------------------------------------------------------------------+ | n | +-----------------------------------------------------------------------------------------------------------------+ | (:ScrollingEntity {b: 0, foobar: "A0", c: 2023-03-20T13:12:25.201, id: "c2c2ebe4-5a02-4d77-a53b-1abbc80aaad9"}) | | (:ScrollingEntity {b: 1, foobar: "B0", c: 2023-03-21T13:12:29.201, id: "f4f84ed4-632d-431e-bb1a-b829bc2eaf5d"}) | | (:ScrollingEntity {b: 2, foobar: "C0", c: 2023-03-22T13:12:39.201, id: "f1c088f8-0b7b-456b-99b3-db5a0199dec6"}) | | (:ScrollingEntity {b: 3, foobar: "D0", c: 2023-03-23T13:12:31.201, id: "3b223485-e81b-4be8-8dbd-50277d313a8b"}) | | (:ScrollingEntity {b: 3, foobar: "D0", c: 2023-03-23T13:12:31.201, id: "1f525d3d-cdfe-40a6-964b-1fbfc08fae99"}) | | (:ScrollingEntity {b: 4, foobar: "E0", c: 2023-03-24T13:12:41.201, id: "572b780e-256f-41b7-87de-4a130bc3814b"}) | | (:ScrollingEntity {b: 5, foobar: "F0", c: 2023-03-25T13:12:25.201, id: "457ec454-a9af-421c-a9c1-7f5ce95310c5"}) | | (:ScrollingEntity {b: 6, foobar: "G0", c: 2023-03-26T13:12:55.201, id: "b423c34b-6952-4b73-b06b-d039cf7c7e7b"}) | | (:ScrollingEntity {b: 7, foobar: "H0", c: 2023-03-27T13:13:00.201, id: "ca90cd25-a676-44d4-a4c2-2db32443bf2f"}) | | (:ScrollingEntity {b: 8, foobar: "I0", c: 2023-03-28T13:12:57.201, id: "59a5dfb2-0e17-4eeb-aecd-95bb555e0117"}) | +-----------------------------------------------------------------------------------------------------------------+ 10 rows ready to start consuming query after 55 ms, results consumed after another 3 ms neo4j@neo4j> |
All of that can be consumed in the simplest way like that:
import static org.assertj.core.api.Assertions.assertThat; import java.util.ArrayList; import org.junit.jupiter.api.Test; import org.springframework.beans.factory.annotation.Autowired; import org.springframework.data.domain.KeysetScrollPosition; import org.springframework.data.domain.Sort; import org.springframework.data.domain.WindowIterator; import org.springframework.data.neo4j.integration.imperative.repositories.ScrollingRepository; import org.springframework.data.neo4j.integration.shared.common.ScrollingEntity; class KeysetBasedScrollingIT { @Test void forwardWithDuplicatesIteratorIteration(@Autowired ScrollingRepository repository) { var sort = Sort.by(Sort.Order.asc("b"), Sort.Order.desc("a")); var it = WindowIterator .of(pos -> repository.findTop4By(sort, pos)) .startingAt(KeysetScrollPosition.initial()); var content = new ArrayList<ScrollingEntity>(); while (it.hasNext()) { var next = it.next(); content.add(next); } assertThat(content).hasSize(10); assertThat(content.stream().map(ScrollingEntity::getId) .distinct().toList()).hasSize(10); } } |
Here, the content is sorted by b
in ascending and by a
in descending order, 4 pieces at a time. The WindowIterator
starts at the initial position and tanks in a window providing function. That window will scroll over the dataset. What queries are generated?
The initial query looks like this, retrieving n+1 elements sorted in the order specified:
MATCH (scrollingEntity:`ScrollingEntity`) RETURN scrollingEntity ORDER BY scrollingEntity.b, scrollingEntity.foobar DESC, scrollingEntity.id LIMIT 5 |
Why n+1 elements? It’s an easy way to judge if there are more elements available to scroll further or not, without going through an additional counting query.
The next query looks like this:
MATCH (scrollingEntity:`ScrollingEntity`) WHERE ((scrollingEntity.b > $pcdsl01 OR (scrollingEntity.b = $pcdsl01 AND scrollingEntity.foobar < $pcdsl02)) OR (scrollingEntity.b = $pcdsl01 AND scrollingEntity.foobar = $pcdsl02 AND scrollingEntity.id > $pcdsl03)) RETURN scrollingEntity ORDER BY scrollingEntity.b ASC, scrollingEntity.foobar DESC, scrollingEntity.id ASC LIMIT 5 |
This has now 3 parameters:
:param pcdsl01 => 3 :param pcdsl02 => "D0" :param pcdsl03 => "282ac053-c821-47f4-9a0e-0d12e0b91808" |
The next page starts at an element whose b attribute is greater 3 or is equal to 3 and as a footer attribute lower than D0. If you look closely, the 4th and 5th poses a nice test case for our tooling and made it hopefully clear why we add the 3rd condition here: To add uniqueness to the keyset on which we paginate.
The Spring Data Commons and Spring Data Neo4j implementation of the keyset based pagination is quite sophisticated in that sense that it allows for different directions for different columns to sort by, in contrast to a simple tuple comparison (in the above case something along the lines where [n.b, n.foobar] >= [$pcdsl01, $pcdsl02, $pcdsl03]
(also not a real tuple in Neo4j but a list comparison).
Above I presented using this feature with the new WindowIterator
, but you can also control this manually like this:
import static org.assertj.core.api.Assertions.assertThat; import java.util.function.Function; import org.assertj.core.data.Index; import org.junit.jupiter.api.Test; import org.springframework.beans.factory.annotation.Autowired; import org.springframework.data.domain.KeysetScrollPosition; import org.springframework.data.domain.Sort; import org.springframework.data.neo4j.integration.imperative.repositories.ScrollingRepository; import org.springframework.data.neo4j.integration.shared.common.ScrollingEntity; class KeysetBasedScrollingIT { @Test void forwardWithDuplicatesManualIteration(@Autowired ScrollingRepository repository) { var duplicates = repository.findAllByAOrderById("D0"); assertThat(duplicates).hasSize(2); var sort = Sort.by(Sort.Order.asc("b"), Sort.Order.desc("a")); var window = repository.findTop4By(sort, KeysetScrollPosition.initial()); assertThat(window.hasNext()).isTrue(); assertThat(window) .hasSize(4) .extracting(Function.identity()) .satisfies(e -> assertThat(e.getId()).isEqualTo(duplicates.get(0).getId()), Index.atIndex(3)) .extracting(ScrollingEntity::getA) .containsExactly("A0", "B0", "C0", "D0"); window = repository.findTop4By(sort, window.positionAt(window.size() - 1)); assertThat(window.hasNext()).isTrue(); assertThat(window) .hasSize(4) .extracting(Function.identity()) .satisfies(e -> assertThat(e.getId()).isEqualTo(duplicates.get(1).getId()), Index.atIndex(0)) .extracting(ScrollingEntity::getA) .containsExactly("D0", "E0", "F0", "G0"); window = repository.findTop4By(sort, window.positionAt(window.size() - 1)); assertThat(window.isLast()).isTrue(); assertThat(window).extracting(ScrollingEntity::getA) .containsExactly("H0", "I0"); } } |
The key classes in that API are org.springframework.data.domain.Window
and the org.springframework.data.domain.KeysetScrollPosition
. The first one contains data and information at which position the scrolling window is positioned at any time, the latter is a value object for the current keys.
Last but not least, the API has full support for scrolling backwards as well. Scrolling backwards requires inverting the order for each column individually. At the end of the post I’ll add a link how we did this in Spring Data Neo4j without going through the pain of fiddling around with strings. Just inverting the operator in the generated order is however not enough when going backwards. By doing so only, the window would jump right back to the beginning. As we only know the keys at which the window arrived and not the keys n positions backward, we can solve this issue by inverting the sort order as whole too, match and collect and than recreating the sort on the client side again (we have some ideas for a more sophisticated Cypher based solution, though).
Implementing this feature for Spring Data Neo4j has been quite a nice experience. All our query generation goes through the Cypher-DSL which is in essence a builder for Cypher. Here we made use of iteratively building conditions:
var resultingCondition = Conditions.noCondition(); // This is the next equality pair if previous sort key was equal var nextEquals = Conditions.noCondition(); // This is the condition for when all the sort orderedKeys are equal, and we must filter via id var allEqualsWithArtificialSort = Conditions.noCondition(); for (Map.Entry<String, Object> entry : orderedKeys.entrySet()) { var k = entry.getKey(); var v = entry.getValue(); if (v == null || (v instanceof Value value && value.isNull())) { throw new IllegalStateException("Cannot resume from KeysetScrollPosition. Offending key: '%s' is 'null'".formatted(k)); } var parameter = Cypher.anonParameter(v); Expression expression; var scrollDirection = scrollPosition.getDirection(); if (Constants.NAME_OF_ADDITIONAL_SORT.equals(k)) { expression = entity.getIdExpression(); var comparatorFunction = getComparatorFunction(scrollDirection == KeysetScrollPosition.Direction.Forward ? Sort.Direction.ASC : Sort.Direction.DESC, scrollDirection); allEqualsWithArtificialSort = allEqualsWithArtificialSort.and(comparatorFunction.apply(expression, parameter)); } else { var p = propertyAndDirection.get(k); expression = p.property.isIdProperty() ? entity.getIdExpression() : root.property(k); var comparatorFunction = getComparatorFunction(p.order.getDirection(), scrollDirection); resultingCondition = resultingCondition.or(nextEquals.and(comparatorFunction.apply(expression, parameter))); nextEquals = expression.eq(parameter); allEqualsWithArtificialSort = allEqualsWithArtificialSort.and(nextEquals); } } resultingCondition = resultingCondition.or(allEqualsWithArtificialSort); |
Also, getting the comparator right is type safe:
static BiFunction<Expression, Expression, Condition> getComparatorFunction( Sort.Direction sortDirection, KeysetScrollPosition.Direction scrollDirection ) { if (scrollDirection == KeysetScrollPosition.Direction.Backward) { return sortDirection.isAscending() ? Expression::lte : Expression::gte; } return sortDirection.isAscending() ? Expression::gt : Expression::lt; } |
Of course a keyset based pagination is not a total silver bullet and can also be outright slow if done wrong. For example you should make sure you have proper indexes on all columns you want to paginate over (which is true for Neo4j and relation, and I guess also for MongoDB). For Neo4j I shamelessly recommend using Neo4j Migrations to have proper control over your Neo4j indexes and constraints, across all Neo4j versions from 3.5 to the latest release in Neo4j Aura. Also, while you can totally go wild in your sort for the pagination and have dozens of columns, the conditions generated will grow exponentially. If this is what you want, be our guest but don’t complain 🙂
If you use this feature in a sane way, it will be faster than just doing offset based paginations in Neo4j and I hope you find it as useful as I enjoyed creating it together with Mark Paluch and Christoph Strobl.