info.michael-simons.eu

Run your integration tests against Testcontainers with GraalVM native image

I have started working with a small team on an exciting project at Neo4j. The project is about database connectivity (what else) and we use Testcontainers in our integration tests, asserting the actual network connectivity and eventually the API.

The thing we are creating should of course also work when being compiled as part of an application into a native executable by GraalVM. For a bunch of older projects with a somewhat convoluted test-setup I used to create dedicated, small applications that produce some output. I compiled these apps into a native binary and used a scripted run to eventually assert the output.

For this new project I wanted to take the opportunity to use the Maven plugin for GraalVM Native Image building and it’s test capabilities directly.
The plugin works great and the maintainers are—like the whole GraalVM team—quite quick fixing any issue. We use it already at several occasions to produce native binaries as part of a distribution, but so far not for testing.

I personally find the documentation above linked not sufficient to create a proper setup for tests. Especially the section “Testing support” does not work for me: Neither the latest surefire nor failsafe plugins bring the required dependency org.junit.platform:junit-platform-launcher. This extension contains a test execution listener org.junit.platform.launcher.listeners.UniqueIdTrackingListener that tracks each executed tests and stores it in a file with unique ids. The GraalVM plugin will use that file to discover the tests it needs to run in native mode. If the file is not generated, it will yell at you with

[ERROR] Test configuration file wasn’t found. Make sure that test execution wasn’t skipped.

I can’t share the thing I am actually testing right now, so here’s something similar. The code below uses the Neo4j Java Driver and creates a data access object interacting with the Neo4j database:

package demo;
 
import java.util.Map;
import java.util.stream.Collectors;
 
import org.neo4j.driver.Driver;
 
public record Movie(String id, String title) {
 
	public static final class Repository {
 
		private final Driver driver;
 
		public Repository(Driver driver) {
			this.driver = driver;
		}
 
		public Movie createOrUpdate(String title) {
			return this.driver.executableQuery("MERGE (n:Movie {title: $title}) RETURN n")
				.withParameters(Map.of("title", title))
				.execute(Collectors.mapping(r -> {
					var node = r.get("n").asNode();
					return new Movie(node.elementId(), node.get("title").asString());
				}, Collectors.toList()))
				.stream()
				.findFirst()
				.orElseThrow();
		}
 
	}
}

and the integration tests looks like this. There are no surprises in it. It is disable without Docker support, uses a life cycle per class so that I can keep a reusable test container around for all tests and a sample test:

package demo;
 
import static org.junit.jupiter.api.Assertions.assertNotNull;
 
import org.junit.jupiter.api.AfterAll;
import org.junit.jupiter.api.BeforeAll;
import org.junit.jupiter.api.Test;
import org.junit.jupiter.api.TestInstance;
import org.neo4j.driver.AuthTokens;
import org.neo4j.driver.Driver;
import org.neo4j.driver.GraphDatabase;
import org.testcontainers.containers.Neo4jContainer;
import org.testcontainers.junit.jupiter.Testcontainers;
 
@Testcontainers(disabledWithoutDocker = true)
@TestInstance(TestInstance.Lifecycle.PER_CLASS)
class RepositoryIT {
 
	protected final Neo4jContainer<?> neo4j = new Neo4jContainer<>("neo4j:5.13.0")
		.waitingFor(Neo4jContainer.WAIT_FOR_BOLT)
		.withReuse(true);
 
	protected Driver driver;
 
	@BeforeAll
	void startNeo4j() {
		this.neo4j.start();
		this.driver = GraphDatabase.driver(this.neo4j.getBoltUrl(),
				AuthTokens.basic("neo4j", this.neo4j.getAdminPassword()));
	}
 
	@AfterAll
	void closeDriver() {
		if (this.driver == null) {
			return;
		}
		this.driver.close();
	}
 
	@Test
	void repositoryShouldWork() {
 
		var repository = new Movie.Repository(driver);
		var newMovie = repository.createOrUpdate("Event Horizon");
		assertNotNull(newMovie.id());
	}
 
}

Now, let’s walk through the pom.xml. For my dependencies, I usually look if they have a BOM file and import those into dependency management, so that I can rely on other projects organising their releases and transitive dependencies proper. Here I have the latest JUnit and Testcontainers plus Slf4j. From the latter I’ll use only the simple logger later, so that I can see Testcontainers logging:

<dependencyManagement>
    <dependencies>
        <dependency>
            <groupId>org.junit</groupId>
            <artifactId>junit-bom</artifactId>
            <version>5.10.0</version>
            <type>pom</type>
            <scope>import</scope>
        </dependency>
        <dependency>
            <groupId>org.slf4j</groupId>
            <artifactId>slf4j-bom</artifactId>
            <version>2.0.9</version>
            <type>pom</type>
            <scope>import</scope>
        </dependency>
        <dependency>
            <groupId>org.testcontainers</groupId>
            <artifactId>testcontainers-bom</artifactId>
            <version>1.19.1</version>
            <type>pom</type>
            <scope>import</scope>
        </dependency>
    </dependencies>
</dependencyManagement>

The relevant test dependencies then look like this. I kinda grouped them together, but please note how I include the JUnit launcher mentioned above explicitly. It is not part of the JUnit core dependencies and at least, was not brought in transitively by either Failsafe or Surefire for me:

<dependencies>
    <dependency>
        <groupId>org.junit.jupiter</groupId>
        <artifactId>junit-jupiter</artifactId>
        <scope>test</scope>
    </dependency>
    <dependency>
        <groupId>org.junit.platform</groupId>
        <artifactId>junit-platform-launcher</artifactId>
        <scope>test</scope>
    </dependency>
 
    <dependency>
        <groupId>org.slf4j</groupId>
        <artifactId>slf4j-simple</artifactId>
        <scope>test</scope>
    </dependency>
    <dependency>
        <groupId>org.testcontainers</groupId>
        <artifactId>junit-jupiter</artifactId>
        <scope>test</scope>
    </dependency>
    <dependency>
        <groupId>org.testcontainers</groupId>
        <artifactId>neo4j</artifactId>
        <scope>test</scope>
    </dependency>
</dependencies>

“But Michael, didn’t you think about the fact that you won’t usually get the latest Failsafe and Surefire plugins with default Maven?”—Of course I did. This is how I configure Failsafe. Take note here how I set a (Java) system property for the integration tests. While the UniqueIdTrackingListener is on the class path, it is disabled by default and must be enabled with the property below (yes, I did read sources for that. The rest is just the usual dance for setting up integration tests:

<plugin>
    <groupId>org.apache.maven.plugins</groupId>
    <artifactId>maven-failsafe-plugin</artifactId>
    <version>3.2.1</version>
    <configuration>
        <systemPropertyVariables>
            <junit.platform.listeners.uid.tracking.enabled>true</junit.platform.listeners.uid.tracking.enabled>
        </systemPropertyVariables>
    </configuration>
    <executions>
        <execution>
            <goals>
                <goal>integration-test</goal>
                <goal>verify</goal>
            </goals>
        </execution>
    </executions>
</plugin>

Now onto the GraalVM native-maven-plugin. I usually wrap this into a dedicated profile to be activated with a system property like in the below listing. The documentation says that one must use <extensions>true</extensions> as part of the configuration in order to use the recommended JUnit Platform test listener mode, but that didn’t work for me. I guess it should in theory avoid having to set the above system property.

The next important part in the listing—at least when you want to use Testcontainers—is enabling the GraalVM Reachability Metadata Repository. This repository contains the required configuration shims for quite the number of libraries, including Testcontainers. If you don’t enable it, Testcontainers won’t work in native mode:

<profile>
    <id>native-image</id>
    <activation>
        <property>
            <name>native</name>
        </property>
    </activation>
    <build>
        <plugins>
            <plugin>
                <groupId>org.graalvm.buildtools</groupId>
                <artifactId>native-maven-plugin</artifactId>
                <version>0.9.28</version>
                <extensions>true</extensions>
                <configuration>
                    <metadataRepository>
                        <enabled>true</enabled>
                    </metadataRepository>
                </configuration>
                <executions>
                    <execution>
                        <id>test-native</id>
                        <goals>
                            <goal>test</goal>
                        </goals>
                        <phase>verify</phase>
                    </execution>
                </executions>
            </plugin>
        </plugins>
    </build>
</profile>

One cannot praise the people from VMWares Spring and Spring Boot teams enough for bringing their knowledge of so many libraries into that repository.

With that in a place, you can run your integration tests on the JVM and as a native image like this:

mvn -Dnative verify

You first see the usual dance of integration tests, then the GraalVM compilation

[1/8] Initializing...                                                                                    (5,9s @ 0,25GB)
 Java version: 17.0.9+11-LTS, vendor version: Oracle GraalVM 17.0.9+11.1
 Graal compiler: optimization level: 2, target machine: armv8-a, PGO: off
 C compiler: cc (apple, arm64, 15.0.0)
 Garbage collector: Serial GC (max heap size: 80% of RAM)
 1 user-specific feature(s)
 - org.graalvm.junit.platform.JUnitPlatformFeature
[junit-platform-native] Running in 'test discovery' mode. Note that this is a fallback mode.
[2/8] Performing analysis...  [*****]                                                                   (21,0s @ 1,48GB)
  13.384 (87,45%) of 15.304 types reachable
  22.808 (62,52%) of 36.483 fields reachable
  73.115 (60,68%) of 120.500 methods reachable
   4.250 types, 1.370 fields, and 3.463 methods registered for reflection
      99 types,   102 fields, and   102 methods registered for JNI access
       5 native libraries: -framework CoreServices, -framework Foundation, dl, pthread, z
[3/8] Building universe...                                                                               (2,6s @ 1,40GB)
[4/8] Parsing methods...      [**]                                                                       (2,6s @ 1,62GB)
[5/8] Inlining methods...     [***]                                                                      (1,4s @ 1,49GB)
[6/8] Compiling methods...    [******]                                                                  (37,1s @ 2,99GB)
[7/8] Layouting methods...    [**]                                                                       (4,0s @ 4,12GB)
[8/8] Creating image...       [**]                                                                       (4,4s @ 1,49GB)
  37,40MB (58,34%) for code area:    40.731 compilation units
  25,78MB (40,21%) for image heap:  338.915 objects and 80 resources
 951,36kB ( 1,45%) for other data
  64,11MB in total
------------------------------------------------------------------------------------------------------------------------
Top 10 origins of code area:                                Top 10 object types in image heap:
  14,74MB java.base                                            7,80MB byte[] for code metadata
   5,94MB testcontainers-1.19.1.jar                            3,21MB byte[] for java.lang.String
   3,85MB java.xml                                             2,46MB java.lang.String
   3,79MB svm.jar (Native Image)                               2,44MB byte[] for general heap data
   1,12MB neo4j-java-driver-5.13.0.jar                         2,39MB java.lang.Class
   1,08MB netty-buffer-4.1.99.Final.jar                        1,50MB byte[] for embedded resources
 938,35kB docker-java-transport-zerodep-3.3.3.jar            879,17kB byte[] for reflection metadata
 683,00kB netty-transport-4.1.99.Final.jar                   627,38kB com.oracle.svm.core.hub.DynamicHubCompanion
 655,80kB netty-common-4.1.99.Final.jar                      438,19kB java.util.HashMap$Node
 490,98kB jna-5.12.1.jar                                     384,69kB c.o.svm.core.hub.DynamicHub$ReflectionMetadata
   3,95MB for 58 more packages                                 3,40MB for 2755 more object types
------------------------------------------------------------------------------------------------------------------------

[1/8] Initializing... (5,9s @ 0,25GB) Java version: 17.0.9+11-LTS, vendor version: Oracle GraalVM 17.0.9+11.1 Graal compiler: optimization level: 2, target machine: armv8-a, PGO: off C compiler: cc (apple, arm64, 15.0.0) Garbage collector: Serial GC (max heap size: 80% of RAM) 1 user-specific feature(s) - org.graalvm.junit.platform.JUnitPlatformFeature [junit-platform-native] Running in 'test discovery' mode. Note that this is a fallback mode. [2/8] Performing analysis... [*****] (21,0s @ 1,48GB) 13.384 (87,45%) of 15.304 types reachable 22.808 (62,52%) of 36.483 fields reachable 73.115 (60,68%) of 120.500 methods reachable 4.250 types, 1.370 fields, and 3.463 methods registered for reflection 99 types, 102 fields, and 102 methods registered for JNI access 5 native libraries: -framework CoreServices, -framework Foundation, dl, pthread, z [3/8] Building universe... (2,6s @ 1,40GB) [4/8] Parsing methods... [**] (2,6s @ 1,62GB) [5/8] Inlining methods... [***] (1,4s @ 1,49GB) [6/8] Compiling methods... [******] (37,1s @ 2,99GB) [7/8] Layouting methods... [**] (4,0s @ 4,12GB) [8/8] Creating image... [**] (4,4s @ 1,49GB) 37,40MB (58,34%) for code area: 40.731 compilation units 25,78MB (40,21%) for image heap: 338.915 objects and 80 resources 951,36kB ( 1,45%) for other data 64,11MB in total ------------------------------------------------------------------------------------------------------------------------ Top 10 origins of code area: Top 10 object types in image heap: 14,74MB java.base 7,80MB byte[] for code metadata 5,94MB testcontainers-1.19.1.jar 3,21MB byte[] for java.lang.String 3,85MB java.xml 2,46MB java.lang.String 3,79MB svm.jar (Native Image) 2,44MB byte[] for general heap data 1,12MB neo4j-java-driver-5.13.0.jar 2,39MB java.lang.Class 1,08MB netty-buffer-4.1.99.Final.jar 1,50MB byte[] for embedded resources 938,35kB docker-java-transport-zerodep-3.3.3.jar 879,17kB byte[] for reflection metadata 683,00kB netty-transport-4.1.99.Final.jar 627,38kB com.oracle.svm.core.hub.DynamicHubCompanion 655,80kB netty-common-4.1.99.Final.jar 438,19kB java.util.HashMap$Node 490,98kB jna-5.12.1.jar 384,69kB c.o.svm.core.hub.DynamicHub$ReflectionMetadata 3,95MB for 58 more packages 3,40MB for 2755 more object types ------------------------------------------------------------------------------------------------------------------------

And shortly after that:

[main] INFO org.testcontainers.DockerClientFactory - Checking the system...
[main] INFO org.testcontainers.DockerClientFactory - ✔︎ Docker server version should be at least 1.6.0
[main] INFO tc.neo4j:5.13.0 - Creating container for image: neo4j:5.13.0
[main] INFO tc.neo4j:5.13.0 - Reusing container with ID: 56cc1b02f9b0ebfcc8670f5cdc54b5b3a85a4720e8d810548986369a557482ca and hash: aa81fad313c4f8e37e5b14246fe863c7dbc26db6
[main] INFO tc.neo4j:5.13.0 - Reusing existing container (56cc1b02f9b0ebfcc8670f5cdc54b5b3a85a4720e8d810548986369a557482ca) and not creating a new one
[main] INFO tc.neo4j:5.13.0 - Container neo4j:5.13.0 started in PT0.280264S
demo.RepositoryIT > repositoryShouldWork() SUCCESSFUL
 
 
Test run finished after 608 ms
[         2 containers found      ]
[         0 containers skipped    ]
[         2 containers started    ]
[         0 containers aborted    ]
[         2 containers successful ]
[         0 containers failed     ]
[         1 tests found           ]
[         0 tests skipped         ]
[         1 tests started         ]
[         0 tests aborted         ]
[         1 tests successful      ]
[         0 tests failed          ]

What I like here is the fact that I don’t have anything special in my test classes, no weird hierarchies nor any additional annotation. In the project we are working on we have all our integration tests as a separate Maven module as we want to make sure we are testing the packaged jar proper (also for the fact that we have integration tests for both the Java class- and module path in separate Maven modules). This setup now gives us the additional advantage that the packaging of our library is subject to be tested under native image, too. This will let you discover issues with missing resources etc in native image, too.

Anyway, the whole project is shared as a gist, it’s only three files anyway.

The image of this post was generated with Dall-E by my friend Michael.

Filed in English posts, Java | Comments (1) »

25-Oct-23

Why would a Neo4j person be so found of an embedded, relational database?

I am working since 2018 for Neo4j. At Neo4j I maintain both Spring Data Neo4j and Neo4j-OGM, both object mappers and entity managers for our database product. This is a great job in a great company with awesome colleagues such as my friends Gerrit and Michael.

Some other projects I created on the job are the Cypher-DSL, a builder for our query language Cypher, which is used extensively inside Spring Data Neo4j, in products of our partner Graphaware and by a whole bunch of other customers. Headed for it’s 100th star is Neo4j-Migrations, a database migration toolkit. Last but not least, I did create the original Neo4j Testcontainers module and was recognized as a Testcontainers Champion for that.

However, I did get “socialized” in a previous lifetime in an SQL (actually, an Oracle shop) and while I did swear more than 20 years ago during studies, I will never do anything with SQL, there I was. For whatever reason, my head actually works pretty well with the relational model and the question I can answer with it. I spent about 15 years in the company doing all kinds of things, such as geospatial applications, applications with energy forecasts based on past measurements and stuff like that. What all that had in common was SQL.

Just a couple of months prior to joining Neo4j, I did a talk under the title Live with your SQL-fetish and choose the right tool for the job, in which I presented jOOQ, way back before it became the hot topic. Anyhow, a lot of the dataset from that talk I eventually used on Neo4j specific talks too… Taking my musical habits into a knowledge graph.

What I am trying to say here is: I am deeply involved in the Graph ecosystem, we have a great product and tools, and I’m glad I have a part in those. But I also think that other query languages have their benefits and also that other projects and companies are doing great work, too.

So, DuckDB, what is it? DuckDB is an “DuckDB is an in-process SQL OLAP database management system” and you can go ahead try it out in your browser, because it can be compiled to WASM.

DuckDBs query engine is vector based. Every bit of data that flows through is a collection of vectors on which algorithms often can be applied in parallel; see Execution Format. That’s the groundwork of DuckDBs fast, analytical queries. From Why DuckDB: Online analytical processing (OLAP) workloads are characterized by complex, relatively long-running queries that process significant portions of the stored dataset, for example aggregations over entire tables or joins between several large tables. In the vectorized query execution engine queries are interpreted and processed in large batches of values in one operation. DuckDB can query foreign stores, such as Postgres and SQLite and there are scenarios in which the engine is actually faster while doing this than native Postgres.

When going through their SQL documentation you will be surprised how much you get essentially for free in one small executable. It’s all there: CTEs, Window functions, ASOF joins, Pivot and many things that make SQL friendlier. Back then we had to run big Oracle installation for similar things.

If you follow me on social media you might notice that my focus in private shifted the last years; I have been doing a lot of sport and training, and less doing site projects that involve any big setups. This is where a small tool that runs without a server installation comes in super handy: I was able to define a neat schema for my photovoltaic systems with a bunch of views and now have a good enough dashboard.

My biking page has been remodeled to be a static page these days but uses a DuckDB database beneath, see biking.michael-simons.eu.

At the moment, I neither need or don’t want more. And as sad it might be, I don’t have a graph problem in either of those applications. I want to aggregate measurements, do analytics and that’s it. It’s a lot of time-series that I’m working with and graph doesn’t really help me there.

Those are use cases that are unrelated to work.

But I do have often times use for DuckDB at work, too. Neo4j can natively ingest CSV files. You need to write some Cypher to massage them into the graph you want, but still.That CSV must be properly formatted.

DuckDB on the other hand can read from CSV, Parquet, JSON and other formats as if they are tables. These source can be in files or in URLs. It does not require you to actually create a schema and persist data in its own store, but just can query things. Querying data without persisting the data as a technique might be unusual for a database and seems counter-intuitive at first look, but is useful in the right situations. DuckDB does not need to persist the content, but you are free to create views and over them. You can happily join a CSV file with a JSON file and literally copy the result to a new CSV file that is well suited for Neo4j.

This can all be done either in a CLI or actually as part of a pipeline in a script.

DuckDB replaced a bunch of tools in my daily usage, such as xiv and to some extend, jq. DuckDBs JSON processing capabilities allow you to query and normalize many complex and denormalized JSON documents, but in some cases it’s not enough… It’s just that you can do so much “interesting” things in JSON 😉

Anyhow, a long read, but maybe an explanation why you did see so many toots by me that dealt with DuckDB or even my profile at DuckDB snippets.

I think it’s valuable knowing other technology and also mastering more than one (query) language. I value both my Cypher knowledge and everything I learned about SQL in the past. Hopefully, I can teach some about both, with my current work and any future publication.

And last but not least: Mark Needham, Michael Hunger and myself have worked together to bring out “DuckDB in Action”. The Manning Early Access Program (MEAP) started October 2023 and the book will be released in 2024. We have more than 50% of the content ready and we would love to hear your feedback:

Cover of DuckDB in Action by Needham, Hunger, Simons

Filed in English posts | Comments (2) »

05-Oct-23

Integrate the AuthManager of Neo4j’s Java Driver with Spring Boot

The following post is more a less a dump of code. Since version 5.8 the official Neo4j drivers supports expiration of authentication tokens (see Introduce AuthToken rotation and session auth support. The PR states: “The feature might also be referred to as a refresh or re-auth. In practice, it allows replacing the current token with a new token during the driver’s lifetime. The main objective of this feature is to allow token rotation for the same identity. As such, it is not intended for a change of identity.”

It’s up to you, if you are gonna change identities with it or not, but in theory you can. Personal opinion: It’s actually one of the main reasons I would integrate it into any backend application that is remotely doing anything multitenancy with it. Why? The impersonation feature of the driver that also exists does not work with credentials checking by default, so go figure: The one thing you want to have in a backend application (one driver instance transparently checking privileges for different tenants authenticated via a token (be it bearer or username/password), either is discouraged but works or does not work.

Normally, I would suggest using a org.springframework.boot.autoconfigure.neo4j.ConfigBuilderCustomizer for changing anything related to the driver’s config, as it would spare me duplicating all the crap below (as described in my post Tailor-Made Neo4j Connectivity With Spring Boot 2.4+), but sadly, the org.neo4j.driver.AuthTokenManager is not configurable via the config. I therefor have opened a pull request over at Spring Boot to allow the detection of an AuthTokenManager bean which hopefully will make it into Spring Boot, rendering the stuff below unnecessary (See Spring Boot PR #36650). For now, I suggest duplicating a couple of pieces of from Spring Boot which turns environment properties into configuration, so that you can still completely rely on the standard properties. The relevant piece in the config is the driver method that – for now – is required to add an AuthTokenManager option to the driver. You are completely free to create one using the factory methods the driver provides or create a custom implementation of the interface. Some ideas are mentioned in the inline comments.

import java.io.File;
import java.net.URI;
import java.time.Duration;
import java.time.ZonedDateTime;
import java.util.Locale;
import java.util.concurrent.TimeUnit;
 
import org.neo4j.driver.AuthTokenManagers;
import org.neo4j.driver.Config;
import org.neo4j.driver.Driver;
import org.neo4j.driver.GraphDatabase;
import org.springframework.boot.autoconfigure.neo4j.Neo4jConnectionDetails;
import org.springframework.boot.autoconfigure.neo4j.Neo4jProperties;
import org.springframework.boot.context.properties.source.InvalidConfigurationPropertyValueException;
import org.springframework.context.annotation.Bean;
import org.springframework.context.annotation.Configuration;
 
@Configuration(proxyBeanMethods = false)
public class Neo4jCustomAuthConfig {
 
   /**
    *
    * @param connectionDetails I'm using the Spring Boot 3.1 abstraction over all service connection details here, 
    *                          so that the cool new container integration I described last week in
    *                          <a href="https://info.michael-simons.eu/2023/07/27/the-best-way-to-use-testcontainers-from-your-spring-boot-tests/">The best way to use Testcontainers from your Spring Boot Tests</a> still applies
    *                          If you are not on Spring Boot 3.1, this class is not available. Remove that argument and
    *                          just use {@link Neo4jProperties#getUri()} then.
    * @param neo4jProperties   Injected so that pool and other connection settings are still configured from the default propertes / environment
    * @return The driver to be used in the application and further down the stack in Spring Data Neo4j
    */
   @Bean
   Driver driver(Neo4jConnectionDetails connectionDetails, Neo4jProperties neo4jProperties) {
 
      // Right now, the factory for AuthTokenManagers only supports expiration based tokens.
      // This is mostly useful for anything token related. You could hook this into Spring Security
      // for example and pass on any JWT token.
      // Another option is using the username and password like here and grab an additional expiration date, i.e. from
      // the config or the environment. When the expiration date is reached, the supplier passed to the factory
      // method will be asked for a new token. This can be a new token or a new username and password configuration.
      // Take note that there is no way to actively trigger an expiration.
      // This would require changes in the Neo4j-Java-Driver:
      // <a href="https://github.com/neo4j/neo4j-java-driver/issues/new">Open an issue</a>.
      var authManager = AuthTokenManagers.expirationBased(
         // Here I'm just using the token from the connection. This must be ofc something else for anything that should make sense
         () -> connectionDetails.getAuthToken()
            .expiringAt(ZonedDateTime.now().plusHours(1).toInstant().toEpochMilli())
      );
 
      // You can totally run your own AuthManager, too
      /*
      authManager = new AuthTokenManager() {
         @Override
         public CompletionStage<AuthToken> getToken() {
            return CompletableFuture.completedFuture(connectionDetails.getAuthToken());
         }
 
         @Override
         public void onExpired(AuthToken authToken) {
            // React accordingly
         }
      }
      */
 
      var uri = connectionDetails.getUri(); // or for older boot versions neo4jProperties.getUri()
      var config = doAllTheStuffSpringBootCouldDoIfAuthManagerWasConfigurableViaConfig(uri, neo4jProperties);
 
      return GraphDatabase.driver(uri, authManager, config);
   }
 
   // Everything below is a verbatim copy from spring boot for the most relevant pieces
   // that can be configured via properties.
   // As of know, pick what you need or add what's missing.
 
   Config doAllTheStuffSpringBootCouldDoIfAuthManagerWasConfigurableViaConfig(URI uri, Neo4jProperties neo4jProperties) {
 
      var builder = Config.builder();
 
      var scheme = uri.getScheme().toLowerCase(Locale.ROOT);
      if (scheme.equals("bolt") || scheme.equals("neo4j")) {
         var securityProperties = neo4jProperties.getSecurity();
         if (securityProperties.isEncrypted()) {
            builder.withEncryption();
         } else {
            builder.withoutEncryption();
         }
         builder.withTrustStrategy(mapTrustStrategy(securityProperties));
         if (securityProperties.isEncrypted()) {
            builder.withEncryption();
         } else {
            builder.withoutEncryption();
         }
      }
      builder.withConnectionTimeout(neo4jProperties.getConnectionTimeout().toMillis(), TimeUnit.MILLISECONDS);
      builder.withMaxTransactionRetryTime(neo4jProperties.getMaxTransactionRetryTime().toMillis(), TimeUnit.MILLISECONDS);
 
      var pool = neo4jProperties.getPool();
 
      if (pool.isLogLeakedSessions()) {
         builder.withLeakedSessionsLogging();
      }
      builder.withMaxConnectionPoolSize(pool.getMaxConnectionPoolSize());
      Duration idleTimeBeforeConnectionTest = pool.getIdleTimeBeforeConnectionTest();
      if (idleTimeBeforeConnectionTest != null) {
         builder.withConnectionLivenessCheckTimeout(idleTimeBeforeConnectionTest.toMillis(), TimeUnit.MILLISECONDS);
      }
      builder.withMaxConnectionLifetime(pool.getMaxConnectionLifetime().toMillis(), TimeUnit.MILLISECONDS);
      builder.withConnectionAcquisitionTimeout(pool.getConnectionAcquisitionTimeout().toMillis(),
         TimeUnit.MILLISECONDS);
      if (pool.isMetricsEnabled()) {
         builder.withDriverMetrics();
      } else {
         builder.withoutDriverMetrics();
      }
 
      return builder.build();
   }
 
   private Config.TrustStrategy mapTrustStrategy(Neo4jProperties.Security securityProperties) {
 
      String propertyName = "spring.neo4j.security.trust-strategy";
      Neo4jProperties.Security.TrustStrategy strategy = securityProperties.getTrustStrategy();
      Config.TrustStrategy trustStrategy = createTrustStrategy(securityProperties, propertyName, strategy);
      if (securityProperties.isHostnameVerificationEnabled()) {
         trustStrategy.withHostnameVerification();
      } else {
         trustStrategy.withoutHostnameVerification();
      }
      return trustStrategy;
   }
 
   private Config.TrustStrategy createTrustStrategy(Neo4jProperties.Security securityProperties, String propertyName,
      Neo4jProperties.Security.TrustStrategy strategy) {
      switch (strategy) {
         case TRUST_ALL_CERTIFICATES:
            return Config.TrustStrategy.trustAllCertificates();
         case TRUST_SYSTEM_CA_SIGNED_CERTIFICATES:
            return Config.TrustStrategy.trustSystemCertificates();
         case TRUST_CUSTOM_CA_SIGNED_CERTIFICATES:
            File certFile = securityProperties.getCertFile();
            if (certFile == null || !certFile.isFile()) {
               throw new InvalidConfigurationPropertyValueException(propertyName, strategy.name(),
                  "Configured trust strategy requires a certificate file.");
            }
            return Config.TrustStrategy.trustCustomCertificateSignedBy(certFile);
         default:
            throw new InvalidConfigurationPropertyValueException(propertyName, strategy.name(),
               "Unknown strategy.");
      }
   }
}

import java.io.File; import java.net.URI; import java.time.Duration; import java.time.ZonedDateTime; import java.util.Locale; import java.util.concurrent.TimeUnit; import org.neo4j.driver.AuthTokenManagers; import org.neo4j.driver.Config; import org.neo4j.driver.Driver; import org.neo4j.driver.GraphDatabase; import org.springframework.boot.autoconfigure.neo4j.Neo4jConnectionDetails; import org.springframework.boot.autoconfigure.neo4j.Neo4jProperties; import org.springframework.boot.context.properties.source.InvalidConfigurationPropertyValueException; import org.springframework.context.annotation.Bean; import org.springframework.context.annotation.Configuration; @Configuration(proxyBeanMethods = false) public class Neo4jCustomAuthConfig { /** * * @param connectionDetails I'm using the Spring Boot 3.1 abstraction over all service connection details here, * so that the cool new container integration I described last week in * <a href="https://info.michael-simons.eu/2023/07/27/the-best-way-to-use-testcontainers-from-your-spring-boot-tests/">The best way to use Testcontainers from your Spring Boot Tests</a> still applies * If you are not on Spring Boot 3.1, this class is not available. Remove that argument and * just use {@link Neo4jProperties#getUri()} then. * @param neo4jProperties Injected so that pool and other connection settings are still configured from the default propertes / environment * @return The driver to be used in the application and further down the stack in Spring Data Neo4j */ @Bean Driver driver(Neo4jConnectionDetails connectionDetails, Neo4jProperties neo4jProperties) { // Right now, the factory for AuthTokenManagers only supports expiration based tokens. // This is mostly useful for anything token related. You could hook this into Spring Security // for example and pass on any JWT token. // Another option is using the username and password like here and grab an additional expiration date, i.e. from // the config or the environment. When the expiration date is reached, the supplier passed to the factory // method will be asked for a new token. This can be a new token or a new username and password configuration. // Take note that there is no way to actively trigger an expiration. // This would require changes in the Neo4j-Java-Driver: // <a href="https://github.com/neo4j/neo4j-java-driver/issues/new">Open an issue</a>. var authManager = AuthTokenManagers.expirationBased( // Here I'm just using the token from the connection. This must be ofc something else for anything that should make sense () -> connectionDetails.getAuthToken() .expiringAt(ZonedDateTime.now().plusHours(1).toInstant().toEpochMilli()) ); // You can totally run your own AuthManager, too /* authManager = new AuthTokenManager() { @Override public CompletionStage<AuthToken> getToken() { return CompletableFuture.completedFuture(connectionDetails.getAuthToken()); } @Override public void onExpired(AuthToken authToken) { // React accordingly } } */ var uri = connectionDetails.getUri(); // or for older boot versions neo4jProperties.getUri() var config = doAllTheStuffSpringBootCouldDoIfAuthManagerWasConfigurableViaConfig(uri, neo4jProperties); return GraphDatabase.driver(uri, authManager, config); } // Everything below is a verbatim copy from spring boot for the most relevant pieces // that can be configured via properties. // As of know, pick what you need or add what's missing. Config doAllTheStuffSpringBootCouldDoIfAuthManagerWasConfigurableViaConfig(URI uri, Neo4jProperties neo4jProperties) { var builder = Config.builder(); var scheme = uri.getScheme().toLowerCase(Locale.ROOT); if (scheme.equals("bolt") || scheme.equals("neo4j")) { var securityProperties = neo4jProperties.getSecurity(); if (securityProperties.isEncrypted()) { builder.withEncryption(); } else { builder.withoutEncryption(); } builder.withTrustStrategy(mapTrustStrategy(securityProperties)); if (securityProperties.isEncrypted()) { builder.withEncryption(); } else { builder.withoutEncryption(); } } builder.withConnectionTimeout(neo4jProperties.getConnectionTimeout().toMillis(), TimeUnit.MILLISECONDS); builder.withMaxTransactionRetryTime(neo4jProperties.getMaxTransactionRetryTime().toMillis(), TimeUnit.MILLISECONDS); var pool = neo4jProperties.getPool(); if (pool.isLogLeakedSessions()) { builder.withLeakedSessionsLogging(); } builder.withMaxConnectionPoolSize(pool.getMaxConnectionPoolSize()); Duration idleTimeBeforeConnectionTest = pool.getIdleTimeBeforeConnectionTest(); if (idleTimeBeforeConnectionTest != null) { builder.withConnectionLivenessCheckTimeout(idleTimeBeforeConnectionTest.toMillis(), TimeUnit.MILLISECONDS); } builder.withMaxConnectionLifetime(pool.getMaxConnectionLifetime().toMillis(), TimeUnit.MILLISECONDS); builder.withConnectionAcquisitionTimeout(pool.getConnectionAcquisitionTimeout().toMillis(), TimeUnit.MILLISECONDS); if (pool.isMetricsEnabled()) { builder.withDriverMetrics(); } else { builder.withoutDriverMetrics(); } return builder.build(); } private Config.TrustStrategy mapTrustStrategy(Neo4jProperties.Security securityProperties) { String propertyName = "spring.neo4j.security.trust-strategy"; Neo4jProperties.Security.TrustStrategy strategy = securityProperties.getTrustStrategy(); Config.TrustStrategy trustStrategy = createTrustStrategy(securityProperties, propertyName, strategy); if (securityProperties.isHostnameVerificationEnabled()) { trustStrategy.withHostnameVerification(); } else { trustStrategy.withoutHostnameVerification(); } return trustStrategy; } private Config.TrustStrategy createTrustStrategy(Neo4jProperties.Security securityProperties, String propertyName, Neo4jProperties.Security.TrustStrategy strategy) { switch (strategy) { case TRUST_ALL_CERTIFICATES: return Config.TrustStrategy.trustAllCertificates(); case TRUST_SYSTEM_CA_SIGNED_CERTIFICATES: return Config.TrustStrategy.trustSystemCertificates(); case TRUST_CUSTOM_CA_SIGNED_CERTIFICATES: File certFile = securityProperties.getCertFile(); if (certFile == null || !certFile.isFile()) { throw new InvalidConfigurationPropertyValueException(propertyName, strategy.name(), "Configured trust strategy requires a certificate file."); } return Config.TrustStrategy.trustCustomCertificateSignedBy(certFile); default: throw new InvalidConfigurationPropertyValueException(propertyName, strategy.name(), "Unknown strategy."); } } }

Happy coding.

Titel picture from Collin at Unsplash.

Filed in English posts, Java | Comments (0) »

31-Jul-23

The best way to use Testcontainers from your Spring Boot tests!

After a long time of blog hiatus, I was in the mood of trying out one of these “The best way to XYZ” posts for once.

While Spring Boot 3 and Spring Framework 6 releases have focused a lot on revamping the application context and annotation processing for GraalVM native image compatibility (and “boring” tasks like Java EE to Jakarta EE migrations), Spring Boot 3.1 and the corresponding framework edition come with a lot of cool changes.
While I was evaluating them for my team (and actually, for a new Spring Infographic coming out later this year), I especially dove into the new `@ServiceConnection` and the related infrastructure.

@ServiceConnection comes together with a hierarchy of interfaces, starting at ConnectionDetails. You might wonder what’s the fuss about that marker interface, especially when you come only from a relatively standardised JDBC abstraction: It makes it possible to abstract connections away from a second angle. Do configuration values come from property sources or something else? In that case, from information that Testcontainers provide. ConnectionDetails is just the entry point to JdbcConnectionDetails or other more specific ones, such as Neo4jConnectionDetails. Below those, concrete classes exists to connect to services.

The nice thing is: You don’t have to deal with that a lot, because there are many existing implementations:

Cassandra
Couchbase
Elasticsearc
Generic JDBC or specialised JDBC such as MariaDB, MySQl, Oracle, PostgreSQL
Kafka
MongoDB
Neo4j
RabbitMQ
Redpanda

More about those features in Spring Boot 3.1’s ConnectionDetails abstraction and Improved Testcontainers Support in Spring Boot 3.1.

We will focus on the latter. What has been pestering me for a while: If you use Testcontainers JUnit 5 extensions to integrate containers with Spring Boot test, you end up in a scenario in which two systems try to manage resources over a lifetime, which is not ideal.
If you have solved that, you must learn about @DynamicPropertySource and friends, as demonstrated by Maciej nicely.

With some good combinations of @TestConfiguration, the aforementioned new stuff and some good chunk of explicitness, this is a solved issue now.

Take this application as an example (I smashed everything in one class so that it is visible as a whole, not as a best practices, but well, I don’t care if you do program in production like this, I have seen worse):

import java.util.List;
import java.util.UUID;
 
import org.springframework.boot.SpringApplication;
import org.springframework.boot.autoconfigure.SpringBootApplication;
import org.springframework.data.neo4j.core.schema.GeneratedValue;
import org.springframework.data.neo4j.core.schema.Id;
import org.springframework.data.neo4j.core.schema.Node;
import org.springframework.data.neo4j.repository.Neo4jRepository;
import org.springframework.data.neo4j.repository.config.EnableNeo4jRepositories;
import org.springframework.web.bind.annotation.GetMapping;
import org.springframework.web.bind.annotation.RestController;
 
@SpringBootApplication
@EnableNeo4jRepositories(considerNestedRepositories = true)
public class MyApplication {
 
	public static void main(String[] args) {
		SpringApplication.run(MyApplication.class, args);
	}
 
	@Node
	public record Movie(@Id @GeneratedValue(GeneratedValue.UUIDGenerator.class) String id, String title) {
 
		Movie(String title) {
			this(UUID.randomUUID().toString(), title);
		}
	}
 
	interface MovieRepository extends Neo4jRepository<Movie, String> {
	}
 
	@RestController
	static class MovieController {
 
		private final MovieRepository movieRepository;
 
		public MovieController(MovieRepository movieRepository) {
			this.movieRepository = movieRepository;
		}
 
		@GetMapping("/movies")
		public List<Movie> getMovies() {
			return movieRepository.findAll();
		}
	}
}

You will want the following dependencies in your test scope:

<dependency>
	<groupId>org.springframework.boot</groupId>
	<artifactId>spring-boot-starter-test</artifactId>
	<scope>test</scope>
</dependency>
<dependency>
	<groupId>org.springframework.boot</groupId>
	<artifactId>spring-boot-testcontainers</artifactId>
	<scope>test</scope>
</dependency>
<dependency>
	<groupId>org.testcontainers</groupId>
	<artifactId>neo4j</artifactId>
	<scope>test</scope>
</dependency>

We use @TestConfiguration to provide an additional test configuration. The @TestConfiguration over @Configuration is twofold: Unlike regular @Configuration classes it does not prevent auto-detection of @SpringBootConfiguration. And: It must be imported explicitly unless it is an inner static class to a test class. The code below than has one @Bean method with @ServiceConnection. The method returns a Neo4jContainer Testcontainer. That container is marked as reusable. As we don’t close that resource by default, we let Testcontainers take care of cleaning it up. When marked as reusable, it will be kept alive and around, meaning a second test run will be much fast (See my video about that topic, if you don’t like it, there’s cycling and food in it as well and fwiw, I also put this into written words over at Foojay.io). The container also carries a special label, which is irrelevant for this config, but will be used later. We will then also address the @RestartScope annotation.

This definition provides enough information for the context so that Spring can bring up that container, rewire all the connections for Neo4j to it and everything just works.

import java.util.Map;
 
import org.springframework.boot.devtools.restart.RestartScope;
import org.springframework.boot.test.context.TestConfiguration;
import org.springframework.boot.testcontainers.service.connection.ServiceConnection;
import org.springframework.context.annotation.Bean;
import org.testcontainers.containers.Neo4jContainer;
 
@TestConfiguration(proxyBeanMethods = false)
public class ContainerConfig {
 
	@Bean
	@ServiceConnection
	@RestartScope
	public Neo4jContainer<?> neo4jContainer() {
		return new Neo4jContainer<>("neo4j:5")
			.withLabels(Map.of("com.testcontainers.desktop.service", "neo4j"))
			.withReuse(true);
	}
}

Putting this into action might look like this. You might notice how to import the config and the absence of messing with properties.

import static org.assertj.core.api.Assertions.assertThat;
 
import org.junit.jupiter.api.Test;
import org.springframework.beans.factory.annotation.Autowired;
import org.springframework.boot.test.context.SpringBootTest;
import org.springframework.context.annotation.Import;
 
@SpringBootTest
@Import(ContainerConfig.class)
class MyApplicationTests {
 
	@Test
	void repositoryIsConnectedAndUsable(
		@Autowired MyApplication.MovieRepository movieRepository
	) {
		var movie = movieRepository.save(new MyApplication.Movie("Barbieheimer"));
		assertThat(movie.id()).isNotNull();
	}
}

To be fair, there are a bunch of other ways – also in the official docs, but I like this by far the best. It’s clear and concise, by being pretty explicit and sticking to one set of annotations (from Spring).

Now about @RestartScope. That annotation is there fore a reason: You might have Springs devtools on the class path which will restart the context if necessary. When the context restarts, the container will restart, defeating the reusable flag. The annotation keeps the original bean around. Why is this relevant? The new Testcontainers support really works well with the concept of “developer services” as introduced by Quarkus. Originally, we only wanted to do test driven development, but than something happened, things getting rushed and in the end, explorative work is fun: So bringing up your application together with a running instance of a database or service feels a lot like “batteries included” and can make you very productive.

Spring Boot supports this now, too, but keeps it (by default) very explicit and also restricted to testing scope. org.springframework.boot.SpringApplication has a new with method used to augment an automatic configured application with additional config. We can use the above ContainerConfig in an additional main-class living in our test scope like this:

import org.springframework.boot.SpringApplication;
 
public class MyApplicationWithDevServices {
 
	public static void main(String[] args) {
		SpringApplication.from(MyApplication::main)
			.with(ContainerConfig.class)
			.run(args);
	}
}

Starting this app, a request to http://localhost:8080/movies immediately works, connected against a Neo4j instance running in a container.

Now the best for last, what about that ominous label I added to the container? I am a happy Testcontainers Cloud user and I have their service running on my machine. This automatically redirects any Testcontainers container request to the cloud and I don’t need Docker on my machine.

There’s also the possibility to define fixed port-mappings for both containers running in the cloud and locally as described here Set fixed ports to easily debug development services.

I have the following configuration on my machine:

more /Users/msimons/.config/testcontainers/services/neo4j.toml 
 
# This example selects neo4j instances and forwards port 7687 to 7687 on the client.
# Same for the Neo4j HTTP port
# Instances are found by selecting containers with label "com.testcontainers.desktop.service=neo4j".
 
# ports defines which ports to proxy.
# local-port indicates which port to listen on the client machine. System ports (0 to 1023) are not supported.
# container-port indicates which port to proxy. If unset, container-port will default to local-port.
ports = [
  {local-port = 7687, container-port = 7687},
  {local-port = 7474, container-port = 7474}
]

This allows me to access the Neo4j instance started by MyApplicationWithDevServices above under the well known Neo4j ports, allowing things like this:

# Use Cypher-Shell to create some data
cypher-shell -uneo4j -ppassword "CREATE (:Movie {id: randomUuid(), title: 'Dune 2'})"
# 0 rows
# ready to start consuming query after 15 ms, results consumed after another 0 ms
# Added 1 nodes, Set 2 properties, Added 1 labels
 
# Request the data from the application running with dev services
http localhost:8080/movies                                                           
 
# HTTP/1.1 200 
# Connection: keep-alive
# Content-Type: application/json
# Date: Thu, 27 Jul 2023 13:32:58 GMT
# Keep-Alive: timeout=60
# Transfer-Encoding: chunked
#
# [
#    {
#        "id": "824ec97e-0a97-4516-8189-f0bf5eb215fe",
#        "title": "Dune 2"
#    }
# ]

And with this, happy coding.

Update: I’m happy that Sergei personally read my post and rightfully noticed me that having fixed ports is possible with local and cloud Testcontainers. I edited the instructions accordingly. Thanks, buddy!

Filed in English posts, Java | Comments (3) »

27-Jul-23

Spring Data and Spring Data Neo4j join the “No OFFSET Movement!”

Markus Winand has blogged about tool support for keyset pagination nearly a decade ago and of course my friend Lukas Eder has picked up that topic and did not only blog about it several times but implemented tool support with the the synthetic seek-clause of jOOQ. As the seek clause in jOOQ is excellent for relational databases, I’m gonna refrain now for calling the following the “very best way” of doing keyset based pagination and leave that to others.

So what is this about? The Spring Data Commons project – that is the base project for a broad variety of store implementations such as JPA, MongoDB, Redis and certainly Neo4j – added infrastructure support for keyset-based scrolling.

How does that look like from a Spring Data repositories point of view?

import java.util.UUID;
 
import org.springframework.data.domain.ScrollPosition;
import org.springframework.data.domain.Sort;
import org.springframework.data.domain.Window;
import org.springframework.data.neo4j.integration.shared.common.ScrollingEntity;
import org.springframework.data.neo4j.repository.Neo4jRepository;
 
public interface ScrollingRepository extends Neo4jRepository<ScrollingEntity, UUID> {
 
	Window<ScrollingEntity> findTop4By(Sort sort, ScrollPosition position);
}

Keyset based pagination drops the notion of offset completely. It is dependent on an associated sort object, in this case given through the first parameter, the Sort object. As with all pagination efforts, we need to know how many items per page shall be retrieved. In this case, 4. This is determined by the derived finder method. The scroll position (the second parameter) determines the offset.

The above will be possible from the next Spring Data releases for the MongoDB- and Neo4j implementations. Some stores might offer additional support on their data access templates, Neo4j does not as of writing (we just added the feature just days prior to the current RC).

The beauty of the above is: For you as a user, the calling of this just works the same for all the stores. Imagine the following simple entity:

@Node
public class ScrollingEntity {
 
	@Id
	@GeneratedValue
	private UUID id;
 
	@Property("foobar")
	private String a;
 
	private Integer b;
 
	private LocalDateTime c;
}

And some test data (here, Neo4j is being used):

Connected to Neo4j using Bolt protocol version 5.0 at neo4j://localhost:7687 as user neo4j.
Type :help for a list of available commands or :exit to exit the shell.
Note that Cypher queries must end with a semicolon.
neo4j@neo4j> match (n:ScrollingEntity) return n order by n.b asc, n.a desc;
+-----------------------------------------------------------------------------------------------------------------+
| n                                                                                                               |
+-----------------------------------------------------------------------------------------------------------------+
| (:ScrollingEntity {b: 0, foobar: "A0", c: 2023-03-20T13:12:25.201, id: "c2c2ebe4-5a02-4d77-a53b-1abbc80aaad9"}) |
| (:ScrollingEntity {b: 1, foobar: "B0", c: 2023-03-21T13:12:29.201, id: "f4f84ed4-632d-431e-bb1a-b829bc2eaf5d"}) |
| (:ScrollingEntity {b: 2, foobar: "C0", c: 2023-03-22T13:12:39.201, id: "f1c088f8-0b7b-456b-99b3-db5a0199dec6"}) |
| (:ScrollingEntity {b: 3, foobar: "D0", c: 2023-03-23T13:12:31.201, id: "3b223485-e81b-4be8-8dbd-50277d313a8b"}) |
| (:ScrollingEntity {b: 3, foobar: "D0", c: 2023-03-23T13:12:31.201, id: "1f525d3d-cdfe-40a6-964b-1fbfc08fae99"}) |
| (:ScrollingEntity {b: 4, foobar: "E0", c: 2023-03-24T13:12:41.201, id: "572b780e-256f-41b7-87de-4a130bc3814b"}) |
| (:ScrollingEntity {b: 5, foobar: "F0", c: 2023-03-25T13:12:25.201, id: "457ec454-a9af-421c-a9c1-7f5ce95310c5"}) |
| (:ScrollingEntity {b: 6, foobar: "G0", c: 2023-03-26T13:12:55.201, id: "b423c34b-6952-4b73-b06b-d039cf7c7e7b"}) |
| (:ScrollingEntity {b: 7, foobar: "H0", c: 2023-03-27T13:13:00.201, id: "ca90cd25-a676-44d4-a4c2-2db32443bf2f"}) |
| (:ScrollingEntity {b: 8, foobar: "I0", c: 2023-03-28T13:12:57.201, id: "59a5dfb2-0e17-4eeb-aecd-95bb555e0117"}) |
+-----------------------------------------------------------------------------------------------------------------+
 
10 rows
ready to start consuming query after 55 ms, results consumed after another 3 ms
neo4j@neo4j>

Connected to Neo4j using Bolt protocol version 5.0 at neo4j://localhost:7687 as user neo4j. Type :help for a list of available commands or :exit to exit the shell. Note that Cypher queries must end with a semicolon. neo4j@neo4j> match (n:ScrollingEntity) return n order by n.b asc, n.a desc; +-----------------------------------------------------------------------------------------------------------------+ | n | +-----------------------------------------------------------------------------------------------------------------+ | (:ScrollingEntity {b: 0, foobar: "A0", c: 2023-03-20T13:12:25.201, id: "c2c2ebe4-5a02-4d77-a53b-1abbc80aaad9"}) | | (:ScrollingEntity {b: 1, foobar: "B0", c: 2023-03-21T13:12:29.201, id: "f4f84ed4-632d-431e-bb1a-b829bc2eaf5d"}) | | (:ScrollingEntity {b: 2, foobar: "C0", c: 2023-03-22T13:12:39.201, id: "f1c088f8-0b7b-456b-99b3-db5a0199dec6"}) | | (:ScrollingEntity {b: 3, foobar: "D0", c: 2023-03-23T13:12:31.201, id: "3b223485-e81b-4be8-8dbd-50277d313a8b"}) | | (:ScrollingEntity {b: 3, foobar: "D0", c: 2023-03-23T13:12:31.201, id: "1f525d3d-cdfe-40a6-964b-1fbfc08fae99"}) | | (:ScrollingEntity {b: 4, foobar: "E0", c: 2023-03-24T13:12:41.201, id: "572b780e-256f-41b7-87de-4a130bc3814b"}) | | (:ScrollingEntity {b: 5, foobar: "F0", c: 2023-03-25T13:12:25.201, id: "457ec454-a9af-421c-a9c1-7f5ce95310c5"}) | | (:ScrollingEntity {b: 6, foobar: "G0", c: 2023-03-26T13:12:55.201, id: "b423c34b-6952-4b73-b06b-d039cf7c7e7b"}) | | (:ScrollingEntity {b: 7, foobar: "H0", c: 2023-03-27T13:13:00.201, id: "ca90cd25-a676-44d4-a4c2-2db32443bf2f"}) | | (:ScrollingEntity {b: 8, foobar: "I0", c: 2023-03-28T13:12:57.201, id: "59a5dfb2-0e17-4eeb-aecd-95bb555e0117"}) | +-----------------------------------------------------------------------------------------------------------------+ 10 rows ready to start consuming query after 55 ms, results consumed after another 3 ms neo4j@neo4j>

All of that can be consumed in the simplest way like that:

import static org.assertj.core.api.Assertions.assertThat;
 
import java.util.ArrayList;
 
import org.junit.jupiter.api.Test;
import org.springframework.beans.factory.annotation.Autowired;
import org.springframework.data.domain.KeysetScrollPosition;
import org.springframework.data.domain.Sort;
import org.springframework.data.domain.WindowIterator;
import org.springframework.data.neo4j.integration.imperative.repositories.ScrollingRepository;
import org.springframework.data.neo4j.integration.shared.common.ScrollingEntity;
 
class KeysetBasedScrollingIT {
 
	@Test
	void forwardWithDuplicatesIteratorIteration(@Autowired ScrollingRepository repository) {
 
		var sort = Sort.by(Sort.Order.asc("b"), Sort.Order.desc("a"));
		var it = WindowIterator
				.of(pos -> repository.findTop4By(sort, pos))
				.startingAt(KeysetScrollPosition.initial());
 
		var content = new ArrayList<ScrollingEntity>();
		while (it.hasNext()) {
			var next = it.next();
			content.add(next);
		}
 
		assertThat(content).hasSize(10);
		assertThat(content.stream().map(ScrollingEntity::getId)
				.distinct().toList()).hasSize(10);
	}
}

Here, the content is sorted by b in ascending and by a in descending order, 4 pieces at a time. The WindowIterator starts at the initial position and tanks in a window providing function. That window will scroll over the dataset. What queries are generated?

The initial query looks like this, retrieving n+1 elements sorted in the order specified:

MATCH (scrollingEntity:`ScrollingEntity`)
RETURN scrollingEntity 
ORDER BY scrollingEntity.b, scrollingEntity.foobar DESC, scrollingEntity.id LIMIT 5

Why n+1 elements? It’s an easy way to judge if there are more elements available to scroll further or not, without going through an additional counting query.

The next query looks like this:

MATCH (scrollingEntity:`ScrollingEntity`)
WHERE ((scrollingEntity.b > $pcdsl01
    OR (scrollingEntity.b = $pcdsl01
      AND scrollingEntity.foobar < $pcdsl02))
  OR (scrollingEntity.b = $pcdsl01
    AND scrollingEntity.foobar = $pcdsl02
    AND scrollingEntity.id > $pcdsl03))
RETURN scrollingEntity 
ORDER BY scrollingEntity.b ASC, scrollingEntity.foobar DESC, scrollingEntity.id ASC LIMIT 5

This has now 3 parameters:

:param pcdsl01 => 3
:param pcdsl02 => "D0"
:param pcdsl03 => "282ac053-c821-47f4-9a0e-0d12e0b91808"

The next page starts at an element whose b attribute is greater 3 or is equal to 3 and as a footer attribute lower than D0. If you look closely, the 4th and 5th poses a nice test case for our tooling and made it hopefully clear why we add the 3rd condition here: To add uniqueness to the keyset on which we paginate.

The Spring Data Commons and Spring Data Neo4j implementation of the keyset based pagination is quite sophisticated in that sense that it allows for different directions for different columns to sort by, in contrast to a simple tuple comparison (in the above case something along the lines where [n.b, n.foobar] >= [$pcdsl01, $pcdsl02, $pcdsl03] (also not a real tuple in Neo4j but a list comparison).

Above I presented using this feature with the new WindowIterator, but you can also control this manually like this:

import static org.assertj.core.api.Assertions.assertThat;
 
import java.util.function.Function;
 
import org.assertj.core.data.Index;
import org.junit.jupiter.api.Test;
import org.springframework.beans.factory.annotation.Autowired;
import org.springframework.data.domain.KeysetScrollPosition;
import org.springframework.data.domain.Sort;
import org.springframework.data.neo4j.integration.imperative.repositories.ScrollingRepository;
import org.springframework.data.neo4j.integration.shared.common.ScrollingEntity;
 
class KeysetBasedScrollingIT {
 
	@Test
	void forwardWithDuplicatesManualIteration(@Autowired ScrollingRepository repository) {
 
		var duplicates = repository.findAllByAOrderById("D0");
		assertThat(duplicates).hasSize(2);
 
		var sort = Sort.by(Sort.Order.asc("b"), Sort.Order.desc("a"));
		var window = repository.findTop4By(sort, KeysetScrollPosition.initial());
		assertThat(window.hasNext()).isTrue();
		assertThat(window)
				.hasSize(4)
				.extracting(Function.identity())
				.satisfies(e -> assertThat(e.getId()).isEqualTo(duplicates.get(0).getId()), Index.atIndex(3))
				.extracting(ScrollingEntity::getA)
				.containsExactly("A0", "B0", "C0", "D0");
 
		window = repository.findTop4By(sort, window.positionAt(window.size() - 1));
		assertThat(window.hasNext()).isTrue();
		assertThat(window)
				.hasSize(4)
				.extracting(Function.identity())
				.satisfies(e -> assertThat(e.getId()).isEqualTo(duplicates.get(1).getId()), Index.atIndex(0))
				.extracting(ScrollingEntity::getA)
				.containsExactly("D0", "E0", "F0", "G0");
 
		window = repository.findTop4By(sort, window.positionAt(window.size() - 1));
		assertThat(window.isLast()).isTrue();
		assertThat(window).extracting(ScrollingEntity::getA)
				.containsExactly("H0", "I0");
	}
}

The key classes in that API are org.springframework.data.domain.Window and the org.springframework.data.domain.KeysetScrollPosition. The first one contains data and information at which position the scrolling window is positioned at any time, the latter is a value object for the current keys.

Last but not least, the API has full support for scrolling backwards as well. Scrolling backwards requires inverting the order for each column individually. At the end of the post I’ll add a link how we did this in Spring Data Neo4j without going through the pain of fiddling around with strings. Just inverting the operator in the generated order is however not enough when going backwards. By doing so only, the window would jump right back to the beginning. As we only know the keys at which the window arrived and not the keys n positions backward, we can solve this issue by inverting the sort order as whole too, match and collect and than recreating the sort on the client side again (we have some ideas for a more sophisticated Cypher based solution, though).

Implementing this feature for Spring Data Neo4j has been quite a nice experience. All our query generation goes through the Cypher-DSL which is in essence a builder for Cypher. Here we made use of iteratively building conditions:

var resultingCondition = Conditions.noCondition();
// This is the next equality pair if previous sort key was equal
var nextEquals = Conditions.noCondition();
// This is the condition for when all the sort orderedKeys are equal, and we must filter via id
var allEqualsWithArtificialSort = Conditions.noCondition();
 
for (Map.Entry<String, Object> entry : orderedKeys.entrySet()) {
 
	var k = entry.getKey();
	var v = entry.getValue();
	if (v == null || (v instanceof Value value && value.isNull())) {
		throw new IllegalStateException("Cannot resume from KeysetScrollPosition. Offending key: '%s' is 'null'".formatted(k));
	}
	var parameter = Cypher.anonParameter(v);
 
	Expression expression;
 
	var scrollDirection = scrollPosition.getDirection();
	if (Constants.NAME_OF_ADDITIONAL_SORT.equals(k)) {
		expression = entity.getIdExpression();
		var comparatorFunction = getComparatorFunction(scrollDirection == KeysetScrollPosition.Direction.Forward ?
				Sort.Direction.ASC : Sort.Direction.DESC, scrollDirection);
		allEqualsWithArtificialSort = allEqualsWithArtificialSort.and(comparatorFunction.apply(expression, parameter));
	} else {
		var p = propertyAndDirection.get(k);
		expression = p.property.isIdProperty() ? entity.getIdExpression() : root.property(k);
 
		var comparatorFunction = getComparatorFunction(p.order.getDirection(), scrollDirection);
		resultingCondition = resultingCondition.or(nextEquals.and(comparatorFunction.apply(expression, parameter)));
		nextEquals = expression.eq(parameter);
		allEqualsWithArtificialSort = allEqualsWithArtificialSort.and(nextEquals);
	}
}
resultingCondition = resultingCondition.or(allEqualsWithArtificialSort);

Also, getting the comparator right is type safe:

static BiFunction<Expression, Expression, Condition> getComparatorFunction(
	Sort.Direction sortDirection, KeysetScrollPosition.Direction scrollDirection
	) {
	if (scrollDirection == KeysetScrollPosition.Direction.Backward) {
	return sortDirection.isAscending() ? Expression::lte : Expression::gte;
	}
	return sortDirection.isAscending() ? Expression::gt : Expression::lt;
}

Of course a keyset based pagination is not a total silver bullet and can also be outright slow if done wrong. For example you should make sure you have proper indexes on all columns you want to paginate over (which is true for Neo4j and relation, and I guess also for MongoDB). For Neo4j I shamelessly recommend using Neo4j Migrations to have proper control over your Neo4j indexes and constraints, across all Neo4j versions from 3.5 to the latest release in Neo4j Aura. Also, while you can totally go wild in your sort for the pagination and have dozens of columns, the conditions generated will grow exponentially. If this is what you want, be our guest but don’t complain 🙂

If you use this feature in a sane way, it will be faster than just doing offset based paginations in Neo4j and I hope you find it as useful as I enjoyed creating it together with Mark Paluch and Christoph Strobl.

Filed in English posts, Java | Comments (2) »

20-Mar-23