Why would a Neo4j person be so found of an embedded, relational database?

I am working since 2018 for Neo4j. At Neo4j I maintain both Spring Data Neo4j and Neo4j-OGM, both object mappers and entity managers for our database product. This is a great job in a great company with awesome colleagues such as my friends Gerrit and Michael.

Some other projects I created on the job are the Cypher-DSL, a builder for our query language Cypher, which is used extensively inside Spring Data Neo4j, in products of our partner Graphaware and by a whole bunch of other customers. Headed for it’s 100th star is Neo4j-Migrations, a database migration toolkit. Last but not least, I did create the original Neo4j Testcontainers module and was recognized as a Testcontainers Champion for that.

However, I did get “socialized” in a previous lifetime in an SQL (actually, an Oracle shop) and while I did swear more than 20 years ago during studies, I will never do anything with SQL, there I was. For whatever reason, my head actually works pretty well with the relational model and the question I can answer with it. I spent about 15 years in the company doing all kinds of things, such as geospatial applications, applications with energy forecasts based on past measurements and stuff like that. What all that had in common was SQL.

Just a couple of months prior to joining Neo4j, I did a talk under the title Live with your SQL-fetish and choose the right tool for the job, in which I presented jOOQ, way back before it became the hot topic. Anyhow, a lot of the dataset from that talk I eventually used on Neo4j specific talks too… Taking my musical habits into a knowledge graph.

What I am trying to say here is: I am deeply involved in the Graph ecosystem, we have a great product and tools, and I’m glad I have a part in those. But I also think that other query languages have their benefits and also that other projects and companies are doing great work, too.

So, DuckDB, what is it? DuckDB is an “DuckDB is an in-process SQL OLAP database management system” and you can go ahead try it out in your browser, because it can be compiled to WASM.

DuckDBs query engine is vector based. Every bit of data that flows through is a collection of vectors on which algorithms often can be applied in parallel; see Execution Format. That’s the groundwork of DuckDBs fast, analytical queries. From Why DuckDB: Online analytical processing (OLAP) workloads are characterized by complex, relatively long-running queries that process significant portions of the stored dataset, for example aggregations over entire tables or joins between several large tables. In the vectorized query execution engine queries are interpreted and processed in large batches of values in one operation. DuckDB can query foreign stores, such as Postgres and SQLite and there are scenarios in which the engine is actually faster while doing this than native Postgres.

When going through their SQL documentation you will be surprised how much you get essentially for free in one small executable. It’s all there: CTEs, Window functions, ASOF joins, Pivot and many things that make SQL friendlier. Back then we had to run big Oracle installation for similar things.

If you follow me on social media you might notice that my focus in private shifted the last years; I have been doing a lot of sport and training, and less doing site projects that involve any big setups. This is where a small tool that runs without a server installation comes in super handy: I was able to define a neat schema for my photovoltaic systems with a bunch of views and now have a good enough dashboard.

My biking page has been remodeled to be a static page these days but uses a DuckDB database beneath, see biking.michael-simons.eu.

At the moment, I neither need or don’t want more. And as sad it might be, I don’t have a graph problem in either of those applications. I want to aggregate measurements, do analytics and that’s it. It’s a lot of time-series that I’m working with and graph doesn’t really help me there.

Those are use cases that are unrelated to work.

But I do have often times use for DuckDB at work, too. Neo4j can natively ingest CSV files. You need to write some Cypher to massage them into the graph you want, but still.That CSV must be properly formatted.

DuckDB on the other hand can read from CSV, Parquet, JSON and other formats as if they are tables. These source can be in files or in URLs. It does not require you to actually create a schema and persist data in its own store, but just can query things. Querying data without persisting the data as a technique might be unusual for a database and seems counter-intuitive at first look, but is useful in the right situations. DuckDB does not need to persist the content, but you are free to create views and over them. You can happily join a CSV file with a JSON file and literally copy the result to a new CSV file that is well suited for Neo4j.

This can all be done either in a CLI or actually as part of a pipeline in a script.

DuckDB replaced a bunch of tools in my daily usage, such as xiv and to some extend, jq. DuckDBs JSON processing capabilities allow you to query and normalize many complex and denormalized JSON documents, but in some cases it’s not enough… It’s just that you can do so much “interesting” things in JSON 😉

Anyhow, a long read, but maybe an explanation why you did see so many toots by me that dealt with DuckDB or even my profile at DuckDB snippets.

I think it’s valuable knowing other technology and also mastering more than one (query) language. I value both my Cypher knowledge and everything I learned about SQL in the past. Hopefully, I can teach some about both, with my current work and any future publication.

And last but not least: Mark Needham, Michael Hunger and myself have worked together to bring out “DuckDB in Action”. The Manning Early Access Program (MEAP) started October 2023 and the book will be released in 2024. We have more than 50% of the content ready and we would love to hear your feedback:

| Comments (2) »

05-Oct-23


Integrate the AuthManager of Neo4j’s Java Driver with Spring Boot

The following post is more a less a dump of code. Since version 5.8 the official Neo4j drivers supports expiration of authentication tokens (see Introduce AuthToken rotation and session auth support. The PR states: “The feature might also be referred to as a refresh or re-auth. In practice, it allows replacing the current token with a new token during the driver’s lifetime. The main objective of this feature is to allow token rotation for the same identity. As such, it is not intended for a change of identity.”

It’s up to you, if you are gonna change identities with it or not, but in theory you can. Personal opinion: It’s actually one of the main reasons I would integrate it into any backend application that is remotely doing anything multitenancy with it. Why? The impersonation feature of the driver that also exists does not work with credentials checking by default, so go figure: The one thing you want to have in a backend application (one driver instance transparently checking privileges for different tenants authenticated via a token (be it bearer or username/password), either is discouraged but works or does not work.

Normally, I would suggest using a org.springframework.boot.autoconfigure.neo4j.ConfigBuilderCustomizer for changing anything related to the driver’s config, as it would spare me duplicating all the crap below (as described in my post Tailor-Made Neo4j Connectivity With Spring Boot 2.4+), but sadly, the org.neo4j.driver.AuthTokenManager is not configurable via the config. I therefor have opened a pull request over at Spring Boot to allow the detection of an AuthTokenManager bean which hopefully will make it into Spring Boot, rendering the stuff below unnecessary (See Spring Boot PR #36650). For now, I suggest duplicating a couple of pieces of from Spring Boot which turns environment properties into configuration, so that you can still completely rely on the standard properties. The relevant piece in the config is the driver method that – for now – is required to add an AuthTokenManager option to the driver. You are completely free to create one using the factory methods the driver provides or create a custom implementation of the interface. Some ideas are mentioned in the inline comments.

import java.io.File;
import java.net.URI;
import java.time.Duration;
import java.time.ZonedDateTime;
import java.util.Locale;
import java.util.concurrent.TimeUnit;
 
import org.neo4j.driver.AuthTokenManagers;
import org.neo4j.driver.Config;
import org.neo4j.driver.Driver;
import org.neo4j.driver.GraphDatabase;
import org.springframework.boot.autoconfigure.neo4j.Neo4jConnectionDetails;
import org.springframework.boot.autoconfigure.neo4j.Neo4jProperties;
import org.springframework.boot.context.properties.source.InvalidConfigurationPropertyValueException;
import org.springframework.context.annotation.Bean;
import org.springframework.context.annotation.Configuration;
 
@Configuration(proxyBeanMethods = false)
public class Neo4jCustomAuthConfig {
 
   /**
    *
    * @param connectionDetails I'm using the Spring Boot 3.1 abstraction over all service connection details here, 
    *                          so that the cool new container integration I described last week in
    *                          <a href="https://info.michael-simons.eu/2023/07/27/the-best-way-to-use-testcontainers-from-your-spring-boot-tests/">The best way to use Testcontainers from your Spring Boot Tests</a> still applies
    *                          If you are not on Spring Boot 3.1, this class is not available. Remove that argument and
    *                          just use {@link Neo4jProperties#getUri()} then.
    * @param neo4jProperties   Injected so that pool and other connection settings are still configured from the default propertes / environment
    * @return The driver to be used in the application and further down the stack in Spring Data Neo4j
    */
   @Bean
   Driver driver(Neo4jConnectionDetails connectionDetails, Neo4jProperties neo4jProperties) {
 
      // Right now, the factory for AuthTokenManagers only supports expiration based tokens.
      // This is mostly useful for anything token related. You could hook this into Spring Security
      // for example and pass on any JWT token.
      // Another option is using the username and password like here and grab an additional expiration date, i.e. from
      // the config or the environment. When the expiration date is reached, the supplier passed to the factory
      // method will be asked for a new token. This can be a new token or a new username and password configuration.
      // Take note that there is no way to actively trigger an expiration.
      // This would require changes in the Neo4j-Java-Driver:
      // <a href="https://github.com/neo4j/neo4j-java-driver/issues/new">Open an issue</a>.
      var authManager = AuthTokenManagers.expirationBased(
         // Here I'm just using the token from the connection. This must be ofc something else for anything that should make sense
         () -> connectionDetails.getAuthToken()
            .expiringAt(ZonedDateTime.now().plusHours(1).toInstant().toEpochMilli())
      );
 
      // You can totally run your own AuthManager, too
      /*
      authManager = new AuthTokenManager() {
         @Override
         public CompletionStage<AuthToken> getToken() {
            return CompletableFuture.completedFuture(connectionDetails.getAuthToken());
         }
 
         @Override
         public void onExpired(AuthToken authToken) {
            // React accordingly
         }
      }
      */
 
      var uri = connectionDetails.getUri(); // or for older boot versions neo4jProperties.getUri()
      var config = doAllTheStuffSpringBootCouldDoIfAuthManagerWasConfigurableViaConfig(uri, neo4jProperties);
 
      return GraphDatabase.driver(uri, authManager, config);
   }
 
   // Everything below is a verbatim copy from spring boot for the most relevant pieces
   // that can be configured via properties.
   // As of know, pick what you need or add what's missing.
 
   Config doAllTheStuffSpringBootCouldDoIfAuthManagerWasConfigurableViaConfig(URI uri, Neo4jProperties neo4jProperties) {
 
      var builder = Config.builder();
 
      var scheme = uri.getScheme().toLowerCase(Locale.ROOT);
      if (scheme.equals("bolt") || scheme.equals("neo4j")) {
         var securityProperties = neo4jProperties.getSecurity();
         if (securityProperties.isEncrypted()) {
            builder.withEncryption();
         } else {
            builder.withoutEncryption();
         }
         builder.withTrustStrategy(mapTrustStrategy(securityProperties));
         if (securityProperties.isEncrypted()) {
            builder.withEncryption();
         } else {
            builder.withoutEncryption();
         }
      }
      builder.withConnectionTimeout(neo4jProperties.getConnectionTimeout().toMillis(), TimeUnit.MILLISECONDS);
      builder.withMaxTransactionRetryTime(neo4jProperties.getMaxTransactionRetryTime().toMillis(), TimeUnit.MILLISECONDS);
 
      var pool = neo4jProperties.getPool();
 
      if (pool.isLogLeakedSessions()) {
         builder.withLeakedSessionsLogging();
      }
      builder.withMaxConnectionPoolSize(pool.getMaxConnectionPoolSize());
      Duration idleTimeBeforeConnectionTest = pool.getIdleTimeBeforeConnectionTest();
      if (idleTimeBeforeConnectionTest != null) {
         builder.withConnectionLivenessCheckTimeout(idleTimeBeforeConnectionTest.toMillis(), TimeUnit.MILLISECONDS);
      }
      builder.withMaxConnectionLifetime(pool.getMaxConnectionLifetime().toMillis(), TimeUnit.MILLISECONDS);
      builder.withConnectionAcquisitionTimeout(pool.getConnectionAcquisitionTimeout().toMillis(),
         TimeUnit.MILLISECONDS);
      if (pool.isMetricsEnabled()) {
         builder.withDriverMetrics();
      } else {
         builder.withoutDriverMetrics();
      }
 
      return builder.build();
   }
 
   private Config.TrustStrategy mapTrustStrategy(Neo4jProperties.Security securityProperties) {
 
      String propertyName = "spring.neo4j.security.trust-strategy";
      Neo4jProperties.Security.TrustStrategy strategy = securityProperties.getTrustStrategy();
      Config.TrustStrategy trustStrategy = createTrustStrategy(securityProperties, propertyName, strategy);
      if (securityProperties.isHostnameVerificationEnabled()) {
         trustStrategy.withHostnameVerification();
      } else {
         trustStrategy.withoutHostnameVerification();
      }
      return trustStrategy;
   }
 
   private Config.TrustStrategy createTrustStrategy(Neo4jProperties.Security securityProperties, String propertyName,
      Neo4jProperties.Security.TrustStrategy strategy) {
      switch (strategy) {
         case TRUST_ALL_CERTIFICATES:
            return Config.TrustStrategy.trustAllCertificates();
         case TRUST_SYSTEM_CA_SIGNED_CERTIFICATES:
            return Config.TrustStrategy.trustSystemCertificates();
         case TRUST_CUSTOM_CA_SIGNED_CERTIFICATES:
            File certFile = securityProperties.getCertFile();
            if (certFile == null || !certFile.isFile()) {
               throw new InvalidConfigurationPropertyValueException(propertyName, strategy.name(),
                  "Configured trust strategy requires a certificate file.");
            }
            return Config.TrustStrategy.trustCustomCertificateSignedBy(certFile);
         default:
            throw new InvalidConfigurationPropertyValueException(propertyName, strategy.name(),
               "Unknown strategy.");
      }
   }
}

Happy coding.

Titel picture from Collin at Unsplash.

| Comments (0) »

31-Jul-23


The best way to use Testcontainers from your Spring Boot tests!

After a long time of blog hiatus, I was in the mood of trying out one of these “The best way to XYZ” posts for once.

While Spring Boot 3 and Spring Framework 6 releases have focused a lot on revamping the application context and annotation processing for GraalVM native image compatibility (and “boring” tasks like Java EE to Jakarta EE migrations), Spring Boot 3.1 and the corresponding framework edition come with a lot of cool changes.
While I was evaluating them for my team (and actually, for a new Spring Infographic coming out later this year), I especially dove into the new `@ServiceConnection` and the related infrastructure.

@ServiceConnection comes together with a hierarchy of interfaces, starting at ConnectionDetails. You might wonder what’s the fuss about that marker interface, especially when you come only from a relatively standardised JDBC abstraction: It makes it possible to abstract connections away from a second angle. Do configuration values come from property sources or something else? In that case, from information that Testcontainers provide. ConnectionDetails is just the entry point to JdbcConnectionDetails or other more specific ones, such as Neo4jConnectionDetails. Below those, concrete classes exists to connect to services.

The nice thing is: You don’t have to deal with that a lot, because there are many existing implementations:

  • Cassandra
  • Couchbase
  • Elasticsearc
  • Generic JDBC or specialised JDBC such as MariaDB, MySQl, Oracle, PostgreSQL
  • Kafka
  • MongoDB
  • Neo4j
  • RabbitMQ
  • Redpanda

More about those features in Spring Boot 3.1’s ConnectionDetails abstraction and Improved Testcontainers Support in Spring Boot 3.1.

We will focus on the latter. What has been pestering me for a while: If you use Testcontainers JUnit 5 extensions to integrate containers with Spring Boot test, you end up in a scenario in which two systems try to manage resources over a lifetime, which is not ideal.
If you have solved that, you must learn about @DynamicPropertySource and friends, as demonstrated by Maciej nicely.

With some good combinations of @TestConfiguration, the aforementioned new stuff and some good chunk of explicitness, this is a solved issue now.

Take this application as an example (I smashed everything in one class so that it is visible as a whole, not as a best practices, but well, I don’t care if you do program in production like this, I have seen worse):

import java.util.List;
import java.util.UUID;
 
import org.springframework.boot.SpringApplication;
import org.springframework.boot.autoconfigure.SpringBootApplication;
import org.springframework.data.neo4j.core.schema.GeneratedValue;
import org.springframework.data.neo4j.core.schema.Id;
import org.springframework.data.neo4j.core.schema.Node;
import org.springframework.data.neo4j.repository.Neo4jRepository;
import org.springframework.data.neo4j.repository.config.EnableNeo4jRepositories;
import org.springframework.web.bind.annotation.GetMapping;
import org.springframework.web.bind.annotation.RestController;
 
@SpringBootApplication
@EnableNeo4jRepositories(considerNestedRepositories = true)
public class MyApplication {
 
	public static void main(String[] args) {
		SpringApplication.run(MyApplication.class, args);
	}
 
	@Node
	public record Movie(@Id @GeneratedValue(GeneratedValue.UUIDGenerator.class) String id, String title) {
 
		Movie(String title) {
			this(UUID.randomUUID().toString(), title);
		}
	}
 
	interface MovieRepository extends Neo4jRepository<Movie, String> {
	}
 
	@RestController
	static class MovieController {
 
		private final MovieRepository movieRepository;
 
		public MovieController(MovieRepository movieRepository) {
			this.movieRepository = movieRepository;
		}
 
		@GetMapping("/movies")
		public List<Movie> getMovies() {
			return movieRepository.findAll();
		}
	}
}

You will want the following dependencies in your test scope:

<dependency>
	<groupId>org.springframework.boot</groupId>
	<artifactId>spring-boot-starter-test</artifactId>
	<scope>test</scope>
</dependency>
<dependency>
	<groupId>org.springframework.boot</groupId>
	<artifactId>spring-boot-testcontainers</artifactId>
	<scope>test</scope>
</dependency>
<dependency>
	<groupId>org.testcontainers</groupId>
	<artifactId>neo4j</artifactId>
	<scope>test</scope>
</dependency>

We use @TestConfiguration to provide an additional test configuration. The @TestConfiguration over @Configuration is twofold: Unlike regular @Configuration classes it does not prevent auto-detection of @SpringBootConfiguration. And: It must be imported explicitly unless it is an inner static class to a test class. The code below than has one @Bean method with @ServiceConnection. The method returns a Neo4jContainer Testcontainer. That container is marked as reusable. As we don’t close that resource by default, we let Testcontainers take care of cleaning it up. When marked as reusable, it will be kept alive and around, meaning a second test run will be much fast (See my video about that topic, if you don’t like it, there’s cycling and food in it as well and fwiw, I also put this into written words over at Foojay.io). The container also carries a special label, which is irrelevant for this config, but will be used later. We will then also address the @RestartScope annotation.

This definition provides enough information for the context so that Spring can bring up that container, rewire all the connections for Neo4j to it and everything just works.

import java.util.Map;
 
import org.springframework.boot.devtools.restart.RestartScope;
import org.springframework.boot.test.context.TestConfiguration;
import org.springframework.boot.testcontainers.service.connection.ServiceConnection;
import org.springframework.context.annotation.Bean;
import org.testcontainers.containers.Neo4jContainer;
 
@TestConfiguration(proxyBeanMethods = false)
public class ContainerConfig {
 
	@Bean
	@ServiceConnection
	@RestartScope
	public Neo4jContainer<?> neo4jContainer() {
		return new Neo4jContainer<>("neo4j:5")
			.withLabels(Map.of("com.testcontainers.desktop.service", "neo4j"))
			.withReuse(true);
	}
}

Putting this into action might look like this. You might notice how to import the config and the absence of messing with properties.

import static org.assertj.core.api.Assertions.assertThat;
 
import org.junit.jupiter.api.Test;
import org.springframework.beans.factory.annotation.Autowired;
import org.springframework.boot.test.context.SpringBootTest;
import org.springframework.context.annotation.Import;
 
@SpringBootTest
@Import(ContainerConfig.class)
class MyApplicationTests {
 
	@Test
	void repositoryIsConnectedAndUsable(
		@Autowired MyApplication.MovieRepository movieRepository
	) {
		var movie = movieRepository.save(new MyApplication.Movie("Barbieheimer"));
		assertThat(movie.id()).isNotNull();
	}
}

To be fair, there are a bunch of other ways – also in the official docs, but I like this by far the best. It’s clear and concise, by being pretty explicit and sticking to one set of annotations (from Spring).

Now about @RestartScope. That annotation is there fore a reason: You might have Springs devtools on the class path which will restart the context if necessary. When the context restarts, the container will restart, defeating the reusable flag. The annotation keeps the original bean around. Why is this relevant? The new Testcontainers support really works well with the concept of “developer services” as introduced by Quarkus. Originally, we only wanted to do test driven development, but than something happened, things getting rushed and in the end, explorative work is fun: So bringing up your application together with a running instance of a database or service feels a lot like “batteries included” and can make you very productive.

Spring Boot supports this now, too, but keeps it (by default) very explicit and also restricted to testing scope. org.springframework.boot.SpringApplication has a new with method used to augment an automatic configured application with additional config. We can use the above ContainerConfig in an additional main-class living in our test scope like this:

import org.springframework.boot.SpringApplication;
 
public class MyApplicationWithDevServices {
 
	public static void main(String[] args) {
		SpringApplication.from(MyApplication::main)
			.with(ContainerConfig.class)
			.run(args);
	}
}

Starting this app, a request to http://localhost:8080/movies immediately works, connected against a Neo4j instance running in a container.

Now the best for last, what about that ominous label I added to the container? I am a happy Testcontainers Cloud user and I have their service running on my machine. This automatically redirects any Testcontainers container request to the cloud and I don’t need Docker on my machine.

There’s also the possibility to define fixed port-mappings for both containers running in the cloud and locally as described here Set fixed ports to easily debug development services.

I have the following configuration on my machine:

more /Users/msimons/.config/testcontainers/services/neo4j.toml 
 
# This example selects neo4j instances and forwards port 7687 to 7687 on the client.
# Same for the Neo4j HTTP port
# Instances are found by selecting containers with label "com.testcontainers.desktop.service=neo4j".
 
# ports defines which ports to proxy.
# local-port indicates which port to listen on the client machine. System ports (0 to 1023) are not supported.
# container-port indicates which port to proxy. If unset, container-port will default to local-port.
ports = [
  {local-port = 7687, container-port = 7687},
  {local-port = 7474, container-port = 7474}
]

This allows me to access the Neo4j instance started by MyApplicationWithDevServices above under the well known Neo4j ports, allowing things like this:

# Use Cypher-Shell to create some data
cypher-shell -uneo4j -ppassword "CREATE (:Movie {id: randomUuid(), title: 'Dune 2'})"
# 0 rows
# ready to start consuming query after 15 ms, results consumed after another 0 ms
# Added 1 nodes, Set 2 properties, Added 1 labels
 
# Request the data from the application running with dev services
http localhost:8080/movies                                                           
 
# HTTP/1.1 200 
# Connection: keep-alive
# Content-Type: application/json
# Date: Thu, 27 Jul 2023 13:32:58 GMT
# Keep-Alive: timeout=60
# Transfer-Encoding: chunked
#
# [
#    {
#        "id": "824ec97e-0a97-4516-8189-f0bf5eb215fe",
#        "title": "Dune 2"
#    }
# ]

And with this, happy coding.

Update: I’m happy that Sergei personally read my post and rightfully noticed me that having fixed ports is possible with local and cloud Testcontainers. I edited the instructions accordingly. Thanks, buddy!

| Comments (3) »

27-Jul-23


Spring Data and Spring Data Neo4j join the “No OFFSET Movement!”

Markus Winand has blogged about tool support for keyset pagination nearly a decade ago and of course my friend Lukas Eder has picked up that topic and did not only blog about it several times but implemented tool support with the the synthetic seek-clause of jOOQ. As the seek clause in jOOQ is excellent for relational databases, I’m gonna refrain now for calling the following the “very best way” of doing keyset based pagination and leave that to others.

So what is this about? The Spring Data Commons project – that is the base project for a broad variety of store implementations such as JPA, MongoDB, Redis and certainly Neo4j – added infrastructure support for keyset-based scrolling.

How does that look like from a Spring Data repositories point of view?

import java.util.UUID;
 
import org.springframework.data.domain.ScrollPosition;
import org.springframework.data.domain.Sort;
import org.springframework.data.domain.Window;
import org.springframework.data.neo4j.integration.shared.common.ScrollingEntity;
import org.springframework.data.neo4j.repository.Neo4jRepository;
 
public interface ScrollingRepository extends Neo4jRepository<ScrollingEntity, UUID> {
 
	Window<ScrollingEntity> findTop4By(Sort sort, ScrollPosition position);
}

Keyset based pagination drops the notion of offset completely. It is dependent on an associated sort object, in this case given through the first parameter, the Sort object. As with all pagination efforts, we need to know how many items per page shall be retrieved. In this case, 4. This is determined by the derived finder method. The scroll position (the second parameter) determines the offset.

The above will be possible from the next Spring Data releases for the MongoDB- and Neo4j implementations. Some stores might offer additional support on their data access templates, Neo4j does not as of writing (we just added the feature just days prior to the current RC).

The beauty of the above is: For you as a user, the calling of this just works the same for all the stores. Imagine the following simple entity:

@Node
public class ScrollingEntity {
 
	@Id
	@GeneratedValue
	private UUID id;
 
	@Property("foobar")
	private String a;
 
	private Integer b;
 
	private LocalDateTime c;
}

And some test data (here, Neo4j is being used):

Connected to Neo4j using Bolt protocol version 5.0 at neo4j://localhost:7687 as user neo4j.
Type :help for a list of available commands or :exit to exit the shell.
Note that Cypher queries must end with a semicolon.
neo4j@neo4j> match (n:ScrollingEntity) return n order by n.b asc, n.a desc;
+-----------------------------------------------------------------------------------------------------------------+
| n                                                                                                               |
+-----------------------------------------------------------------------------------------------------------------+
| (:ScrollingEntity {b: 0, foobar: "A0", c: 2023-03-20T13:12:25.201, id: "c2c2ebe4-5a02-4d77-a53b-1abbc80aaad9"}) |
| (:ScrollingEntity {b: 1, foobar: "B0", c: 2023-03-21T13:12:29.201, id: "f4f84ed4-632d-431e-bb1a-b829bc2eaf5d"}) |
| (:ScrollingEntity {b: 2, foobar: "C0", c: 2023-03-22T13:12:39.201, id: "f1c088f8-0b7b-456b-99b3-db5a0199dec6"}) |
| (:ScrollingEntity {b: 3, foobar: "D0", c: 2023-03-23T13:12:31.201, id: "3b223485-e81b-4be8-8dbd-50277d313a8b"}) |
| (:ScrollingEntity {b: 3, foobar: "D0", c: 2023-03-23T13:12:31.201, id: "1f525d3d-cdfe-40a6-964b-1fbfc08fae99"}) |
| (:ScrollingEntity {b: 4, foobar: "E0", c: 2023-03-24T13:12:41.201, id: "572b780e-256f-41b7-87de-4a130bc3814b"}) |
| (:ScrollingEntity {b: 5, foobar: "F0", c: 2023-03-25T13:12:25.201, id: "457ec454-a9af-421c-a9c1-7f5ce95310c5"}) |
| (:ScrollingEntity {b: 6, foobar: "G0", c: 2023-03-26T13:12:55.201, id: "b423c34b-6952-4b73-b06b-d039cf7c7e7b"}) |
| (:ScrollingEntity {b: 7, foobar: "H0", c: 2023-03-27T13:13:00.201, id: "ca90cd25-a676-44d4-a4c2-2db32443bf2f"}) |
| (:ScrollingEntity {b: 8, foobar: "I0", c: 2023-03-28T13:12:57.201, id: "59a5dfb2-0e17-4eeb-aecd-95bb555e0117"}) |
+-----------------------------------------------------------------------------------------------------------------+
 
10 rows
ready to start consuming query after 55 ms, results consumed after another 3 ms
neo4j@neo4j>

All of that can be consumed in the simplest way like that:

import static org.assertj.core.api.Assertions.assertThat;
 
import java.util.ArrayList;
 
import org.junit.jupiter.api.Test;
import org.springframework.beans.factory.annotation.Autowired;
import org.springframework.data.domain.KeysetScrollPosition;
import org.springframework.data.domain.Sort;
import org.springframework.data.domain.WindowIterator;
import org.springframework.data.neo4j.integration.imperative.repositories.ScrollingRepository;
import org.springframework.data.neo4j.integration.shared.common.ScrollingEntity;
 
class KeysetBasedScrollingIT {
 
	@Test
	void forwardWithDuplicatesIteratorIteration(@Autowired ScrollingRepository repository) {
 
		var sort = Sort.by(Sort.Order.asc("b"), Sort.Order.desc("a"));
		var it = WindowIterator
				.of(pos -> repository.findTop4By(sort, pos))
				.startingAt(KeysetScrollPosition.initial());
 
		var content = new ArrayList<ScrollingEntity>();
		while (it.hasNext()) {
			var next = it.next();
			content.add(next);
		}
 
		assertThat(content).hasSize(10);
		assertThat(content.stream().map(ScrollingEntity::getId)
				.distinct().toList()).hasSize(10);
	}
}

Here, the content is sorted by b in ascending and by a in descending order, 4 pieces at a time. The WindowIterator starts at the initial position and tanks in a window providing function. That window will scroll over the dataset. What queries are generated?

The initial query looks like this, retrieving n+1 elements sorted in the order specified:

MATCH (scrollingEntity:`ScrollingEntity`)
RETURN scrollingEntity 
ORDER BY scrollingEntity.b, scrollingEntity.foobar DESC, scrollingEntity.id LIMIT 5

Why n+1 elements? It’s an easy way to judge if there are more elements available to scroll further or not, without going through an additional counting query.

The next query looks like this:

MATCH (scrollingEntity:`ScrollingEntity`)
WHERE ((scrollingEntity.b > $pcdsl01
    OR (scrollingEntity.b = $pcdsl01
      AND scrollingEntity.foobar < $pcdsl02))
  OR (scrollingEntity.b = $pcdsl01
    AND scrollingEntity.foobar = $pcdsl02
    AND scrollingEntity.id > $pcdsl03))
RETURN scrollingEntity 
ORDER BY scrollingEntity.b ASC, scrollingEntity.foobar DESC, scrollingEntity.id ASC LIMIT 5

This has now 3 parameters:

:param pcdsl01 => 3
:param pcdsl02 => "D0"
:param pcdsl03 => "282ac053-c821-47f4-9a0e-0d12e0b91808"

The next page starts at an element whose b attribute is greater 3 or is equal to 3 and as a footer attribute lower than D0. If you look closely, the 4th and 5th poses a nice test case for our tooling and made it hopefully clear why we add the 3rd condition here: To add uniqueness to the keyset on which we paginate.

The Spring Data Commons and Spring Data Neo4j implementation of the keyset based pagination is quite sophisticated in that sense that it allows for different directions for different columns to sort by, in contrast to a simple tuple comparison (in the above case something along the lines where [n.b, n.foobar] >= [$pcdsl01, $pcdsl02, $pcdsl03] (also not a real tuple in Neo4j but a list comparison).

Above I presented using this feature with the new WindowIterator, but you can also control this manually like this:

import static org.assertj.core.api.Assertions.assertThat;
 
import java.util.function.Function;
 
import org.assertj.core.data.Index;
import org.junit.jupiter.api.Test;
import org.springframework.beans.factory.annotation.Autowired;
import org.springframework.data.domain.KeysetScrollPosition;
import org.springframework.data.domain.Sort;
import org.springframework.data.neo4j.integration.imperative.repositories.ScrollingRepository;
import org.springframework.data.neo4j.integration.shared.common.ScrollingEntity;
 
class KeysetBasedScrollingIT {
 
	@Test
	void forwardWithDuplicatesManualIteration(@Autowired ScrollingRepository repository) {
 
		var duplicates = repository.findAllByAOrderById("D0");
		assertThat(duplicates).hasSize(2);
 
		var sort = Sort.by(Sort.Order.asc("b"), Sort.Order.desc("a"));
		var window = repository.findTop4By(sort, KeysetScrollPosition.initial());
		assertThat(window.hasNext()).isTrue();
		assertThat(window)
				.hasSize(4)
				.extracting(Function.identity())
				.satisfies(e -> assertThat(e.getId()).isEqualTo(duplicates.get(0).getId()), Index.atIndex(3))
				.extracting(ScrollingEntity::getA)
				.containsExactly("A0", "B0", "C0", "D0");
 
		window = repository.findTop4By(sort, window.positionAt(window.size() - 1));
		assertThat(window.hasNext()).isTrue();
		assertThat(window)
				.hasSize(4)
				.extracting(Function.identity())
				.satisfies(e -> assertThat(e.getId()).isEqualTo(duplicates.get(1).getId()), Index.atIndex(0))
				.extracting(ScrollingEntity::getA)
				.containsExactly("D0", "E0", "F0", "G0");
 
		window = repository.findTop4By(sort, window.positionAt(window.size() - 1));
		assertThat(window.isLast()).isTrue();
		assertThat(window).extracting(ScrollingEntity::getA)
				.containsExactly("H0", "I0");
	}
}

The key classes in that API are org.springframework.data.domain.Window and the org.springframework.data.domain.KeysetScrollPosition. The first one contains data and information at which position the scrolling window is positioned at any time, the latter is a value object for the current keys.

Last but not least, the API has full support for scrolling backwards as well. Scrolling backwards requires inverting the order for each column individually. At the end of the post I’ll add a link how we did this in Spring Data Neo4j without going through the pain of fiddling around with strings. Just inverting the operator in the generated order is however not enough when going backwards. By doing so only, the window would jump right back to the beginning. As we only know the keys at which the window arrived and not the keys n positions backward, we can solve this issue by inverting the sort order as whole too, match and collect and than recreating the sort on the client side again (we have some ideas for a more sophisticated Cypher based solution, though).

Implementing this feature for Spring Data Neo4j has been quite a nice experience. All our query generation goes through the Cypher-DSL which is in essence a builder for Cypher. Here we made use of iteratively building conditions:

var resultingCondition = Conditions.noCondition();
// This is the next equality pair if previous sort key was equal
var nextEquals = Conditions.noCondition();
// This is the condition for when all the sort orderedKeys are equal, and we must filter via id
var allEqualsWithArtificialSort = Conditions.noCondition();
 
for (Map.Entry<String, Object> entry : orderedKeys.entrySet()) {
 
	var k = entry.getKey();
	var v = entry.getValue();
	if (v == null || (v instanceof Value value && value.isNull())) {
		throw new IllegalStateException("Cannot resume from KeysetScrollPosition. Offending key: '%s' is 'null'".formatted(k));
	}
	var parameter = Cypher.anonParameter(v);
 
	Expression expression;
 
	var scrollDirection = scrollPosition.getDirection();
	if (Constants.NAME_OF_ADDITIONAL_SORT.equals(k)) {
		expression = entity.getIdExpression();
		var comparatorFunction = getComparatorFunction(scrollDirection == KeysetScrollPosition.Direction.Forward ?
				Sort.Direction.ASC : Sort.Direction.DESC, scrollDirection);
		allEqualsWithArtificialSort = allEqualsWithArtificialSort.and(comparatorFunction.apply(expression, parameter));
	} else {
		var p = propertyAndDirection.get(k);
		expression = p.property.isIdProperty() ? entity.getIdExpression() : root.property(k);
 
		var comparatorFunction = getComparatorFunction(p.order.getDirection(), scrollDirection);
		resultingCondition = resultingCondition.or(nextEquals.and(comparatorFunction.apply(expression, parameter)));
		nextEquals = expression.eq(parameter);
		allEqualsWithArtificialSort = allEqualsWithArtificialSort.and(nextEquals);
	}
}
resultingCondition = resultingCondition.or(allEqualsWithArtificialSort);

Also, getting the comparator right is type safe:

static BiFunction<Expression, Expression, Condition> getComparatorFunction(
	Sort.Direction sortDirection, KeysetScrollPosition.Direction scrollDirection
	) {
	if (scrollDirection == KeysetScrollPosition.Direction.Backward) {
	return sortDirection.isAscending() ? Expression::lte : Expression::gte;
	}
	return sortDirection.isAscending() ? Expression::gt : Expression::lt;
}

Of course a keyset based pagination is not a total silver bullet and can also be outright slow if done wrong. For example you should make sure you have proper indexes on all columns you want to paginate over (which is true for Neo4j and relation, and I guess also for MongoDB). For Neo4j I shamelessly recommend using Neo4j Migrations to have proper control over your Neo4j indexes and constraints, across all Neo4j versions from 3.5 to the latest release in Neo4j Aura. Also, while you can totally go wild in your sort for the pagination and have dozens of columns, the conditions generated will grow exponentially. If this is what you want, be our guest but don’t complain 🙂

If you use this feature in a sane way, it will be faster than just doing offset based paginations in Neo4j and I hope you find it as useful as I enjoyed creating it together with Mark Paluch and Christoph Strobl.

| Comments (2) »

20-Mar-23


Weekly digest 2022-50

While listening to the beautiful Kahedi Radio Show, I’m having a hard time this week to put down words; I feel done for the year. Wrt music, one interesting documentary I watched last week was this:

It’s about the ever increasing prices of concert tickets and about the business of Ticketmaster, Livenation and Eventim. And wow, it’s incredible how everything in the world that used to be nice has been turned into a business and investment strategy. Makes me sad.

Last Sunday I clocked in my 12th halve marathon (21,1km) in 2022, running through frosty meadows, which was actually quite beautiful:



I finally found some nice and warm running bibs that fit me, the Saucony Solstice in Large. I have an inner leg length of roughly 84cm and I’m rather thin… Which makes most trousers kind of problematic. These fit well. And while I was at it, I spent a bit more money for trail-ish shoes, as I signed up for the Abdij Cross in January and than a full marathon in March, the Heuvelland Marathon. Looking forward to both very much. Let’s see if I can continue 2023 like 2022. Apparently I’m this year in the 1% most active users on Strava.

Java and Community

There will a small Java / EuregJug community in Aachen on the Christmas-Market, you might wanna watch this space:

I was super happy that Øredev published a video about my talk on GraalVM:

And because I just felt like it, I created a video demonstrating that todays debugging tools aren’t your uncles tools anymore. The code you see in my video is an actually issue we worked on and in that case, it really was essential to have proper debugging at hand. Of course, it still is fine to just investigate with a random Sysout, I often do this myself, but as everything: It’s just not black and white.

I was told by my kid, I should definitely ask you to subscribe to my channel, hit the bell and leave a like. Dunno in which order, though 😉

What else? I read up on a couple of things I wanted to understand fully. One of them had been Project Loom aka virtual threads in Java. To learn about this proper, read JEP 425 and than watch this presentation from Devoxx BE 2022 by Ron Presser and Alan Bateman first (those are the owners of the JEP) and than this by Mario Fusco, for some great insights when and when to use this stuff.

As always, do something nice for yourself on the weekend. Try to wind down and stuff not everything just in before the holidays. The latter is often pointless and doesn’t change a bit.

| Comments (0) »

16-Dec-22