Some thoughts about user defined functions (UDF) in databases

Last week I came across a Twitter thread about using stored procedures or in other words, user defined functions in databases.

Every myth about them has already been debunked like 13 years ago in this excellent post: Mythbusters: Stored Procedures Edition. I can completely get behind it, but also like some personal thoughts and share some memories:

In the past I worked for years, in case over a decade, on big systems in the utilities sector. IHL evolved from an Oracle Forms 6i application into a Java desktop application and from what I hear, into a web application in the last 24 years. While the clients changed, the database model was pretty consistent through the years (A relational model, actually, and maybe you understand why this paper under the title What comes around by Andrew Pavlo and Michael Stonebreaker resonates with me, even though I work at Neo4j these days).
Even in first decade of the 2000s we followed the Pink database paradigm, even though we didn’t call it that way. A lot of our API schema was enriched or even defined by various stored procedures, written in PL/SQL and *gosh* Java (yes, Oracle database could do this even back in 2010, these days powered by GraalVM, you can even run JavaScript proper in the thing). Why was this useful? We could have a lot of complicated computations, such as “if that power rod drops, in what radius will it fall, where will the lines end, does it kill a cow or hit a Kindergarten?”, and use the result straight in a view without having first grab all the necessary data, compute, write back… Furthermore it allowed us to move quite quickly all the time: Instead of rolling out new client versions all the time, we could just role out improved PL/SQL code, without any downtime.

The thing I find most funny is that one of the loudest argument against stored procedures is “but you cannot test them…”: Of course you can. But, my main point: I would bet money on it, that the people who bring this argument are by some share the same who don’t write tests, because $reasons, such as “my manager does not allow”, “no time”, etc…

Of course, when you fell for the “we need to rewrite our app every year, replace all the frameworks each quarter”, than none of my ranting, the myth buster above or the paper linked will convince you otherwise, that are might be a different, more stable solution and I can keep on using this slide more often in the future.

Anyway, in this week I realized that the 1.x release of DuckDB also supports UDFs, or Macros as they call them and I was like sh*t, we don’t have it in the book… Maybe a good opportunity to set a reminder for a 2nd edition, if we ever sell that many… Nevertheless, I needed to try this… And what is a better goal than my sports page, that received its 3rd major overhaul since 2014: biking.michael-simons.eu/. I wanted to add my age group to my favorite achievements. In Germany, there are some halfway complicated rules to compute it, and I did so in a stored procedure. That diff creates the function, calls it in a view (which defines the API, and the only thing I have to touch in the app is the actual view layer, as the actual client code is just one statement that couldn’t be simpler (con.execute('FROM v_reoccurring_events').fetchall()) and I can now flex a bit like this:



What about Neo4j? I am so glad you asked: Of course we offer a framework to enhance our query language Cypher, with User-defined procedures and I happen to own the example repository: neo4j-procedure-template, which includes a proper test-setup (btw, one of the *rare* cases, I would actually *not* use Testcontainers, but our dedicated testing framework, because the overall turnaround will be faster (you skip the repackaging)).

For Neo4j enhance ability is key: We do evolve Cypher (and GQL) with great care, but often not as fast as a customer, the necessities of a deal or a support case might require. Neo4j APOC started as a set of curated functions and procedures and we promoted it ever since. From Neo4j 5 onwards, it even became in parts as much as a standard library, delivered with the project.

Is there something I’d personally would add? Actually, yes: We only have compiled procedures at the moment, written in any JVM language. This is ok to some extend, but I’d love to see an enhancement here that I could *just* do a create or replace function in our database as well. If it would be any of the seasonal holidays that bring gifts, maybe even in something else than Java. A while back I started an experiment using GraalVM for that as Neo4j polyglot stored procedures and I would love to revive that at some point.

My little rant does not even come close to these two I read last month (1, 2), but I hope you enjoyed it nevertheless. If you take one thing away: There’s always more than one solution to a problem. Make better use of a database product, you most likely pay for, is a proper good idea. Being it either optimizing your model (there are probably a ton of books on SQL, less so for Neo4j, hence I really can recommend The definitive Guide to Neo4j, by my friends Christophe and Luanne, it has excellent content on modeling), understanding its query language and capabilities better or enhancing it to your needs, will be beneficial in the long run.

To quote one of the rants above: Happy Apathetic coding and don’t forget to step outside for a proper run sometime.

Title picture by Jan Antonin Kolar on Unsplash.

| Comments (1) »

06-Jul-24


Run your integration tests against Testcontainers with GraalVM native image

I have started working with a small team on an exciting project at Neo4j. The project is about database connectivity (what else) and we use Testcontainers in our integration tests, asserting the actual network connectivity and eventually the API.

The thing we are creating should of course also work when being compiled as part of an application into a native executable by GraalVM. For a bunch of older projects with a somewhat convoluted test-setup I used to create dedicated, small applications that produce some output. I compiled these apps into a native binary and used a scripted run to eventually assert the output.

For this new project I wanted to take the opportunity to use the Maven plugin for GraalVM Native Image building and it’s test capabilities directly.
The plugin works great and the maintainers are—like the whole GraalVM team—quite quick fixing any issue. We use it already at several occasions to produce native binaries as part of a distribution, but so far not for testing.

I personally find the documentation above linked not sufficient to create a proper setup for tests. Especially the section “Testing support” does not work for me: Neither the latest surefire nor failsafe plugins bring the required dependency org.junit.platform:junit-platform-launcher. This extension contains a test execution listener org.junit.platform.launcher.listeners.UniqueIdTrackingListener that tracks each executed tests and stores it in a file with unique ids. The GraalVM plugin will use that file to discover the tests it needs to run in native mode. If the file is not generated, it will yell at you with

[ERROR] Test configuration file wasn’t found. Make sure that test execution wasn’t skipped.

I can’t share the thing I am actually testing right now, so here’s something similar. The code below uses the Neo4j Java Driver and creates a data access object interacting with the Neo4j database:

package demo;
 
import java.util.Map;
import java.util.stream.Collectors;
 
import org.neo4j.driver.Driver;
 
public record Movie(String id, String title) {
 
	public static final class Repository {
 
		private final Driver driver;
 
		public Repository(Driver driver) {
			this.driver = driver;
		}
 
		public Movie createOrUpdate(String title) {
			return this.driver.executableQuery("MERGE (n:Movie {title: $title}) RETURN n")
				.withParameters(Map.of("title", title))
				.execute(Collectors.mapping(r -> {
					var node = r.get("n").asNode();
					return new Movie(node.elementId(), node.get("title").asString());
				}, Collectors.toList()))
				.stream()
				.findFirst()
				.orElseThrow();
		}
 
	}
}

and the integration tests looks like this. There are no surprises in it. It is disable without Docker support, uses a life cycle per class so that I can keep a reusable test container around for all tests and a sample test:

package demo;
 
import static org.junit.jupiter.api.Assertions.assertNotNull;
 
import org.junit.jupiter.api.AfterAll;
import org.junit.jupiter.api.BeforeAll;
import org.junit.jupiter.api.Test;
import org.junit.jupiter.api.TestInstance;
import org.neo4j.driver.AuthTokens;
import org.neo4j.driver.Driver;
import org.neo4j.driver.GraphDatabase;
import org.testcontainers.containers.Neo4jContainer;
import org.testcontainers.junit.jupiter.Testcontainers;
 
@Testcontainers(disabledWithoutDocker = true)
@TestInstance(TestInstance.Lifecycle.PER_CLASS)
class RepositoryIT {
 
	protected final Neo4jContainer<?> neo4j = new Neo4jContainer<>("neo4j:5.13.0")
		.waitingFor(Neo4jContainer.WAIT_FOR_BOLT)
		.withReuse(true);
 
	protected Driver driver;
 
	@BeforeAll
	void startNeo4j() {
		this.neo4j.start();
		this.driver = GraphDatabase.driver(this.neo4j.getBoltUrl(),
				AuthTokens.basic("neo4j", this.neo4j.getAdminPassword()));
	}
 
	@AfterAll
	void closeDriver() {
		if (this.driver == null) {
			return;
		}
		this.driver.close();
	}
 
	@Test
	void repositoryShouldWork() {
 
		var repository = new Movie.Repository(driver);
		var newMovie = repository.createOrUpdate("Event Horizon");
		assertNotNull(newMovie.id());
	}
 
}

Now, let’s walk through the pom.xml. For my dependencies, I usually look if they have a BOM file and import those into dependency management, so that I can rely on other projects organising their releases and transitive dependencies proper. Here I have the latest JUnit and Testcontainers plus Slf4j. From the latter I’ll use only the simple logger later, so that I can see Testcontainers logging:

<dependencyManagement>
    <dependencies>
        <dependency>
            <groupId>org.junit</groupId>
            <artifactId>junit-bom</artifactId>
            <version>5.10.0</version>
            <type>pom</type>
            <scope>import</scope>
        </dependency>
        <dependency>
            <groupId>org.slf4j</groupId>
            <artifactId>slf4j-bom</artifactId>
            <version>2.0.9</version>
            <type>pom</type>
            <scope>import</scope>
        </dependency>
        <dependency>
            <groupId>org.testcontainers</groupId>
            <artifactId>testcontainers-bom</artifactId>
            <version>1.19.1</version>
            <type>pom</type>
            <scope>import</scope>
        </dependency>
    </dependencies>
</dependencyManagement>

The relevant test dependencies then look like this. I kinda grouped them together, but please note how I include the JUnit launcher mentioned above explicitly. It is not part of the JUnit core dependencies and at least, was not brought in transitively by either Failsafe or Surefire for me:

<dependencies>
    <dependency>
        <groupId>org.junit.jupiter</groupId>
        <artifactId>junit-jupiter</artifactId>
        <scope>test</scope>
    </dependency>
    <dependency>
        <groupId>org.junit.platform</groupId>
        <artifactId>junit-platform-launcher</artifactId>
        <scope>test</scope>
    </dependency>
 
    <dependency>
        <groupId>org.slf4j</groupId>
        <artifactId>slf4j-simple</artifactId>
        <scope>test</scope>
    </dependency>
    <dependency>
        <groupId>org.testcontainers</groupId>
        <artifactId>junit-jupiter</artifactId>
        <scope>test</scope>
    </dependency>
    <dependency>
        <groupId>org.testcontainers</groupId>
        <artifactId>neo4j</artifactId>
        <scope>test</scope>
    </dependency>
</dependencies>

“But Michael, didn’t you think about the fact that you won’t usually get the latest Failsafe and Surefire plugins with default Maven?”—Of course I did. This is how I configure Failsafe. Take note here how I set a (Java) system property for the integration tests. While the UniqueIdTrackingListener is on the class path, it is disabled by default and must be enabled with the property below (yes, I did read sources for that. The rest is just the usual dance for setting up integration tests:

<plugin>
    <groupId>org.apache.maven.plugins</groupId>
    <artifactId>maven-failsafe-plugin</artifactId>
    <version>3.2.1</version>
    <configuration>
        <systemPropertyVariables>
            <junit.platform.listeners.uid.tracking.enabled>true</junit.platform.listeners.uid.tracking.enabled>
        </systemPropertyVariables>
    </configuration>
    <executions>
        <execution>
            <goals>
                <goal>integration-test</goal>
                <goal>verify</goal>
            </goals>
        </execution>
    </executions>
</plugin>

Now onto the GraalVM native-maven-plugin. I usually wrap this into a dedicated profile to be activated with a system property like in the below listing. The documentation says that one must use <extensions>true</extensions> as part of the configuration in order to use the recommended JUnit Platform test listener mode, but that didn’t work for me. I guess it should in theory avoid having to set the above system property.

The next important part in the listing—at least when you want to use Testcontainers—is enabling the GraalVM Reachability Metadata Repository. This repository contains the required configuration shims for quite the number of libraries, including Testcontainers. If you don’t enable it, Testcontainers won’t work in native mode:

<profile>
    <id>native-image</id>
    <activation>
        <property>
            <name>native</name>
        </property>
    </activation>
    <build>
        <plugins>
            <plugin>
                <groupId>org.graalvm.buildtools</groupId>
                <artifactId>native-maven-plugin</artifactId>
                <version>0.9.28</version>
                <extensions>true</extensions>
                <configuration>
                    <metadataRepository>
                        <enabled>true</enabled>
                    </metadataRepository>
                </configuration>
                <executions>
                    <execution>
                        <id>test-native</id>
                        <goals>
                            <goal>test</goal>
                        </goals>
                        <phase>verify</phase>
                    </execution>
                </executions>
            </plugin>
        </plugins>
    </build>
</profile>

One cannot praise the people from VMWares Spring and Spring Boot teams enough for bringing their knowledge of so many libraries into that repository.

With that in a place, you can run your integration tests on the JVM and as a native image like this:

mvn -Dnative verify

You first see the usual dance of integration tests, then the GraalVM compilation

[1/8] Initializing...                                                                                    (5,9s @ 0,25GB)
 Java version: 17.0.9+11-LTS, vendor version: Oracle GraalVM 17.0.9+11.1
 Graal compiler: optimization level: 2, target machine: armv8-a, PGO: off
 C compiler: cc (apple, arm64, 15.0.0)
 Garbage collector: Serial GC (max heap size: 80% of RAM)
 1 user-specific feature(s)
 - org.graalvm.junit.platform.JUnitPlatformFeature
[junit-platform-native] Running in 'test discovery' mode. Note that this is a fallback mode.
[2/8] Performing analysis...  [*****]                                                                   (21,0s @ 1,48GB)
  13.384 (87,45%) of 15.304 types reachable
  22.808 (62,52%) of 36.483 fields reachable
  73.115 (60,68%) of 120.500 methods reachable
   4.250 types, 1.370 fields, and 3.463 methods registered for reflection
      99 types,   102 fields, and   102 methods registered for JNI access
       5 native libraries: -framework CoreServices, -framework Foundation, dl, pthread, z
[3/8] Building universe...                                                                               (2,6s @ 1,40GB)
[4/8] Parsing methods...      [**]                                                                       (2,6s @ 1,62GB)
[5/8] Inlining methods...     [***]                                                                      (1,4s @ 1,49GB)
[6/8] Compiling methods...    [******]                                                                  (37,1s @ 2,99GB)
[7/8] Layouting methods...    [**]                                                                       (4,0s @ 4,12GB)
[8/8] Creating image...       [**]                                                                       (4,4s @ 1,49GB)
  37,40MB (58,34%) for code area:    40.731 compilation units
  25,78MB (40,21%) for image heap:  338.915 objects and 80 resources
 951,36kB ( 1,45%) for other data
  64,11MB in total
------------------------------------------------------------------------------------------------------------------------
Top 10 origins of code area:                                Top 10 object types in image heap:
  14,74MB java.base                                            7,80MB byte[] for code metadata
   5,94MB testcontainers-1.19.1.jar                            3,21MB byte[] for java.lang.String
   3,85MB java.xml                                             2,46MB java.lang.String
   3,79MB svm.jar (Native Image)                               2,44MB byte[] for general heap data
   1,12MB neo4j-java-driver-5.13.0.jar                         2,39MB java.lang.Class
   1,08MB netty-buffer-4.1.99.Final.jar                        1,50MB byte[] for embedded resources
 938,35kB docker-java-transport-zerodep-3.3.3.jar            879,17kB byte[] for reflection metadata
 683,00kB netty-transport-4.1.99.Final.jar                   627,38kB com.oracle.svm.core.hub.DynamicHubCompanion
 655,80kB netty-common-4.1.99.Final.jar                      438,19kB java.util.HashMap$Node
 490,98kB jna-5.12.1.jar                                     384,69kB c.o.svm.core.hub.DynamicHub$ReflectionMetadata
   3,95MB for 58 more packages                                 3,40MB for 2755 more object types
------------------------------------------------------------------------------------------------------------------------

And shortly after that:

[main] INFO org.testcontainers.DockerClientFactory - Checking the system...
[main] INFO org.testcontainers.DockerClientFactory - ✔︎ Docker server version should be at least 1.6.0
[main] INFO tc.neo4j:5.13.0 - Creating container for image: neo4j:5.13.0
[main] INFO tc.neo4j:5.13.0 - Reusing container with ID: 56cc1b02f9b0ebfcc8670f5cdc54b5b3a85a4720e8d810548986369a557482ca and hash: aa81fad313c4f8e37e5b14246fe863c7dbc26db6
[main] INFO tc.neo4j:5.13.0 - Reusing existing container (56cc1b02f9b0ebfcc8670f5cdc54b5b3a85a4720e8d810548986369a557482ca) and not creating a new one
[main] INFO tc.neo4j:5.13.0 - Container neo4j:5.13.0 started in PT0.280264S
demo.RepositoryIT > repositoryShouldWork() SUCCESSFUL
 
 
Test run finished after 608 ms
[         2 containers found      ]
[         0 containers skipped    ]
[         2 containers started    ]
[         0 containers aborted    ]
[         2 containers successful ]
[         0 containers failed     ]
[         1 tests found           ]
[         0 tests skipped         ]
[         1 tests started         ]
[         0 tests aborted         ]
[         1 tests successful      ]
[         0 tests failed          ]

What I like here is the fact that I don’t have anything special in my test classes, no weird hierarchies nor any additional annotation. In the project we are working on we have all our integration tests as a separate Maven module as we want to make sure we are testing the packaged jar proper (also for the fact that we have integration tests for both the Java class- and module path in separate Maven modules). This setup now gives us the additional advantage that the packaging of our library is subject to be tested under native image, too. This will let you discover issues with missing resources etc in native image, too.

Anyway, the whole project is shared as a gist, it’s only three files anyway.

The image of this post was generated with Dall-E by my friend Michael.

| Comments (1) »

25-Oct-23


Why would a Neo4j person be so found of an embedded, relational database?

I am working since 2018 for Neo4j. At Neo4j I maintain both Spring Data Neo4j and Neo4j-OGM, both object mappers and entity managers for our database product. This is a great job in a great company with awesome colleagues such as my friends Gerrit and Michael.

Some other projects I created on the job are the Cypher-DSL, a builder for our query language Cypher, which is used extensively inside Spring Data Neo4j, in products of our partner Graphaware and by a whole bunch of other customers. Headed for it’s 100th star is Neo4j-Migrations, a database migration toolkit. Last but not least, I did create the original Neo4j Testcontainers module and was recognized as a Testcontainers Champion for that.

However, I did get “socialized” in a previous lifetime in an SQL (actually, an Oracle shop) and while I did swear more than 20 years ago during studies, I will never do anything with SQL, there I was. For whatever reason, my head actually works pretty well with the relational model and the question I can answer with it. I spent about 15 years in the company doing all kinds of things, such as geospatial applications, applications with energy forecasts based on past measurements and stuff like that. What all that had in common was SQL.

Just a couple of months prior to joining Neo4j, I did a talk under the title Live with your SQL-fetish and choose the right tool for the job, in which I presented jOOQ, way back before it became the hot topic. Anyhow, a lot of the dataset from that talk I eventually used on Neo4j specific talks too… Taking my musical habits into a knowledge graph.

What I am trying to say here is: I am deeply involved in the Graph ecosystem, we have a great product and tools, and I’m glad I have a part in those. But I also think that other query languages have their benefits and also that other projects and companies are doing great work, too.

So, DuckDB, what is it? DuckDB is an “DuckDB is an in-process SQL OLAP database management system” and you can go ahead try it out in your browser, because it can be compiled to WASM.

DuckDBs query engine is vector based. Every bit of data that flows through is a collection of vectors on which algorithms often can be applied in parallel; see Execution Format. That’s the groundwork of DuckDBs fast, analytical queries. From Why DuckDB: Online analytical processing (OLAP) workloads are characterized by complex, relatively long-running queries that process significant portions of the stored dataset, for example aggregations over entire tables or joins between several large tables. In the vectorized query execution engine queries are interpreted and processed in large batches of values in one operation. DuckDB can query foreign stores, such as Postgres and SQLite and there are scenarios in which the engine is actually faster while doing this than native Postgres.

When going through their SQL documentation you will be surprised how much you get essentially for free in one small executable. It’s all there: CTEs, Window functions, ASOF joins, Pivot and many things that make SQL friendlier. Back then we had to run big Oracle installation for similar things.

If you follow me on social media you might notice that my focus in private shifted the last years; I have been doing a lot of sport and training, and less doing site projects that involve any big setups. This is where a small tool that runs without a server installation comes in super handy: I was able to define a neat schema for my photovoltaic systems with a bunch of views and now have a good enough dashboard.

My biking page has been remodeled to be a static page these days but uses a DuckDB database beneath, see biking.michael-simons.eu.

At the moment, I neither need or don’t want more. And as sad it might be, I don’t have a graph problem in either of those applications. I want to aggregate measurements, do analytics and that’s it. It’s a lot of time-series that I’m working with and graph doesn’t really help me there.

Those are use cases that are unrelated to work.

But I do have often times use for DuckDB at work, too. Neo4j can natively ingest CSV files. You need to write some Cypher to massage them into the graph you want, but still.That CSV must be properly formatted.

DuckDB on the other hand can read from CSV, Parquet, JSON and other formats as if they are tables. These source can be in files or in URLs. It does not require you to actually create a schema and persist data in its own store, but just can query things. Querying data without persisting the data as a technique might be unusual for a database and seems counter-intuitive at first look, but is useful in the right situations. DuckDB does not need to persist the content, but you are free to create views and over them. You can happily join a CSV file with a JSON file and literally copy the result to a new CSV file that is well suited for Neo4j.

This can all be done either in a CLI or actually as part of a pipeline in a script.

DuckDB replaced a bunch of tools in my daily usage, such as xiv and to some extend, jq. DuckDBs JSON processing capabilities allow you to query and normalize many complex and denormalized JSON documents, but in some cases it’s not enough… It’s just that you can do so much “interesting” things in JSON 😉

Anyhow, a long read, but maybe an explanation why you did see so many toots by me that dealt with DuckDB or even my profile at DuckDB snippets.

I think it’s valuable knowing other technology and also mastering more than one (query) language. I value both my Cypher knowledge and everything I learned about SQL in the past. Hopefully, I can teach some about both, with my current work and any future publication.

And last but not least: Mark Needham, Michael Hunger and myself have worked together to bring out “DuckDB in Action”. The Manning Early Access Program (MEAP) started October 2023 and the book will be released in 2024. We have more than 50% of the content ready and we would love to hear your feedback:

| Comments (2) »

05-Oct-23


Integrate the AuthManager of Neo4j’s Java Driver with Spring Boot

The following post is more a less a dump of code. Since version 5.8 the official Neo4j drivers supports expiration of authentication tokens (see Introduce AuthToken rotation and session auth support. The PR states: “The feature might also be referred to as a refresh or re-auth. In practice, it allows replacing the current token with a new token during the driver’s lifetime. The main objective of this feature is to allow token rotation for the same identity. As such, it is not intended for a change of identity.”

It’s up to you, if you are gonna change identities with it or not, but in theory you can. Personal opinion: It’s actually one of the main reasons I would integrate it into any backend application that is remotely doing anything multitenancy with it. Why? The impersonation feature of the driver that also exists does not work with credentials checking by default, so go figure: The one thing you want to have in a backend application (one driver instance transparently checking privileges for different tenants authenticated via a token (be it bearer or username/password), either is discouraged but works or does not work.

Normally, I would suggest using a org.springframework.boot.autoconfigure.neo4j.ConfigBuilderCustomizer for changing anything related to the driver’s config, as it would spare me duplicating all the crap below (as described in my post Tailor-Made Neo4j Connectivity With Spring Boot 2.4+), but sadly, the org.neo4j.driver.AuthTokenManager is not configurable via the config. I therefor have opened a pull request over at Spring Boot to allow the detection of an AuthTokenManager bean which hopefully will make it into Spring Boot, rendering the stuff below unnecessary (See Spring Boot PR #36650). For now, I suggest duplicating a couple of pieces of from Spring Boot which turns environment properties into configuration, so that you can still completely rely on the standard properties. The relevant piece in the config is the driver method that – for now – is required to add an AuthTokenManager option to the driver. You are completely free to create one using the factory methods the driver provides or create a custom implementation of the interface. Some ideas are mentioned in the inline comments.

import java.io.File;
import java.net.URI;
import java.time.Duration;
import java.time.ZonedDateTime;
import java.util.Locale;
import java.util.concurrent.TimeUnit;
 
import org.neo4j.driver.AuthTokenManagers;
import org.neo4j.driver.Config;
import org.neo4j.driver.Driver;
import org.neo4j.driver.GraphDatabase;
import org.springframework.boot.autoconfigure.neo4j.Neo4jConnectionDetails;
import org.springframework.boot.autoconfigure.neo4j.Neo4jProperties;
import org.springframework.boot.context.properties.source.InvalidConfigurationPropertyValueException;
import org.springframework.context.annotation.Bean;
import org.springframework.context.annotation.Configuration;
 
@Configuration(proxyBeanMethods = false)
public class Neo4jCustomAuthConfig {
 
   /**
    *
    * @param connectionDetails I'm using the Spring Boot 3.1 abstraction over all service connection details here, 
    *                          so that the cool new container integration I described last week in
    *                          <a href="https://info.michael-simons.eu/2023/07/27/the-best-way-to-use-testcontainers-from-your-spring-boot-tests/">The best way to use Testcontainers from your Spring Boot Tests</a> still applies
    *                          If you are not on Spring Boot 3.1, this class is not available. Remove that argument and
    *                          just use {@link Neo4jProperties#getUri()} then.
    * @param neo4jProperties   Injected so that pool and other connection settings are still configured from the default propertes / environment
    * @return The driver to be used in the application and further down the stack in Spring Data Neo4j
    */
   @Bean
   Driver driver(Neo4jConnectionDetails connectionDetails, Neo4jProperties neo4jProperties) {
 
      // Right now, the factory for AuthTokenManagers only supports expiration based tokens.
      // This is mostly useful for anything token related. You could hook this into Spring Security
      // for example and pass on any JWT token.
      // Another option is using the username and password like here and grab an additional expiration date, i.e. from
      // the config or the environment. When the expiration date is reached, the supplier passed to the factory
      // method will be asked for a new token. This can be a new token or a new username and password configuration.
      // Take note that there is no way to actively trigger an expiration.
      // This would require changes in the Neo4j-Java-Driver:
      // <a href="https://github.com/neo4j/neo4j-java-driver/issues/new">Open an issue</a>.
      var authManager = AuthTokenManagers.expirationBased(
         // Here I'm just using the token from the connection. This must be ofc something else for anything that should make sense
         () -> connectionDetails.getAuthToken()
            .expiringAt(ZonedDateTime.now().plusHours(1).toInstant().toEpochMilli())
      );
 
      // You can totally run your own AuthManager, too
      /*
      authManager = new AuthTokenManager() {
         @Override
         public CompletionStage<AuthToken> getToken() {
            return CompletableFuture.completedFuture(connectionDetails.getAuthToken());
         }
 
         @Override
         public void onExpired(AuthToken authToken) {
            // React accordingly
         }
      }
      */
 
      var uri = connectionDetails.getUri(); // or for older boot versions neo4jProperties.getUri()
      var config = doAllTheStuffSpringBootCouldDoIfAuthManagerWasConfigurableViaConfig(uri, neo4jProperties);
 
      return GraphDatabase.driver(uri, authManager, config);
   }
 
   // Everything below is a verbatim copy from spring boot for the most relevant pieces
   // that can be configured via properties.
   // As of know, pick what you need or add what's missing.
 
   Config doAllTheStuffSpringBootCouldDoIfAuthManagerWasConfigurableViaConfig(URI uri, Neo4jProperties neo4jProperties) {
 
      var builder = Config.builder();
 
      var scheme = uri.getScheme().toLowerCase(Locale.ROOT);
      if (scheme.equals("bolt") || scheme.equals("neo4j")) {
         var securityProperties = neo4jProperties.getSecurity();
         if (securityProperties.isEncrypted()) {
            builder.withEncryption();
         } else {
            builder.withoutEncryption();
         }
         builder.withTrustStrategy(mapTrustStrategy(securityProperties));
         if (securityProperties.isEncrypted()) {
            builder.withEncryption();
         } else {
            builder.withoutEncryption();
         }
      }
      builder.withConnectionTimeout(neo4jProperties.getConnectionTimeout().toMillis(), TimeUnit.MILLISECONDS);
      builder.withMaxTransactionRetryTime(neo4jProperties.getMaxTransactionRetryTime().toMillis(), TimeUnit.MILLISECONDS);
 
      var pool = neo4jProperties.getPool();
 
      if (pool.isLogLeakedSessions()) {
         builder.withLeakedSessionsLogging();
      }
      builder.withMaxConnectionPoolSize(pool.getMaxConnectionPoolSize());
      Duration idleTimeBeforeConnectionTest = pool.getIdleTimeBeforeConnectionTest();
      if (idleTimeBeforeConnectionTest != null) {
         builder.withConnectionLivenessCheckTimeout(idleTimeBeforeConnectionTest.toMillis(), TimeUnit.MILLISECONDS);
      }
      builder.withMaxConnectionLifetime(pool.getMaxConnectionLifetime().toMillis(), TimeUnit.MILLISECONDS);
      builder.withConnectionAcquisitionTimeout(pool.getConnectionAcquisitionTimeout().toMillis(),
         TimeUnit.MILLISECONDS);
      if (pool.isMetricsEnabled()) {
         builder.withDriverMetrics();
      } else {
         builder.withoutDriverMetrics();
      }
 
      return builder.build();
   }
 
   private Config.TrustStrategy mapTrustStrategy(Neo4jProperties.Security securityProperties) {
 
      String propertyName = "spring.neo4j.security.trust-strategy";
      Neo4jProperties.Security.TrustStrategy strategy = securityProperties.getTrustStrategy();
      Config.TrustStrategy trustStrategy = createTrustStrategy(securityProperties, propertyName, strategy);
      if (securityProperties.isHostnameVerificationEnabled()) {
         trustStrategy.withHostnameVerification();
      } else {
         trustStrategy.withoutHostnameVerification();
      }
      return trustStrategy;
   }
 
   private Config.TrustStrategy createTrustStrategy(Neo4jProperties.Security securityProperties, String propertyName,
      Neo4jProperties.Security.TrustStrategy strategy) {
      switch (strategy) {
         case TRUST_ALL_CERTIFICATES:
            return Config.TrustStrategy.trustAllCertificates();
         case TRUST_SYSTEM_CA_SIGNED_CERTIFICATES:
            return Config.TrustStrategy.trustSystemCertificates();
         case TRUST_CUSTOM_CA_SIGNED_CERTIFICATES:
            File certFile = securityProperties.getCertFile();
            if (certFile == null || !certFile.isFile()) {
               throw new InvalidConfigurationPropertyValueException(propertyName, strategy.name(),
                  "Configured trust strategy requires a certificate file.");
            }
            return Config.TrustStrategy.trustCustomCertificateSignedBy(certFile);
         default:
            throw new InvalidConfigurationPropertyValueException(propertyName, strategy.name(),
               "Unknown strategy.");
      }
   }
}

Happy coding.

Titel picture from Collin at Unsplash.

| Comments (0) »

31-Jul-23


The best way to use Testcontainers from your Spring Boot tests!

After a long time of blog hiatus, I was in the mood of trying out one of these “The best way to XYZ” posts for once.

While Spring Boot 3 and Spring Framework 6 releases have focused a lot on revamping the application context and annotation processing for GraalVM native image compatibility (and “boring” tasks like Java EE to Jakarta EE migrations), Spring Boot 3.1 and the corresponding framework edition come with a lot of cool changes.
While I was evaluating them for my team (and actually, for a new Spring Infographic coming out later this year), I especially dove into the new `@ServiceConnection` and the related infrastructure.

@ServiceConnection comes together with a hierarchy of interfaces, starting at ConnectionDetails. You might wonder what’s the fuss about that marker interface, especially when you come only from a relatively standardised JDBC abstraction: It makes it possible to abstract connections away from a second angle. Do configuration values come from property sources or something else? In that case, from information that Testcontainers provide. ConnectionDetails is just the entry point to JdbcConnectionDetails or other more specific ones, such as Neo4jConnectionDetails. Below those, concrete classes exists to connect to services.

The nice thing is: You don’t have to deal with that a lot, because there are many existing implementations:

  • Cassandra
  • Couchbase
  • Elasticsearc
  • Generic JDBC or specialised JDBC such as MariaDB, MySQl, Oracle, PostgreSQL
  • Kafka
  • MongoDB
  • Neo4j
  • RabbitMQ
  • Redpanda

More about those features in Spring Boot 3.1’s ConnectionDetails abstraction and Improved Testcontainers Support in Spring Boot 3.1.

We will focus on the latter. What has been pestering me for a while: If you use Testcontainers JUnit 5 extensions to integrate containers with Spring Boot test, you end up in a scenario in which two systems try to manage resources over a lifetime, which is not ideal.
If you have solved that, you must learn about @DynamicPropertySource and friends, as demonstrated by Maciej nicely.

With some good combinations of @TestConfiguration, the aforementioned new stuff and some good chunk of explicitness, this is a solved issue now.

Take this application as an example (I smashed everything in one class so that it is visible as a whole, not as a best practices, but well, I don’t care if you do program in production like this, I have seen worse):

import java.util.List;
import java.util.UUID;
 
import org.springframework.boot.SpringApplication;
import org.springframework.boot.autoconfigure.SpringBootApplication;
import org.springframework.data.neo4j.core.schema.GeneratedValue;
import org.springframework.data.neo4j.core.schema.Id;
import org.springframework.data.neo4j.core.schema.Node;
import org.springframework.data.neo4j.repository.Neo4jRepository;
import org.springframework.data.neo4j.repository.config.EnableNeo4jRepositories;
import org.springframework.web.bind.annotation.GetMapping;
import org.springframework.web.bind.annotation.RestController;
 
@SpringBootApplication
@EnableNeo4jRepositories(considerNestedRepositories = true)
public class MyApplication {
 
	public static void main(String[] args) {
		SpringApplication.run(MyApplication.class, args);
	}
 
	@Node
	public record Movie(@Id @GeneratedValue(GeneratedValue.UUIDGenerator.class) String id, String title) {
 
		Movie(String title) {
			this(UUID.randomUUID().toString(), title);
		}
	}
 
	interface MovieRepository extends Neo4jRepository<Movie, String> {
	}
 
	@RestController
	static class MovieController {
 
		private final MovieRepository movieRepository;
 
		public MovieController(MovieRepository movieRepository) {
			this.movieRepository = movieRepository;
		}
 
		@GetMapping("/movies")
		public List<Movie> getMovies() {
			return movieRepository.findAll();
		}
	}
}

You will want the following dependencies in your test scope:

<dependency>
	<groupId>org.springframework.boot</groupId>
	<artifactId>spring-boot-starter-test</artifactId>
	<scope>test</scope>
</dependency>
<dependency>
	<groupId>org.springframework.boot</groupId>
	<artifactId>spring-boot-testcontainers</artifactId>
	<scope>test</scope>
</dependency>
<dependency>
	<groupId>org.testcontainers</groupId>
	<artifactId>neo4j</artifactId>
	<scope>test</scope>
</dependency>

We use @TestConfiguration to provide an additional test configuration. The @TestConfiguration over @Configuration is twofold: Unlike regular @Configuration classes it does not prevent auto-detection of @SpringBootConfiguration. And: It must be imported explicitly unless it is an inner static class to a test class. The code below than has one @Bean method with @ServiceConnection. The method returns a Neo4jContainer Testcontainer. That container is marked as reusable. As we don’t close that resource by default, we let Testcontainers take care of cleaning it up. When marked as reusable, it will be kept alive and around, meaning a second test run will be much fast (See my video about that topic, if you don’t like it, there’s cycling and food in it as well and fwiw, I also put this into written words over at Foojay.io). The container also carries a special label, which is irrelevant for this config, but will be used later. We will then also address the @RestartScope annotation.

This definition provides enough information for the context so that Spring can bring up that container, rewire all the connections for Neo4j to it and everything just works.

import java.util.Map;
 
import org.springframework.boot.devtools.restart.RestartScope;
import org.springframework.boot.test.context.TestConfiguration;
import org.springframework.boot.testcontainers.service.connection.ServiceConnection;
import org.springframework.context.annotation.Bean;
import org.testcontainers.containers.Neo4jContainer;
 
@TestConfiguration(proxyBeanMethods = false)
public class ContainerConfig {
 
	@Bean
	@ServiceConnection
	@RestartScope
	public Neo4jContainer<?> neo4jContainer() {
		return new Neo4jContainer<>("neo4j:5")
			.withLabels(Map.of("com.testcontainers.desktop.service", "neo4j"))
			.withReuse(true);
	}
}

Putting this into action might look like this. You might notice how to import the config and the absence of messing with properties.

import static org.assertj.core.api.Assertions.assertThat;
 
import org.junit.jupiter.api.Test;
import org.springframework.beans.factory.annotation.Autowired;
import org.springframework.boot.test.context.SpringBootTest;
import org.springframework.context.annotation.Import;
 
@SpringBootTest
@Import(ContainerConfig.class)
class MyApplicationTests {
 
	@Test
	void repositoryIsConnectedAndUsable(
		@Autowired MyApplication.MovieRepository movieRepository
	) {
		var movie = movieRepository.save(new MyApplication.Movie("Barbieheimer"));
		assertThat(movie.id()).isNotNull();
	}
}

To be fair, there are a bunch of other ways – also in the official docs, but I like this by far the best. It’s clear and concise, by being pretty explicit and sticking to one set of annotations (from Spring).

Now about @RestartScope. That annotation is there fore a reason: You might have Springs devtools on the class path which will restart the context if necessary. When the context restarts, the container will restart, defeating the reusable flag. The annotation keeps the original bean around. Why is this relevant? The new Testcontainers support really works well with the concept of “developer services” as introduced by Quarkus. Originally, we only wanted to do test driven development, but than something happened, things getting rushed and in the end, explorative work is fun: So bringing up your application together with a running instance of a database or service feels a lot like “batteries included” and can make you very productive.

Spring Boot supports this now, too, but keeps it (by default) very explicit and also restricted to testing scope. org.springframework.boot.SpringApplication has a new with method used to augment an automatic configured application with additional config. We can use the above ContainerConfig in an additional main-class living in our test scope like this:

import org.springframework.boot.SpringApplication;
 
public class MyApplicationWithDevServices {
 
	public static void main(String[] args) {
		SpringApplication.from(MyApplication::main)
			.with(ContainerConfig.class)
			.run(args);
	}
}

Starting this app, a request to http://localhost:8080/movies immediately works, connected against a Neo4j instance running in a container.

Now the best for last, what about that ominous label I added to the container? I am a happy Testcontainers Cloud user and I have their service running on my machine. This automatically redirects any Testcontainers container request to the cloud and I don’t need Docker on my machine.

There’s also the possibility to define fixed port-mappings for both containers running in the cloud and locally as described here Set fixed ports to easily debug development services.

I have the following configuration on my machine:

more /Users/msimons/.config/testcontainers/services/neo4j.toml 
 
# This example selects neo4j instances and forwards port 7687 to 7687 on the client.
# Same for the Neo4j HTTP port
# Instances are found by selecting containers with label "com.testcontainers.desktop.service=neo4j".
 
# ports defines which ports to proxy.
# local-port indicates which port to listen on the client machine. System ports (0 to 1023) are not supported.
# container-port indicates which port to proxy. If unset, container-port will default to local-port.
ports = [
  {local-port = 7687, container-port = 7687},
  {local-port = 7474, container-port = 7474}
]

This allows me to access the Neo4j instance started by MyApplicationWithDevServices above under the well known Neo4j ports, allowing things like this:

# Use Cypher-Shell to create some data
cypher-shell -uneo4j -ppassword "CREATE (:Movie {id: randomUuid(), title: 'Dune 2'})"
# 0 rows
# ready to start consuming query after 15 ms, results consumed after another 0 ms
# Added 1 nodes, Set 2 properties, Added 1 labels
 
# Request the data from the application running with dev services
http localhost:8080/movies                                                           
 
# HTTP/1.1 200 
# Connection: keep-alive
# Content-Type: application/json
# Date: Thu, 27 Jul 2023 13:32:58 GMT
# Keep-Alive: timeout=60
# Transfer-Encoding: chunked
#
# [
#    {
#        "id": "824ec97e-0a97-4516-8189-f0bf5eb215fe",
#        "title": "Dune 2"
#    }
# ]

And with this, happy coding.

Update: I’m happy that Sergei personally read my post and rightfully noticed me that having fixed ports is possible with local and cloud Testcontainers. I edited the instructions accordingly. Thanks, buddy!

| Comments (3) »

27-Jul-23