Running free… A developers story of development.

This is gonna be the first of four parts accompanying my talk with the same name: Running free… A developers story of development. which I held in March 2020 at the inaugural launch of dev.next. I’m grateful that Venkat Subramaniam gave me the opportunity to do this.

Sadly, dev.next got cancelled respectively postponed due to the 2019–20 coronavirus pandemic.

I had created a readable version of the talk in hope anyone finds this motivating or can draw something for themselves out of it. I’m happy to have it now and I’m gonna share it in five parts over the coming weeks. If anyone finds it useful, I love to hear your feedback.

Overview

Running free… A developers story of development.



This is “Running free”, a talk about a developers story of development. In this post I will tell you bits and pieces of my story that I hope you might find useful.

This talks needs a couple of warnings. The first – and shorter warning – it’s a personal thing, the standard about me page doesn’t apply. The whole talk is probably an about page.

I am gonna share a lot of personal stories and more important, personal recipes and strategies to cope and tackle various things. It should be obvious, that these approaches may work or may not work for you. They work in most cases for me, but your milage will vary.

Am I the right person for this?



I’d say, it depends… On a good day, of course I am. I’m confident in what I am doing, I am aware that I’m good at what I am doing. On a bad day, probably not.

There are actually tons of days where I

  • Feel bad about me
  • Feel bad about the stuff I do
  • Overwork
  • Stress out (myself, my family and colleagues (Hello, Gerrit… ))

But… let’s focus on the good days and keep it with Dajana here:



While I actually have training and I am certified to train other people for a job in computer science, i am not a personal coach and I have never been trained in coaching other people. I can only share my experiences the same way a lot of people shared their experience with me.

The talk will contain a couple of quotes from people I like and whose input helped me to rethink a couple of things. People and the network you’ll build over the course of a career are much more important than technology. Always remember that: Value relationships, in both directions.

So, let’s get started: who is this developer?

About me (Standard Edition)



Let’s start with the usual “About me” slide. My name is Michael Simons. What do we have here:

I work for Neo4j, the graph database company. We have a lot of cool things out there. First of all, the graph database with the same name and then the thing that I’m working on with my friend Gerrit Meier: The Neo4j-OGM and Spring Data integration.

Of course I have to brag around that I’m one of the Java Champions. It’s one of these things that made me really proud, actually. How did it come to this?

The slide explains it as well: I’m the current lead for a Java user group named EuregJUG, which I have been running now in the 5th year with great success. In that role I’m also actively involved with the JavaLand conference. I’m also the author of a couple of books: The first german Spring Boot 2 book and “Arc42 by example”,

Before I was at Neo4j, I worked most of my professional life at ENERKO INFORMATIK in Aachen, Germany. In between I briefly worked as a senior consultant for the well known German consultancy INNOQ.

In good moments I am able to honestly say that I’m proud about all of these things. In some others, my imposter syndrome kicks in and doesn’t let me realize that this is actually something.

At this point you’d usually find the company slide or slides, depending on how much marketing and legal material one is obliged to show but I like to quote Sébastien from RedHat here:

We are not just developers.



Sébastien runs the Riviera Dev in France and they have a track named this way about exactly that topic.
We spoke briefly about this talk here and the track name immediately clicked with me for a couple of reasons.

Hopefully it is clear that thinking about yourself as a Java or .NET, as Go or Rust developer or whatever
language you prefer these days, narrows the number of options and choices you might have.
So from a professional point of view it is of course a good choice to think as “I’m a developer”.

But thinking of oneself as just a developer and probably focussing on gaining appreciation from that single source makes you vulnerable and insecure.

First of all, we are humans, with different needs and hopefully, different interests and resulting from that, different inputs.

About me (Additional Edition)



So, I am a father of two, a husband to a beautiful wife, I cycle, run, read and do a lot of more things. I am probably a few more other things.

The things I do change over time, sometimes but less often, the things I am do too. It’s important to distinguish between something you need to be and something you do. Those are not the same (and please don’t get me started about “I have a wife” vs “I am a husband” or the same with “I have two kids” vs “I am a parent”)!

Sometimes it is appropriate to include an additional slide like this one, sometimes it isn’t. I think this time it was.

Keep a slide like this in mind next time you feel bad about code, a project or work in general.

I am We are not…



Actually, I think we could avoid a whole category of problems in our job when we could remember a couple of easy things here:

  • We are not the code we wrote nor the project we maintain.
  • We are not the architecture of a system we are responsible for.

If we don’t keep these things in mind we’re in danger of drawing all our self worth from external opportunities. This makes reasoning and speaking about technical problems so much harder, even when the other people in the room are good colleagues and your project has a good discussion culture.

Why? Because getting self worth and appreciation from only a single thing is a single point of failure and as such, a pain in the ass. Even if it does not fail, it bothers and bugs us.

As alway, there are smarter people in the room. When I tweeted about this thoughts – quite exactly a year ago – Roman Kennke, Java VM Hacker and Principal Software Engineer at RedHat, responded like this:

Don’t rely on any outside appreciation



I appreciate it very much that Roman replied with can. I mean, this is endboss-level-hard. Anyone of us likes a casual “thank you”, a “well done”. Even more so some verbalization that something we did was helpful.

By all probability not only the younger people in the audience but anyone who is in one form or the other on social media looks for hearts, stars, thumb ups, followers and what not.

Stuff like this becomes addictive and you will have a hard time getting off from it. I’m not saying it’s completely bad, not at all, but mind the dosage.

Find things that you enjoy, preferable a broad range of things, make yourself less fragile and more resilient.

For me, those couple of statements already required some kind of focus shift. Getting my head and my thoughts away from recurring patterns and thoughts. A cognitive therapy, if you like, so let’s start with that.

Continue with part 2: Miles are my meditation.

| Comments (1) »

17-Mar-20


Polyglot programming on the GraalVM

(*) As of now R, Ruby, Python and JavaScript 😉

GraalVM consists of a Java Compiler, a VM and a JDK based on HotSpot/OpenJDK, implemented in Java. These days it’s all about additional execution modes compared to the standard JVM, mainly ahead-of-time compilation and the creation of native images. Native images are an important piece to create Java based applications that are easier to scale in very elastic ways and large deployments.

However, the GraalVM has a couple of more distinctive features:

  • Truffle Language Implementation framework and the GraalVM SDK, to implement additional programming language runtimes
  • LLVM Runtime and JavaScript Runtime

The content of this post is based on the Truffle framework and with it, the polyglot aspect of GraalVM.

There have been a couple of great talks about Graal and especially Polyglot. Have a look at Polyglot Applications with GraalVM from @mesirii and JS, Rust, Python and the Holy Graal by @larsr_h.

The GraalVM is polyglot in many different ways: You can embedded a supported guest language in a host language through a so called polyglot context but you can also call access all supported languages from a supported, dynamic language by running the GraalVM version of the interpreter with the --polyglot.

Inside the GraalVM manual you’ll find the polyglot reference as well as the embedding reference. The embedding reference is mostly about running supported languages from Java programs. This won’t be our concern in this post. This is about the polyglot reference. The examples in the manual are designed for first selecting a start language and then a target language. The examples follow the pattern that a host languages executes some simple code in the guest language.

Me friend Michael however nerd snipped me with a different idea: How to access Neo4j from R? Or use the multi database features from Neo4j 4.0 from a language for which we (I work at Neo4j at the drivers team) haven’t yet come up with a driver?

GraalVM polyglot interoperability for the win. You find information about GraalVM’s polyglot interoperability per language inside the manual, for example here for Ruby.

In short: The interpreters for the supported languages all comes with an API to interact with all other supported languages.

That allows us the Neo4j Java Driver in it’s current 4.0.0 mainline to from R, Python, Ruby and for completeness from JavaScript (the 4.0 JavaScript driver for Neo4j is already there) as well.

Michael and I have setup a repository named “neo4j-graalvm-polyglot-examples” that demonstrates this approach.

Setting up Neo4j

These example are about access to Neo4j from various languages. I find the easiest way to have an instance up in running in no time is Docker.

docker run --publish=7474:7474 --publish=7687:7687 -e 'NEO4J_AUTH=neo4j/secret' neo4j:4.0.1

gives you a running instance. After that, you can open http://localhost:7474/browser/?cmd=play&arg=movies and install our Movie graph to have a dataset to work with.

Other options include our Desktop edition which you find among other downloads here: https://neo4j.com/download/

Setting up GraalVM

Michael Hunger uses SDKMan! for downloading and installing GraalVM, I went to the GraalVM Downloads page at GitHub and got the JDK 11 edition of GraalVM 20.0.0.

After downloading and installing, you should have a valid GRAALVM_HOME and a JAVA_HOME pointing to the former:

echo $GRAALVM_HOME 
/Library/Java/JavaVirtualMachines/graalvm-ce-java11-20.0.0/Contents/Home
✗ echo $JAVA_HOME
/Library/Java/JavaVirtualMachines/graalvm-ce-java11-20.0.0/Contents/Home
✗ java -version
openjdk version "11.0.6" 2020-01-14
OpenJDK Runtime Environment GraalVM CE 20.0.0 (build 11.0.6+9-jvmci-20.0-b02)
OpenJDK 64-Bit Server VM GraalVM CE 20.0.0 (build 11.0.6+9-jvmci-20.0-b02, mixed mode, sharing)

GraalVM comes with gu, the GraalVM Component Updater. gu is used to install additional packages. We use it to install R, Ruby and Python packages as well as the native image tool for GraalVM:

$GRAALVM_HOME/bin/gu install R
$GRAALVM_HOME/bin/gu install Ruby     
$GRAALVM_HOME/bin/gu install Python
$GRAALVM_HOME/bin/gu install native-image

The list of installed components should now look likes this:

$GRAALVM_HOME/bin/gu list                
 
ComponentId              Version             Component name      Origin 
--------------------------------------------------------------------------------
graalvm                  20.0.0              GraalVM Core        
R                        20.0.0              FastR               github.com
llvm-toolchain           20.0.0              LLVM.org toolchain  github.com
native-image             20.0.0              Native Image        github.com
python                   20.0.0              Graal.Python        github.com
ruby                     20.0.0              TruffleRuby         github.com

JS respectively Node come by default with GraalVM.

Running the examples

The examples live in neo4j-graalvm-polyglot-examples. Clone this repository via standard Git means. The Neo4j driver lives under the Maven coordinates org.neo4j.driver:neo4j-java-driver. The driver has a single dependency to the reactive stream API. To make the download easier, the repository comes with a Gradle build that works both under Windows and Linux. Get and export the required dependencies via

./gradlew downloadDependencies
export CLASSPATH=lib/neo4j-java-driver-4.0.0.jar:lib/reactive-streams-1.0.2.jar

All examples can now be run like this:

# R
$GRAALVM_HOME/bin/Rscript --jvm --vm.cp=$CLASSPATH neo4j-graalvm-fastr-example.R
 
# Javascript
$GRAALVM_HOME/bin/node --jvm --vm.cp=$CLASSPATH neo4j-graalvm-javascript-example.js
 
# Python
$GRAALVM_HOME/bin/graalpython --jvm --vm.cp=$CLASSPATH neo4j-graalvm-python-example.py
 
# Ruby
$GRAALVM_HOME/bin/truffleruby --jvm --vm.cp=$CLASSPATH neo4j-graalvm-ruby-example.rb

They all execute the following query

MATCH (:Person {name:$name})-[:ACTED_IN]->(m)<-[:ACTED_IN]-(coActor) 
RETURN DISTINCT coActor

The query has one parameter called name and gives you all the actors that acted in the same movie like the request actor.

I take neo4j-graalvm-fastr-example.R as example, but the idea is the same for all the examples above.

First, you have to import the required classes via GraalVM’s Java API:

graphDatabase <- java.type('org.neo4j.driver.GraphDatabase')
authTokens <- java.type('org.neo4j.driver.AuthTokens')
config <- java.type('org.neo4j.driver.Config')

With our static factory method, a database connection is opened:

driver <- graphDatabase$driver('bolt://localhost:7687', authTokens$basic('neo4j', 'secret'), config$defaultConfig())

It looks similar in all the other languages. The Neo4j drivers are session oriented. That means, the driver instance is a long living object you keep around and the session is used for your tasks. The driver takes care of connection pooling.

Executing the above query looks like this in R:

query <- '
    MATCH (:Person {name:$name})-[:ACTED_IN]->(m)<-[:ACTED_IN]-(coActor) 
    RETURN DISTINCT coActor
'
 
session <- driver$session()
# The R list (which behaves like an associative array) is automatically converted to a Java Map 
coActorsRecords <- session$run(query, list(name="Tom Hanks"))$list()

This gives you a list of records that can be processed further.

Recap

The GraalVM is a fascinating piece of software. We are not combining trivial libraries in those examples, but loading a driver that manages a connection pool based on an embedded Netty.

While it is of course preferable to have everything “natively” to your language, a polyglot environment like this gives you the opportunity to use foreign functions without that much effort. It wouldn’t be that hard to wrap our Java driver with idiomatic code for someone that has actual R knowledge to make it feel like first class R citizen.

Another use case is to have access to all features of Neo4j 4.0, for example the multi database feature. This is not yet available in the Python driver or the community driven Ruby driver. Here’s an example on how to use them: Python and Ruby.

The official, native Neo4j drivers that are already fully Neo4j 4.0 are here:

The Python driver will be released the coming weeks with full 4.0 support.

| Comments (2) »

06-Mar-20


Spring Data Neo4j, Neo4j-OGM and OSIV

TL;DR: Don’t use Open Session in View with a Neo4j-Cluster

If you use Spring Data Neo4j and Neo4j-OGM in your Spring Boot application connectect against a Neo4j cluster (via bolt+routing://, neo4j:// or multiple bolt-uris), configure spring.data.neo4j.open-in-view explicitly to false like this: spring.data.neo4j.open-in-view=false

What is Open Session in View?

Vlad Mihalcea has written extensive amounts about why the Open Session in View Pattern is an Anti pattern these days.

Vlad is well known for his work in JPA and especially hibernate world. What the hell has this to do with Spring Data Neo4j and Neo4j’s Object Graph Mapping (Neo4j-OGM)? It turns out, a lot. We have to bring a couple of things together.

Neo4j routing

This is maybe unexpected, but the first thing we have to understand is Neo4j’s routing mechanism. Neo4j databases are able to run as clusters. The instances of those clusters use the Raft consensus algorithm. Part of that algorithm are leaders, followers and read replicas.

My colleague David allen has a nice post out there on how to query a Neo4j cluster: Querying Neo4j Clusters.

The important stanza for this post is:

The leader is responsible for coordinating the cluster and accepting all writes. Followers help scale the read workload ability of the cluster and provide for high-availability of data. Should one of the cluster machines fail, and you still have a majority, you can still process reads and writes. If your cluster loses the majority it can only serve (stale) reads and has no leader anymore.

Optionally, you can have any number caches in the form of read replicas. They are read-only copies of your database for scaling out read-query load. They are not officially part of the cluster, but rather are “tag along” copies that get replicated transactions from the main cluster.

When you connect to a cluster one of the Neo4j drivers does the job of routing for you everytime you open a connection via the bolt+routing:// respectivly neo4j://. It is said that “the driver is connected to a cluster”. The driver has a routing table, knowing leader, follower and read replicas.

The driver does not parse Cypher on the client side. The only information it has available to pick a host from the routing table is whether you want a read session or a read-write session respectivly transaction. Therefor it is very important to make that choice consciously! Otherwise all request will go to the leader, as that instance will be able to answer all of them.

Neo4j-OGM

Neo4j-OGM is able to support Neo4j cluster and routing via it’s Bolt transport. There are a couple of convience methods on the Neo4j-OGM session that let’s you specify whether it’s a read or a read-write statement you want to run. Neo4j-OGM actually does look into your Cypher and issues a warning if you ask for a read-only transaction but have something like MERGE, CREATE or UPDATE in your Cypher.

So up here it’s all good.

Entering Spring’s @Transactional

Spring’s @Transactional is a great piece of code. It instructs the framework to execute your code in a transactional around aspect: A transaction is opened before your code and closed afterwards, commiting if your code ran without error, rolling back otherwise. A transaction manager takes care that this also works with a explicit TransactionTemplate.

The annotation has an attribute named readOnly. In contrast to a superficial look at it, it doesn’t prevent writes to happen against a database. And how could it? @Transactional is a general purpose mechanism, working with JPA, Neo4j and other databases. It would need mechanisms to find out about your query for all of those. readOnly is merely an indicator to be passed on to the underlying store so that this store configure characteristics as needed. Vlad has written down what it does for Hibernate.

For Neo4j-OGM it just configures the default mode of the Driver’s session: read only or read-write. As we learned above, this is super important to prevent the leader from being hammered by all the queries.

Spring Data Neo4j and Neo4j-OGM supports this in all of the following scenarios, given the following repository and service:

import java.util.Collection;
import java.util.Collections;
import java.util.List;
 
import org.neo4j.ogm.session.Session;
import org.springframework.data.neo4j.annotation.Query;
import org.springframework.data.neo4j.repository.Neo4jRepository;
import org.springframework.stereotype.Service;
import org.springframework.transaction.annotation.Transactional;
 
public interface MovieRepository extends Neo4jRepository<Movie, Long> {
 
	@Transactional(readOnly = true)
	@Query("MATCH (m:Movie) RETURN m.tagline as value")
	List<SomeResult> findCustomResultWithTxOnRepository();
 
	// Not read only
	@Query("MATCH (m:Movie) RETURN m.tagline as value")
	List<SomeResult> findCustomResult();
}
 
 
@Service
public class SomeService {
 
	private final MovieRepository movieRepository;
 
	private final Session session;
 
	public SomeService(MovieRepository movieRepository, Session session) {
		this.movieRepository = movieRepository;
		this.session = session;
	}
 
	@Transactional(readOnly = true)
	public Collection<SomeResult> findCustomResultWithTransactionOnService() {
 
		return movieRepository.findCustomResult();
	}
 
	@Transactional(readOnly = true)
	public Collection<Movie> findMoviesViaLoadAllOnSession() {
 
		return session.loadAll(Movie.class);
	}
 
	@Transactional(readOnly = true)
	public Collection<Movie>findMoviesViaCustomQueryOnSession() {
 
		return (Collection) session.query(Movie.class, "MATCH (m:Movie) RETURN m", Collections.emptyMap());
	}
}

All of those understand that none of the queries needs to go to a leader.

Entering OSIV

For Neo4j-OGM, there’s also an OpenSessionInViewInterceptor that get’s configured via Spring Boot by default in a web application. It makes sure there’s an ongoing Neo4j-OGM session along with it’s caches and mapping in place for the whole request.

Why is this bad in a cluster scenario? Because tied to the Neo4j-OGM session there’s the driver session. To allow the user to do anything they want in the request, those sessions need to be read-write sessions. Once marked as read-write, you cannot make turn read only ever again.

That means if you call any of the above scribbeld methods from a REST endpoint, there will already be an ongoing session and the readOnly indicator will be ignored!

So if you connect a Spring Data Neo4j + Neo4j-OGM application against a Neo4j cluster: Go to your Spring Boot configuration (either properties or YAML or Environment) and explicity configure:

spring.data.neo4j.open-in-view=false

I have raised an issue with the Spring Boot team to discuss changing the default: Open session in view is problematic with Neo4j..

| Comments (2) »

03-Feb-20


Short end of the year status update.

Last year – 2018 – I ended up with 21 post on my blog, this one here, the last of this decade at this blog, will be the 11th of 2019. What happened?

I have been working more than ever, and happier than ever. Together with Gerrit Meier, we managed to bring down the open issues on Neo4j-OGM under ten (without just closing them) and at the same time designing two new projects: A Spring Boot Starter for Neo4j without any object mapping at all and SDN/RX, a new object mapping framework with a focus on reactive database access.

I started the year with a lot of JUG talks (see my Speakerdeck), in the Netherlands, in Switzerland and a couple in Germany. In between, we did a couple of talks at the EuregJUG as well.

So, busy times for blogging, at least on this platform.

I posted several Neo4j related posts on Medium. I personally like this one the best: How to choose an unique identifier for your database entities. I think it applies to other systems, too. You’ll find more interesting content at our Medium Neo4j magazine.

Late November, Ralf nerd-snipped me to participate in Advent of Code and wow, I got hit for good. I managed to come up with not only two stars each day, but a blog post as well for 13 days in December. Find my solutions at Ralf’s fan site Christina just bluntly told me after the tenth day: “Stop it, your mood is getting increasingly worse.” and that was a good thing to do.

I submitted some Call for Papers next year, mostly about reactive programming with Neo4j and Spring Data Neo4j RX and with a bit of luck, I’ll be at some conferences.

The one thing I’m looking very much forward however is dev.next in March 2020:

2019 has one of the most rewarding years I had in my professional and private life. While a casual reader of my blog might probably think “well, that guy is just doing fine”, that’s not always true. I often struggle, make mistakes and bad things happen. I’m gonna talk about some conscious decisions – like the one above, stop doing things (AoC) – and some things that might look like decisions but might have been just a couple of lucky coincides.

Anyway, until then, have a good holiday season at the end of the year and all the best for 2020.

Featured photo by Trent Erwin on Unsplash.

| Comments (0) »

23-Dec-19


Query all the things

Over the last weekend, I had some time on my hands and revisited my biking application: https://biking.michael-simons.eu. That application turned 10 this year, started as a Sinatra based Ruby application in 2009 and helped me a lot from the late 2013 and early 2014 in my career. Back then, I was using Spring for a couple of projects, both private and work, and discovered early pre 1.0 releases of Spring Boot.

I found that super interesting. With the impending release of Java 8 in 2014, I found my personal project to work on that year. I had written extensive blog posts about that application (See the entry Developing a web application with Spring Boot, AngularJS and Java 8. This posts culminated not only in an application i used nearly everyday, but also in my very first book, Arc42 by example (which is also on Packt these days).

Furthermore, the application brought me very close to the Spring Boot community. By following Spring Boot with this application, upgrading and evolving it with every release, I was able to spot issues, report them and sometimes fix them. It literally change my career.

Having said all that: Why rewriting a major part of my application? As I said, the thing was also to learn more about Java 8, about the stream API, lambdas and whatnot. I really went nuts with the streams API.

Building up an analytic query

The main feature I use is tracking my mileage. I wanted it be as simple as possible: Have a look at the odometer of each bike, add the number with a date to the bike, done. The application shall than display the difference between the recorded value of last month with the next month. So, that’s basic arithmetic, but if you want to compute the date for the following chart (available here), good luck with that:



What do we see? The upper chart shows the mileage per bike per month I rode with a summarising line over all bikes per month. The lower chart shows the lowest and highest total value per month, the average monthly value and the overall average over all years per month (the red line).

I don’t aggregate the values, I did compute them all from the periods mentioned above.

Here’s just the tip of what I needed to do to have only the data to work with: Building the period values itself (the difference between a month and the next), stored in a transient attribute:

public synchronized Map<LocalDate, Integer> getPeriods() {
    if (this.periods == null) {
        this.periods = IntStream.range(1, this.milages.size()).collect(TreeMap::new, (map, i) -> {
            final MilageEntity left = milages.get(i - 1);
            map.put(
                    left.getRecordedOn(),
                    milages.get(i).getAmount().subtract(left.getAmount()).intValue()
            );
        }, TreeMap::putAll);
    }
    return this.periods;
}

And then working with it, like this to summarise those periods:

public static Map<LocalDate, Integer> summarizePeriods(final List<BikeEntity> bikes, final Predicate<Map.Entry<LocalDate, Integer>> entryFilter) {
   return bikes.stream()
           .filter(BikeEntity::hasMilages)
           .flatMap(bike -> bike.getPeriods().entrySet().stream())
           .filter(Optional.ofNullable(entryFilter).orElse(entry -> true))
           .collect(
                   Collectors.groupingBy(
                           Map.Entry::getKey,
                           Collectors.reducing(0, Map.Entry::getValue, Integer::sum)
                   )
           );
}

There have been a couple more of those plus a ton of test code. This is nice when you want to play around with the stream API and such, but to be honest, really?

I have been speaking about SQL and jOOQ since 2016. Back then, I was already working ENERKO Informatik for a long time. We had been a database shop right from the start. It was there where I learned that is a good thing to put your database to work for you and not the other way round. At times, I was not convinced, but these days I believe in that approach more than before. And that is true regardless whether you use a relational or graph database.

Here’s one of the later installments of that content:

So let’s have a look at what is needed for the charts. First of all, the individual monthly values: This can be solved using a window function, in this case, the lead function:

SELECT 
  bikes.name, 
  bikes.color, 
  milages.recorded_on, 
  (lead(milages.amount) OVER (PARTITION BY bikes.id ORDER BY milages.recorded_on) - milages.amount) VALUE
FROM bikes
  JOIN milages
    ON milages.bike_id = bikes.id
ORDER BY milages.recorded_on ASC

This query joined the bikes and their recorded milages and moves the window over a set of two rows following each order, the order being when they have been recorded. lead gives a following row at offset 1 by default. And just with that, I can compute the difference. 5 lines, without the additional selected columns. That was easy.

I can create a SQL view from that, if I want. The query is actually needed for all the charts.

I cannot add another aggregate function on top of the window function, at least not in H2, so I need to manually aggregate to values per bike per month to a monthly value.

This is easy. Assume for a moment, there’s a table like thing monthlymilages containing the above query. We do this:

SELECT 
  monthlymilages.recorded_on, 
  SUM(monthlymilages.value) VALUE
FROM monthlymilages
WHERE VALUE IS NOT NULL
GROUP BY monthlymilages.recorded_on

sum is an aggregate function, so we need to group by the month.

Now, let’s pretend that this is a table like thing named aggregatedmonthlymilages. We then can select everything we want by slapping a bunch of aggregates on it:

SELECT 
  EXTRACT(MONTH FROM aggregatedmonthlymilages.recorded_on) MONTH, 
  round(MIN((aggregatedmonthlymilages.value))) minimum, 
  round(MAX((aggregatedmonthlymilages.value))) maximum, 
  round(avg((aggregatedmonthlymilages.value))) average
FROM aggregatedmonthlymilages
GROUP BY MONTH
ORDER BY MONTH ASC

And that’s just it. Views are bit static and I would probably need additional columns to join them and also have to maintain them. I prefer something called a Common table expression. That sounds complicated, but it is simple to use with the WITH clause:

WITH 
  monthlymilages AS (
    SELECT 
      bikes.name, 
      bikes.color, 
      milages.recorded_on, 
      (lead(milages.amount) OVER (PARTITION BY bikes.id ORDER BY milages.recorded_on) - milages.amount) VALUE
    FROM bikes
      JOIN milages
        ON milages.bike_id = bikes.id
    ORDER BY milages.recorded_on ASC
  ), 
  aggregatedmonthlymilages AS (
    SELECT 
      monthlymilages.recorded_on, 
      SUM(monthlymilages.value) VALUE
    FROM monthlymilages
    WHERE VALUE IS NOT NULL
    GROUP BY monthlymilages.recorded_on
  )
SELECT 
  EXTRACT(MONTH FROM aggregatedmonthlymilages.recorded_on) MONTH, 
  round(MIN((aggregatedmonthlymilages.value))) minimum, 
  round(MAX((aggregatedmonthlymilages.value))) maximum, 
  round(avg((aggregatedmonthlymilages.value))) average
FROM aggregatedmonthlymilages
GROUP BY MONTH
ORDER BY MONTH ASC

And that’s all that’s needed for the monthly average chart. 28 readable lines. When you see the query fully upfront, it may is intimidating, but I think show the steps leading to it, is not.

So, in a first commit, I removed all the application site computation and added a new `StatisticService` that encapsulates all database access that does not go through JPA away: Introduce SQL based statistics service.

Migrations and more possibilities

Of course I wanted to keep tests and those tests now should use the same database that I ran in production. I’m actually using H2 (file based) in production, so I didn’t have the need to go for Testcontainers. H2, really? Yes, really. The application runs on a standard hosting provider and I didn’t want to bother with database installation and H2 actually offers all the modern SQL features mentioned above. Luckily, as they are standard SQL, I could switch to another database as well anyway.

But for the tests, I need a schema and data. As I didn’t want the test be dependent on the schema derived from my entity classes, I also added Flyway and a bunch of simple DDL scripts, which you find in the commit as well.

Now with migrations in place I actually can control (nearly impossible with Hibernate auto ddl), I could address something else: I used the relation between bikes and milages to compute the last mileage for a bike as a property I gave out as JSON. So either I would need an eager association or use “open session in view” for Hibernate to make that work at the web layer: In that commit I added a computed column that I filled with the value I want. The column is now updated when adding a new milage and that’s fine to have some denormalisation.

Also: Be really careful if you have “open session in view” in your Spring application: If at some point you decide to disable it and rely on your @WebMvcTests using mocked repositories, that tests won’t fail, as they don’t need a Hibernate session at all.

Now jOOQ

So, I had the queries and I could finish, right? When I would change the schema, my tests would break and I would need to go through all query strings to fix things. That’s not something I want.

So, in “Introduce jOOQ” I added jOOQ to the build, including a temporary migration into a temporary database beforehand which generates a Java based schema for me, pretty much exactly like described in the slides at the start.

Please, have a look at the slides for an introduction. For this post, I’m gonna focus on the benefits I got.

First, I can store fragments of SQL on variables, first the computation of the monthly values, as a table like thing:

/**
 * The computed monthly milage value.
 */
private static final Field<BigDecimal> MONTHLY_MILAGE_VALUE =
        lead(MILAGES.AMOUNT).over(partitionBy(BIKES.ID).orderBy(MILAGES.RECORDED_ON)).minus(MILAGES.AMOUNT).as(ALIAS_FOR_VALUE);
 
/**
 * The cte for computing the monthly milage value.
 */
private static final CommonTableExpression<Record4<String, String, LocalDate, BigDecimal>> MONTHLY_MILAGES =
        name("monthlyMilages").as(DSL
                .select(BIKES.NAME, BIKES.COLOR, MILAGES.RECORDED_ON, MONTHLY_MILAGE_VALUE)
                .from(BIKES).join(MILAGES).onKey()
                .orderBy(MILAGES.RECORDED_ON.asc()));

This is compiled Java code based on the database schema. If the schema changes, the code won’t compile. There is no string concatenation going on. The code knows both SQL and my schema.

For the fragments shared, I used static instances. The code get’s nicer with JDK 11’s var keyword:

var aggregatedMonthlyValue = sum(MONTHLY_MILAGES.field(MONTHLY_MILAGE_VALUE)).as(ALIAS_FOR_VALUE);
var aggregatedMonthlyMilages = name("aggregatedMonthlyMilages").as(DSL
    .select(MONTHLY_MILAGES.field(MILAGES.RECORDED_ON), aggregatedMonthlyValue)
    .from(MONTHLY_MILAGES)
    .where(MONTHLY_MILAGE_VALUE.isNotNull())
    .groupBy(MONTHLY_MILAGES.field(MILAGES.RECORDED_ON)));
 
 
var value = aggregatedMonthlyMilages.field(aggregatedMonthlyValue);
var minimum = round(min(value)).as("minimum");
var maximum = round(max(value)).as("maximum");
var average = round(avg(value)).as("average");
var month = extract(aggregatedMonthlyMilages.field(MILAGES.RECORDED_ON), DatePart.MONTH).as("month");
this.database
        .with(MONTHLY_MILAGES)
        .with(aggregatedMonthlyMilages)
        .select(month, minimum, maximum, average)
        .from(aggregatedMonthlyMilages)
        .groupBy(month).orderBy(month.asc())
        .fetch();

And there’s the query again, this time type safe and in Java with the added benefit, that jOOQ knows which database I know and potentially can translate some methods and clauses for me that work different in different databases.

What about your day job with that database company?

So, I work for Neo4j since nearly 2 years now and I have giving some talks about using a graph database and why you could use it but I never said use it for everything. I am however very consistent with my opinion on query languages: Use them, they are powerful!

Luckily, in this project here I can do both: SQL in my application and Cypher, Neo4j’s query language, that currently inspires the GQL standard.

How is this possible? I use jQAssistant to verify the architecture of my application. With the changes in the application, I needed to adapt the rules. Rules written in Cypher, as jQAssistant stores the graph of the projects structure as a… Graph 🙂

So this is how the rules of package dependencies looks:

MATCH (db:Package:Database)
WITH db
MATCH (a:Main:Artifact)
MATCH (a) -[:CONTAINS]-> (p1:Package) -[:DEPENDS_ON]-> (p2:Package) <-[:CONTAINS]- (a)
WHERE not p1:Config
  and not (p1) -[:CONTAINS]-> (p2)
  and not p2:Support
  and p2 <> db
  and not (db) - [:CONTAINS*] -> (p2)
RETURN p1, p2

I have one Java package, where the jOOQ classes are stored. That is my database package which I select first. That’s the first part. The second part in this query – after the WITH clause that has a different meaning than in SQL (here it acts more like a pipe) – makes sure, that each top level package only depends on inner packages or on the support or database package.

Conclusion

In regards of lines of codes, I have now a bit more lines, mainly because I added some more domain objects to pass the series data around. If I subtract them, I save about 1000 lines Java code to compute and test my statistics.

In regards of the domain, I have now better statistics than before: I also have a table storing assorted trips. Trips done on rental bikes etc. I didn’t add them to the original statistics… Because to complicated to add.

Just queuing them and adding them to the sums was way easier. I’m quite happy with the result and I would do it again.

I really recommend that when you are using a database, relational or graph, and your application requires statistics, recommendations, detection of clusters and what not: Use the database at your hands. Don’t try to compute stuff on your own. You hopefully chose a product for a purpose, so make use if it.

I dedicate that post to my former colleagues Silke and Rainer at ENERKO Informatik, I’m happy that I have all that knowledge about query languages and a sense for when it’s useful to use or not.

| Comments (0) »

06-Nov-19