info.michael-simons.eu

Polyglot programming on the GraalVM

(*) As of now R, Ruby, Python and JavaScript 😉

GraalVM consists of a Java Compiler, a VM and a JDK based on HotSpot/OpenJDK, implemented in Java. These days it’s all about additional execution modes compared to the standard JVM, mainly ahead-of-time compilation and the creation of native images. Native images are an important piece to create Java based applications that are easier to scale in very elastic ways and large deployments.

However, the GraalVM has a couple of more distinctive features:

Truffle Language Implementation framework and the GraalVM SDK, to implement additional programming language runtimes
LLVM Runtime and JavaScript Runtime

The content of this post is based on the Truffle framework and with it, the polyglot aspect of GraalVM.

There have been a couple of great talks about Graal and especially Polyglot. Have a look at Polyglot Applications with GraalVM from @mesirii and JS, Rust, Python and the Holy Graal by @larsr_h.

The GraalVM is polyglot in many different ways: You can embedded a supported guest language in a host language through a so called polyglot context but you can also call access all supported languages from a supported, dynamic language by running the GraalVM version of the interpreter with the --polyglot.

Inside the GraalVM manual you’ll find the polyglot reference as well as the embedding reference. The embedding reference is mostly about running supported languages from Java programs. This won’t be our concern in this post. This is about the polyglot reference. The examples in the manual are designed for first selecting a start language and then a target language. The examples follow the pattern that a host languages executes some simple code in the guest language.

Me friend Michael however nerd snipped me with a different idea: How to access Neo4j from R? Or use the multi database features from Neo4j 4.0 from a language for which we (I work at Neo4j at the drivers team) haven’t yet come up with a driver?

GraalVM polyglot interoperability for the win. You find information about GraalVM’s polyglot interoperability per language inside the manual, for example here for Ruby.

In short: The interpreters for the supported languages all comes with an API to interact with all other supported languages.

That allows us the Neo4j Java Driver in it’s current 4.0.0 mainline to from R, Python, Ruby and for completeness from JavaScript (the 4.0 JavaScript driver for Neo4j is already there) as well.

Michael and I have setup a repository named “neo4j-graalvm-polyglot-examples” that demonstrates this approach.

Setting up Neo4j

These example are about access to Neo4j from various languages. I find the easiest way to have an instance up in running in no time is Docker.

docker run --publish=7474:7474 --publish=7687:7687 -e 'NEO4J_AUTH=neo4j/secret' neo4j:4.0.1

gives you a running instance. After that, you can open http://localhost:7474/browser/?cmd=play&arg=movies and install our Movie graph to have a dataset to work with.

Other options include our Desktop edition which you find among other downloads here: https://neo4j.com/download/

Setting up GraalVM

Michael Hunger uses SDKMan! for downloading and installing GraalVM, I went to the GraalVM Downloads page at GitHub and got the JDK 11 edition of GraalVM 20.0.0.

After downloading and installing, you should have a valid GRAALVM_HOME and a JAVA_HOME pointing to the former:

✗ echo $GRAALVM_HOME 
/Library/Java/JavaVirtualMachines/graalvm-ce-java11-20.0.0/Contents/Home
✗ echo $JAVA_HOME
/Library/Java/JavaVirtualMachines/graalvm-ce-java11-20.0.0/Contents/Home
✗ java -version
openjdk version "11.0.6" 2020-01-14
OpenJDK Runtime Environment GraalVM CE 20.0.0 (build 11.0.6+9-jvmci-20.0-b02)
OpenJDK 64-Bit Server VM GraalVM CE 20.0.0 (build 11.0.6+9-jvmci-20.0-b02, mixed mode, sharing)

GraalVM comes with gu, the GraalVM Component Updater. gu is used to install additional packages. We use it to install R, Ruby and Python packages as well as the native image tool for GraalVM:

$GRAALVM_HOME/bin/gu install R
$GRAALVM_HOME/bin/gu install Ruby     
$GRAALVM_HOME/bin/gu install Python
$GRAALVM_HOME/bin/gu install native-image

The list of installed components should now look likes this:

$GRAALVM_HOME/bin/gu list                
 
ComponentId              Version             Component name      Origin 
--------------------------------------------------------------------------------
graalvm                  20.0.0              GraalVM Core        
R                        20.0.0              FastR               github.com
llvm-toolchain           20.0.0              LLVM.org toolchain  github.com
native-image             20.0.0              Native Image        github.com
python                   20.0.0              Graal.Python        github.com
ruby                     20.0.0              TruffleRuby         github.com

JS respectively Node come by default with GraalVM.

Running the examples

The examples live in neo4j-graalvm-polyglot-examples. Clone this repository via standard Git means. The Neo4j driver lives under the Maven coordinates org.neo4j.driver:neo4j-java-driver. The driver has a single dependency to the reactive stream API. To make the download easier, the repository comes with a Gradle build that works both under Windows and Linux. Get and export the required dependencies via

./gradlew downloadDependencies
export CLASSPATH=lib/neo4j-java-driver-4.0.0.jar:lib/reactive-streams-1.0.2.jar

All examples can now be run like this:

# R
$GRAALVM_HOME/bin/Rscript --jvm --vm.cp=$CLASSPATH neo4j-graalvm-fastr-example.R
 
# Javascript
$GRAALVM_HOME/bin/node --jvm --vm.cp=$CLASSPATH neo4j-graalvm-javascript-example.js
 
# Python
$GRAALVM_HOME/bin/graalpython --jvm --vm.cp=$CLASSPATH neo4j-graalvm-python-example.py
 
# Ruby
$GRAALVM_HOME/bin/truffleruby --jvm --vm.cp=$CLASSPATH neo4j-graalvm-ruby-example.rb

They all execute the following query

MATCH (:Person {name:$name})-[:ACTED_IN]->(m)<-[:ACTED_IN]-(coActor) 
RETURN DISTINCT coActor

The query has one parameter called name and gives you all the actors that acted in the same movie like the request actor.

I take neo4j-graalvm-fastr-example.R as example, but the idea is the same for all the examples above.

First, you have to import the required classes via GraalVM’s Java API:

graphDatabase <- java.type('org.neo4j.driver.GraphDatabase')
authTokens <- java.type('org.neo4j.driver.AuthTokens')
config <- java.type('org.neo4j.driver.Config')

With our static factory method, a database connection is opened:

driver <- graphDatabase$driver('bolt://localhost:7687', authTokens$basic('neo4j', 'secret'), config$defaultConfig())

It looks similar in all the other languages. The Neo4j drivers are session oriented. That means, the driver instance is a long living object you keep around and the session is used for your tasks. The driver takes care of connection pooling.

Executing the above query looks like this in R:

query <- '
    MATCH (:Person {name:$name})-[:ACTED_IN]->(m)<-[:ACTED_IN]-(coActor) 
    RETURN DISTINCT coActor
'
 
session <- driver$session()
# The R list (which behaves like an associative array) is automatically converted to a Java Map 
coActorsRecords <- session$run(query, list(name="Tom Hanks"))$list()

This gives you a list of records that can be processed further.

Recap

The GraalVM is a fascinating piece of software. We are not combining trivial libraries in those examples, but loading a driver that manages a connection pool based on an embedded Netty.

While it is of course preferable to have everything “natively” to your language, a polyglot environment like this gives you the opportunity to use foreign functions without that much effort. It wouldn’t be that hard to wrap our Java driver with idiomatic code for someone that has actual R knowledge to make it feel like first class R citizen.

Another use case is to have access to all features of Neo4j 4.0, for example the multi database feature. This is not yet available in the Python driver or the community driven Ruby driver. Here’s an example on how to use them: Python and Ruby.

The official, native Neo4j drivers that are already fully Neo4j 4.0 are here:

The Python driver will be released the coming weeks with full 4.0 support.

Filed in English posts | Comments (2) »

06-Mar-20

Spring Data Neo4j, Neo4j-OGM and OSIV

TL;DR: Don’t use Open Session in View with a Neo4j-Cluster

If you use Spring Data Neo4j and Neo4j-OGM in your Spring Boot application connectect against a Neo4j cluster (via bolt+routing://, neo4j:// or multiple bolt-uris), configure spring.data.neo4j.open-in-view explicitly to false like this: spring.data.neo4j.open-in-view=false

What is Open Session in View?

Vlad Mihalcea has written extensive amounts about why the Open Session in View Pattern is an Anti pattern these days.

Vlad is well known for his work in JPA and especially hibernate world. What the hell has this to do with Spring Data Neo4j and Neo4j’s Object Graph Mapping (Neo4j-OGM)? It turns out, a lot. We have to bring a couple of things together.

Neo4j routing

This is maybe unexpected, but the first thing we have to understand is Neo4j’s routing mechanism. Neo4j databases are able to run as clusters. The instances of those clusters use the Raft consensus algorithm. Part of that algorithm are leaders, followers and read replicas.

My colleague David allen has a nice post out there on how to query a Neo4j cluster: Querying Neo4j Clusters.

The important stanza for this post is:

The leader is responsible for coordinating the cluster and accepting all writes. Followers help scale the read workload ability of the cluster and provide for high-availability of data. Should one of the cluster machines fail, and you still have a majority, you can still process reads and writes. If your cluster loses the majority it can only serve (stale) reads and has no leader anymore.

Optionally, you can have any number caches in the form of read replicas. They are read-only copies of your database for scaling out read-query load. They are not officially part of the cluster, but rather are “tag along” copies that get replicated transactions from the main cluster.

When you connect to a cluster one of the Neo4j drivers does the job of routing for you everytime you open a connection via the bolt+routing:// respectivly neo4j://. It is said that “the driver is connected to a cluster”. The driver has a routing table, knowing leader, follower and read replicas.

The driver does not parse Cypher on the client side. The only information it has available to pick a host from the routing table is whether you want a read session or a read-write session respectivly transaction. Therefor it is very important to make that choice consciously! Otherwise all request will go to the leader, as that instance will be able to answer all of them.

Neo4j-OGM

Neo4j-OGM is able to support Neo4j cluster and routing via it’s Bolt transport. There are a couple of convience methods on the Neo4j-OGM session that let’s you specify whether it’s a read or a read-write statement you want to run. Neo4j-OGM actually does look into your Cypher and issues a warning if you ask for a read-only transaction but have something like MERGE, CREATE or UPDATE in your Cypher.

So up here it’s all good.

Entering Spring’s @Transactional

Spring’s @Transactional is a great piece of code. It instructs the framework to execute your code in a transactional around aspect: A transaction is opened before your code and closed afterwards, commiting if your code ran without error, rolling back otherwise. A transaction manager takes care that this also works with a explicit TransactionTemplate.

The annotation has an attribute named readOnly. In contrast to a superficial look at it, it doesn’t prevent writes to happen against a database. And how could it? @Transactional is a general purpose mechanism, working with JPA, Neo4j and other databases. It would need mechanisms to find out about your query for all of those. readOnly is merely an indicator to be passed on to the underlying store so that this store configure characteristics as needed. Vlad has written down what it does for Hibernate.

For Neo4j-OGM it just configures the default mode of the Driver’s session: read only or read-write. As we learned above, this is super important to prevent the leader from being hammered by all the queries.

Spring Data Neo4j and Neo4j-OGM supports this in all of the following scenarios, given the following repository and service:

import java.util.Collection;
import java.util.Collections;
import java.util.List;
 
import org.neo4j.ogm.session.Session;
import org.springframework.data.neo4j.annotation.Query;
import org.springframework.data.neo4j.repository.Neo4jRepository;
import org.springframework.stereotype.Service;
import org.springframework.transaction.annotation.Transactional;
 
public interface MovieRepository extends Neo4jRepository<Movie, Long> {
 
	@Transactional(readOnly = true)
	@Query("MATCH (m:Movie) RETURN m.tagline as value")
	List<SomeResult> findCustomResultWithTxOnRepository();
 
	// Not read only
	@Query("MATCH (m:Movie) RETURN m.tagline as value")
	List<SomeResult> findCustomResult();
}
 
 
@Service
public class SomeService {
 
	private final MovieRepository movieRepository;
 
	private final Session session;
 
	public SomeService(MovieRepository movieRepository, Session session) {
		this.movieRepository = movieRepository;
		this.session = session;
	}
 
	@Transactional(readOnly = true)
	public Collection<SomeResult> findCustomResultWithTransactionOnService() {
 
		return movieRepository.findCustomResult();
	}
 
	@Transactional(readOnly = true)
	public Collection<Movie> findMoviesViaLoadAllOnSession() {
 
		return session.loadAll(Movie.class);
	}
 
	@Transactional(readOnly = true)
	public Collection<Movie>findMoviesViaCustomQueryOnSession() {
 
		return (Collection) session.query(Movie.class, "MATCH (m:Movie) RETURN m", Collections.emptyMap());
	}
}

All of those understand that none of the queries needs to go to a leader.

Entering OSIV

For Neo4j-OGM, there’s also an OpenSessionInViewInterceptor that get’s configured via Spring Boot by default in a web application. It makes sure there’s an ongoing Neo4j-OGM session along with it’s caches and mapping in place for the whole request.

Why is this bad in a cluster scenario? Because tied to the Neo4j-OGM session there’s the driver session. To allow the user to do anything they want in the request, those sessions need to be read-write sessions. Once marked as read-write, you cannot make turn read only ever again.

That means if you call any of the above scribbeld methods from a REST endpoint, there will already be an ongoing session and the readOnly indicator will be ignored!

So if you connect a Spring Data Neo4j + Neo4j-OGM application against a Neo4j cluster: Go to your Spring Boot configuration (either properties or YAML or Environment) and explicity configure:

spring.data.neo4j.open-in-view=false

I have raised an issue with the Spring Boot team to discuss changing the default: Open session in view is problematic with Neo4j..

Filed in Java | Comments (2) »

03-Feb-20

Short end of the year status update.

Last year – 2018 – I ended up with 21 post on my blog, this one here, the last of this decade at this blog, will be the 11th of 2019. What happened?

I have been working more than ever, and happier than ever. Together with Gerrit Meier, we managed to bring down the open issues on Neo4j-OGM under ten (without just closing them) and at the same time designing two new projects: A Spring Boot Starter for Neo4j without any object mapping at all and SDN/RX, a new object mapping framework with a focus on reactive database access.

I started the year with a lot of JUG talks (see my Speakerdeck), in the Netherlands, in Switzerland and a couple in Germany. In between, we did a couple of talks at the EuregJUG as well.

So, busy times for blogging, at least on this platform.

I posted several Neo4j related posts on Medium. I personally like this one the best: How to choose an unique identifier for your database entities. I think it applies to other systems, too. You’ll find more interesting content at our Medium Neo4j magazine.

Late November, Ralf nerd-snipped me to participate in Advent of Code and wow, I got hit for good. I managed to come up with not only two stars each day, but a blog post as well for 13 days in December. Find my solutions at Ralf’s fan site Christina just bluntly told me after the tenth day: “Stop it, your mood is getting increasingly worse.” and that was a good thing to do.

I submitted some Call for Papers next year, mostly about reactive programming with Neo4j and Spring Data Neo4j RX and with a bit of luck, I’ll be at some conferences.

The one thing I’m looking very much forward however is dev.next in March 2020:

I'm gonna speak how I avoided burnout and depression at the job, about the active steps taken and about being a lucky person happen to be at the right place. All that under the title "Running free" at @devdotnext next year. pic.twitter.com/8TMC4qxIrT

— Michael Simons (@rotnroll666) December 16, 2019

2019 has one of the most rewarding years I had in my professional and private life. While a casual reader of my blog might probably think “well, that guy is just doing fine”, that’s not always true. I often struggle, make mistakes and bad things happen. I’m gonna talk about some conscious decisions – like the one above, stop doing things (AoC) – and some things that might look like decisions but might have been just a couple of lucky coincides.

Anyway, until then, have a good holiday season at the end of the year and all the best for 2020.

Featured photo by Trent Erwin on Unsplash.

Filed in Uncategorized | Comments (0) »

23-Dec-19

Query all the things

Over the last weekend, I had some time on my hands and revisited my biking application: https://biking.michael-simons.eu. That application turned 10 this year, started as a Sinatra based Ruby application in 2009 and helped me a lot from the late 2013 and early 2014 in my career. Back then, I was using Spring for a couple of projects, both private and work, and discovered early pre 1.0 releases of Spring Boot.

I found that super interesting. With the impending release of Java 8 in 2014, I found my personal project to work on that year. I had written extensive blog posts about that application (See the entry Developing a web application with Spring Boot, AngularJS and Java 8. This posts culminated not only in an application i used nearly everyday, but also in my very first book, Arc42 by example (which is also on Packt these days).

Furthermore, the application brought me very close to the Spring Boot community. By following Spring Boot with this application, upgrading and evolving it with every release, I was able to spot issues, report them and sometimes fix them. It literally change my career.

Having said all that: Why rewriting a major part of my application? As I said, the thing was also to learn more about Java 8, about the stream API, lambdas and whatnot. I really went nuts with the streams API.

Building up an analytic query

The main feature I use is tracking my mileage. I wanted it be as simple as possible: Have a look at the odometer of each bike, add the number with a date to the bike, done. The application shall than display the difference between the recorded value of last month with the next month. So, that’s basic arithmetic, but if you want to compute the date for the following chart (available here), good luck with that:

What do we see? The upper chart shows the mileage per bike per month I rode with a summarising line over all bikes per month. The lower chart shows the lowest and highest total value per month, the average monthly value and the overall average over all years per month (the red line).

I don’t aggregate the values, I did compute them all from the periods mentioned above.

Here’s just the tip of what I needed to do to have only the data to work with: Building the period values itself (the difference between a month and the next), stored in a transient attribute:

public synchronized Map<LocalDate, Integer> getPeriods() {
    if (this.periods == null) {
        this.periods = IntStream.range(1, this.milages.size()).collect(TreeMap::new, (map, i) -> {
            final MilageEntity left = milages.get(i - 1);
            map.put(
                    left.getRecordedOn(),
                    milages.get(i).getAmount().subtract(left.getAmount()).intValue()
            );
        }, TreeMap::putAll);
    }
    return this.periods;
}

And then working with it, like this to summarise those periods:

public static Map<LocalDate, Integer> summarizePeriods(final List<BikeEntity> bikes, final Predicate<Map.Entry<LocalDate, Integer>> entryFilter) {
   return bikes.stream()
           .filter(BikeEntity::hasMilages)
           .flatMap(bike -> bike.getPeriods().entrySet().stream())
           .filter(Optional.ofNullable(entryFilter).orElse(entry -> true))
           .collect(
                   Collectors.groupingBy(
                           Map.Entry::getKey,
                           Collectors.reducing(0, Map.Entry::getValue, Integer::sum)
                   )
           );
}

There have been a couple more of those plus a ton of test code. This is nice when you want to play around with the stream API and such, but to be honest, really?

I have been speaking about SQL and jOOQ since 2016. Back then, I was already working ENERKO Informatik for a long time. We had been a database shop right from the start. It was there where I learned that is a good thing to put your database to work for you and not the other way round. At times, I was not convinced, but these days I believe in that approach more than before. And that is true regardless whether you use a relational or graph database.

Here’s one of the later installments of that content:

So let’s have a look at what is needed for the charts. First of all, the individual monthly values: This can be solved using a window function, in this case, the lead function:

SELECT 
  bikes.name, 
  bikes.color, 
  milages.recorded_on, 
  (lead(milages.amount) OVER (PARTITION BY bikes.id ORDER BY milages.recorded_on) - milages.amount) VALUE
FROM bikes
  JOIN milages
    ON milages.bike_id = bikes.id
ORDER BY milages.recorded_on ASC

This query joined the bikes and their recorded milages and moves the window over a set of two rows following each order, the order being when they have been recorded. lead gives a following row at offset 1 by default. And just with that, I can compute the difference. 5 lines, without the additional selected columns. That was easy.

I can create a SQL view from that, if I want. The query is actually needed for all the charts.

I cannot add another aggregate function on top of the window function, at least not in H2, so I need to manually aggregate to values per bike per month to a monthly value.

This is easy. Assume for a moment, there’s a table like thing monthlymilages containing the above query. We do this:

SELECT 
  monthlymilages.recorded_on, 
  SUM(monthlymilages.value) VALUE
FROM monthlymilages
WHERE VALUE IS NOT NULL
GROUP BY monthlymilages.recorded_on

sum is an aggregate function, so we need to group by the month.

Now, let’s pretend that this is a table like thing named aggregatedmonthlymilages. We then can select everything we want by slapping a bunch of aggregates on it:

SELECT 
  EXTRACT(MONTH FROM aggregatedmonthlymilages.recorded_on) MONTH, 
  round(MIN((aggregatedmonthlymilages.value))) minimum, 
  round(MAX((aggregatedmonthlymilages.value))) maximum, 
  round(avg((aggregatedmonthlymilages.value))) average
FROM aggregatedmonthlymilages
GROUP BY MONTH
ORDER BY MONTH ASC

And that’s just it. Views are bit static and I would probably need additional columns to join them and also have to maintain them. I prefer something called a Common table expression. That sounds complicated, but it is simple to use with the WITH clause:

WITH 
  monthlymilages AS (
    SELECT 
      bikes.name, 
      bikes.color, 
      milages.recorded_on, 
      (lead(milages.amount) OVER (PARTITION BY bikes.id ORDER BY milages.recorded_on) - milages.amount) VALUE
    FROM bikes
      JOIN milages
        ON milages.bike_id = bikes.id
    ORDER BY milages.recorded_on ASC
  ), 
  aggregatedmonthlymilages AS (
    SELECT 
      monthlymilages.recorded_on, 
      SUM(monthlymilages.value) VALUE
    FROM monthlymilages
    WHERE VALUE IS NOT NULL
    GROUP BY monthlymilages.recorded_on
  )
SELECT 
  EXTRACT(MONTH FROM aggregatedmonthlymilages.recorded_on) MONTH, 
  round(MIN((aggregatedmonthlymilages.value))) minimum, 
  round(MAX((aggregatedmonthlymilages.value))) maximum, 
  round(avg((aggregatedmonthlymilages.value))) average
FROM aggregatedmonthlymilages
GROUP BY MONTH
ORDER BY MONTH ASC

And that’s all that’s needed for the monthly average chart. 28 readable lines. When you see the query fully upfront, it may is intimidating, but I think show the steps leading to it, is not.

So, in a first commit, I removed all the application site computation and added a new `StatisticService` that encapsulates all database access that does not go through JPA away: Introduce SQL based statistics service.

Migrations and more possibilities

Of course I wanted to keep tests and those tests now should use the same database that I ran in production. I’m actually using H2 (file based) in production, so I didn’t have the need to go for Testcontainers. H2, really? Yes, really. The application runs on a standard hosting provider and I didn’t want to bother with database installation and H2 actually offers all the modern SQL features mentioned above. Luckily, as they are standard SQL, I could switch to another database as well anyway.

But for the tests, I need a schema and data. As I didn’t want the test be dependent on the schema derived from my entity classes, I also added Flyway and a bunch of simple DDL scripts, which you find in the commit as well.

Now with migrations in place I actually can control (nearly impossible with Hibernate auto ddl), I could address something else: I used the relation between bikes and milages to compute the last mileage for a bike as a property I gave out as JSON. So either I would need an eager association or use “open session in view” for Hibernate to make that work at the web layer: In that commit I added a computed column that I filled with the value I want. The column is now updated when adding a new milage and that’s fine to have some denormalisation.

Also: Be really careful if you have “open session in view” in your Spring application: If at some point you decide to disable it and rely on your @WebMvcTests using mocked repositories, that tests won’t fail, as they don’t need a Hibernate session at all.

Now jOOQ

So, I had the queries and I could finish, right? When I would change the schema, my tests would break and I would need to go through all query strings to fix things. That’s not something I want.

So, in “Introduce jOOQ” I added jOOQ to the build, including a temporary migration into a temporary database beforehand which generates a Java based schema for me, pretty much exactly like described in the slides at the start.

Please, have a look at the slides for an introduction. For this post, I’m gonna focus on the benefits I got.

First, I can store fragments of SQL on variables, first the computation of the monthly values, as a table like thing:

/**
 * The computed monthly milage value.
 */
private static final Field<BigDecimal> MONTHLY_MILAGE_VALUE =
        lead(MILAGES.AMOUNT).over(partitionBy(BIKES.ID).orderBy(MILAGES.RECORDED_ON)).minus(MILAGES.AMOUNT).as(ALIAS_FOR_VALUE);
 
/**
 * The cte for computing the monthly milage value.
 */
private static final CommonTableExpression<Record4<String, String, LocalDate, BigDecimal>> MONTHLY_MILAGES =
        name("monthlyMilages").as(DSL
                .select(BIKES.NAME, BIKES.COLOR, MILAGES.RECORDED_ON, MONTHLY_MILAGE_VALUE)
                .from(BIKES).join(MILAGES).onKey()
                .orderBy(MILAGES.RECORDED_ON.asc()));

This is compiled Java code based on the database schema. If the schema changes, the code won’t compile. There is no string concatenation going on. The code knows both SQL and my schema.

For the fragments shared, I used static instances. The code get’s nicer with JDK 11’s var keyword:

var aggregatedMonthlyValue = sum(MONTHLY_MILAGES.field(MONTHLY_MILAGE_VALUE)).as(ALIAS_FOR_VALUE);
var aggregatedMonthlyMilages = name("aggregatedMonthlyMilages").as(DSL
    .select(MONTHLY_MILAGES.field(MILAGES.RECORDED_ON), aggregatedMonthlyValue)
    .from(MONTHLY_MILAGES)
    .where(MONTHLY_MILAGE_VALUE.isNotNull())
    .groupBy(MONTHLY_MILAGES.field(MILAGES.RECORDED_ON)));
 
 
var value = aggregatedMonthlyMilages.field(aggregatedMonthlyValue);
var minimum = round(min(value)).as("minimum");
var maximum = round(max(value)).as("maximum");
var average = round(avg(value)).as("average");
var month = extract(aggregatedMonthlyMilages.field(MILAGES.RECORDED_ON), DatePart.MONTH).as("month");
this.database
        .with(MONTHLY_MILAGES)
        .with(aggregatedMonthlyMilages)
        .select(month, minimum, maximum, average)
        .from(aggregatedMonthlyMilages)
        .groupBy(month).orderBy(month.asc())
        .fetch();

And there’s the query again, this time type safe and in Java with the added benefit, that jOOQ knows which database I know and potentially can translate some methods and clauses for me that work different in different databases.

What about your day job with that database company?

So, I work for Neo4j since nearly 2 years now and I have giving some talks about using a graph database and why you could use it but I never said use it for everything. I am however very consistent with my opinion on query languages: Use them, they are powerful!

Luckily, in this project here I can do both: SQL in my application and Cypher, Neo4j’s query language, that currently inspires the GQL standard.

How is this possible? I use jQAssistant to verify the architecture of my application. With the changes in the application, I needed to adapt the rules. Rules written in Cypher, as jQAssistant stores the graph of the projects structure as a… Graph 🙂

So this is how the rules of package dependencies looks:

MATCH (db:Package:Database)
WITH db
MATCH (a:Main:Artifact)
MATCH (a) -[:CONTAINS]-> (p1:Package) -[:DEPENDS_ON]-> (p2:Package) <-[:CONTAINS]- (a)
WHERE not p1:Config
  and not (p1) -[:CONTAINS]-> (p2)
  and not p2:Support
  and p2 <> db
  and not (db) - [:CONTAINS*] -> (p2)
RETURN p1, p2

I have one Java package, where the jOOQ classes are stored. That is my database package which I select first. That’s the first part. The second part in this query – after the WITH clause that has a different meaning than in SQL (here it acts more like a pipe) – makes sure, that each top level package only depends on inner packages or on the support or database package.

Conclusion

In regards of lines of codes, I have now a bit more lines, mainly because I added some more domain objects to pass the series data around. If I subtract them, I save about 1000 lines Java code to compute and test my statistics.

In regards of the domain, I have now better statistics than before: I also have a table storing assorted trips. Trips done on rental bikes etc. I didn’t add them to the original statistics… Because to complicated to add.

Just queuing them and adding them to the sums was way easier. I’m quite happy with the result and I would do it again.

I really recommend that when you are using a database, relational or graph, and your application requires statistics, recommendations, detection of clusters and what not: Use the database at your hands. Don’t try to compute stuff on your own. You hopefully chose a product for a purpose, so make use if it.

I dedicate that post to my former colleagues Silke and Rainer at ENERKO Informatik, I’m happy that I have all that knowledge about query languages and a sense for when it’s useful to use or not.

Filed in English posts, Java | Comments (0) »

06-Nov-19

What’s a Bill of Material BOM file?

Wikipedia says:

A bill of materials or product structure (sometimes bill of material, BOM or associated list) is a list of the raw materials, sub-assemblies, intermediate assemblies, sub-components, parts, and the quantities of each needed to manufacture an end product. A BOM may be used for communication between manufacturing partners or confined to a single manufacturing plant. A bill of materials is often tied to a production order

https://en.wikipedia.org/wiki/Bill_of_materials

In the world of Java and Maven that boils down to some important Maven concepts.

The packing type
The dependency management tag
The fact that build files can import other build files

The packing type

Maven build descriptors (or “project object models”, in short pom (take note of the p)) can have different packaging: pom, jar, maven-plugin, ejb, war, ear, and rar. Standard is jar.

The type pom doesn’t produce an artefact but is the artefact itself. It is used for parent poms in multi module setups for example.

Poms can also be imported into other poms, into their dependency management, to be precise:

Dependency management

The manual has everything to get you covered. Here’s the important details from Dependency Management:

For BOMs we are not talking about the <dependencies />-tag but <dependencyManagement/>. Inside the dependency management tag, dependencies, their version numbers and exclusions are specified. Much as you would do inside the top-level dependencies tag.

The dependency management tag however does not declare dependencies. This still needs to be done, but one can omit the versions now. For examples, look at the link above.

Now two important facts: The dependency management propagates to child modules and one can import POMs with packaging pom into dependency management.

Import others peoples dependencies

Have a look at Project Reactors BOM: reactor-bom.

Project Reactor doesn’t consist of one single artefact and or multiple artefacts having the same version numbers. Instead of forcing users to keep track of all those version numbers and import single dependencies, one does import the whole bom:

<dependencyManagement>
  <dependencies>
    <dependency>
      <groupId>io.projectreactor</groupId>
      <artifactId>reactor-bom</artifactId>
      <version>Californium-SR32</version>
      <type>pom</type>
      <scope>import</scope>
    </dependency>
  </dependencies>
</dependencyManagement>

After that, you pick what you want in dependencies, without version numbers:

<dependencies>
  <dependency>
    <groupId>io.projectreactor</groupId>
    <artifactId>reactor-core</artifactId>
  </dependency>
</dependencies>

So now you have the bill of materials, from which you can pick.

Doing more cool things with it

Now Maven allows to define properties inside <properties/>. Those properties can be reused in dependency declarations and inside dependency management.

Instead of hardcoding Californium-SR32 inside the example above, you would do:

<properties>
  <reactor-bom.version>Californium-SR9</reactor-bom.version>
</properties>
 
<dependencyManagement>
  <dependencies>
    <dependency>
      <groupId>io.projectreactor</groupId>
      <artifactId>reactor-bom</artifactId>
      <version>${reactor-bom.version}</version>
      <type>pom</type>
      <scope>import</scope>
    </dependency>
  </dependencies>
</dependencyManagement>

Now you can switch to another version of the dependency or in this case of the whole dependency management by just setting one property.

This is

How Spring Boot works

Spring Boot uses exactly that mechanism for all supported dependencies. See Neo4j-OGM you would define the following in your Maven POM file:

<properties>
  <neo4j-ogm.version>3.1.13</neo4j-ogm.version>
</properties>

This updates the version of all dependencies of Neo4j-OGM, and not only the one you might remembered.

It works exactly the same with projects using a BOM, for example Jackson:

<properties>
  <jackson.version>2.9.9.20190807</jackson.version> 
</properties>

In a Gradle project, things need a second file, gradle.properties. I have written about it before. For the above example, that file would look like this:

jackson.version = 2.9.9.20190807

Here’s the full example.

TL;DR

Many projects use BOMs these days. Import those whenever possible and then pick the dependencies your need from the projects without specifying individual versions.
In the case of Spring Boot, never overwrite managed dependencies in your own dependencies, but use a property. Both for plain dependencies as well as for BOMs.

Filed in English posts | Comments (1) »

22-Aug-19