Query all the things

Over the last weekend, I had some time on my hands and revisited my biking application: https://biking.michael-simons.eu. That application turned 10 this year, started as a Sinatra based Ruby application in 2009 and helped me a lot from the late 2013 and early 2014 in my career. Back then, I was using Spring for a couple of projects, both private and work, and discovered early pre 1.0 releases of Spring Boot.

I found that super interesting. With the impending release of Java 8 in 2014, I found my personal project to work on that year. I had written extensive blog posts about that application (See the entry Developing a web application with Spring Boot, AngularJS and Java 8. This posts culminated not only in an application i used nearly everyday, but also in my very first book, Arc42 by example (which is also on Packt these days).

Furthermore, the application brought me very close to the Spring Boot community. By following Spring Boot with this application, upgrading and evolving it with every release, I was able to spot issues, report them and sometimes fix them. It literally change my career.

Having said all that: Why rewriting a major part of my application? As I said, the thing was also to learn more about Java 8, about the stream API, lambdas and whatnot. I really went nuts with the streams API.

Building up an analytic query

The main feature I use is tracking my mileage. I wanted it be as simple as possible: Have a look at the odometer of each bike, add the number with a date to the bike, done. The application shall than display the difference between the recorded value of last month with the next month. So, that’s basic arithmetic, but if you want to compute the date for the following chart (available here), good luck with that:



What do we see? The upper chart shows the mileage per bike per month I rode with a summarising line over all bikes per month. The lower chart shows the lowest and highest total value per month, the average monthly value and the overall average over all years per month (the red line).

I don’t aggregate the values, I did compute them all from the periods mentioned above.

Here’s just the tip of what I needed to do to have only the data to work with: Building the period values itself (the difference between a month and the next), stored in a transient attribute:

public synchronized Map<LocalDate, Integer> getPeriods() {
    if (this.periods == null) {
        this.periods = IntStream.range(1, this.milages.size()).collect(TreeMap::new, (map, i) -> {
            final MilageEntity left = milages.get(i - 1);
            map.put(
                    left.getRecordedOn(),
                    milages.get(i).getAmount().subtract(left.getAmount()).intValue()
            );
        }, TreeMap::putAll);
    }
    return this.periods;
}

And then working with it, like this to summarise those periods:

public static Map<LocalDate, Integer> summarizePeriods(final List<BikeEntity> bikes, final Predicate<Map.Entry<LocalDate, Integer>> entryFilter) {
   return bikes.stream()
           .filter(BikeEntity::hasMilages)
           .flatMap(bike -> bike.getPeriods().entrySet().stream())
           .filter(Optional.ofNullable(entryFilter).orElse(entry -> true))
           .collect(
                   Collectors.groupingBy(
                           Map.Entry::getKey,
                           Collectors.reducing(0, Map.Entry::getValue, Integer::sum)
                   )
           );
}

There have been a couple more of those plus a ton of test code. This is nice when you want to play around with the stream API and such, but to be honest, really?

I have been speaking about SQL and jOOQ since 2016. Back then, I was already working ENERKO Informatik for a long time. We had been a database shop right from the start. It was there where I learned that is a good thing to put your database to work for you and not the other way round. At times, I was not convinced, but these days I believe in that approach more than before. And that is true regardless whether you use a relational or graph database.

Here’s one of the later installments of that content:

So let’s have a look at what is needed for the charts. First of all, the individual monthly values: This can be solved using a window function, in this case, the lead function:

SELECT 
  bikes.name, 
  bikes.color, 
  milages.recorded_on, 
  (lead(milages.amount) OVER (PARTITION BY bikes.id ORDER BY milages.recorded_on) - milages.amount) VALUE
FROM bikes
  JOIN milages
    ON milages.bike_id = bikes.id
ORDER BY milages.recorded_on ASC

This query joined the bikes and their recorded milages and moves the window over a set of two rows following each order, the order being when they have been recorded. lead gives a following row at offset 1 by default. And just with that, I can compute the difference. 5 lines, without the additional selected columns. That was easy.

I can create a SQL view from that, if I want. The query is actually needed for all the charts.

I cannot add another aggregate function on top of the window function, at least not in H2, so I need to manually aggregate to values per bike per month to a monthly value.

This is easy. Assume for a moment, there’s a table like thing monthlymilages containing the above query. We do this:

SELECT 
  monthlymilages.recorded_on, 
  SUM(monthlymilages.value) VALUE
FROM monthlymilages
WHERE VALUE IS NOT NULL
GROUP BY monthlymilages.recorded_on

sum is an aggregate function, so we need to group by the month.

Now, let’s pretend that this is a table like thing named aggregatedmonthlymilages. We then can select everything we want by slapping a bunch of aggregates on it:

SELECT 
  EXTRACT(MONTH FROM aggregatedmonthlymilages.recorded_on) MONTH, 
  round(MIN((aggregatedmonthlymilages.value))) minimum, 
  round(MAX((aggregatedmonthlymilages.value))) maximum, 
  round(avg((aggregatedmonthlymilages.value))) average
FROM aggregatedmonthlymilages
GROUP BY MONTH
ORDER BY MONTH ASC

And that’s just it. Views are bit static and I would probably need additional columns to join them and also have to maintain them. I prefer something called a Common table expression. That sounds complicated, but it is simple to use with the WITH clause:

WITH 
  monthlymilages AS (
    SELECT 
      bikes.name, 
      bikes.color, 
      milages.recorded_on, 
      (lead(milages.amount) OVER (PARTITION BY bikes.id ORDER BY milages.recorded_on) - milages.amount) VALUE
    FROM bikes
      JOIN milages
        ON milages.bike_id = bikes.id
    ORDER BY milages.recorded_on ASC
  ), 
  aggregatedmonthlymilages AS (
    SELECT 
      monthlymilages.recorded_on, 
      SUM(monthlymilages.value) VALUE
    FROM monthlymilages
    WHERE VALUE IS NOT NULL
    GROUP BY monthlymilages.recorded_on
  )
SELECT 
  EXTRACT(MONTH FROM aggregatedmonthlymilages.recorded_on) MONTH, 
  round(MIN((aggregatedmonthlymilages.value))) minimum, 
  round(MAX((aggregatedmonthlymilages.value))) maximum, 
  round(avg((aggregatedmonthlymilages.value))) average
FROM aggregatedmonthlymilages
GROUP BY MONTH
ORDER BY MONTH ASC

And that’s all that’s needed for the monthly average chart. 28 readable lines. When you see the query fully upfront, it may is intimidating, but I think show the steps leading to it, is not.

So, in a first commit, I removed all the application site computation and added a new `StatisticService` that encapsulates all database access that does not go through JPA away: Introduce SQL based statistics service.

Migrations and more possibilities

Of course I wanted to keep tests and those tests now should use the same database that I ran in production. I’m actually using H2 (file based) in production, so I didn’t have the need to go for Testcontainers. H2, really? Yes, really. The application runs on a standard hosting provider and I didn’t want to bother with database installation and H2 actually offers all the modern SQL features mentioned above. Luckily, as they are standard SQL, I could switch to another database as well anyway.

But for the tests, I need a schema and data. As I didn’t want the test be dependent on the schema derived from my entity classes, I also added Flyway and a bunch of simple DDL scripts, which you find in the commit as well.

Now with migrations in place I actually can control (nearly impossible with Hibernate auto ddl), I could address something else: I used the relation between bikes and milages to compute the last mileage for a bike as a property I gave out as JSON. So either I would need an eager association or use “open session in view” for Hibernate to make that work at the web layer: In that commit I added a computed column that I filled with the value I want. The column is now updated when adding a new milage and that’s fine to have some denormalisation.

Also: Be really careful if you have “open session in view” in your Spring application: If at some point you decide to disable it and rely on your @WebMvcTests using mocked repositories, that tests won’t fail, as they don’t need a Hibernate session at all.

Now jOOQ

So, I had the queries and I could finish, right? When I would change the schema, my tests would break and I would need to go through all query strings to fix things. That’s not something I want.

So, in “Introduce jOOQ” I added jOOQ to the build, including a temporary migration into a temporary database beforehand which generates a Java based schema for me, pretty much exactly like described in the slides at the start.

Please, have a look at the slides for an introduction. For this post, I’m gonna focus on the benefits I got.

First, I can store fragments of SQL on variables, first the computation of the monthly values, as a table like thing:

/**
 * The computed monthly milage value.
 */
private static final Field<BigDecimal> MONTHLY_MILAGE_VALUE =
        lead(MILAGES.AMOUNT).over(partitionBy(BIKES.ID).orderBy(MILAGES.RECORDED_ON)).minus(MILAGES.AMOUNT).as(ALIAS_FOR_VALUE);
 
/**
 * The cte for computing the monthly milage value.
 */
private static final CommonTableExpression<Record4<String, String, LocalDate, BigDecimal>> MONTHLY_MILAGES =
        name("monthlyMilages").as(DSL
                .select(BIKES.NAME, BIKES.COLOR, MILAGES.RECORDED_ON, MONTHLY_MILAGE_VALUE)
                .from(BIKES).join(MILAGES).onKey()
                .orderBy(MILAGES.RECORDED_ON.asc()));

This is compiled Java code based on the database schema. If the schema changes, the code won’t compile. There is no string concatenation going on. The code knows both SQL and my schema.

For the fragments shared, I used static instances. The code get’s nicer with JDK 11’s var keyword:

var aggregatedMonthlyValue = sum(MONTHLY_MILAGES.field(MONTHLY_MILAGE_VALUE)).as(ALIAS_FOR_VALUE);
var aggregatedMonthlyMilages = name("aggregatedMonthlyMilages").as(DSL
    .select(MONTHLY_MILAGES.field(MILAGES.RECORDED_ON), aggregatedMonthlyValue)
    .from(MONTHLY_MILAGES)
    .where(MONTHLY_MILAGE_VALUE.isNotNull())
    .groupBy(MONTHLY_MILAGES.field(MILAGES.RECORDED_ON)));
 
 
var value = aggregatedMonthlyMilages.field(aggregatedMonthlyValue);
var minimum = round(min(value)).as("minimum");
var maximum = round(max(value)).as("maximum");
var average = round(avg(value)).as("average");
var month = extract(aggregatedMonthlyMilages.field(MILAGES.RECORDED_ON), DatePart.MONTH).as("month");
this.database
        .with(MONTHLY_MILAGES)
        .with(aggregatedMonthlyMilages)
        .select(month, minimum, maximum, average)
        .from(aggregatedMonthlyMilages)
        .groupBy(month).orderBy(month.asc())
        .fetch();

And there’s the query again, this time type safe and in Java with the added benefit, that jOOQ knows which database I know and potentially can translate some methods and clauses for me that work different in different databases.

What about your day job with that database company?

So, I work for Neo4j since nearly 2 years now and I have giving some talks about using a graph database and why you could use it but I never said use it for everything. I am however very consistent with my opinion on query languages: Use them, they are powerful!

Luckily, in this project here I can do both: SQL in my application and Cypher, Neo4j’s query language, that currently inspires the GQL standard.

How is this possible? I use jQAssistant to verify the architecture of my application. With the changes in the application, I needed to adapt the rules. Rules written in Cypher, as jQAssistant stores the graph of the projects structure as a… Graph 🙂

So this is how the rules of package dependencies looks:

MATCH (db:Package:Database)
WITH db
MATCH (a:Main:Artifact)
MATCH (a) -[:CONTAINS]-> (p1:Package) -[:DEPENDS_ON]-> (p2:Package) <-[:CONTAINS]- (a)
WHERE not p1:Config
  and not (p1) -[:CONTAINS]-> (p2)
  and not p2:Support
  and p2 <> db
  and not (db) - [:CONTAINS*] -> (p2)
RETURN p1, p2

I have one Java package, where the jOOQ classes are stored. That is my database package which I select first. That’s the first part. The second part in this query – after the WITH clause that has a different meaning than in SQL (here it acts more like a pipe) – makes sure, that each top level package only depends on inner packages or on the support or database package.

Conclusion

In regards of lines of codes, I have now a bit more lines, mainly because I added some more domain objects to pass the series data around. If I subtract them, I save about 1000 lines Java code to compute and test my statistics.

In regards of the domain, I have now better statistics than before: I also have a table storing assorted trips. Trips done on rental bikes etc. I didn’t add them to the original statistics… Because to complicated to add.

Just queuing them and adding them to the sums was way easier. I’m quite happy with the result and I would do it again.

I really recommend that when you are using a database, relational or graph, and your application requires statistics, recommendations, detection of clusters and what not: Use the database at your hands. Don’t try to compute stuff on your own. You hopefully chose a product for a purpose, so make use if it.

I dedicate that post to my former colleagues Silke and Rainer at ENERKO Informatik, I’m happy that I have all that knowledge about query languages and a sense for when it’s useful to use or not.

| Comments (0) »

06-Nov-19


What’s a Bill of Material BOM file?

Wikipedia says:

A bill of materials or product structure (sometimes bill of material, BOM or associated list) is a list of the raw materials, sub-assemblies, intermediate assemblies, sub-components, parts, and the quantities of each needed to manufacture an end product. A BOM may be used for communication between manufacturing partners or confined to a single manufacturing plant. A bill of materials is often tied to a production order

https://en.wikipedia.org/wiki/Bill_of_materials

In the world of Java and Maven that boils down to some important Maven concepts.

  • The packing type
  • The dependency management tag
  • The fact that build files can import other build files

The packing type

Maven build descriptors (or “project object models”, in short pom (take note of the p)) can have different packaging: pom, jar, maven-plugin, ejb, war, ear, and rar. Standard is jar.

The type pom doesn’t produce an artefact but is the artefact itself. It is used for parent poms in multi module setups for example.

Poms can also be imported into other poms, into their dependency management, to be precise:

Dependency management

The manual has everything to get you covered. Here’s the important details from Dependency Management:

For BOMs we are not talking about the <dependencies />-tag but <dependencyManagement/>. Inside the dependency management tag, dependencies, their version numbers and exclusions are specified. Much as you would do inside the top-level dependencies tag.

The dependency management tag however does not declare dependencies. This still needs to be done, but one can omit the versions now. For examples, look at the link above.

Now two important facts: The dependency management propagates to child modules and one can import POMs with packaging pom into dependency management.

Import others peoples dependencies

Have a look at Project Reactors BOM: reactor-bom.

Project Reactor doesn’t consist of one single artefact and or multiple artefacts having the same version numbers. Instead of forcing users to keep track of all those version numbers and import single dependencies, one does import the whole bom:

<dependencyManagement>
  <dependencies>
    <dependency>
      <groupId>io.projectreactor</groupId>
      <artifactId>reactor-bom</artifactId>
      <version>Californium-SR32</version>
      <type>pom</type>
      <scope>import</scope>
    </dependency>
  </dependencies>
</dependencyManagement>

After that, you pick what you want in dependencies, without version numbers:

<dependencies>
  <dependency>
    <groupId>io.projectreactor</groupId>
    <artifactId>reactor-core</artifactId>
  </dependency>
</dependencies>

So now you have the bill of materials, from which you can pick.

Doing more cool things with it

Now Maven allows to define properties inside <properties/>. Those properties can be reused in dependency declarations and inside dependency management.

Instead of hardcoding Californium-SR32 inside the example above, you would do:

<properties>
  <reactor-bom.version>Californium-SR9</reactor-bom.version>
</properties>
 
<dependencyManagement>
  <dependencies>
    <dependency>
      <groupId>io.projectreactor</groupId>
      <artifactId>reactor-bom</artifactId>
      <version>${reactor-bom.version}</version>
      <type>pom</type>
      <scope>import</scope>
    </dependency>
  </dependencies>
</dependencyManagement>

Now you can switch to another version of the dependency or in this case of the whole dependency management by just setting one property.

This is

How Spring Boot works

Spring Boot uses exactly that mechanism for all supported dependencies. See Neo4j-OGM you would define the following in your Maven POM file:

<properties>
  <neo4j-ogm.version>3.1.13</neo4j-ogm.version>
</properties>

This updates the version of all dependencies of Neo4j-OGM, and not only the one you might remembered.

It works exactly the same with projects using a BOM, for example Jackson:

<properties>
  <jackson.version>2.9.9.20190807</jackson.version> 
</properties>

In a Gradle project, things need a second file, gradle.properties. I have written about it before. For the above example, that file would look like this:

jackson.version = 2.9.9.20190807

Here’s the full example.

TL;DR

Many projects use BOMs these days. Import those whenever possible and then pick the dependencies your need from the projects without specifying individual versions.
In the case of Spring Boot, never overwrite managed dependencies in your own dependencies, but use a property. Both for plain dependencies as well as for BOMs.

| Comments (1) »

22-Aug-19


Spring Security 5.1: Upgrade / Rehash stored passwords

In this post from early 2018, I described the new password infrastructure of Spring Security 5.0. As the post got some feedback the last couple of days, I jumped back a bit into Spring Security code and noticed this method here: PasswordEncoder#upgradeEncoding. Hm, what does it do?

It turns out, it works together with the additional, new interface UserDetailsPasswordService respectively ReactiveUserDetailsPasswordService.

TL;DR version: The delegating password encoder checks by default, whether the password hash’s id is present and matches the id to encode (see linked post above). If so, all is fine. Otherwise, the password needs an upgrade.

In my example application, I used an in-memory user details service with a user “michael” and password “SIMONS”, stored as a ROT-13 “hash” (I’m still waiting to see BSPasswordEncoder in production somewhere). The hash has of course no prefix. While I of course never used a password encoder like this, I migrated several applications hashed passwords from sha1 to bcrypt and pbkdf2 with a similar mechanism.

With the lastest commit I could remove all my custom ceremony for an upgrade. (see Upgrade to Spring Boot 2.1.7 and rely on PasswordEncoder#upgradeEncoding.). As the InMemoryUserDetailsManager is also a UserDetailsPasswordService, Spring Boot wires everything smooth and nicely for me.

The InMemoryUserDetailsManager of course just changes the in-memory hash.

A fictive, JDBCTemplate-based UserDetailsPasswordService could look like this:

public class JdbcUserDetailsPasswordService implements UserDetailsPasswordService {
    private final UserDetailsService userDetailsService;
 
    private final JdbcTemplate jdbcTemplate;
 
    public JdbcUserDetailsPasswordService(UserDetailsService userDetailsService, JdbcTemplate jdbcTemplate) {
        this.userDetailsService = userDetailsService;
        this.jdbcTemplate = jdbcTemplate;
    }
 
    @Override
    public UserDetails updatePassword(UserDetails user, String newPassword) {
        jdbcTemplate.update("UPDATE users SET hashed_password = ? WHERE username = ?", newPassword, user.getUsername());
        return userDetailsService.loadUserByUsername(user.getUsername()); // reload the whole user details
    }
}

The most important thing here is the fact, that UserDetailsPasswordService#updatePassword actually get’s the newly hashed password! A clear separation of concern, the update mechanism doesn’t need to know anything about the new hash that is used, it just has to store the string version of it! Apart from that, my version here reloads the complete UserDetails from a UserDetailsService not shown here. The implementation would be JDBCTemplate-based, too, in this case.

I really like this addition:

| Comments (1) »

15-Aug-19


Oh, all that entitlement…

There was a tiny tweet yesterday, resurrecting a talk from Christin:

Here’s the direct link to the talk: “Hibernate should be to programmers what cake mixes are to bakers: beneath their dignity.”.

I’d like to emphasize that I would have maybe thought so some 15 years back, too.
Today I cannot understand this whole setup of patronizing so many things at once.

The tweet linked above cause many threads. You have to digg through them a bit, but the most interesting ones are the discussions around Gavins, Gunnars and Johannes’ answers.

I am not a Hibernate fanboy, I’m a regular Hibernate user though. I’m however very much a database fanboy, as long as there is a great, declarative query language (PSA: I work for Neo4j, use Cypher a lot to provide and work in a team that creates an Object Graph Mapper (OGM)).

I firmly believe in the fact that there are no silver bullets and having only one idea is a super dangerous situation:

Having said that, I actually don’t to take the discussion about SQL and Cypher vs ORM and OGM any further.

I’d like to point out that I find several things with the heated discussion yesterday more than odd:

We all started somewhen and somewhere. With no knowledge, some knowledge, good or bad intuition. But we started and learned stuff. As a matter of fact, I just wrote about it the other week. One very important step on the way of becoming a senior something is actually realizing that other people may have less or other knowledge and working with that, helping them, instead of condescending them.

Second: I like cake mixes. I also like fresh, hand made apple pie.

Most of the years, I take the time with the kids to bake a cake for my wife’s birthday. You know what: We go and buy a cake mix and the missing, liquidish ingredients, mix the stuff up and spent the time we saved on decorating the cake, our table and whatnot.

On the other hand, going out for a coffee, either in a café or visiting friends being great with baking: Most awesome! And sometimes, when I’m really in for the mood, I do bakery with all the things necessary myself.

Having the possibility to chose between these options is entitlement, not everyone has: There are enough people that don’t have these simple choices, for whatever reasons.

Leaving the analogies behind: In now nearly 20 years of working in IT, I always had choices and good options. I could try out various frameworks, far beyond the “getting started guide.” Not on my spare time, but on the job. This such a big advantage. Yes, I know there are voices that people have to invest their own time to learn, but you know what? Being able to do this… well is entitlement, too. It’s ok to only work a regular 8 hours shift.

Today I understand better that people just want to use things. Many people work in projects under such high pressure, there’s just no room to get everything used right and even less room to recreate all the stuff by themself.

I think we should be happy, that we can choose from so many actually good ready to use ingredients, being it Hibernate, Spring, Grails, Spring Data, Java EE or whatever these are called in the Ruby, PHP, JavaScript etc. worlds. We can even be more happy when we work in places that gives us time to get things right. Either using frameworks correctly or add the custom bells and whistles needed.

Feel free to rant about all the stuff you want. But rant about actual bugs and inefficiencies and don’t start from a patronizing analogy.

Featured image: unsplash-logoDilyara Garifullina

| Comments (0) »

13-Jul-19


Never not learning…

Sometimes my teammate Gerrit teases me “Don’t you know this and that as a Java Champion?”… Of course I don’t. Honestly, the older I get, the better I know that I know nothing.

This week has been interesting. At first, I stumbled upon something like this

A shortcut to System properties

public interface Stuff {
 
    String SOMETHING_ENABLED_KEY = "stuff.is.enabled";
    static boolean isSomethingEnabled() {
 
        return Boolean.getBoolean(SOMETHING_ENABLED_KEY);
    }
}

And I was like: “How the hell do I set SOMETHING_ENABLED_KEY too true?!”, only having Boolean#parseBoolean(String s) in mind, which “parses the string argument as a boolean.”

Boolean#getBoolean(String s) actually goes to the Systems properties and checks for the existence of a property named s as described in the docs: “Returns true if and only if the system property named by the argument exists and is equal to the string ‘true’.”

I honestly didn’t expect such a shortcut in a wrapper class. To my surprise, those exists for other wrappers like java.lang.Long, too.

See related twitter thread:

But, the week just got more interesting.

Things that are and aren’t enums (aka isEnum())

See this gist right here: EnumTests.java. To understand this, you have to know a bit more about EnumTypes.

An enum type is a special data type that enables for a variable to be a set of predefined constants. The variable must be equal to one of the values that have been predefined for it.

I guess many people stop reading there and don’t go further into the planet example.

Enums are very versatile things. They can work as strategies and are one of the easiest and actually safest to use singletons in Java. They can also implement interfaces. I use this here in ParameterConversion.java to encapsulate a strategy and providing one defined instance only. Joshua Bloch talks about this in Effective Java Third Edition in detail.

But anyway, that’s not the point. Instances of an enum (that is, one of the “constants”) can overwrite the enums methods. As soon as you do this, they are realised as an anonymous, inner class. And with it, asking for EnumType.ENUM_CONSTANT.class.isEnum() will return false.

Brian Goetz is totally right here, even though it’s embarrassing for me:

I could have asked “EnumType.class, are you an enum?” and that holds correctly true. The enum constants however doesn’t promise that. If you want to check wether a thing is an instance of an enum type, than just do this: EnumType.ENUM_CONSTANT instanceof Enum or if you have an object of unknown type, do this Enum.class.isAssignableFrom(aThing.getClass())

I learned so much from the answers to my tweet here, you should check them out:

Twitters UI messes the threading up a bit, so I highlight that one here, from Joseph Darcy:

This old Oracle Blog is a great read: So you want to change the Java Programming Language…

And now what?

On to the next week, onto more learning, more development. Don’t let titles or honours impress you too much. We all learn new stuff, even in old things, each day. And I do think that’s a very good thing.

Title photo: unsplash-logoKimberly Farmer

| Comments (1) »

05-Jul-19