Releasing Maven based projects to Maven central

I published my first library on Maven central about 2013. A server side embedding tool for webpages based on OEmbed. I remember how happy and proud I was having published something in binary form “for all eternity”.

Maven central is the canonical, default artifact repository for the built tool of the same name, Apache Maven. Until configured otherwise, Maven tries to resolve dependencies from there.

The company behind sponsoring and running Maven central is Sonatype. Back in 2013 (and also 2018) account approval to release things to central was manual and involved resolving of ownership of the reverse DNS name. All of that is centered around preventing the hijacking of coordinates (read along here) to prevent people from tricking other people into using malicious software.

The whole process to be a producer starts here: The central repository: Producers. It’s explained in great detail.

Releasing to central also involves going through something called staging repositories and the process associated with it. it can be done through an UI (OSS Sonatype) or via plugins for Maven.

This week, another company, JFrog, announced that they are shutting down Bintray / JCenter. I was aware that Bintray and JCenter are around and are often used within Gradle projects. Apart from that, I only used one of JFrogs products, Artifactory, in a company as a local Maven central mirror and local deploy and release target.

People seem to have used JCenter and Bintray because the release process seems less strict and they found the Maven central way of doing things too hard. Other voices have been raised that Maven central is often slow. I cannot confirm the later, though.

I am writing down the following remarks to demonstrate that it is not that hard to publish your libraries on Maven central after getting across the initial setup.

First of all, read the above link “Producer” to get your coordinates registered for you.

I did not go through the UI of Sonatype for quite some time. The libraries I put myself onto central are all released via the Maven release plugin.

To get this up and running, your pom.xml has to fit some requirements. Especially, the meta data has to be complete. This should be the first step.

Further requirements are: Javadoc and sources must be present. The artifacts must not come in a snapshot form and they must not depend on anything not being on Maven central. Also, the artifacts must signed via GPG (See all of this here).

For my personal needs I have configured to a local release profile in my ~/.m2/settings.xml containing my username on oss.sonatype.org and encrypted password. This goes into the list of <servers> like this:

<server>
  <id>ossrh</id>
  <username>g.reizt</username>
  <password>XXXXX</password>
</server>

Also in settings.xml, the GPG credentials.

<profile>
    <id>gpg</id>
    <properties>
            <gpg.keyname>KEYNAME</gpg.keyname>
<!--        <gpg.passphrase>XXXX</gpg.passphrase> -->
<!-- Or better via an agent -->
    </properties>
</profile>

This turns out to be one of the hardest part to get right. I always have to look this up for CI or a new machine.

So, for the projects pom: Make sure you follow the requirements for the meta data and configure the necessary plugins for JavaDoc, sources and signature.

The libraries I put on central have basically all this information:

<build>
	<plugins>
		<plugin>
			<groupId>org.sonatype.plugins</groupId>
			<artifactId>nexus-staging-maven-plugin</artifactId>
			<version>${nexus-staging-maven-plugin.version}</version>
			<extensions>true</extensions>
			<configuration>
				<serverId>ossrh</serverId>
				<nexusUrl>https://oss.sonatype.org/</nexusUrl>
				<autoReleaseAfterClose>true</autoReleaseAfterClose>
			</configuration>
		</plugin>
		<plugin>
			<groupId>org.apache.maven.plugins</groupId>
			<artifactId>maven-source-plugin</artifactId>
			<version>${maven-source-plugin.version}</version>
			<executions>
				<execution>
					<id>attach-sources</id>
					<goals>
						<goal>jar-no-fork</goal>
					</goals>
				</execution>
			</executions>
		</plugin>
		<plugin>
			<groupId>org.apache.maven.plugins</groupId>
			<artifactId>maven-javadoc-plugin</artifactId>
			<executions>
				<execution>
					<id>attach-javadocs</id>
					<goals>
						<goal>jar</goal>
					</goals>
				</execution>
			</executions>
			<configuration>
				<detectOfflineLinks>false</detectOfflineLinks>
				<detectJavaApiLink>false</detectJavaApiLink>
				<source>${java.version}</source>
				<tags>
					<tag>
						<name>soundtrack</name>
						<placement>X</placement>
					</tag>
				</tags>
			</configuration>
		</plugin>
		<plugin>
			<groupId>org.apache.maven.plugins</groupId>
			<artifactId>maven-release-plugin</artifactId>
			<version>${maven-release-plugin.version}</version>
			<configuration>
				<autoVersionSubmodules>true</autoVersionSubmodules>
				<useReleaseProfile>false</useReleaseProfile>
				<releaseProfiles>release</releaseProfiles>
				<tagNameFormat>@{project.version}</tagNameFormat>
				<goals>deploy</goals>
			</configuration>
		</plugin>
	</plugins>
</build>
 
<profiles>
	<profile>
		<id>release</id>
		<build>
			<plugins>
				<plugin>
					<groupId>org.apache.maven.plugins</groupId>
					<artifactId>maven-gpg-plugin</artifactId>
					<version>${maven-gpg-plugin.version}</version>
					<executions>
						<execution>
							<id>sign-artifacts</id>
							<phase>verify</phase>
							<goals>
								<goal>sign</goal>
							</goals>
						</execution>
					</executions>
				</plugin>
			</plugins>
		</build>
	</profile>
</profiles>

org.sonatype.plugins:nexus-staging-maven-plugin does all the heavy lifting behind the scenes as it hooks up to the release phases. I keep the gpg plugin in a separate profile so that users of my libraries are not pestered with it when they just want to build some stuff locally.

After all this is done, you can release your things in a two step process via `mvn release:prepare` followed by a `man release:perform`. The tooling will guide you through setting the current version and update your things to a new snapshot version.

I won’t go into the discussion whether the repeated tests runs are meaning full or not or whether not using the release plugin at all makes sense or not. I currently maintain projects than are run with CI friendly versions and released via other tooling to central and a couple of things released as described above.

| Comments (1) »

05-Feb-21


Do some puzzles sometimes

Wait, I here you say “This guy writing there, didn’t he write about not having time and energy for site projects?”:

Well, yes, sadly that’s the case and I did cancel a couple of things for good (BTW, I am looking for someone kind to take over the leadership of Aachens Java User Group EuregJUG, maybe there will be a time again for meetings).

On the other hand, I try not to go completely nuts and the last couple of weeks have not made this easy. It’s grey, cold (which I don’t even mind), but wet, wet, wet, muddy, muddy and then some: Running and especially cycling is a bit hard. Normally that would keep me on the track.

Instead, I started puzzle on Advent of Code again. I have a dedicated repository for my solutions, find it at michael-simons/aoc. Why more coding?

What I really like about the puzzles is the fact that they absolutely have nothing todo with frameworks, annotations, microservices, DDD platforms, build systems or moving JSON between endpoints. Just plain, logical puzzles to tinker with. A bit like doing a crossword thing each day.

In my repository, I tackled every puzzle with what’s available in a language, no libraries. Setup in a way that someone who want’s to run needs only to install one thing, the language. The repository contains Java (of course), Kotlin, Rust, Ruby, Go, SQL, Cypher, PHP, Scala, Typescript and some other things. I guess I managed to be more idiomatic in some languages than others, though. In most of the puzzles you’ll realize that you need various ideas, mathematical concepts, algorithms again and again. It’s helpful to compare how easy or hard is to implement those in various languages.

Up until this week I did run some circles around languages like Clojure or Lisp in general or things like Haskell. I find them intimidating at first sight and I don’t have a university background with knowledge about their theoretical concepts.

I started to read up on Clojure a bit and while I do not yet understand everything, I was able to create the following script

(def input (clojure.string/trim-newline (slurp "input.txt")))
 
(def freq (frequencies input))
(def starOne
    (- (get freq \() (get freq \))))
(println starOne)
 
(def starTwo
    (count
        (take-while (fn [p] (not= p -1))
            (reductions (fn [sum num] (+ sum num)) 0 (map {\( 1 \) -1} input)))))
(println starTwo)

It reads an input file into memory and uses the frequencies function to compute the frequencies of different characters in it. It assigns the difference of occurrences of `(` and `)` to the a variable named starOne and prints it.

The second part counts the number of iterations needed to map all opening brackets to 1 and closing brackets to -1 and summing them up (in several reductions) until one of them hits -1.

Many important things are already in there: Reading files, working with maps and lists, applying functions to every item in a list, calling functions and defining anonymous functions. I can work with that.

Fast forward a couple of days, having a look at I Was Told There Would Be No Math. Well, the math is actual super simple in that. Read a file with lines like 2x3x4. They give you the dimension of a parcel (length, width, height).

Compute according to some rules paper and ribbon needed to wrap those parcels or presents. Paper area is given by “find the surface area of the box, which is 2*l*w + 2*w*h + 2*h*l. The elves also need a little extra paper for each present: the area of the smallest side.”

The good object oriented person and the happy Java 15 user with preview feature I am I started to create a record to model that thing:

record Present(int l, int w, int h) {
    int surfaceArea() {
        return 2 * (l * w + w * h + h * l);
    }
 
    int slack() {
        return Math.min(l * w, Math.min(w * h, h * l));
    }
 
    int volume() {
        return l * w * h;
    }
 
}

I mean, basic math, right? Solution to the first question is just summing surface area and slack up like var starOne = presents.stream().mapToInt(p -> p.surfaceArea() + p.slack()).sum();. Easy, right?

Always thinking in objects primes you to things. Here to length, width, height. I should have realized what I am doing when I computed the smallest area (computed 3 areas and chose the smallest one): It doesn’t matter which value is assigned to length, width and height: I can just sort them, take the two smallest and multiple them. I realized that when I computed the smallest perimeter of the parcel:

int smallestPerimeter() {
    return Stream.of(l, w, h).sorted().limit(2).mapToInt(v -> 2 * v).sum();
}

Enter the Clojure solution. It felt very clumsy to define a type just for that.

Here’s what I came up with instead

(use '[clojure.string :only (trim-newline split-lines split)])
 
(def input 
    "I use again slurp to read the file and two library functions
     to trim the newlines and split the whole thing into a list. 
     An anonymous function is used on each line to split line by 
     the letter `x` and map the values to an int. Those ints are than
     sorted and the variable `input` will be a lazy list of int arrays."
    (map (fn [v] (sort (map bigint (split v #"x"))))
    (split-lines (trim-newline (slurp "input.txt")))))
 
(defn paper
    "As I know that the array is sorted, I can deconstruct it into the 
     3 values contained. The riddle for the paper is that the smallest area 
     is in there 3 and not 2 times. As the smallest area is defined by the first
     two elements, we multiple them 3 instead of 2 times like the rest."
  [dimensions]
  (let [[l w h] dimensions]
    (+ (* 3 l w) (* 2 l h) (* 2 w h))))
 
(defn ribbon  
    "Same idea as above: The smallest perimeter is defined by the 2 smallest values.
     The volume is of course the product of all 3."
  [dimensions]
  (let [[l w h] dimensions]
    (+ (* 2 l) (* 2 w) (* l w h))))
 
(println (reduce + (map paper input)))
(println (reduce + (map ribbon input)))

Find my prose inside the program.

I do like the approach not using to many data structures.

In the end: It is good to have a look outside the box sometimes and reset your brain with fresh ideas.

Thanks to Tim, Stefan and Jan for the multiple times in which you brought Clojure into my bubble.

Title picture by Bannon Morrissy on Unsplash

| Comments (1) »

03-Feb-21


Minecraft terminology for Java developers

My eldest kid – 11 at the time of writing – has been into Minecraft for some time now. I tried to motivate him on various ages to do some kinda programming with me. We tried Scratch, Lego Mindstorms (one of the few sets that is gathering dust) and a couple of other things. We didn’t have much fun with any of those. The nicest thing done together have been actually a couple of the Advent of Code challenges in which I tried to reduce to setup of things to a bare minimum (aka Texteditor).

I don’t blame the kid for failure at all… I am not a good teacher but what is worse, I have not much interest myself in tinkering with a game, never had (basically the same with regards to cycle, I prefer doing the thing actually), so I was quite happy that the kid himself wants to do stuff.

A bit of terminology

I was a bit surprised how many different kinds of Minecraft client and servers are out there:

First of all, there’s the Java based, “vanilla” server you can download here: original, “vanilla” Minecraft Java server. This edition of the server does not support custom plugins. It does however support Minecraft Forge. Forge is a modification loader for the vanilla server. You would program against the Forge API and that API encapsulates away the interaction with the server code.

Only the Minecraft Java edition client can connect against the Java server. This is a bit sad, as many friends of my kid would only have access to a gaming console. The Minecraft edition available on Switch, X-Box or Playstation does not support the Java server. They only connect to something called “Bedrock” edition.

Custom mods with MCreator

Back to Forge: Forge as the modloader must be installed into your Java Minecraft server via an installer that fits the server version. You get the installers here. After that you can pick and choose from a plethora of already existing mods at curseforge.

As far as I understood this, mods can also be installed client side only, but I don’t see the point of that when playing with multiple people on a server.

Anyway, we want to create our own mods. Mods can change many aspects of the games. They can include new commands, new recipes, blocks, bioms, creates, enchantments and more.

Adam Ness pointed me to MCreator. From the site: “MCreator is open source software used to make Minecraft Java Edition mods, Minecraft Bedrock Edition Add-Ons, and data packs using an intuitive easy-to-learn interface or with an integrated code editor.” This was exactly what I was looking for. A full fledged IDE but dedicated to Minecraft and the environment.

After download MCreator will setup a workspace for you. It uses Gradle behind the scenes to setup everything, including a development client and server. The nice thing here: You or your kid won’t need a Minecraft client license at that point. The documentation of the thing is stellar, it has an exhaustive Wiki and a good Knowledgebase.

MCreator contains tooling to package the mod and distribute it.

Custom plugins for the Spigot Minecraft server

I could stop here, as MCreator and the ability of mods to change a vanilla server is everything my kid was looking for, but for completeness and for fellow Java developers, here’s more.

There once was a fork of the Minecraft server called “Bukkit” but a little fluster cluck happened and now there’s Spigot. Spigot is something like a fork of the Minecraft server, but depends on the original binary (read the first link about to understand way). It does everything that the vanilla server does, but more. Especially it supports full fledged plugins.

To get started with developing plugins of any kind – btw, I know of one that uses Neo4j and Neo4j-OGM, yes, I helped the plugin author as part of my day job already – you first need to build the Spigot server. They don’t offer prebuild binaries due to the license mess (I found binaries at getbukkit.org but I am unsure about that being legit).

Building Spigot is pretty much trivial when you have a recent JDK installed on your machine. Grab the Spigot BuildTools.jar, save it somewhere and run in that directory java -jar BuildTools.jar to get a new server. I am unsure if you need to have Git installed for your machine (I have on all machines), but the build tools use it to clone a bunch of repositories.

After a while, you’ll find spigot-1.x.y.jar inside the same directory (at the time of writing, 1.16.4). This is your fresh Minecraft server supporting custom plugins.

How to write plugins? You can start of with a basic Maven project. The Spigot Plugin API lives under the following coordinates in the provided scope: org.spigotmc:spigot-api:1.16.4-R0.1-SNAPSHOT. They are not on central, so you would need to add the Spigot-Repo, too:

<repositories>
    <repository>
        <id>spigot-repo</id>
        <url>https://hub.spigotmc.org/nexus/content/repositories/snapshots/</url>
    </repository>
</repositories>
 
<dependencies>
    <dependency>
        <groupId>org.spigotmc:spigot-api:1.16.4-R0.1-SNAPSHOT</version>
        <scope>provided</scope>
    </dependency>
</dependencies>

The Spigot wiki gives more details.

The server version and naming a is a mess, not to speak about the licensing issues. But apart from that, I found the API pleasant to use: Spigot Java API docs. It gives full control about basically everything you can do in the game and I was able to create a simple plugin very fast.

To build the whole thing, you would need of course Maven or Gradle and some idea to set this up. I did install the Plugman plugin. This allows to load / unload / reload other plugins without restarting the server every time which cuts down on the feedback loop.

Summary

To sum this up: The various forks, versions, unclear naming of things regarded Minecraft is intimidating. People without much knowledge will eventually stumble from one YouTube “tutorial” to the next and download all sort of things in various qualities… After managing that initial step, things are not that bad and one can do neat things.

MCreator gives modders of all age great tooling without too much diversions in terms of installing things to express their minds about custom interactions in Minecraft. In a classroom situation or a scenario where a non-developer modern wants to try out things, I would recommend that.

Java developers will probably enjoy the Spigot API more. There’s even an IntelliJ plugin that creates full projects for you with the required setup.

| Comments (2) »

03-Jan-21


Music 2020. Wrapped.

Everyone and their dog is posting their Spotify Wrapped thing. It’s 2020, i still don’t have Spotify, but despite my increasing age, I still listen to a ton of music.

When I started to work remotely back in 2018, one of the biggest perks for me was – apart from not having to commute – to be able to listen to whatever thing I currently like as loud as I want without headphones. Well, that changed a bit during the course of the COVID-19 pandemic as my wife is now working remotely as well, but alas, it turned out, the volume knob is still working.

So, no Spotify for me. But let’s see what the MariaDB – the database powering the scrobble engine running for dailyfratze.de is up to. How do I fill this data? I have a custom iTunes script written ages ago that calls a REST endpoint with the stuff I’m listening. Pretty basic, actually.

I am working for Neo4j now since 2.5 years and I honestly love the company for manifold reasons. However, it seems that it is considered rude to post SQL in the company Slack and we should prefer to use only Cypher 😉 Well, this post will contain a lot of SQL and use the scheme I had a couple of times in this SQL talk of mine.

Interested in Cypher? Cypher is a language for querying Neo4j, the Graph database by the vendor of the same name,

You can do awesome stuff with Cypher and you’ll find talks by me as well about that topic, but today I’ll keep it to a 1990’s joke: MATCH (n) RETURN n SKIP $no LIMIT /* no */ $ /* no */ limit 😉

General database stats

39497 tracks by 9661 artists and 161141 played tracks by 9 different users. First plays stored April 27 in 2005.

We will make use of the rank function to compute the exact position of things we are interested in.

Top 10 tracks in 2020

A simple approach without rank would be something like this:

SELECT a.artist,
       t.name,
       COUNT(*)
FROM plays p
JOIN tracks t ON t.id = p.track_id
JOIN artists a ON a.id = t.artist_id
WHERE p.user_id = 1
AND YEAR(p.played_on) = 2020
GROUP BY a.artist, t.name
ORDER BY COUNT(*) DESC
LIMIT 10

but that would fill already several places with tracks that have the same absolute count. This is where the rank() and dense_rank() functions come into play.

Both functions assign rows in a row set a rank based on their given order. Both functions can do this over partitions or windows over the whole data. Therefore these analytics functions are often called window functions. Both variants of rank functions assigns the same rank to rows having the same value. Thus, two tracks that have been played the same number of time will receive the same rank. However, rank will skip n ranks if there are 1 + n items in a rank wheres the dense_rank function will not. I want consecutive ranks, that is: All tracks played the most n times will be 1 first place, the next rank second and so forth.

Let’s give it a shot. We see that the a query creating a window function over the dense rank of count gives us 8 tracks in total when I ask for the top 5 places:



And what can I say: German Hip-Hop/Punk-Band Antilopen Gang is on my radar now for 2 years, but in 2020, they have become my meds and therapy. If you would have ever told me, that I would totally fall in love with German Hip-Hop in my early 40ties, I would have said you’re mad, but there we are: “Wünsch Dir nix”, so fitting for 2020:

Also in the top 5, Patientenkollektiv. Such goose bumps:

We will see the Antilopen later on. I was a bit surprised by S&M2, a new album in 2020. A retake of Metallicas symphonic metal approach with the San Francisco Symphony orchestra. That version of The Unforgiven III is not something I would have ever expected by James. An incredible performance:

Last but not least: Ozzy Osborne. This guy has reached Lemmy Kilmister undying level.

Back to SQL. There are no partitions in the above query. The partitions would come in handy if would like to see my top 1 track over the last years in one query. Let’s give it a try:



It’s basically the same query but notice now how I create the rank: dense_rank() OVER (partition by year(played_on) ORDER BY count(*) DESC) AS rank. The rank is computed now for each year in which I played tracks, separately.

But wait, 2019. What did I drink?

Albums

Let’s be safe and let us aggregate that stuff. Yes, I do still listen to whole albums. It is basically the same query, but group by album, not by single tracks. And I excluded compilations. Apart from that, the query is hardly different:



Antilopen Gang with 3 albums. Holy crap. But yes, they did release two albums in 2020, “Abbruch Abbruch” and “Adrenochrom”. The later a reply to some people in the music circus going lunatic and believing a lot of shit. I haven’t heard one album in the last 10 years so often like this. It is streamable on all major platforms.

Let’s play “Dinge” from Deichkinds “Wer sagt denn das?”:

For me 2020 proved that I don’t need too many things. Some stuff is essential for me: Feeling secure and snug with my family, working in a good company that also makes me feel safe. Ok, bicycles are my personal issue, but that’s a different topic…

Back to Deichkind: Electropunk. Punk is a good keyword, but I only spot “5, 6, 7, 8 Bullenstaat” by Die Ärzte in the above list. 2020 had some more punk. Let’s filter the above list to albums that have been released in 2020:



And we will see Madsen, Ferris MC and Slime. Madsen, a “Deutsch Rock” band released the Punk Rock Album of 2020 (which is 200% more Punk than Die Ärzte these days), and Ferris MC, who played for a decade with Deichkind, joined forces with Swiss und die anderen and dropped an incredible Rap-Punk-Rock piece. Slime are Slime. The subtitle of this blog, “120 Dezibel” are a quote from “Missglückte Asimetrie” and one reason I will have always music in my life:

Ich dreh auf und die Erde steht still bei 120 Dezibel
Alles was ich brauch und will sind 120 Dezibel
Ich kann euch alle nicht mehr hören bei 120 Dezibel
Nichts was mich noch stört bei 120 Dezibel
120 Dezibel
120 Dezibel

Let’s have a look at Madsen. They got some stand-ins for “Alte weiße Männer”:

And one of our favorite songs among the adults and kids, “Quarantäne für immer”

Listen to “Sorry, kein Sorry” by Ferris. After that, give it a go with Slime. That band is as old as me:

And back to the database and the

Artists

What have been my preferred artists in 2020? I expect no surprises here. The query will be very similar to before, only the grouping changes again (it becomes simpler):



We see again Antilopen Gang and on the next two ranks – if I didn’t restrict to top 10 – we would have seen Juse Ju and Fatoni, two more German rapper who are also somewhere near Antilopen Gang. A graph database like Neo4j would show this connection and probably discover the “Anti Alles Aktion” on it’s own. Want to learn how? Have a look at Going from relational databases to databases with relations with Neo4j.

People who know me a bit longer know that I have been to more than one Heavy Metal festival and certainly to more than one Grindcore gig. That much German hip-hop in my playlists? I would have never thought. Ok, there’s still the usual suspects like Motörhead (I love running with Motörhead in my ears), the mighty Black Sabbath and even Body Count had a decent album out this year.

Can my database answer the change in artist or preference as well? Hmm, we would need the current years rank of something and the previous one. I think we can do this.

But in the meantime, enjoy Faith Alone 2020 by Bad Religion:

So be prepared for the with-with clause or “Common Table Expression”. The keen eye did already see that I used subqueries in my queries above. Why? To filter on the rank (top 5 or top 10 only). I cannot do this in the same select as in which the rank is computed. Therefore I nested the query and made it accessible that way.

The subquery works, but is kinda hard to read and cannot be reused. A relation inside a with clause is somewhat like a named subquery or a view that only exists during that query. Fun fact: CTEs can refer to themselves, thus become recursive.

Anyway, I think they read nice and reminds me a lot of the with clause in Cypher which is used to stick together multiple segments of a query to one pipeline.

But show me the code:

WITH rank_per_year AS (
  SELECT YEAR(p.played_on)  AS YEAR,
         a.artist,
         dense_rank() OVER (partition BY YEAR(played_on) ORDER BY COUNT(*) DESC) AS rank
  FROM plays p
  JOIN tracks t ON t.id = p.track_id
  JOIN artists a ON a.id = t.artist_id
  WHERE p.user_id = 1
  AND YEAR(p.played_on) BETWEEN 2015 AND  2020
  AND t.compilation = 'f'
  GROUP BY YEAR(p.played_on), a.artist
) 
SELECT YEAR, artist, rank, 
       ifnull(
         lag(rank) OVER (partition BY artist ORDER BY YEAR ASC) - rank, 
         'new'
       )  AS `change`
FROM rank_per_year
WHERE rank <= 5
ORDER BY YEAR ASC, rank ASC;

We do compute the rank per year and artist (“group by year and artist”) for years 2015 to 2020, partitioned by the year and give this whole thing a name (“rank_per_year”). This is a new relation that can now be used in a select clause, like we do.

In that select clause, we do find lag. lag is a window function that can go n rows backward over a partition that is ordered. The partition here is defined by the artist and in that, ordered by year. lag picks the value of the rank of the previous year. rank is a variable in that case, coming from the CTE named “rank_per_year”, not from the window function of the same name!

From that lagged value we subtract the current and get the change from the previous to this year. As one artist can be under the top 5 artist for the first time in a year, we need to check whether the previous rank is null. That’s what ifnull is for. Neat, isn’t it? And the result? Here we go (I added some blank lines manually):



I hope you enjoyed this a bit. At least I did. Was nice doing some SQL again and digging through the stuff I have been listening in 2020. In total I listend to about 9847 tracks so far in 2020 with a duration of roughly 27 days.

I leave you with a track that captures my mood in 2020 all to perfect. Danger Dan mit Nudeln und Klopapier:

The screenshots of the code have been created with Carbon. I was too lazy to fiddle around with something else that would have fit both the queries and the output. I generated the query format with window chrome and the output without and appended both files than with ImageMagick like this: convert carbon.png carbon\(1\).png -append tracks-2020.png.

| Comments (0) »

05-Dec-20


About the tooling available to create native GraalVM images.

A couple of days ago I sent out this tweet here:

This tweet caused quite some reaction and with the following piece I want to clarify my train of thoughts behind it. First of all, let’s pick the components here and explain what they are.

The complete source code for all examples is on GitHub: michael-simons/native-story. Title image is from Alexander. Cheers, hugs and thank you to Gerrit, Michael and Gunnar for your reviews, proofreading and feedback.

GraalVM

GraalVM is a high-performance runtime that provides significant improvements in application performance and efficiency which is ideal for microservices. Quoted from the GraalVM website:

“GraalVM is a high-performance runtime that provides significant improvements in application performance and efficiency which is ideal for microservices. It is designed for applications written in Java, JavaScript, LLVM-based languages.”

One benefit of the GraalVM is a new just-in-time compilation mechanism, which makes many scenarios running on GraalVM faster than running on a comparable JDK. However, there is more. Also quoting from the above intro: “For existing Java applications, GraalVM can provide benefits by […] creating ahead-of-time compiled native images.”

SubstrateVM

The SubstrateVM is the part of GraalVM that is responsible for running the native image. The readme states:

(A native image) does not run on the Java VM, but includes necessary components like memory management and thread scheduling from a different virtual machine, called “Substrate VM”. Substrate VM is the name for the runtime components (like the deoptimizer, garbage collector, thread scheduling etc.). The resulting program has faster startup time and lower runtime memory overhead compared to a Java VM.

The GraalVM team has a couple of benchmarks showing the benefits from running microservices as native images.
Those numbers are impressing, no doubt, and they will have a positive effect for many applications.

I wrote my sentiment not as an author of applications, but as an author and contributor of database drivers supporting encrypted connections to servers as well as an object mapping framework that takes arbitrary domain objects (in form of whatever classes people can think of) and creates instances of those dynamically from database queries and vice versa.

This text is not an exhaustive take on the GraalVM and its fantastic tooling. It’s a collections of things I learned during making the Neo4j Java Driver, the Quarkus Neo4j extension, Spring Data Neo4j 6 and some GraalVM polyglot examples native image compatible.

Since my first ever encounter with GraalVM back in 2017 at JCrete, things have become rather easy for application developers. There is the native-image tool that takes classes or a whole jar containing a main class or the corresponding Maven plugin and produces a native executable.

There is a great getting started Install GraalVM which you can follow as an application developer step by step. Make sure you install the native-image tool, too.

Giving the following trivial program – which can be run as a single source file java trivial/src/main/java/ac/simons/native_story/trivial/Application.java Michael producing Hello, Michael

package ac.simons.native_story.trivial;
 
public class Application {
 
	public static void main(String... args) {
 
		System.out.println("Hello, " + (args.length == 0 ? "User" : args[0]));
	}
}

Compile this with first with javac and after that, run native-image like this:

javac trivial/src/main/java/ac/simons/native_story/trivial/Application.java 
native-image -cp trivial/src/main/java ac.simons.native_story.trivial.Application app

It will produce some output like this

Build on Server(pid: 21148, port: 50583)
[ac.simons.native_story.trivial.application:21148]    classlist:      71.34 ms,  4.55 GB
[ac.simons.native_story.trivial.application:21148]        (cap):   1,663.79 ms,  4.55 GB
[ac.simons.native_story.trivial.application:21148]        setup:   1,850.67 ms,  4.55 GB
[ac.simons.native_story.trivial.application:21148]     (clinit):     107.06 ms,  4.55 GB
[ac.simons.native_story.trivial.application:21148]   (typeflow):   2,620.63 ms,  4.55 GB
[ac.simons.native_story.trivial.application:21148]    (objects):   3,051.08 ms,  4.55 GB
[ac.simons.native_story.trivial.application:21148]   (features):      83.23 ms,  4.55 GB
[ac.simons.native_story.trivial.application:21148]     analysis:   5,962.31 ms,  4.55 GB
[ac.simons.native_story.trivial.application:21148]     universe:     112.18 ms,  4.55 GB
[ac.simons.native_story.trivial.application:21148]      (parse):     218.57 ms,  4.55 GB
[ac.simons.native_story.trivial.application:21148]     (inline):     494.42 ms,  4.55 GB
[ac.simons.native_story.trivial.application:21148]    (compile):     912.43 ms,  4.43 GB
[ac.simons.native_story.trivial.application:21148]      compile:   1,828.57 ms,  4.43 GB
[ac.simons.native_story.trivial.application:21148]        image:     465.08 ms,  4.43 GB
[ac.simons.native_story.trivial.application:21148]        write:     135.90 ms,  4.43 GB
[ac.simons.native_story.trivial.application:21148]      [total]:  10,465.92 ms,  4.43 GB

and eventually you can run a native executable like this ./app Michael. Adding the corresponding Maven plugins to the project makes that part of the build. Pretty neat.

So far, so good and done? From this application, of course. But having framework needs is a bit more elaborated.

A fictive “framework”

Let’s take this simple “hello-world” application and turn it into something artificially complicated. Imagine we are writing a complex application, having some framework like traits. So, the “greeting” must be turned into an interface based service:

public interface Service {
 
	String sayHelloTo(String name);
 
	String getGreetingFromResource();
}

Of course, we need a factory to get instances of that service

public class ServiceFactory {
 
	public Service getService() {
		Class<Service> aClass;
		try {
			aClass = (Class<Service>) Class.forName(ServiceImpl.class.getName());
			return aClass.getConstructor().newInstance();
		} catch (Exception e) {
			throw new RuntimeException(\\_(ツ)_/¯", e);
		}
	}
}

The implementation of the service should look something like this

import java.io.BufferedReader;
import java.io.IOException;
import java.io.InputStreamReader;
import java.io.UncheckedIOException;
import java.util.stream.Collectors;
 
public class ServiceImpl implements Service {
 
	private final TimeService timeService = new TimeService();
 
	@Override
	public String sayHelloTo(String name) {
		return "Hello " + name + " from ServiceImpl at " + timeService.getStartupTime();
	}
 
	@Override
	public String getGreetingFromResource() {
		try (BufferedReader reader = new BufferedReader(
			new InputStreamReader(this.getClass().getResourceAsStream("/content/greeting.txt")))) {
 
			return reader.lines()
				.collect(Collectors.joining(System.lineSeparator()));
		} catch (IOException e) {
			throw new UncheckedIOException(e);
		}
	}
}

That looks actually rather simple. As an added bonus, it includes a TimeService that returns the start of the application. That service is implemented in a super naive way:

import java.time.Instant;
 
public class TimeService {
 
	private final static Instant STARTED_AT = Instant.now();
 
	public Instant getStartupTime() {
		return STARTED_AT;
	}
}

It’s problematic on its own, but that shall not be the point here. Last but not least, let’s blow up the application itself a bit:

import java.lang.reflect.Method;
 
public class Application {
 
	public static void main(String... a) {
 
		Service service = new ServiceFactory().getService();
		System.out.println(service.sayHelloTo("GraalVM"));
 
		System.out.println(invokeGreetingFromResource(service, "getGreetingFromResource"));
	}
 
	static String invokeGreetingFromResource(Service service, String theName) {
 
		try {
			Method method = Service.class.getMethod(theName);
			return (String) method.invoke(service);
		} catch (Exception e) {
			throw new RuntimeException(e);
		}
	}
}

I tried to make up some examples that need to be addressed due to limitations of Graals ahead of time compilation described here.

What do we have?

  • A factory producing an instance based on a dynamic class name (non compile time constant), the ServiceFactory
  • A dynamic method call (could be a field call or whatever through java.lang.reflect in Application)
  • A service that uses some resources (getGreetingFromResource).
  • Another service that uses a static field initialized during class initialization containing a sensible value dependent on the current time (TimeService)

When I package this application as a jar file, containing a manifest entry pointing to the main class, I can run it like this:

java -jar only-on-jvm/target/only-on-jvm-1.0-SNAPSHOT.jar 
Hello GraalVM from ServiceImpl at 2020-09-15T09:37:37.832141Z
Hello, from a resource.

However, pointing native-image to it, now results in a couple of warnings

native-image -jar only-on-jvm/target/only-on-jvm-1.0-SNAPSHOT.jar 
...
Warning: Reflection method java.lang.Class.forName invoked at ac.simons.native_story.ServiceFactory.getService(ServiceFactory.java:8)
Warning: Reflection method java.lang.Class.getMethod invoked at ac.simons.native_story.Application.invokeGreetingFromResource(Application.java:18)
Warning: Reflection method java.lang.Class.getConstructor invoked at ac.simons.native_story.ServiceFactory.getService(ServiceFactory.java:9)
Warning: Aborting stand-alone image build due to reflection use without configuration.
Warning: Use -H:+ReportExceptionStackTraces to print stacktrace of underlying exception
Build on Server(pid: 26437, port: 61293)
...
Warning: Image 'only-on-jvm-1.0-SNAPSHOT' is a fallback image that requires a JDK for execution (use --no-fallback to suppress fallback image generation and to print more detailed information why a fallback image was necessary).

A fallback image that requires a JDK means that the resulting image – however not being much smaller or larger than a non-fallback – requires the JDK to be present at runtime. If you remove the JDK from your path and try to execute it, it will greet you with:

./only-on-jvm-1.0-SNAPSHOT 
Error: No bin/java and no environment variable JAVA_HOME

What tools are available to address the issues? Let’s first tackle the first two, both dynamic class loading and Java reflection. We have two options:

We can enumerate what classes need to be present in the native image and what methods as well and to which methods reflection based access should be available. Or we can substitute classes or methods when run on GraalVM.

Enumerating things present in the native image

The GraalVM analysis intercepts calls like the one to Class.forName and tries to reduce their arguments to a compile time constant. If this succeeds, the class in question is added to the image. The above example is contrived so that the analysis cannot do this. This is where the “reflection config” can come into place. The native-image tool takes -H:ReflectionConfigurationFiles as arguments which points to JSON files containing something like this:

[
  {
    "name" : "ac.simons.native_story.ServiceImpl",
    "allPublicConstructors" : true
  },
  {
    "name" : "ac.simons.native_story.Service",
    "allPublicMethods" : true
  }
]

Here we declare that we want allow reflective access to all public constructors of ServiceImpl so that we can get an instance of it and allow access to all public methods of the services interface.

There are more options as described here.

One way to make native-image use that config is to pass it as
-H:ReflectionConfigurationFiles=/path/to/reflectconfig, but I prefer having one
native-image.properties in META-INF/native-image/GROUP_ID/ARTIFACT_ID which is picked up by the native-image tool.

That native-image.properties contains so far the following:

Args = -H:ReflectionConfigurationResources=${.}/reflection-config.json

Pointing to the above config.

This will compile the image just nicely. However, it will still fail with a NullPointerException: The greeting.txt resource has not been included in the image.

This can be fixed with a resources-config.json like this

{
  "resources": [
    {
      "pattern": ".*greeting.txt$"
    }
  ]
}

The appropriate stanza needs to be added to the image properties, so that we have now:

Args = -H:ReflectionConfigurationResources=${.}/reflection-config.json \
       -H:ResourceConfigurationResources=${.}/resources-config.json

Note The arguments for specifying configuration in form of some JSON “things” come in two options: As XXXConfigurationResources and XXXConfigurationFiles which I learned in this issue (which is great example of fantastic communication from an OSS project). The resources-form is for everything inside your artifact, the files-form is for external files. The wildcard ${.} resolves accordingly. All the options to specify can be retrieved with something like this: native-image --expert-options | grep Configuration

Now the image runs without errors:

 ./reflection-config-1.0-SNAPSHOT                                                                                          
Hello GraalVM from ServiceImpl at 2020-09-15T15:02:47.572800Z
Hello, from a resource.

But does it run without bugs? Well not exactly. I wrote a bit more text, time went on and when I run it again, it prints the same date. Look back at the TimeService. It holds an instance of private final static Instant STARTED_AT = Instant.now();. It must be initialized before the time service is used.

I’m actually unsure why the native image tool considers the TimeService class as “safe” (described here) and choses to initialize it at build time (which also contradicts Runtime vs Build-Time Initialization stating “Since GraalVM 19.0 all class-initialization code (static initializers and static field initialization)”. At first I thought that happens as I “hide” the TimeServices usage behind my reflection based code, but I can reproduce it without it, too.

At the time of writing, I asked for this on the GraalVM slack and we see how it will be answered. Until then, I’m happy to have a somewhat contrived example. The TimeService must be of course initialized at runtime, it is not safe. This is done via --initialize-at-run-time arguments to the native image tool.

So now we have:

Args = -H:ReflectionConfigurationResources=${.}/reflection-config.json \
       -H:ResourceConfigurationResources=${.}/resources-config.json \
       --initialize-at-run-time=ac.simons.native_story.TimeService

And a correctly working, native binary.

Substitutions

Working on making the Neo4j driver natively compilable was much more effort. We used Netty underneath for SSL connections. A couple of things need to be enabled on the native image tool to get the groundworks running (like having those -H:EnableURLProtocols=http,https --enable-all-security-services -H:+JNI options which can be added in the same manner like we did above).

A couple of other things needed active substitutions.

With the “SVM” project the GraalVM provides a way to substitute whole classes or methods during the image build:

<dependency>
	<groupId>org.graalvm.nativeimage</groupId>
	<artifactId>svm</artifactId>
	<version>${native-image-maven-plugin.version}</version>
	<!-- Provided scope as it is only needed for compiling the SVM substitution classes -->
	<scope>provided</scope>
</dependency>

Now we can provide them like this in a package private class like CustomSubstitutions.java hidden away.

import ac.simons.native_story.Service;
import ac.simons.native_story.ServiceImpl;
 
import com.oracle.svm.core.annotate.Substitute;
import com.oracle.svm.core.annotate.TargetClass;
 
@TargetClass(className = "ac.simons.native_story.ServiceFactory")
final class Target_ac_simons_native_story_ServiceFactory {
 
	@Substitute
	private Service getService() {
		return new ServiceImpl();
	}
}
 
@TargetClass(className = "ac.simons.native_story.Application")
final class Target_ac_simons_native_story_Application {
 
	@Substitute
	private static String invokeGreetingFromResource(Service service, String theName) {
 
		return "#" + theName + " on " + service + " should have been called.";
	}
}
 
 
class CustomSubstitutions {
}

The names of the classes don’t matter, the target classes do of course.

With that, -H:ReflectionConfigurationResources=${.}/reflection-config.json can go away (in our case). You can do a lot of stuff in the substitutions. Have a look at what we do in Neo4j Java driver.

The tracing agent

Thanks to Gunnar I learned about GraalVMs Reflection tracing agent. It can discover most of things described above for you.

Running the only-on-jvm example from the beginning with the agent enabled, it generates the full configuration for us. For this to work, you must of course be running the OpenJDK version of the GraalVM already:

java --version
openjdk 11.0.7 2020-04-14
OpenJDK Runtime Environment GraalVM CE 20.1.0 (build 11.0.7+10-jvmci-20.1-b02)
OpenJDK 64-Bit Server VM GraalVM CE 20.1.0 (build 11.0.7+10-jvmci-20.1-b02, mixed mode, sharing)
 
java  -agentlib:native-image-agent=config-output-dir=only-on-jvm/target/generated-config -jar only-on-jvm/target/only-on-jvm-1.0-SNAPSHOT.jar
Hello GraalVM from ServiceImpl at 2020-09-16T07:12:27.194185Z
Hello, from a resource.

The result looks like this:

dir only-on-jvm/target/generated-config 
total 32
14417465 0 drwxr-xr-x  6 msimons  staff  192 16 Sep 09:12 .
14396074 0 drwxr-xr-x  8 msimons  staff  256 16 Sep 09:12 ..
14417471 8 -rw-r--r--  1 msimons  staff  278 16 Sep 09:12 jni-config.json
14417468 8 -rw-r--r--  1 msimons  staff    4 16 Sep 09:12 proxy-config.json
14417470 8 -rw-r--r--  1 msimons  staff  226 16 Sep 09:12 reflect-config.json
14417469 8 -rw-r--r--  1 msimons  staff   77 16 Sep 09:12 resource-config.json

Looking into the reflect-config.json we find a less coarse version of what I used above:

[
{
  "name":"ac.simons.native_story.Service",
  "methods":[{"name":"getGreetingFromResource","parameterTypes":[] }]
},
{
  "name":"ac.simons.native_story.ServiceImpl",
  "methods":[{"name":"<init>","parameterTypes":[] }]
}
]

The configuration is in fact complete in my example, as none of the dynamic method calls depend on input. If input varies the method calls, the agent has ways of merging the generated config.

In anyway, the agent is a fantastic tool to get you up and running with a base configuration for your libraries native config.

Quintessence

Without much effort I can make up a framework or program that is not exactly a good fit for a native binary. Of course, those examples here are contrived but I am pretty sure a couple of things I did here are to be found in many many applications still written today.

Also, reflection is used a lot in frameworks like Spring-Core, Hibernate ORM and of course Neo4j-OGM and Spring Data. For DI related frameworks, reflections make it easy to create injectors and wire dependencies. Object mappers don’t have an idea of what people are gonna throw at them.

Some of the things can be solved very elegantly with compile-time processors and resolve annotations and injections into byte code. This is what Micronaut does for example. Or with prebuilt indexes for domain classes like the Hibernate extensions in Quarkus do.

Older frameworks like Spring that also integrate over a lot of other things don’t have that luxury right now.

Either way, the tooling on the framework sides is improving a lot. Quarkus has several annotations and config options that generates the appropriate parameters and things as described above and a nice extension mechanism I described here. Spring will provide similar things through the spring-graalvm-native project. For Spring Data Neo4j the hints will probably look similar to this. In the end: Those solutions will translate to what I described above eventually.

Also bear in mind that there’s more that needs configuration: I addressed only reflection and resources but not JNI or proxies. There are shims and actuators to make them work as well.

I think that all the tooling around GraalVM native images is great and well documented. However, as you can see in my contrived example, there can be some pitfalls, even with applications that may seem trivial. Just pointing the native-image command against your class or jar file is not enough. Your test scenarios for services running native must be rather strict. If they spot errors, there is a plethora of utilities to help you with edge cases.

If you want to have more information, I really like this talk given at Spring One 2020 by Sébastien Deleuze and Andy Clement called “The Path Towards Spring Boot Native Applications” and I think it has a couple of takeaways that are applicable to other frameworks and applications, too:

In the long run, the work we as library authors put into making this things possible will surely pay out. But the benefit that a native image provides for many scenarios is not a free lunch.

| Comments (5) »

15-Sep-20