info.michael-simons.eu

Beautiful “new” cloud world: Apple Music

Things used to work, things had been quite nice for a while even. I usually felt being the owner and in charge of my electronic devices for the better part of my life. There’s been a lot of stuff that used to just worked.

One such feature was the fact that my first iPod classic I ever owned, a 3rd generation one, black&white backlight illuminated display, with red illuminated buttons, did keep track of when I did play which track, updated its play count and synched it back to my Apple iTunes library. I do have basically the same library still. Like, literally. It just grew and was migrated with each new Mac, Mac OS X and iTunes.

Sure, in the last years, Apple tried to push me into their Music subscription in all possible ways and places, but the stubborn old fuck I am, I still buy either physical CDs and rip them, hit the iTunes Store or Bandcamp.

Back to the play counts: Yes, I still have them. I still value them. I have smart playlist based on the number of plays, the last time I listened to stuff and so on. I put them into a database, I did talks about the domain model, turning it into a graph and I have a chart page up and running since 2021. Yes, this is of value to me. Not live altering, but a nice thing. I don’t need more AI and shit on it, just those numbers.

Now, why the long intro? I am also the father of two kids and a husband. My family does like music a lot, heck, the big kid even has good taste, but their listening habits are quite different than mine. Which is absolutely fine of course. The kids especially want to have streaming, they are just not “generation album”. Sure, I do share my local library inside our home, that works. But again, taste. And convenience (and don’t tell me, I should setup yet another server somewhere, hosting yet another software to replicate “the could”, that just does not fly in my life anymore).

So, after a long thought we finally agreed on subscribing to Apple Music for family. I was very reluctant about how that could possibly affect my local library, but a test showed that Apple is indeed able to leave my collection alone. Why Apple and not Spotify? Easy: I’d rather buy every single track my kids wanna listen to than sent any money over there. Why not YouTube? Convenience and quality, also price.

Now, the kids are super happy, and my wife too. Which is essentially all that counts.

And me? I did have a look at Apple Music itself of course. It feels alien to me, I want to have a more structured view on content and artists, but I did listen to some stuff. Lossless quality is excellent, selection too. I made the mistake and listened to “Stumblin’ In” (to learn about the original of that German cover that did get some traction the last weeks again and wow, that’s the worst catchy tune of all time and I really want my prior life back, but alas). Anyhow, the service is ok.

Fast forward two weeks: I want to sync local macOS content to my iPhone, and get the play count and info from my phone onto my computer. They way things did work for more than 20 years.

The next two hours I find myself searching “iOS music app doesn’t sync plays and play counts and more to macOS with an active Apple Music subscription” and there are threads on discussions.apple.com several years old and it basically boils down to the fact that the moment you have an Apple Music subscription your previously working bidirectional macOS <> iOS sync turns into a one way street unless you bite the bullet and turn on full library sync which I absolutely don’t want. Sync doesn’t even work when I use CS music for playing stuff… Its that deep into the hole system.

The solution: I signed out on iOS from “media and purchases” (not from the Apple id itself). Apple Music will instantly disappear from the music app, next sync will work again as expected. I still see tracks I bought on the iTunes music store, I am still able to play over family share and via remote. My remaining few other app subscriptions still work. I still can buy new stuff, but are prompted now again for credentials.

I. Hate. This. Shit. Side. Of. “The cloud”.

If it wouldn’t be for the happiness of my loved ones, I would have unsubscribed in an instant from the service again This is my device. I paid money for it and the functionality it came with. I even pay money now for the subscription and all of sudden, it’s less than before. OKRs and KPI driven products go f*ck your self with a stick, for crying out loud, and start to listen to your users again for a change.

Filed in Apple, English posts | Comments (0) »

01-May-25

Maven build plan and execution order

Update: The plugin is now hosted on Mojohaus, has a new artefact id and thus is automatically available in Maven without to having to supply the group id. I updated the snippets below. The latest version displays also the versions of the plugins being used.

I recently stumbled upon the buildplan-maven-plugin and I find it quite useful

Run it on your project with

mvn buildplan:list

to list all steps in order and

mvn buildplan:list-phase

On the core project of Neo4js JDBC driver it looks like this:

Read the complete article »

Filed in English posts, Java | Comments (2) »

10-Mar-25

Stepping out of my comfort zone: No Rest for the Wicked

“Oh please no, Michael, not Sport content here”… Well, sorry, it’s my blog and right now I don’t have a much better place for it. Bear with me, though, there will be some cool database querying later on.

In late 2024 I stumbled upon No Rest for the Wicked 2025 and the idea is simple, even though the page describes it a bit complicated. The challenge is as follows:

Throughout February, do

Do 24 runs in total
Run once each hour of the day
Run once on each weekday
Each run (or walk / hike) must be at least 5km and at least 45 minutes
A break of at least 90 minutes between each activity

That sounds like “fun”, let’s go:

There’s a race result event that tracks the progress and we come to that later. I was in the top twenty at the time of writing, with one activity to go and finished eleventh.

The challenge was quite an emotional ride for me between “oh my gosh what did I sign up for?”, “I never manager” etc. My favorite hours of the day for running is actually anything between 6 and 10am. Afternoon is ok, evenings I really dislike, especially after long working hours sitting. I have a really odd way of putting my feet down while sitting, so in the evening my ankles are always kinda busted. Night running? I never tried. As a matter of fact, I try to avoid activities with a high heart rate in the evenings.

So there was a lot going on outside of my comfort zone and I am very content that I did this:

I felt good enough to go out in the middle of the freaking nights several times for running
Running in the night was a lot more satisfying than walking, the latter just to boring alone and I didn’t want to put on headphones
The silence was so nice, and so was the starry nights, quite unique, deeply satisfying experience
Prior to getting up and out it’s tough, doing it is uplifting then
One is capable of many things you never thought you are
Happy that I live in a place that I can go out every time without having to be afraid of things

Honestly, at the end of 2 weeks I felt less depressed and down than usual in February. The last 3 years I travelled for a week to the Canaries to get some sun and cycling in, which I couldn’t do this year for personal reasons. This challenge was a great antidote.

The only picture of myself I know of between 3 and 4am

So what does running here have to do with running database queries? Remember that Race result event above? I wanted to have an proper overview myself, without manual work and I created this:

My report of No Rest for the Wicked 2025

Looking at those number, would I do things differently? Yeah, I would probably not pushback the wee small hours between midnight and 3am to the end, but do them in the middle in two nights, in contrast spreading them over several days. I wasn’t sure if I would push through the challenge and my reasoning was that I didn’t want to fell for sunken costs. Anyhow, how did I create the above report? I went to connect.garmin.com, pulled down my activities as CSV for the past weeks, fired up DuckDB and ran that query:

WITH
  src AS (
    SELECT DISTINCT ON (Datum, Titel)
           *, EXTRACT('hour' FROM Zeit)*60*60 + EXTRACT('minute' FROM Zeit)*60 + EXTRACT('second' FROM Zeit) AS Duration
    FROM read_csv('*.csv', union_by_name=TRUE, filename=FALSE)
    WHERE Titel LIKE 'NRFTW%'
  ),
  activities AS (
    SELECT "Day #"      : dense_rank() OVER (ORDER BY Datum::DATE),
           "Day"        : Datum::DATE,
           "Run #"      : ROW_NUMBER() OVER (),
           "Break"      : age(Datum, COALESCE(date_add(lag(Datum) OVER starts, INTERVAL (lag(Duration) OVER starts) SECOND), Datum))::VARCHAR,
           "Hour of day": HOUR(Datum),
           "Weekday"    : dayname(Datum),
           "Sport"      : CASE Aktivitätstyp
                            WHEN 'Gehen'  THEN 'Walking'
                            WHEN 'Laufen' THEN 'Running'
                            ELSE Aktivitätstyp
                          END,
           "Distance"   : REPLACE(Distanz, ',', '.')::NUMERIC,
           "Duration"   : EXTRACT('hour' FROM Zeit)*60*60 + EXTRACT('minute' FROM Zeit)*60 + EXTRACT('second' FROM Zeit),
           "Progress"   : 100/24,
    FROM src
    WINDOW starts AS (ORDER BY Datum)
  ),
  sums AS (
    SELECT * REPLACE(
               round(SUM(Distance), 2)   AS Distance,
               to_seconds(SUM(Duration)) AS Duration,
               SUM(Progress) OVER dates  AS Progress
           ),
           "p_weekdays" : COUNT(DISTINCT Weekday) OVER (ORDER BY DAY GROUPS BETWEEN UNBOUNDED PRECEDING AND 1 PRECEDING),
           "Weekdays"   : COUNT(DISTINCT Weekday) OVER dates
    FROM activities
    GROUP BY GROUPING SETS (("Day #", "Run #", DAY, "Hour of day", Progress, Weekday, Break, Sport), ("Day #", DAY), ())
    WINDOW dates AS (ORDER BY DAY)
  )
SELECT * EXCLUDE(p_weekdays) REPLACE (
           Break::INTERVAL AS Break,
           CASE WHEN "Hour of day" IS NULL AND p_weekdays <> 7 AND Weekdays = 7 THEN '✅' END AS Weekdays,
           CASE
             WHEN DAY           IS NULL THEN lpad(printf('%.2f%%', Progress), 7, ' ')
             WHEN "Hour of day" IS NULL THEN lpad(printf('%.2f%%', Progress), 7, ' ') || ' ' || bar(Progress, 0, 100, 20) END AS Progress
         )
FROM sums
ORDER BY DAY, "Hour of day" NULLS LAST;

WITH src AS ( SELECT DISTINCT ON (Datum, Titel) *, extract('hour' FROM Zeit)*60*60 + extract('minute' FROM Zeit)*60 + extract('second' FROM Zeit) AS Duration FROM read_csv('*.csv', union_by_name=True, filename=False) WHERE Titel LIKE 'NRFTW%' ), activities AS ( SELECT "Day #" : dense_rank() OVER (ORDER BY Datum::DATE), "Day" : Datum::DATE, "Run #" : row_number() OVER (), "Break" : age(Datum, coalesce(date_add(lag(Datum) OVER starts, INTERVAL (lag(Duration) OVER starts) Second), Datum))::VARCHAR, "Hour of day": hour(Datum), "Weekday" : dayname(Datum), "Sport" : CASE Aktivitätstyp WHEN 'Gehen' THEN 'Walking' WHEN 'Laufen' THEN 'Running' ELSE Aktivitätstyp END, "Distance" : replace(Distanz, ',', '.')::NUMERIC, "Duration" : extract('hour' FROM Zeit)*60*60 + extract('minute' FROM Zeit)*60 + extract('second' FROM Zeit), "Progress" : 100/24, FROM src WINDOW starts AS (ORDER BY Datum) ), sums AS ( SELECT * REPLACE( round(sum(Distance), 2) AS Distance, to_seconds(sum(Duration)) AS Duration, sum(Progress) OVER dates AS Progress ), "p_weekdays" : count(DISTINCT Weekday) OVER (ORDER BY Day GROUPS BETWEEN UNBOUNDED PRECEDING AND 1 PRECEDING), "Weekdays" : count(DISTINCT Weekday) OVER dates FROM activities GROUP BY GROUPING SETS (("Day #", "Run #", Day, "Hour of day", Progress, Weekday, Break, Sport), ("Day #", Day), ()) WINDOW dates AS (ORDER BY Day) ) SELECT * EXCLUDE(p_weekdays) REPLACE ( Break::INTERVAL AS Break, CASE WHEN "Hour of day" IS NULL AND p_weekdays <> 7 AND Weekdays = 7 THEN '✅' END AS Weekdays, CASE WHEN Day IS NULL THEN lpad(printf('%.2f%%', Progress), 7, ' ') WHEN "Hour of day" IS NULL THEN lpad(printf('%.2f%%', Progress), 7, ' ') || ' ' || bar(Progress, 0, 100, 20) END AS Progress ) FROM sums ORDER BY Day, "Hour of day" NULLS LAST;

It’s so funny how much value one can get out of basically three columns alone. I am basically working only on the start date, the duration and the distance. Everything else are derived values. Things to note: The progress per day and not per run (the latter would always be 1/24 of course), the checkbox when I hit the weekdays and the summary of distance and duration per day and overall. The nice formatting of the break time is a sweet bonus (using the age function). The DuckDB team just released 1.2.0 and put out a couple of blog posts that I really dig and I was able to utilize in my script:

Vertical Stacking as the Relational Model Intended: UNION ALL BY NAME Great blog, great feature. Used it to union all the CSV files (they have had different columns, because some of them included cycling watts, some didn’t. See FROM read_csv('*.csv', union_by_name=True, filename=False) in the query)
Catching up with Windowing Again, great content and I am a bit jealous because I think it’s more complete than the explanation in our book. I used the GROUPS framing to figure out when I hit all the weekdays
Announcing DuckDB 1.2.0 New features in 1.2.0. I am mostly using the alternative projection syntax “X: U” compared to “U AS X” in the inner projection. It just reads better with long expressions.

Apart from that the query demonstrates again the power of window functions computing for example the break between activities as well as the day and activity numbers. It also makes extensive use of the addition to asterisk expression EXCLUDE and REPLACE avoiding juggling all the columns all over again. Also, take note of the GROUP BY with GROUPING SETS, which is the longer form of a GROUP BY ROLLUP that gives you control over the sets. Last bit is the sweet DISTINCT ON. I could just add more csv files, and this deduplicates not on all columns, but only on date and title.

This is what I like running: Challenges and algorithms locally, on my personal data. Those small and “crazy” challenges are so much nicer than the super commercialised side of running you find with the 6 star marathons or super halves. And the same thing applies to reports as the one I created here for myself–and through FLOSS and actually sharing how things are done–for others, too. For me being able to do this locally without any big vendor analytics is empowering and brings back a lot of joy that is missing today in many settings.

Challenge accepted and put to bed with walking and running roughly 175km in 24 hours total.

Filed in English posts | Comments (0) »

14-Feb-25

Let’s deadlock all the things.

Hej there, yes it’s me, I’m still alive and doing things, such as neo4j-jdbc, with automatic SQL to Cypher translations or building things like this for my own enjoyment. However, with the tons of meaningless posts and stuff around “Hey ma, look how I do AI with $tool”, I felt a bit out of time these days, just enjoying artisanal coding, requirements analysis and architectural work and didn’t have much incentive to write more prose. And giving the rest out for AI to swallow and regurgitate it, well… 🤷

In the last days however a rookie mistake I made gave me some food for thoughts and I write this post for anyone new in the Java ecosystem or new to programming in general: Regardless how much years you have acquired or how good you think you are, you always will make some stupid mistakes such as the following. Hopefully, you’re lucky as me and find yourself surrounded by colleagues who help you rolling a fix out. And yes, the blast radius of something that runs in $cloud is bigger than a bug in a client-side library, especially with the slow update cycles in many industries.

Enough talking, let’s go. I was refactoring some code to not use Java’s HttpURLConnection of old anymore. Here’s basically how it looked:

import java.net.HttpURLConnection;
import java.net.URI;
import java.nio.charset.StandardCharsets;
 
class S1 {
  public static void main(String... a) throws Exception {
    var uri = URI.create("http://randomapi.io");
    var body = "Hello";
 
    // Nice I get an output stream I just can write too, yeah…
    var connection = (HttpURLConnection) uri.toURL().openConnection();
    connection.setDoOutput(true);
    connection.setRequestMethod("POST");
    try (var out = connection.getOutputStream()) {
      // Reality: objectMapper.writeValue(out, Map.of("some", "content"));
      out.write(body.getBytes(StandardCharsets.UTF_8));
    }
    connection.connect();
  }
}

This works, but the URL connection is not a great API to work with. Also, it does not support HTTP/2 out of the box (or even not at all). The recommended replacement is the Java HttpClient. The same behaviour like above but with the HttpClient looks like this, defaulting to HTTP/2 by default:

import java.net.URI;
import java.net.http.HttpClient;
import java.net.http.HttpRequest;
import java.net.http.HttpRequest.BodyPublishers;
import java.net.http.HttpResponse.BodyHandlers;
import java.util.concurrent.Executors;
 
class S2 {
  public static void main(String... a) throws Exception {
    var uri = URI.create("http://randomapi.io");
    var body = "Hello";
 
    try (var client = HttpClient.newBuilder().executor(Executors.newVirtualThreadPerTaskExecutor()).build()) {
      var response = client
        .send(
          HttpRequest.newBuilder(uri).POST(BodyPublishers.ofString(body)).build(),
          BodyHandlers.ofString())
        .body();
    } catch (InterruptedException e) {
      Thread.currentThread().interrupt();
    }
  }
}

The request body is prepared by a BodyPublishers.ofString(body). There are quite a few others, such as from a byte array, files, reactive streams and more. What does not exist is a way to just get an output stream from the HTTP client one can write directly to, like the above example using URL connection. So assuming I am a Jackson user and I want to write some rather large JSON as request body, I would need to materialise this as a byte array in memory or deal with the shenanigans of ByteBuffer, which I really didn’t feel like. Instead, I thought, hey, there’s the pair of PipedInputStream and PipedOutputStream, let’s use a body publisher from an input stream supplier like the following:

var publisher = HttpRequest.BodyPublishers.ofInputStream(() -> {
  var in = new PipedInputStream();
  try (var out = new PipedOutputStream(in)) {
     outputStreamConsumer.accept(out);
   } catch (IOException e) {
     throw new UncheckedIOException(e);
   }
   return in;
});

Easy, right? While the pair is made exactly for the above purpose, the usage is 100% wrong and the JavaDoc is quite explicit with this:

Attempting to use both objects from a single thread is not recommended, as it may deadlock the thread. The piped input stream contains a buffer, decoupling read operations from write operations, within limits.

I did actually read this, set this up like above nevertheless, tested it and it “worked”. I thought, yeah, the HttpClient is asynchronous under the hood anyway, let’s go. Stupid me, as I did use its blocking API and ofc, reading and writing did occur on the same thread of course. Why did it not show up in tests? Rookie mistake number two, not testing proper extreme, like 0, -1, Integer.MAX_VALUE, a lizard and null. Within limits means: There’s a default buffer of 1024 bytes, any payload bigger than coming from anywhere into the request body would blow up.

Do this often enough, and you have a procedure that happily drains all the available threads from a system by deadlocking them, congratulations.

What would have prevented that? Taking a step down from mount stupid to begin with. A second pair of eyes. Better testing. Would AI have helped me here? I am unsure. Maybe in generating test input. On API usage: Most likely not. It’s all correct, and Java has no way in the language to make me aware. Maybe an IDE with a proper base model for semantic analysis can spot it, too. IntelliJ does not see it.

How did I solve it in the end? I was not happy using a byte[] array, so I really wanted to have that pipe and a working solution looks like this:

import java.io.IOException;
import java.io.PipedInputStream;
import java.io.PipedOutputStream;
import java.io.UncheckedIOException;
import java.net.URI;
import java.net.http.HttpClient;
import java.net.http.HttpRequest;
import java.net.http.HttpRequest.BodyPublishers;
import java.net.http.HttpResponse.BodyHandlers;
import java.nio.charset.StandardCharsets;
 
class S3 {
  public static void main(String... a) throws Exception {
    var uri = URI.create("http://randomapi.io");
    var body = "Hello";
 
    try (var client = HttpClient.newBuilder().build()) {
      var bodyPublisher = BodyPublishers.ofInputStream(() -> {
        var in = new PipedInputStream();
        var out = new PipedOutputStream();
        try {
          out.connect(in);
        } catch (IOException e) {
          throw new UncheckedIOException(e);
        }
        Thread.ofVirtual().start(() -> {
          try (out) {
            // Here the stream can be passed to Jackson or whatever you have 
            // that's generating output
            out.write(body.getBytes(StandardCharsets.UTF_8));
          } catch (IOException e) {
            throw new UncheckedIOException(e);
          }
        });
        return in;
      });
      var response = client
        .send(
          HttpRequest.newBuilder(uri).POST(bodyPublisher).build(),
          BodyHandlers.ofString()
        ).body();
    } catch (InterruptedException e) {
      Thread.currentThread().interrupt();
    }
  }
}

The important pieces here are:

Create the input and output streams independently
Connect them outside both the reading and writing thread
Kick of the writing thread, and since we are on modern Java, just use a virtual thread for it. In the end, they are exactly for that purpose: To be used when there’s some potentially block things happening
Just return the input stream to the HTTP Client, it will take care of using and later closing it.
Close the output stream inside the writing thread (note the try-with-resources statement that “imports” the stream)

A runnable solution is below. I am using an explicitly declared main class here, so that I can have a proper Java script without the boiler plate of old. It brings up an HTTP server for testing as well, that just mirrors its input. It than uses both methods described above to POST to that server, get the response and asserts it. Use Java 24 to run it with java -ea --enable-preview Complete.java.

Fun was had the last days.

import java.io.BufferedInputStream;
import java.io.IOException;
import java.io.PipedInputStream;
import java.io.PipedOutputStream;
import java.io.UncheckedIOException;
import java.net.HttpURLConnection;
import java.net.InetSocketAddress;
import java.net.URI;
import java.net.http.HttpClient;
import java.net.http.HttpRequest;
import java.net.http.HttpRequest.BodyPublishers;
import java.net.http.HttpResponse.BodyHandlers;
import java.nio.charset.StandardCharsets;
import java.security.SecureRandom;
import java.util.Base64;
import java.util.concurrent.Executors;
 
import com.sun.net.httpserver.HttpServer;
 
void main() throws IOException {
  var port = 4711;
  var server = HttpServer.create(new InetSocketAddress(port), 0);
  try {
    server.createContext("/mirror", exchange -> {
      byte[] response;
      try (var in = new BufferedInputStream(exchange.getRequestBody())) {
        response = in.readAllBytes();
      }
 
      exchange.sendResponseHeaders(HttpURLConnection.HTTP_OK, response.length);
      var outputStream = exchange.getResponseBody();
      outputStream.write(response);
      outputStream.flush();
      outputStream.close();
    });
    server.setExecutor(null);
    server.start();
 
    var secureRandom = new SecureRandom();
    var buffer = new byte[(int) Math.pow(2, 16)];
    secureRandom.nextBytes(buffer);
    var requestBody = Base64.getUrlEncoder().withoutPadding().encodeToString(buffer);
 
    var uri = URI.create("http://localhost:%d/mirror".formatted(port));
 
    var responseBody = useUrlConnection(uri, requestBody);
    assert responseBody.equals(requestBody);
    responseBody = useHttpClient(uri, requestBody);
    assert responseBody.equals(requestBody);
  } finally {
    server.stop(0);
  }
}
 
private static String useUrlConnection(URI uri, String body) throws IOException {
  var urlConnection = (HttpURLConnection) uri.toURL().openConnection();
  urlConnection.setDoOutput(true);
  urlConnection.setRequestMethod("POST");
  try (var out = urlConnection.getOutputStream()) {
    out.write(body.getBytes(StandardCharsets.UTF_8));
  }
  urlConnection.connect();
  try (var in = new BufferedInputStream(urlConnection.getInputStream())) {
    return new String(in.readAllBytes(), StandardCharsets.UTF_8);
  }
}
 
private static String useHttpClient(URI uri, String body) throws IOException {
  try (var client = HttpClient.newBuilder().executor(Executors.newVirtualThreadPerTaskExecutor()).build()) {
    return client.send(HttpRequest.newBuilder(uri).POST(
      BodyPublishers.ofInputStream(() -> {
        var in = new PipedInputStream();
        //noinspection resource
        var out = new PipedOutputStream();
        try {
          out.connect(in);
        } catch (IOException e) {
          throw new UncheckedIOException(e);
        }
        Thread.ofVirtual().start(() -> {
          try (out) {
            out.write(body.getBytes(StandardCharsets.UTF_8));
          } catch (IOException e) {
            throw new UncheckedIOException(e);
          }
        });
        return in;
      })).build(), BodyHandlers.ofString()).body();
  } catch (InterruptedException e) {
    Thread.currentThread().interrupt();
    throw new IOException(e.getMessage());
  }
}

import java.io.BufferedInputStream; import java.io.IOException; import java.io.PipedInputStream; import java.io.PipedOutputStream; import java.io.UncheckedIOException; import java.net.HttpURLConnection; import java.net.InetSocketAddress; import java.net.URI; import java.net.http.HttpClient; import java.net.http.HttpRequest; import java.net.http.HttpRequest.BodyPublishers; import java.net.http.HttpResponse.BodyHandlers; import java.nio.charset.StandardCharsets; import java.security.SecureRandom; import java.util.Base64; import java.util.concurrent.Executors; import com.sun.net.httpserver.HttpServer; void main() throws IOException { var port = 4711; var server = HttpServer.create(new InetSocketAddress(port), 0); try { server.createContext("/mirror", exchange -> { byte[] response; try (var in = new BufferedInputStream(exchange.getRequestBody())) { response = in.readAllBytes(); } exchange.sendResponseHeaders(HttpURLConnection.HTTP_OK, response.length); var outputStream = exchange.getResponseBody(); outputStream.write(response); outputStream.flush(); outputStream.close(); }); server.setExecutor(null); server.start(); var secureRandom = new SecureRandom(); var buffer = new byte[(int) Math.pow(2, 16)]; secureRandom.nextBytes(buffer); var requestBody = Base64.getUrlEncoder().withoutPadding().encodeToString(buffer); var uri = URI.create("http://localhost:%d/mirror".formatted(port)); var responseBody = useUrlConnection(uri, requestBody); assert responseBody.equals(requestBody); responseBody = useHttpClient(uri, requestBody); assert responseBody.equals(requestBody); } finally { server.stop(0); } } private static String useUrlConnection(URI uri, String body) throws IOException { var urlConnection = (HttpURLConnection) uri.toURL().openConnection(); urlConnection.setDoOutput(true); urlConnection.setRequestMethod("POST"); try (var out = urlConnection.getOutputStream()) { out.write(body.getBytes(StandardCharsets.UTF_8)); } urlConnection.connect(); try (var in = new BufferedInputStream(urlConnection.getInputStream())) { return new String(in.readAllBytes(), StandardCharsets.UTF_8); } } private static String useHttpClient(URI uri, String body) throws IOException { try (var client = HttpClient.newBuilder().executor(Executors.newVirtualThreadPerTaskExecutor()).build()) { return client.send(HttpRequest.newBuilder(uri).POST( BodyPublishers.ofInputStream(() -> { var in = new PipedInputStream(); //noinspection resource var out = new PipedOutputStream(); try { out.connect(in); } catch (IOException e) { throw new UncheckedIOException(e); } Thread.ofVirtual().start(() -> { try (out) { out.write(body.getBytes(StandardCharsets.UTF_8)); } catch (IOException e) { throw new UncheckedIOException(e); } }); return in; })).build(), BodyHandlers.ofString()).body(); } catch (InterruptedException e) { Thread.currentThread().interrupt(); throw new IOException(e.getMessage()); } }

Filed in Java | Comments (3) »

05-Feb-25

Some thoughts about user defined functions (UDF) in databases

Last week I came across a Twitter thread about using stored procedures or in other words, user defined functions in databases.

Every myth about them has already been debunked like 13 years ago in this excellent post: Mythbusters: Stored Procedures Edition. I can completely get behind it, but also like some personal thoughts and share some memories:

In the past I worked for years, in case over a decade, on big systems in the utilities sector. IHL evolved from an Oracle Forms 6i application into a Java desktop application and from what I hear, into a web application in the last 24 years. While the clients changed, the database model was pretty consistent through the years (A relational model, actually, and maybe you understand why this paper under the title What comes around by Andrew Pavlo and Michael Stonebreaker resonates with me, even though I work at Neo4j these days).
Even in first decade of the 2000s we followed the Pink database paradigm, even though we didn’t call it that way. A lot of our API schema was enriched or even defined by various stored procedures, written in PL/SQL and *gosh* Java (yes, Oracle database could do this even back in 2010, these days powered by GraalVM, you can even run JavaScript proper in the thing). Why was this useful? We could have a lot of complicated computations, such as “if that power rod drops, in what radius will it fall, where will the lines end, does it kill a cow or hit a Kindergarten?”, and use the result straight in a view without having first grab all the necessary data, compute, write back… Furthermore it allowed us to move quite quickly all the time: Instead of rolling out new client versions all the time, we could just role out improved PL/SQL code, without any downtime.

The thing I find most funny is that one of the loudest argument against stored procedures is “but you cannot test them…”: Of course you can. But, my main point: I would bet money on it, that the people who bring this argument are by some share the same who don’t write tests, because $reasons, such as “my manager does not allow”, “no time”, etc…

Of course, when you fell for the “we need to rewrite our app every year, replace all the frameworks each quarter”, than none of my ranting, the myth buster above or the paper linked will convince you otherwise, that are might be a different, more stable solution and I can keep on using this slide more often in the future.

Anyway, in this week I realized that the 1.x release of DuckDB also supports UDFs, or Macros as they call them and I was like sh*t, we don’t have it in the book… Maybe a good opportunity to set a reminder for a 2nd edition, if we ever sell that many… Nevertheless, I needed to try this… And what is a better goal than my sports page, that received its 3rd major overhaul since 2014: biking.michael-simons.eu/. I wanted to add my age group to my favorite achievements. In Germany, there are some halfway complicated rules to compute it, and I did so in a stored procedure. That diff creates the function, calls it in a view (which defines the API, and the only thing I have to touch in the app is the actual view layer, as the actual client code is just one statement that couldn’t be simpler (con.execute('FROM v_reoccurring_events').fetchall()) and I can now flex a bit like this:

What about Neo4j? I am so glad you asked: Of course we offer a framework to enhance our query language Cypher, with User-defined procedures and I happen to own the example repository: neo4j-procedure-template, which includes a proper test-setup (btw, one of the *rare* cases, I would actually *not* use Testcontainers, but our dedicated testing framework, because the overall turnaround will be faster (you skip the repackaging)).

For Neo4j enhance ability is key: We do evolve Cypher (and GQL) with great care, but often not as fast as a customer, the necessities of a deal or a support case might require. Neo4j APOC started as a set of curated functions and procedures and we promoted it ever since. From Neo4j 5 onwards, it even became in parts as much as a standard library, delivered with the project.

Is there something I’d personally would add? Actually, yes: We only have compiled procedures at the moment, written in any JVM language. This is ok to some extend, but I’d love to see an enhancement here that I could *just* do a create or replace function in our database as well. If it would be any of the seasonal holidays that bring gifts, maybe even in something else than Java. A while back I started an experiment using GraalVM for that as Neo4j polyglot stored procedures and I would love to revive that at some point.

My little rant does not even come close to these two I read last month (1, 2), but I hope you enjoyed it nevertheless. If you take one thing away: There’s always more than one solution to a problem. Make better use of a database product, you most likely pay for, is a proper good idea. Being it either optimizing your model (there are probably a ton of books on SQL, less so for Neo4j, hence I really can recommend The definitive Guide to Neo4j, by my friends Christophe and Luanne, it has excellent content on modeling), understanding its query language and capabilities better or enhancing it to your needs, will be beneficial in the long run.

To quote one of the rants above: ~~Happy~~ Apathetic coding and don’t forget to step outside for a proper run sometime.

Title picture by Jan Antonin Kolar on Unsplash.

Filed in English posts | Comments (1) »

06-Jul-24