Some thoughts about user defined functions (UDF) in databases

Everything has been said, but not by me and some might call this post a rant
July 6, 2024 by Michael

Last week I came across a Twitter thread about using stored procedures or in other words, user defined functions in databases.

Every myth about them has already been debunked like 13 years ago in this excellent post: Mythbusters: Stored Procedures Edition. I can completely get behind it, but also like some personal thoughts and share some memories:

In the past I worked for years, in case over a decade, on big systems in the utilities sector. IHL evolved from an Oracle Forms 6i application into a Java desktop application and from what I hear, into a web application in the last 24 years. While the clients changed, the database model was pretty consistent through the years (A relational model, actually, and maybe you understand why this paper under the title What comes around by Andrew Pavlo and Michael Stonebreaker resonates with me, even though I work at Neo4j these days).
Even in first decade of the 2000s we followed the Pink database paradigm, even though we didn’t call it that way. A lot of our API schema was enriched or even defined by various stored procedures, written in PL/SQL and *gosh* Java (yes, Oracle database could do this even back in 2010, these days powered by GraalVM, you can even run JavaScript proper in the thing). Why was this useful? We could have a lot of complicated computations, such as “if that power rod drops, in what radius will it fall, where will the lines end, does it kill a cow or hit a Kindergarten?”, and use the result straight in a view without having first grab all the necessary data, compute, write back… Furthermore it allowed us to move quite quickly all the time: Instead of rolling out new client versions all the time, we could just role out improved PL/SQL code, without any downtime.

The thing I find most funny is that one of the loudest argument against stored procedures is “but you cannot test them…”: Of course you can. But, my main point: I would bet money on it, that the people who bring this argument are by some share the same who don’t write tests, because $reasons, such as “my manager does not allow”, “no time”, etc…

Of course, when you fell for the “we need to rewrite our app every year, replace all the frameworks each quarter”, than none of my ranting, the myth buster above or the paper linked will convince you otherwise, that are might be a different, more stable solution and I can keep on using this slide more often in the future.

Anyway, in this week I realized that the 1.x release of DuckDB also supports UDFs, or Macros as they call them and I was like sh*t, we don’t have it in the book… Maybe a good opportunity to set a reminder for a 2nd edition, if we ever sell that many… Nevertheless, I needed to try this… And what is a better goal than my sports page, that received its 3rd major overhaul since 2014: biking.michael-simons.eu/. I wanted to add my age group to my favorite achievements. In Germany, there are some halfway complicated rules to compute it, and I did so in a stored procedure. That diff creates the function, calls it in a view (which defines the API, and the only thing I have to touch in the app is the actual view layer, as the actual client code is just one statement that couldn’t be simpler (con.execute('FROM v_reoccurring_events').fetchall()) and I can now flex a bit like this:



What about Neo4j? I am so glad you asked: Of course we offer a framework to enhance our query language Cypher, with User-defined procedures and I happen to own the example repository: neo4j-procedure-template, which includes a proper test-setup (btw, one of the *rare* cases, I would actually *not* use Testcontainers, but our dedicated testing framework, because the overall turnaround will be faster (you skip the repackaging)).

For Neo4j enhance ability is key: We do evolve Cypher (and GQL) with great care, but often not as fast as a customer, the necessities of a deal or a support case might require. Neo4j APOC started as a set of curated functions and procedures and we promoted it ever since. From Neo4j 5 onwards, it even became in parts as much as a standard library, delivered with the project.

Is there something I’d personally would add? Actually, yes: We only have compiled procedures at the moment, written in any JVM language. This is ok to some extend, but I’d love to see an enhancement here that I could *just* do a create or replace function in our database as well. If it would be any of the seasonal holidays that bring gifts, maybe even in something else than Java. A while back I started an experiment using GraalVM for that as Neo4j polyglot stored procedures and I would love to revive that at some point.

My little rant does not even come close to these two I read last month (1, 2), but I hope you enjoyed it nevertheless. If you take one thing away: There’s always more than one solution to a problem. Make better use of a database product, you most likely pay for, is a proper good idea. Being it either optimizing your model (there are probably a ton of books on SQL, less so for Neo4j, hence I really can recommend The definitive Guide to Neo4j, by my friends Christophe and Luanne, it has excellent content on modeling), understanding its query language and capabilities better or enhancing it to your needs, will be beneficial in the long run.

To quote one of the rants above: Happy Apathetic coding and don’t forget to step outside for a proper run sometime.

Title picture by Jan Antonin Kolar on Unsplash.

No comments yet

One Trackback/Pingback
  1. Java Weekly, Issue 550 | Baeldung on July 11, 2024 at 10:35 PM

    […] >> Some thoughts about user defined functions (UDF) in databases [michael-simons.eu] […]

Post a Comment

Your email is never published. We need your name and email address only for verifying a legitimate comment. For more information, a copy of your saved data or a request to delete any data under this address, please send a short notice to michael@simons.ac from the address you used to comment on this entry.
By entering and submitting a comment, wether with or without name or email address, you'll agree that all data you have entered including your IP address will be checked and stored for a limited time by Automattic Inc., 60 29th Street #343, San Francisco, CA 94110-4929, USA. only for the purpose of avoiding spam. You can deny further storage of your data by sending an email to support@wordpress.com, with subject “Deletion of Data stored by Akismet”.
Required fields are marked *