An HTTP api for analytic queries

One use case for advanced SQL queries with jOOQ
November 2, 2016 by Michael

This post has been featured on This Week in Spring – November 8, 2016 and I’m so happy to be referred as “friend of the Spring community” 🙂

So, you have read through the previous parts of this series (1, 2 and 3) and you now have a Spring Boot project with jOOQ enabled, an Oracle Database with a simple schema containing musical data. What can you do with it?

Content

  1. An introductory example
  2. Window functions with jOOQ
  3. Common table expressions, window functions and Java expressions combined
  4. Summary and use cases

In the previous post I had a simple select, getting all artists from my library ordered by name. For queries like that I wouldn’t bother investigating into jOOQ. I have written extensively about my use of Hibernate and summarized my thoughts in this interview. In short: You can come very far with JPA / Hibernate, you can even create portable JPQL queries that support some analytics, as Thorben shows. Thorben also has a nice little tool that helps deciding wether JPA / Hibernate is the right tool for the job, checkout it out: Should you use JPA for your next project?

Imagine the following use cases based on this schema:


doag2016_schema
  • I want a list of top n albums by a list of artists ordered by play count
  • I want a list of artists and how many times I played a track by them cumulative by day
  • Given a specific months, I want a top n list of tracks played in this month ordered by play count and also including the change in result to the previous month

You can try all examples yourself by cloning this repository: github.com/michael-simons/DOAG2016 and running those steps:

./mvnw docker:start
./mvnw clean package
./mvnw spring-boot:run

Please read through the links at the top of this post to grok each step.

An introductory example

I want a list of top n albums by a list of artists ordered by play count

For the younger generation: An album was a coherent collection of tracks that you often listened from the first to the last track. You could buy those on CDs or LPs, some even owned compact cassettes.

The task looks simple but if you look closely, I haven’t bothered normalizing the tracks table into 2nf in regard to the “album” column. Stuff like this exists! The specific reason the schema above looks like this is that I just was to lazy to normalize it all the way. I would have at least adding the artists to albums as well but even then album is non unique in some cases. Anyway.

Writing it down in SQL is an easy task:

SELECT 
  tracks.album, 
  COUNT(*) AS cnt
FROM plays
  JOIN tracks
  ON plays.track_id = tracks.id
WHERE tracks.artist_id IN (54, 86)
GROUP BY 
  tracks.artist_id, 
  tracks.album
ORDER BY 
  cnt DESC, 
  tracks.album ASC

It should be a bigger problem writing that down in JPQL. But then you would have to use a kind of projection, a custom object to be instantiated and mapped through ResultSet mappings or constructor call inside the query itself. Not a nice outlook.

How does this look with jOOQ? Having the imports in place, that is: The generated Tables and there records, it pretty much looks identical:

import static ac.simons.doag2016.db.Tables.ARTISTS;
import static ac.simons.doag2016.db.Tables.PLAYS;
import static ac.simons.doag2016.db.Tables.TRACKS;
import static org.jooq.impl.DSL.count;
 
DSLContext create;
 
create
      .select(TRACKS.ALBUM,
              count())
      .from(PLAYS)
      .join(TRACKS).onKey()
      .where(TRACKS.ARTIST_ID.in(BigDecimal.valueOf(54), BigDecimal.valueOf(86)))
      .groupBy(TRACKS.ARTIST_ID, TRACKS.ALBUM)
      .orderBy(count().desc(), TRACKS.ALBUM.asc())

So far nothing got executed. At this point you also can use #getSQL to just retrieve the generated SQL. #fetch will get you a result of records, any of the overloaded methods allow you to retrieve just the columns you need into a record. Note that no mapping to concrete objects is necessary at this point! You can however use generated POJOs or my idea of providing RecordMapper from a Spring Context.

The result you get from calling #fetch has some nice additional methods that all start with “format” and we are gonna use one of those to get useful representation. As the post is called “An HTTP api for analytic queries” I’m not targeting a real REST api speaking of resources but a representation of analytic queries that is useful for a well defined set of analytic dashboards and such. Looking from that perspective, one may call those queries themselves resources.

jOOQ offers formatJSON to get an actually useful JSON result. We are not gonna use the method that returns a string and pass that as the response value from our Spring Boot controller but instead write the contend directly to the response, avoid a possible overhead from a String to JSON converter which isn’t necessary.

@Controller
@RequestMapping("/api/artists")
public class ArtistsReportController {
    @RequestMapping(path = "/{artistIds}/topNAlbums")
    public void getTopNAlbums(            
            final HttpServletResponse response
    ) throws IOException {
        response.setContentType(MediaType.APPLICATION_JSON_UTF8_VALUE);
        this.create
                .select(TRACKS.ALBUM,
                        count())
                .from(PLAYS)
                .join(TRACKS).onKey()
                .where(TRACKS.ARTIST_ID.in(BigDecimal.valueOf(54), BigDecimal.valueOf(86)))                
                .groupBy(TRACKS.ARTIST_ID, TRACKS.ALBUM)
                .orderBy(count().desc(), TRACKS.ALBUM.asc())                
                .fetch()
                .formatJSON(response.getOutputStream());
    }
}

This Spring @Controller is your first analytical Spring Boot jOOQ SQL api and delivers a result like this:

{
  "fields": [
    {
      "schema": "",
      "table": "TRACKS",
      "name": "ALBUM",
      "type": "VARCHAR"
    },
    {
      "name": "count",
      "type": "INTEGER"
    }
  ],
  "records": [
    [
      "Everything Louder Than Everyone Else",
      151
    ],
    [
      "BBC Live & In-Session",
      68
    ],
    [
      "Clean Your Clock",
      64
    ],
    [
      "Live At Brixton Academy",
      49
    ],
    [
      "No Sleep 'Til Hammersmith",
      47
    ],
    [
      "Bastards",
      36
    ]
  ]
}

Congratulations!

But wouldn’t that be boring, no limits, no possibility to query stuff? Let’s introduce both: A query that allows restriction to certain dates and certain artists ids as well as limiting the query result:

Getting the parameters into the controller is super easy with Spring Boot. You’ll see @PathVariable which matches an array of ids, labelled “artistIds” inside the request mapping. You can call this URL with a comma separated list of values. Then you’ll find several parameters annotated with @RequestParam. Those are query parameters that even get converted in a usable format, as you can see for the date parameters final Optional<LocalDate> from and to:

@RequestMapping(path = "/{artistIds}/topNAlbums")
public void getTopNAlbums(
        @PathVariable final BigDecimal[] artistIds,
        @RequestParam(defaultValue = "10") final int n,
        @RequestParam
        @DateTimeFormat(iso = ISO.DATE)
        final Optional<LocalDate> from,
        @RequestParam
        @DateTimeFormat(iso = ISO.DATE)
        final Optional<LocalDate> to,
        final HttpServletResponse response
) throws IOException {}

You get those into your jOOQ query like this with quite some cool things to discover: The array of Ids can be passed to a “in()” clause as is… Without having to fear SQL injections anywhere. The worst thing that could happen is a type mismatch exception at controller level.

For a correct query on the days, I want to truncate the played on date to the day. Easy to be done with SQL, even easier with jOOQ as a can reference that expression as a variable PLAYED_ON_TRUNCATED_TO_DAY. And last but not least, you’ll see how nicely Java 8 optionals fit into such an expression: If the date is set create a greaterOrEqual expression, otherwise just a simple true condition:

final Field<Date> PLAYED_ON_TRUNCATED_TO_DAY = trunc(PLAYS.PLAYED_ON, DatePart.DAY).cast(Date.class);
this.create
    .select(TRACKS.ALBUM,
            count())
    .from(PLAYS)
    .join(TRACKS).onKey()
    .where(TRACKS.ARTIST_ID.in(artistIds))
    .and(from.map(Date::valueOf)
            .map(PLAYED_ON_TRUNCATED_TO_DAY::greaterOrEqual)
            .orElseGet(DSL::trueCondition)
    )
    .and(to.map(Date::valueOf)
            .map(PLAYED_ON_TRUNCATED_TO_DAY::lessOrEqual)
            .orElseGet(DSL::trueCondition)
    )
    .groupBy(TRACKS.ARTIST_ID, TRACKS.ALBUM)
    .orderBy(count().desc(), TRACKS.ALBUM.asc())
    .limit(n)
    .fetch()
    .formatJSON(response.getOutputStream());

Find the whole thing inside the demo repository in the ArtistReportController.

If you call this api with:

curl -X "GET" "http://127.0.0.1:8080/api/artists/54,86/topNAlbums?n=3&from=2016-01-03&to=2016-02-03"

Who do you know that jOOQ generates the right query? If you provide a bean for your boot app defining the jOOQ config like this:

@Configuration
public class JooqSpringBootConfig {
 
    @Bean
    public org.jooq.Configuration jooqConfig(
            final ConnectionProvider connectionProvider,
            final TransactionProvider transactionProvider,
            final ExecuteListenerProvider executeListenerProvider,            
            @Value("${jooq.renderFormatted:false}") final boolean renderFormatted
    ) {
        final DefaultConfiguration hlp = new DefaultConfiguration();
        return hlp
                .derive(hlp.settings()
                        .withRenderNameStyle(RenderNameStyle.LOWER)
                        .withRenderKeywordStyle(RenderKeywordStyle.UPPER)
                        .withRenderFormatted(renderFormatted)
                )
                .derive(connectionProvider)
                .derive(transactionProvider)
                .derive(executeListenerProvider)
                .derive(SQLDialect.ORACLE);
    }
}

and setting log level and formatting like this in application.properties

logging.level.org.jooq = DEBUG
jooq.renderFormatted = true

jOOQ does a wonderful job of presenting your query with and without bind parameters:

SELECT 
  v0 album, 
  v1 COUNT
FROM (
  SELECT 
    x.v0, 
    x.v1, 
    rownum rn
  FROM (
    SELECT 
      tracks.album v0, 
      COUNT(*) v1
    FROM plays
      JOIN tracks
      ON plays.track_id = tracks.id
    WHERE (
      tracks.artist_id IN (
        ?, ?
      )
      AND CAST(TRUNC(plays.played_on, 'DD') AS DATE) >= ?
      AND CAST(TRUNC(plays.played_on, 'DD') AS DATE) <= ?
    )
    GROUP BY 
      tracks.artist_id, 
      tracks.album
    ORDER BY 
      v1 DESC, 
      v0 ASC
 
  ) x
  WHERE rownum <= (0 + ?)
)
WHERE rn > 0
ORDER BY rn

and with bind parameters:

SELECT 
  v0 album, 
  v1 COUNT
FROM (
  SELECT 
    x.v0, 
    x.v1, 
    rownum rn
  FROM (
    SELECT 
      tracks.album v0, 
      COUNT(*) v1
    FROM plays
      JOIN tracks
      ON plays.track_id = tracks.id
    WHERE (
      tracks.artist_id IN (
        54, 86
      )
      AND CAST(TRUNC(plays.played_on, 'DD') AS DATE) >= DATE '2016-01-03'
      AND CAST(TRUNC(plays.played_on, 'DD') AS DATE) <= DATE '2016-02-03'
    )
    GROUP BY 
      tracks.artist_id, 
      tracks.album
    ORDER BY 
      v1 DESC, 
      v0 ASC
 
  ) x
  WHERE rownum <= (0 + 3)
)
WHERE rn > 0
ORDER BY rn

See the effort Lukas put into making limiting work in Oracle?

Window functions with jOOQ

I want a list of artists and how many times I played a track by them cumulative by month

The second use case. I want a list Something you could achieve with client side programming working on the result set of a query very similar to the above, having group the plays by month and artist and keep a running total. Throw some nice lambdas in and you’re done.

But, there are window functions. From one of the best explanations I have found, Understanding Window Functions:

Moving window

The whole idea behind window functions is to allow you to process several values of the result set at a time: you see through the window some peer rows and are able to compute a single output value from them, much like when using an aggregate function.

I have heard a lot of Motörhead after Lemmy passing away by the end of 2015. I want to know how many times I heard Motörhead on January 1st, how many times on 2nd, 3rd and so on in total, like 5 times, than additional 10 times (makes 15) and another 7 (should show as 22). Result should look like this:

curl -X "GET" "http://127.0.0.1:8080/api/artists/54,86/cumulativePlays?from=2016-01-01&to=2016-01-07"
{
  "fields": [
    {
      "name": "cast",
      "type": "DATE"
    },
    {
      "schema": "",
      "table": "ARTISTS",
      "name": "ARTIST",
      "type": "VARCHAR"
    },
    {
      "name": "cumulativePlays",
      "type": "NUMERIC"
    }
  ],
  "records": [
    [
      "2016-01-01",
      "Motörhead",
      48
    ],
    [
      "2016-01-02",
      "Motörhead",
      50
    ],
    [
      "2016-01-03",
      "Motörhead",
      116
    ],
    [
      "2016-01-04",
      "Motörhead",
      129
    ],
    [
      "2016-01-05",
      "Motörhead",
      151
    ],
    [
      "2016-01-06",
      "Motörhead",
      183
    ],
    [
      "2016-01-07",
      "Motörhead",
      195
    ]
  ]
}

To make it a bit more interesting, I’ve added the additional option to query multiple artists. So, what do we have? I have added some comments to the query:

@RequestMapping(path = "/{artistIds}/cumulativePlays")
public void getCumulativePlays(
        @PathVariable final BigDecimal[] artistIds,
        @RequestParam
        @DateTimeFormat(iso = ISO.DATE)
        final Optional<LocalDate> from,
        @RequestParam
        @DateTimeFormat(iso = ISO.DATE)
        final Optional<LocalDate> to,
        final HttpServletResponse response
) throws IOException {
    response.setContentType(MediaType.APPLICATION_JSON_UTF8_VALUE);
    this.create
            // What to select
            .select(
                    // The play date, truncated to the day as explained above
                    PLAYED_ON_TRUNCATED_TO_DAY,
                    // The artists
                    ARTISTS.ARTIST,
                    // The sum of the play counts in different partitions
                    sum(count())
                        .over( // This is where the definition of the window begins
                               // without further restriction the window moves from
                               // the first result up and including to the current
                            partitionBy(ARTISTS.ARTIST)  // The window has to move through multiple partions
                                                         // here, the artist as I want each cumulated playcount separately 
                                .orderBy(PLAYED_ON_TRUNCATED_TO_DAY) // The window entries are ordered by playdate 
                        ).as("cumulativePlays")
            )
            .from(PLAYS)
            .join(TRACKS).onKey()
            .join(ARTISTS).onKey()
            .where(ARTISTS.ID.in(artistIds))
            .and(from.map(Date::valueOf)
                    .map(PLAYED_ON_TRUNCATED_TO_DAY::greaterOrEqual)
                    .orElseGet(DSL::trueCondition)
            )
            .and(to.map(Date::valueOf)
                    .map(PLAYED_ON_TRUNCATED_TO_DAY::lessOrEqual)
                    .orElseGet(DSL::trueCondition)
            )
            .groupBy(PLAYED_ON_TRUNCATED_TO_DAY, ARTISTS.ARTIST)
            .orderBy(PLAYED_ON_TRUNCATED_TO_DAY, ARTISTS.ARTIST)
            .fetch()
            .formatJSON(response.getOutputStream());
}

Important thing here is that jOOQ not only makes it particularly easy to use those standard SQL methods but it also helps me to get them right. I could hide this beautiful query inside an annotation passing it to the JPA provider of my choice, but than I would not only have no compile time checking, I would have to deal with the result set on my own.

If you query the URL above, this is the query which gets generated and executed by jOOQ as formatted by jOOQ:

SELECT 
  CAST(TRUNC(plays.played_on, 'DD') AS DATE), 
  artists.artist, 
  SUM(COUNT(*)) OVER (PARTITION BY artists.artist ORDER BY CAST(TRUNC(plays.played_on, 'DD') AS DATE) ASC) cumulativeplays
FROM plays
  JOIN tracks
  ON plays.track_id = tracks.id
  JOIN artists
  ON tracks.artist_id = artists.id
WHERE (
  artists.id IN (
    54, 86
  )
  AND CAST(TRUNC(plays.played_on, 'DD') AS DATE) >= DATE '2016-01-01'
  AND CAST(TRUNC(plays.played_on, 'DD') AS DATE) <= DATE '2016-01-07'
)
GROUP BY 
  CAST(TRUNC(plays.played_on, 'DD') AS DATE), 
  artists.artist
ORDER BY 
  CAST(TRUNC(plays.played_on, 'DD') AS DATE) ASC, 
  artists.artist ASC

Believe it or not, even in the time of micro services and the like, there is still a need for adhoc queries against databases. I really like the fact that I can pass this along.

Common table expressions, window functions and Java expressions combined

Given a specific months, I want a top n list of tracks played in this month ordered by play count

Do you know those charts list that present the top 10 or top 40 of musical tracks during the last months? With changes in regard to the previous month (How many positions a track won or lost)? I have such a chart in my daily photo project that looks like this:


2015-05-charts

This is already the UI I’m gonna present in the next article, though. Back then, I have written an atrocious MySQL statement which I nearly sent to #CTMMC some months ago. That beast is huge, slow and complicated.

Using modern SQL constructs like the the with clause aka Common Table Expressions combined with jOOQ makes it easy to generate a query like this:

WITH 
  previous_month(track_id, cnt, POSITION) AS (
    SELECT 
      plays.track_id, 
      COUNT(*) cnt, 
      dense_rank() OVER (ORDER BY COUNT(*) DESC) POSITION
    FROM plays
    WHERE CAST(TRUNC(plays.played_on, 'DD') AS DATE) BETWEEN DATE '2015-01-01' AND DATE '2015-01-31'
    GROUP BY plays.track_id
  ), 
  current_month(track_id, cnt, POSITION) AS (
    SELECT 
      plays.track_id, 
      COUNT(*) cnt, 
      dense_rank() OVER (ORDER BY COUNT(*) DESC) POSITION
    FROM plays
    WHERE CAST(TRUNC(plays.played_on, 'DD') AS DATE) BETWEEN DATE '2014-12-01' AND DATE '2014-12-31'
    GROUP BY plays.track_id
  )
SELECT 
  v0 label, 
  v1 cnt, 
  v2 CHANGE
FROM (
  SELECT 
    x.v0, 
    x.v1, 
    x.v2, 
    rownum rn
  FROM (
    SELECT 
      (artists.artist || ' - ' || tracks.name || ' (' || tracks.album || ')') v0, 
      previous_month.cnt v1, 
      (current_month.POSITION - previous_month.POSITION) v2
    FROM tracks
      JOIN artists
      ON tracks.artist_id = artists.id
      JOIN previous_month
      ON previous_month.track_id = tracks.id
      LEFT OUTER JOIN current_month
      ON current_month.track_id = tracks.id
    ORDER BY 
      v1 DESC, 
      v0 ASC
 
  ) x
  WHERE rownum <= (0 + 10)
)
WHERE rn > 0
ORDER BY rn

Hopefully nobody passes this to an ORM!

What does it do? There are to CTEs, previous and current month, both including a window function over their “position”, where position is defined as the count of plays in the given period of time. The dense_rank() method works much like row num, but “equal” rows (same count in this case) are given the same number, with “dense_rank” having no gaps. Exactly what we need to compute the chart. In the actual query I can join those CTEs like normal tables and I can compute the change very easily.

If I use jOOQ to break this down into chunks, it probably looks more readable to non-SQL developers, see ChartReportController:

@RequestMapping("/{year}/{month}")
public void getCharts(
        @PathVariable final int year,
        @PathVariable final int month,
        @RequestParam(defaultValue = "10") final int n,
        final HttpServletResponse response
) throws IOException {
    response.setContentType(MediaType.APPLICATION_JSON_UTF8_VALUE);
 
    final Field<Date> playedOnTruncatedToDay = trunc(PLAYS.PLAYED_ON, DatePart.DAY).cast(Date.class);
 
    YearMonth hlp = YearMonth.of(year, month);        
    final CommonTableExpression<Record3<BigDecimal, Integer, Integer>> currentMonth = name("previous_month").fields("track_id", "cnt", "position")
            .as(select(PLAYS.TRACK_ID, count().as("cnt"), denseRank().over(orderBy(count().desc())).as("position"))
                    .from(PLAYS)
                    .where(playedOnTruncatedToDay.between(Date.valueOf(hlp.atDay(1)), Date.valueOf(hlp.atEndOfMonth())))
                    .groupBy(PLAYS.TRACK_ID));
 
    hlp = hlp.minusMonths(1);
    final CommonTableExpression<Record3<BigDecimal, Integer, Integer>> previousMonth = name("current_month").fields("track_id", "cnt", "position")
            .as(select(PLAYS.TRACK_ID, count().as("cnt"), denseRank().over(orderBy(count().desc())).as("position"))
                    .from(PLAYS)
                    .where(playedOnTruncatedToDay.between(Date.valueOf(hlp.atDay(1)), Date.valueOf(hlp.atEndOfMonth())))
                    .groupBy(PLAYS.TRACK_ID));        
 
    final Field<String> label = concat(ARTISTS.ARTIST, val(" - "), TRACKS.NAME, val(" ("), TRACKS.ALBUM, val(")")).as("label");        
    this.create
            .with(currentMonth)
            .with(previousMonth)
            .select(label, 
                    currentMonth.field("cnt"), 
                    previousMonth.field("position").minus(currentMonth.field("position")).as("change")
            )
            .from(TRACKS)
            .join(ARTISTS).onKey()
            .join(currentMonth).on(currentMonth.field("track_id", BigDecimal.class).eq(TRACKS.ID))
            .leftOuterJoin(previousMonth).on(previousMonth.field("track_id", BigDecimal.class).eq(TRACKS.ID))                
            .orderBy(currentMonth.field("cnt").desc(), label.asc())
            .limit(n)
            .fetch()
            .formatJSON(response.getOutputStream());
}

Those CTEs can be “saved” onto variables and subsequently be used in a pretty much ordinary query.

Summary and use cases

The above query is not a query I write on an everyday use case and maybe it should be written at all. In real live though, I often haven to deal with complex data models and data silos that need to be queried in many different ways. Those analyses are really useful and helpful to a lot of people.

With the combination of Spring Boot and jOOQ I can provide those analysis in no time. Having those queries available as an HTTP API without depending on some HTTP tool inside a database provides much value. I can map simple and complex queries a like and get great logging for free.

Using Springs awesome MVC support, I can pass URL query parameters directly to the jOOQ builder without risking injection scenarios.

But: I strongly believe that JPA / Hibernate is not evil at all. Quite the contrary. Combine it with Spring Data JPA, implement the repository pattern and you’ll get easy to test modules that provide a lot of cruft for you.

If I would have write access of some kind, aka OLTP, in my scenario here, I’d rather would use JPA for it than filling those inserts with jOOQ myself for example.

Gavin King, creator of Hibernate, said on reddit: “Just because you’re using Hibernate, doesn’t mean you have to use it for everything.” and i fully agree with him. After working now on a jOOQ based project for 13 months, this is how an ideal workflow for a database centric but not database dependent application looks like for me:

  1. Always use automatic database migrations, for example Flyway or Liquibase
  2. Get your domain model as “right” as possible, normalize your database and use JPA / Hibernate in combination with Spring Data JPA.
  3. Use JPQL queries if necessary but don’t hide native queries somewhere in annotations
  4. For complex queries, projections or use cases like merge statements, use jOOQ. Either directly or pass the generated SQL to JPA or if you like it, to Springs JDBCTemplate. You can write your statements as strings, but then the burden of using prepared statements and the like correctly is on you.

But the most important point to make: SQL, JPA and jOOQ are expert tools. You have to know them to get the most out of them. Read about SQL and persistence in common, read the reference docs. I personally recommend the following sites and their authors:

If you enjoyed this post, you may follow me on twitter, leave some feedback or even star the demo. Next week I’m gonna publish the last entry in this series where I add an UI in which those analytic queries really shine.

No comments yet

4 Trackbacks/Pingbacks
  1. […] friend of the Spring community Michael Simons just put together a really interesting look at exposing a JOOQ-based analytics API as a Spring Boot and Spring MVC-based REST API. […]

  2. […] Friend of the Spring community Michael Simons just put together a really interesting look at exposing a JOOQ-based analytics API as a Spring Boot and Spring MVC-based REST API. […]

  3. Java Web Weekly, Issue 150 | Baeldung on November 10, 2016 at 2:45 PM

    […] >> An HTTP API for analytic queries [info.michael-simons.eu] […]

  4. […] Part 4: An HTTP api for analytic queries […]

Post a Comment

Your email is never published nor shared. Required fields are marked *