Everyone and their dog is posting their Spotify Wrapped thing. It’s 2020, i still don’t have Spotify, but despite my increasing age, I still listen to a ton of music.
When I started to work remotely back in 2018, one of the biggest perks for me was – apart from not having to commute – to be able to listen to whatever thing I currently like as loud as I want without headphones. Well, that changed a bit during the course of the COVID-19 pandemic as my wife is now working remotely as well, but alas, it turned out, the volume knob is still working.
So, no Spotify for me. But let’s see what the MariaDB – the database powering the scrobble engine running for dailyfratze.de is up to. How do I fill this data? I have a custom iTunes script written ages ago that calls a REST endpoint with the stuff I’m listening. Pretty basic, actually.
I am working for Neo4j now since 2.5 years and I honestly love the company for manifold reasons. However, it seems that it is considered rude to post SQL in the company Slack and we should prefer to use only Cypher 😉 Well, this post will contain a lot of SQL and use the scheme I had a couple of times in this SQL talk of mine.
Interested in Cypher? Cypher is a language for querying Neo4j, the Graph database by the vendor of the same name,
You can do awesome stuff with Cypher and you’ll find talks by me as well about that topic, but today I’ll keep it to a 1990’s joke: MATCH (n) RETURN n SKIP $no LIMIT /* no */ $ /* no */ limit
😉
General database stats
39497 tracks by 9661 artists and 161141 played tracks by 9 different users. First plays stored April 27 in 2005.
We will make use of the rank function to compute the exact position of things we are interested in.
Top 10 tracks in 2020
A simple approach without rank would be something like this:
SELECT a.artist, t.name, COUNT(*) FROM plays p JOIN tracks t ON t.id = p.track_id JOIN artists a ON a.id = t.artist_id WHERE p.user_id = 1 AND YEAR(p.played_on) = 2020 GROUP BY a.artist, t.name ORDER BY COUNT(*) DESC LIMIT 10 |
but that would fill already several places with tracks that have the same absolute count. This is where the rank()
and dense_rank()
functions come into play.
Both functions assign rows in a row set a rank based on their given order. Both functions can do this over partitions or windows over the whole data. Therefore these analytics functions are often called window functions. Both variants of rank
functions assigns the same rank to rows having the same value. Thus, two tracks that have been played the same number of time will receive the same rank. However, rank
will skip n ranks if there are 1 + n items in a rank wheres the dense_rank
function will not. I want consecutive ranks, that is: All tracks played the most n times will be 1 first place, the next rank second and so forth.
Let’s give it a shot. We see that the a query creating a window function over the dense rank of count gives us 8 tracks in total when I ask for the top 5 places:
And what can I say: German Hip-Hop/Punk-Band Antilopen Gang is on my radar now for 2 years, but in 2020, they have become my meds and therapy. If you would have ever told me, that I would totally fall in love with German Hip-Hop in my early 40ties, I would have said you’re mad, but there we are: “Wünsch Dir nix”, so fitting for 2020:
Also in the top 5, Patientenkollektiv. Such goose bumps:
We will see the Antilopen later on. I was a bit surprised by S&M2, a new album in 2020. A retake of Metallicas symphonic metal approach with the San Francisco Symphony orchestra. That version of The Unforgiven III is not something I would have ever expected by James. An incredible performance:
Last but not least: Ozzy Osborne. This guy has reached Lemmy Kilmister undying level.
Back to SQL. There are no partitions in the above query. The partitions would come in handy if would like to see my top 1 track over the last years in one query. Let’s give it a try:
It’s basically the same query but notice now how I create the rank: dense_rank() OVER (partition by year(played_on) ORDER BY count(*) DESC) AS rank
. The rank is computed now for each year in which I played tracks, separately.
But wait, 2019. What did I drink?
Albums
Let’s be safe and let us aggregate that stuff. Yes, I do still listen to whole albums. It is basically the same query, but group by album, not by single tracks. And I excluded compilations. Apart from that, the query is hardly different:
Antilopen Gang with 3 albums. Holy crap. But yes, they did release two albums in 2020, “Abbruch Abbruch” and “Adrenochrom”. The later a reply to some people in the music circus going lunatic and believing a lot of shit. I haven’t heard one album in the last 10 years so often like this. It is streamable on all major platforms.
Let’s play “Dinge” from Deichkinds “Wer sagt denn das?”:
For me 2020 proved that I don’t need too many things. Some stuff is essential for me: Feeling secure and snug with my family, working in a good company that also makes me feel safe. Ok, bicycles are my personal issue, but that’s a different topic…
Back to Deichkind: Electropunk. Punk is a good keyword, but I only spot “5, 6, 7, 8 Bullenstaat” by Die Ärzte in the above list. 2020 had some more punk. Let’s filter the above list to albums that have been released in 2020:
And we will see Madsen, Ferris MC and Slime. Madsen, a “Deutsch Rock” band released the Punk Rock Album of 2020 (which is 200% more Punk than Die Ärzte these days), and Ferris MC, who played for a decade with Deichkind, joined forces with Swiss und die anderen and dropped an incredible Rap-Punk-Rock piece. Slime are Slime. The subtitle of this blog, “120 Dezibel” are a quote from “Missglückte Asimetrie” and one reason I will have always music in my life:
Ich dreh auf und die Erde steht still bei 120 Dezibel
Alles was ich brauch und will sind 120 Dezibel
Ich kann euch alle nicht mehr hören bei 120 Dezibel
Nichts was mich noch stört bei 120 Dezibel
120 Dezibel
120 Dezibel
Let’s have a look at Madsen. They got some stand-ins for “Alte weiße Männer”:
And one of our favorite songs among the adults and kids, “Quarantäne für immer”
Listen to “Sorry, kein Sorry” by Ferris. After that, give it a go with Slime. That band is as old as me:
And back to the database and the
Artists
What have been my preferred artists in 2020? I expect no surprises here. The query will be very similar to before, only the grouping changes again (it becomes simpler):
We see again Antilopen Gang and on the next two ranks – if I didn’t restrict to top 10 – we would have seen Juse Ju and Fatoni, two more German rapper who are also somewhere near Antilopen Gang. A graph database like Neo4j would show this connection and probably discover the “Anti Alles Aktion” on it’s own. Want to learn how? Have a look at Going from relational databases to databases with relations with Neo4j.
People who know me a bit longer know that I have been to more than one Heavy Metal festival and certainly to more than one Grindcore gig. That much German hip-hop in my playlists? I would have never thought. Ok, there’s still the usual suspects like Motörhead (I love running with Motörhead in my ears), the mighty Black Sabbath and even Body Count had a decent album out this year.
Can my database answer the change in artist or preference as well? Hmm, we would need the current years rank of something and the previous one. I think we can do this.
But in the meantime, enjoy Faith Alone 2020 by Bad Religion:
So be prepared for the with
-with clause or “Common Table Expression”. The keen eye did already see that I used subqueries in my queries above. Why? To filter on the rank (top 5 or top 10 only). I cannot do this in the same select as in which the rank is computed. Therefore I nested the query and made it accessible that way.
The subquery works, but is kinda hard to read and cannot be reused. A relation inside a with clause is somewhat like a named subquery or a view that only exists during that query. Fun fact: CTEs can refer to themselves, thus become recursive.
Anyway, I think they read nice and reminds me a lot of the with clause in Cypher which is used to stick together multiple segments of a query to one pipeline.
But show me the code:
WITH rank_per_year AS ( SELECT YEAR(p.played_on) AS YEAR, a.artist, dense_rank() OVER (partition BY YEAR(played_on) ORDER BY COUNT(*) DESC) AS rank FROM plays p JOIN tracks t ON t.id = p.track_id JOIN artists a ON a.id = t.artist_id WHERE p.user_id = 1 AND YEAR(p.played_on) BETWEEN 2015 AND 2020 AND t.compilation = 'f' GROUP BY YEAR(p.played_on), a.artist ) SELECT YEAR, artist, rank, ifnull( lag(rank) OVER (partition BY artist ORDER BY YEAR ASC) - rank, 'new' ) AS `change` FROM rank_per_year WHERE rank <= 5 ORDER BY YEAR ASC, rank ASC; |
We do compute the rank per year and artist (“group by year and artist”) for years 2015 to 2020, partitioned by the year and give this whole thing a name (“rank_per_year”). This is a new relation that can now be used in a select clause, like we do.
In that select clause, we do find lag
. lag is a window function that can go n rows backward over a partition that is ordered. The partition here is defined by the artist and in that, ordered by year. lag
picks the value of the rank of the previous year. rank is a variable in that case, coming from the CTE named “rank_per_year”, not from the window function of the same name!
From that lagged value we subtract the current and get the change from the previous to this year. As one artist can be under the top 5 artist for the first time in a year, we need to check whether the previous rank is null. That’s what ifnull
is for. Neat, isn’t it? And the result? Here we go (I added some blank lines manually):
I hope you enjoyed this a bit. At least I did. Was nice doing some SQL again and digging through the stuff I have been listening in 2020. In total I listend to about 9847 tracks so far in 2020 with a duration of roughly 27 days.
I leave you with a track that captures my mood in 2020 all to perfect. Danger Dan mit Nudeln und Klopapier:
The screenshots of the code have been created with Carbon. I was too lazy to fiddle around with something else that would have fit both the queries and the output. I generated the query format with window chrome and the output without and appended both files than with ImageMagick like this: convert carbon.png carbon\(1\).png -append tracks-2020.png
.
No comments yet
Post a Comment