Hibernate Search and Spring Boot: Simple yet powerful archiving

This post has been featured in the Hibernate Community Newsletter 18/2016.

Before my summer holidays I mentioned my personal twitter archive on Twitter again….

This time, Vlad from Hibernate reacted on my tweet:

More reactions came from Sanne and Emmanuel and here we go:

Content

  1. Source
  2. Background
  3. Features
  4. Tools used
  5. Application
  6. Database schema
  7. The Tweet entity
  8. Storing new entities
  9. Querying entities
  10. Conclusion
  11. Try it out yourself

Source

The whole project, which has already grown into more than a tech demo, is on github: michael-simons/tweetarchive.

What I skipped is a fancy gui. So far, it only has a REST interface. But, it can be run as a docker image with local, persistent storage. Check it out, star it, maybe even add stuff to it… Feel free!

Background

I’m running my archive for several years now, from Daily Fratze. Daily Fratze contains a home grown crawler that checks my user time line and stores my tweets in a MySQL database. I’m using JPA with Hibernate as my database access tool, so Hibernate Search fit’s nicely and is really easy to implement. Hibernate Search is a super easy way to add an Apache Lucene full text index to your entities.

For large scale applications, Elastic Search or similar maybe more fitting, but I’m really content with my “small” (at the end of last year ~50Mb) search index and it’s performance. It doesn’t add much (if any) overhead to development and on production.

For the demo, I’ve taken my entities but not the parser. For parsing in the demo I use Twitter4J. Twitter4J is apparently not made for parsing static tweets, so there are some ugly constructs for getting a Twitter archive into the app, but that should not be the point here. The entities have been adapted and refreshed according to my current skills. Some things I created years ago should never see the light of day.

Features

  • I want to be able to search my tweets. With keywords and with full blown Lucene queries
  • The application should track new tweets
  • The original JSON content should be stored as well

Tools used

In order:

Application

The application is a standard Spring Boot application. It’s 2016, you should find several real good guides out there and also on this blog how such an application is build.

I also assume that you have an idea what Apache Lucene is about.

Database schema

My migrations are inside src/main/resources/db/migration/ where Flyway automatically finds it. Flyway itself is recognized by Spring Boot if on the classpath.

I have this PostgresSQL cast

that allows me to store a string java attribute inside a JSONB column without a bunch of custom converters, without explicitly casting it but with type checks.

The table definition for tweets looks like this:

Nothing fancy here except the raw_data column, which contains the tweets original source. You can use PostgreSQLs JSON operators to query it, if you like.

The Tweet entity

You’ll find the Tweet entity here src/main/java/ac/simons/tweetarchive/tweets/TweetEntity.java. Basically, it is a standard JPA entity. I use Project Lombok to get rid of boiler plate code, so you’ll find no getters and setters.

For the following stuff, I assume you know JPA, because I’m not gonna covering that.

To make Hibernate Search aware of an entity, that should be indexed, you have to annotate the entity:

That is already all there is!

Next step: Add a simple field, for example the screen name, just annotate it with @Field:

That actually reads: Index that field, store the value with the index so that it can be searched without hitting the database but don’t to further analysis.

If you read through the entity, you’ll find several such fields.

Next: Analyzing fields. I want to search for similar words in the content of the tweet. While receiving the tweet, the application resolves URLs and stuff and replaces the short urls, see TweetStorageService.

The entity takes this one step further. The content field is annotated with:

Here the @Field annotation says: Index the content, don’t store it, but analyze it. It also says, through @AnalyzerDiscriminator, with which analyzer.

I have defined my analyzers right with the entity, but they can be defined elsewhere, on a package for example, too:

I have 3 analyzers in place: An English analyzer, wo tokenizes the input, lower cases it and then does english based word stemming. The same for German and last but not least, an analyzer that just tokenizes and filters the content.

The analyzer itself can be dynamically inferred with a discriminator, which looks like this:

Read: If the language of the tweet is available and supported, use the fitting analyzer, otherwise use the default analyzer for undefined languages.

Hibernate Search allows spatial queries. You can annotate the whole class or an attribute, that returns Coordinates:

Also nested entities are supported. My example: The information regarding a reply. I have InReplyTo as an @Embeddable class and an attribute inReplyTo

This reads: Please index the embedded class, add a prefix “reply.” to all fields and otherwise, check for @Field annotations in the embedded class.

So far: Not much!

Storing new entities

If you use Spring Boot together with Hibernate and Spring Data JPA, you have nothing to take care of except configuring the database (and you can even skip this, if you use an in memory database).

This is all the configuration it takes, to get Hibernate Search up and running with that setup, if you add org.springframework.boot:spring-boot-starter-data-jpa, org.postgresql:postgresql and org.hibernate:hibernate-search-orm to the classpath:

spring.datasource.platform = postgresql
spring.datasource.driver-class-name = org.postgresql.Driver
spring.datasource.url = jdbc:postgresql://localhost:5432/tweetArchive
spring.datasource.username = tweetArchive
spring.datasource.password = tweetArchive
 
spring.jpa.hibernate.ddl-auto = validate
 
spring.jpa.properties.hibernate.search.default.directory_provider = filesystem
spring.jpa.properties.hibernate.search.default.indexBase = ${user.dir}/var/index/default

Just go ahead and define a Repository the TweetEntity:

This is an Interface with no implementation in my application. It inherits from org.springframework.data.repository.Repository, thus providing means access entities already. I chose the simplest form of repository so that I don’t clutter my application with methods I wouldn’t need. If I instead would have inherited from CrudRepository, I wouldn’t have do define save or delete methods.

Calling the save or delete method from my tweet storage service already updates my search index.

Querying entities

But take good note that this interface inherits also from TweetRepositoryExt. This is the recommended way by Spring Data JPA to add custom behavior. This interface defines to search methods which I actually have to define. This is done in TweetRepositoryImpl and I’m gonna walk you through the search method:

First I retrieve a new FullTextEntityManager inside the declarative transaction and instantiate a query builder. The query builder exposes a nice, fluent interface to define my Lucene query. You’ll see how I add a keyword query on one specific field and also, if the user provided a date range, I add some range queries to a bracing boolean condition.

The FullTextEntityManager is then used again to instantiate a JPA query from the full text query and retrieve the result.

And that’s all there is: I can use (and hide!) the full text queries inside the same repositories I would use elsewhere.

Conclusion

If you already are using Hibernate as your ORM, have embraced Spring Data repositories and you’ll need to search some entities then Hibernate Search maybe the right approach for your project. It’s really easy to implement and also easy to use. One downside for a 12 factor app could be the fact, that the index is directory based in the default setting. You can work around it, though, by using JMS or JGroups.

I have been using Hibernate Search for quite a while now on Daily Fratze and on several other projects intern as well and for my respectively our purpose it has been enough.

Try it out yourself

There’s much more to learn in the demo application. Go to michael-simons/tweetarchive and see for yourself. There’s an extensive README, that should guide you through running the application yourself. The easiest way is to use a local Docker based instance.

If you like it, follow me on Twitter, I am @rotnroll666, leave a comment or a star.

| Comments (5) »

06-Sep-16


Integration testing with Docker and Maven

This post will use

to provide an integration test environment for Spring Boot applications, running at least JUnit 4.12, Spring Boot 1.4, the Failsafe plugin in the version managed by Spring Boot and the latest docker-maven-plugin.

Gerald Venzl asked for it on twitter especially in the context of integration tests with databases. In case you don’t know, Gerald and Bruno Borges are responsible for the official Oracle database docker images, which I am using at my company respectively for my upcoming talk at DOAG 2016.

Apart from Gerald asking I had several reasons to finally get this topic right. First, after upgrading my jugs site to Spring Boot 1.4 and with it, to Hibernate 5 I ran into issues with the ID generator which behaved differently than before (a million thanks to Vlad Mihalcea for his great input).

eurejug.eu is developed locally on an H2 database and runs on Pivotal Cloud Foundry in production where it uses a PostgreSQL database. So my problem was detected during the existing unit tests and broke the application in a way that wasn’t immediately obvious.

And last but not least: We at ENERKO INFORMATIK are creating mainly database centric application, some of them only 2 tier applications with a lot of SQL logic. That logic is fine for us and our customers, but automated integration testing always gives us a hard time, where I am convinced, that the following setup I developed for the Euregio JUG, will give us a lot of improvements, replacing the PostgreSQL docker image with Oracle ones. So everything you’ll read can be applied to other databases as well.

Configuring integration tests

There’s actually not much to do in a Spring Boot / Maven based application, just add the plugin like I did:

<plugin>
  <groupId>org.apache.maven.plugins</groupId>
  <artifactId>maven-failsafe-plugin</artifactId>
  <executions>
    <execution>
      <goals>
        <goal>integration-test</goal>
        <goal>verify</goal>
      </goals>
    </execution>
  </executions>
</plugin>

I can safely omit the version number because Spring Boot has this plugin in its managed dependencies. The failsafe plugin automatically recognizes the following patterns in test classes as integration tests:

  • “**/IT*.java” – includes all of its subdirectories and all Java filenames that start with “IT”.
  • “**/*IT.java” – includes all of its subdirectories and all Java filenames that end with “IT”.
  • “**/*ITCase.java” – includes all of its subdirectories and all Java filenames that end with “ITCase”.

The surefire plugin itself excludes them in the current version so that they aren’t run as Unit tests. I didn’t bother to move them into separate folders but that should be easy be doable by the Build Helper plugin.

Configure your containers with docker-maven-plugin

As I said before, the docker-maven-plugin has a superb documentation and is really easy to use. Here it is used to start a docker container based on the official postgresql image before the integration tests run. The integration tests are run by the failsafe plugin so its made sure that the container will be removed afterwards. See the commit for EuregJUG:

<plugin>
    <groupId>io.fabric8</groupId>
    <artifactId>docker-maven-plugin</artifactId>
    <version>0.20.1</version>
    <executions>
        <execution>
            <id>prepare-it-database</id>
            <phase>pre-integration-test</phase>
            <goals>
                <goal>start</goal>
            </goals>
            <configuration>
                <images>
                    <image>
                        <name>postgres:9.5.4</name>
                        <alias>it-database</alias>
                        <run>
                            <ports>
                                <port>it-database.port:5432</port>
                            </ports>
                            <wait>
                                <log>(?s)database system is ready to accept connections.*database system is ready to accept connections</log>
                                <time>20000</time>
                            </wait>
                        </run>
                    </image>
                </images>
            </configuration>
        </execution>
        <execution>
            <id>remove-it-database</id>
            <phase>post-integration-test</phase>
            <goals>
                <goal>stop</goal>
            </goals>
        </execution>
    </executions>
</plugin>

Here it runs an existing image under the alias “it-database”. It waits either until the configured message appears in the Docker logs or the time expires (The plugin can also be used to create new images, for example based on the Oracle Docker images which is something I’ll use in my talk and publish afterwards). The container is started before the integration tests and stopped afterwards.

Also take note of the important port mapping: it-database.port:5432. Docker maps this port to a random high port on your machine. This random port will be grabbed by the Maven plugin and assigned to the new property it-database.port and can be used throughout the pom file. If you would use the Oracle Database image, that would 1521 in all probability.

I use the new property as an environment variable for the integration tests by adding the following to the failsafe configuration above:

<plugin>
  <groupId>org.apache.maven.plugins</groupId>
  <artifactId>maven-failsafe-plugin</artifactId>
  <executions>
    <execution>
      <goals>
        <goal>integration-test</goal>
        <goal>verify</goal>
      </goals>
    </execution>
  </executions>
  <configuration>
    <environmentVariables>
      <it-database.port>${it-database.port}</it-database.port>
    </environmentVariables>
  </configuration>
</plugin>

and make use of Spring Boots awesome configuration features, as you can see in this commit, where I add a file named application-it.properties containing the following line among others:

spring.datasource.url = jdbc:postgresql://localhost:${it-database.port}/postgres

recognize the property? This way, I don’t have to hardcode the port somewhere, which would lead into problems when several builds run in parallel (for example on a CI machine). Parallel build wouldn’t be possible if Docker would map the exposed port to a fixed port on the machine running the build.

Also that file configures the data to load through:

spring.datasource.data = classpath:data-it.sql

Per default it would load a file called “data-.sql”, so you are free to write SQL fitting to your database. I much prefer this over a DBUnit or similar approach, because it’s much easier for me to just write the SQL down for the specific database than a mediocre replacement for database specific tasks.

Writing the actual tests for a Spring Boot application

So far, that was nothing, wasn’t it? I’m really impressed how much development change the last years. Yes, Maven is still XML based, but I didn’t have to do any “fancy” things or use special scripts or whatever to a sane testing environment up and running.

The actual test I needed looks like this:

import static org.hamcrest.Matchers.is;
import org.junit.Assert;
import org.junit.Test;
import org.junit.runner.RunWith;
import org.springframework.beans.factory.annotation.Autowired;
import org.springframework.boot.test.autoconfigure.orm.jpa.AutoConfigureTestDatabase;
import static org.springframework.boot.test.autoconfigure.orm.jpa.AutoConfigureTestDatabase.Replace.NONE;
import org.springframework.boot.test.autoconfigure.orm.jpa.DataJpaTest;
import org.springframework.test.context.ActiveProfiles;
import org.springframework.test.context.junit4.SpringRunner;
 
@RunWith(SpringRunner.class)
@DataJpaTest
@AutoConfigureTestDatabase(replace = NONE)
@ActiveProfiles("it")
public class RegistrationRepositoryIT {
 
    @Autowired
    private EventRepository eventRepository;
 
    @Autowired
    private RegistrationRepository registrationRepository;
 
    @Test
    public void idGeneratorsShouldWorkWithPostgreSQLAsExpected() {
        // data-id.sql creates on registration with id "1"
        final EventEntity event = this.eventRepository.findOne(1).get();
        final RegistrationEntity savedRegistration = this.registrationRepository.save(new RegistrationEntity(event, "foo@bar.baz", "idGeneratorsShouldWorkWithPostgreSQLAsExpected", null, true));
        Assert.assertThat(savedRegistration.getId(), is(2));
    }
}

Important here are three things:

  • Run the test with @RunWith(SpringRunner.class) and as a @DataJpaTest. I don’t want to fire up the whole application but only JPA and the corresponding repositories.
  • By default, @DataJpaTest replaces all data sources by an in memory database if such is on the class path, which is exactly what I don’t want here. So I prohibit this with @AutoConfigureTestDatabase(replace = NONE)
  • Activate a profile named “it” with @ActiveProfiles("it") so that Spring Boot takes the application-it.properties into account mentioned earlier.

Summary

I’m really, really happy with the solution, especially regarding the fact that I’m probably the right amount late to the Docker party. While it was relatively easy to install Docker on Linux, you would have to jump through hoops to get it running under OS X and especially on Windows. Lately, Docker natively supports libvirt under OS X and HyperV under Windows 10 pro so my colleagues don’t have a reason not to use it. The pre build images works fine and even the mentioned Oracle images above build without errors on several OS X and Windows machines I tested.

Even if you don’t want to have anything to do with Docker, the docker-maven-plugin is your friend. Nothing needs to be started manually, it just works. The EuregJUG site just build flawless on my Jenkins based CI.

If this post if of use to you, I’d be happy to hear in the comments.

Update: Here’s a snippet to build and run Oracle Database instances from the official Oracle Docker files: pom.xml.

Update 2: I just noticed that this commit on the above mentioned Oracle Database Images makes them somewhat less useful for integration testing: Now the Dockerfile only installs the database software but doesn’t create the database and the container takes ages to start up… I don’t have a solution for that right now and i’m staying with my older images, even if they are bit larger.

Update 3: In some cases a log messages appears twice in a container. The Maven-Docker-Plugin supports regex for log-waits since 0.20.1, which is reflected in pom snippet now. Also regarding Update 2: One just has to do it “right”, see this article.

| Comments (17) »

25-Aug-16


Burning Geek Insult Con

Some weeks ago this thread escalated quickly:

and the idea of the Burning Geek Insult Con was born… Bring your favorite IT discussion from the internet into real live…

We could over tracks like Tabs and Spaces (no explanation needed (ok, well, not exactly, how many spaces for one tab?)), The best editor in the world (for realz) and many more… What about Buildtools? For sure, not all has been said about Maven and Gradle… And should you use Groovy or Kotlin? I’m quite sure someone has a strong opinion… Anyway, MAKE is the way to go.

If we finish the track Databases (sorry, we don’t offer NoSQL stores) with survivors, we invite special guest to decide, which ORM to use or wether to use an ORM or not.

I’m quite sure I have forgotten stuff… Like IDEs, OS and such. In the end, it doesn’t matter anyway.

I love to try out new stuff, pretty much everyday. But regarding applications that should life longer than a month, I’ll try to be consistent with regards of the tools used and how they are used. What has been true with monoliths is still valid with micro services: Architecture and the role of the an architect is about communication. And communicating for me is much simpler if me and my team can come to agreements on tools.

| Comments (2) »

17-Aug-16


Spring Boot @WebMvcTest and MessageSource

Spring Boot makes it really easy to translate your application. As soon as a message.properties and accompanying message_*.properties are on the class path, it configures a org.springframework.context.MessageSource which can be used in view templates or inside a controller layer through org.springframework.context.support.MessageSourceAccessor as I have done here.

If you wanna test this, either your controller or if your translating your views correctly and you are using the new Spring Boot 1.4 test improvements, you’ll have to manually configure the MessageSourceAutoConfiguration, which isn’t done by @WebMvcTest like so:

@RunWith(SpringRunner.class)
@WebMvcTest(
        controllers = BikesController.class,
        secure = false        
)
@ImportAutoConfiguration(MessageSourceAutoConfiguration.class)
public class BikesControllerTest {
}

See complete example here.

This took me a while to figure out, so I thought that this is maybe useful.

| Comments (0) »

04-Aug-16


DOAG Konferenz und Ausstellung 2016

13 years ago, I visited my first IT conference ever, the DOAG in Mannheim 2003. This year, I will be speaking myself at #DOAG2016 about creating database centric applications with Spring Boot and jOOQ in German (see abstract below).

After a successful premier of speaking in the IT circus earlier this year at Spring I/O about custom Spring Boot starters (btw, I’m also at W-JAX this year with a refined version of that talk), I’m really happy to be accepted at DOAG.

I’ve been very deeply into all things databases since the beginning of my career. My company runs several applications which changed their faces many times over the last decade, but the relational model stood the test of time, often only added to, never completely rewritten.

When visiting DOAG in 2003 and later my company was looking for alternatives to Oracle Forms 6i, whose end off error correction support was announced for the end of 2004 and extended support ending 2008.

Back then we where depending on Client / Server support and Oracle Forms 9 only available as a 3-tier architecture wasn’t an option for us.

I saw lot of different approaches: The whole J2EE stack (yes, it was called that way back then) which was a nightmare to me, lot of different approaches for automagically converting old Forms applications to new ones or the J2EE or to Java clients, which were working equally bad for us, Oracle ADF, Oracle APEX and so on.

Personally, I spent some time in Ruby on Rails land (being back in Java land since 2010), my company opted for Java Swing where desktop clients where needed and Grails respectively pure Spring / Spring Boot with various view technologies otherwise.

Now, more than a decade later, one powerful and given that power, relatively easy to understand technology stack for database centric application with a nice UI for me is Spring Boot using jOOQ to access the database and rendering a nice UI with Oracle JET.

For me, there’s no easier way to start a modern Webapplication then with Spring Boot. The stuff just works and it has an automatic configuration for jOOQ as well.

For accessing relational databases I’d actually prefer an ORM used together with Spring Data JPA for many tasks, but when it comes to reporting and batch inserting, why not using the power of your database, for which you probably have spent a lot of money for? Why computing it on the application server or even worse, at client side? Here is where SQL shines and jOOQ is really great way to write type safe SQL and throw it your database.

In my demo I’ll use my scrobbled (like Last.FM) music data from the last 10 years to build up a chart reporting engine. The graphs will be rendered inside a simple Oracle JET dashboard.

The demo will facilitate Docker (through the Maven Docker Plugin) for creating the development database instance and will be runnable as a fat jar presenting itself like this:

If you want to hear more about that, meet me in Nürnberg, somewhere between 15th an 18th November.

And, as promised, the abstract in German:

Datenbankzentrische Anwendungen mit Spring Boot und jOOQ

In diesem Vortrag wird eine Variante datenbankzentrischer Anwendungen mit einer modernen Architektur vorgestellt, die sowohl in einer klassischen Verteilung als auch “cloud native” genutzt werden kann und dabei eine sehr direkte Interaktion mit Datenbanken erlaubt.

jOOQ ist eine von vielen Möglichkeiten, Datenbankzugriff in Java zu realisieren, aber weder eine Objektrelationale Abbildung (ORM) noch “Plain SQL”, sondern eine typsichere Umsetzung aktueller SQL Standards in Java. jOOQ “schützt” den Entwickler nicht vor SQL Code, sondern unterstützt ihn dabei, typsicher Abfragen in Java zu schreiben.

Spring Boot setzt seit 2 Jahren neue Standards im Bereich der Anwendungsentwicklung mit dem Spring Framework. Waren vor wenigen Jahren noch aufwändige XML Konfigurationen notwendig, ersetzen heute “opinionated defaults” manuelle Konfiguration. Eine vollständige Spring Boot Anwendung passt mittlerweile in einen Tweet.

Der Autor setzt die Kombination beider Technologien erfolgreich zur Migration einer bestehenden, komplexen Oracle Forms Client Server Anwendung mit zahlreichen Tabellen und PL/SQL Stored Procedures hinzu einer modernen Architektur ein. Das Projekt profitiert sehr davon, die Datenbankstrukturen nicht in einen ORM “zu zwängen”.

Nach einer kurzen Einführung dieser Themen wird eine Demo “from scratch” entwickelt, die zuerst die niedrige Einstiegshürde in die Spring basierte Entwicklung mit Java und danach die einfache Verwendung moderner SQL Konstrukte zeigt, ohne das ein ORM oder stringbasierte SQL Statements im Weg stehen. Der Abschluss der Demo wird eine JSON Api sein, die von einer OracleJET Clientanwendung genutzt wird.

Die Besucher kennen im Abschluss eine schlanke Alternative sowohl zur aufwändigen JPA basierten Entwicklung als auch zu APEX Anwendungen.

| Comments (5) »

28-Jul-16