Running Hibernate Search with Elasticsearch on Pivotal CF

This post has been featured on This Week in Spring – September 20, 2016 and on the Hibernate Community Newsletter 19/2016.

Two weeks ago, I wrote a post on how to use Hibernate Search with Spring Boot. The post got featured on the Hibernate community newsletter as well as on Thorbens blog Thoughts on Java.

I ended the the post saying that a downside for a fully cloud based application is the fact, that the default index provider is directory based.

Well.

There’s a solution for that, too: In upcoming Hibernate Search 5.6 there’s an integration with Elasticsearch.

I didn’t try this out with my Tweet Archive, but with the site of my JUG, which runs happily on Pivotal CF.

Goal

  • Use local, directory based Lucene index during development
  • Use Elastic Search integration when deployed to Pivotal CF (“The cloud”)

Steps taken

First of all, you have to add the dependencies

with hibernate-search.version being 5.6.0.Beta2 at the moment.

The annotations at entity level are exactly the same as in my previous post, but for your convince, here’s the post entity, which I wanted to make searchable:

Again, I have configured a language discriminator at entity level with @AnalyzerDiscriminator(impl = PostLanguageDiscriminator.class), but we come later to this.

To make Hibernate Search use the Elastic Search integration, you have to change the index manager. In a Spring Boot application this can be done by setting the following property:

spring.jpa.properties.hibernate.search.default.indexmanager = elasticsearch

And that’s exactly all there is to switch from a directory based, local Lucene index to Elasticsearch. If you have a local instance running, for example in doctor, everything works as before, the indexing as well as the querying.

The default host is http://127.0.0.1:9200, but we’re not gonna use that in the cloud. Pivotal IO offers Searchly at their marketplace, providing Elastic Search. If you add this to your application, you’ll get the credentials via an URL. The endpoint then can be configured like this in Spring application.properties:

spring.jpa.properties.hibernate.search.default.elasticsearch.host = ${vcap.services.search.credentials.sslUri}

Here I am making use of the fact that environment variables are evaluated in properties. The vcap property is automatically added by the Pivotal infrastructure and contains the mentioned secure URL. And that’s it. I have added a simple search by keyword method to my Post repository, but that I had already covered in my other post:

The actual frontend accessible through http://www.euregjug.eu/archive is nothing special, you can just browse the sources or drop me a line if you have any questions.

More interesting is the language discriminator for the posts. It looks like this:

It returns the name of the posts language. Elasticsearch offers build-in language specific analyzers, “english” and “german” are both available.

What, if I want to use a local index for testing and Elasticsearch only on deployment? I would have to define those analyzers in that profile. The right way to do it is a Hibernate @Factory like this:

and a application-default.properties containing

spring.jpa.properties.hibernate.search.model_mapping = eu.euregjug.site.config.DefaultSearchMapping

Recap

To use Hibernate Search with your JPA entities, basically follow the steps described here.

If you want to use named Analyzers from Elastic Search, that aren’t available for locale Lucene, add analyzers with the same name (and maybe a similar functionality as well) through a Hibernate @Factory and configure them in application-default.properties. If you’re at it, you may want to configure the index path into a directory which is excluded from your repo:

Relevant part of application-default.properties:

spring.jpa.properties.hibernate.search.default.indexBase = ${user.dir}/var/default/index/
spring.jpa.properties.hibernate.search.model_mapping = eu.euregjug.site.config.DefaultSearchMapping

In your prod properties, or in my case, in application-cloud.properties switch from the default index manager to “elasticsearch” and also configure the endpoint:

Relevant part of application-cloud.properties:

spring.jpa.properties.hibernate.search.default.indexmanager = elasticsearch
spring.jpa.properties.hibernate.search.default.elasticsearch.host = ${vcap.services.search.credentials.sslUri}
spring.jpa.properties.hibernate.search.default.elasticsearch.index_schema_management_strategy = MERGE

Happy searching and finding 🙂

| Comments (2) »

20-Sep-16


NetBeans, Maven and Spring Boot… more fun together

At the 1st NetBeans Day Cologne I gave a talk about why I think that the combination of NetBeans, Maven and Spring Boot is more fun together.

Together with me were Michael MĂĽller, who spoke about the upcoming support of Java 9s JShell in NetBeans and I’m totally curious how that will work with my Spring projects. I imagine that really useful.

And certainly, Geertjan Wielenga from Oracle was there, spoke a little bit about NetBeans background and the upcoming features. His second talk was about OracleJET and how NetBeans support that concept of enterprise JavaScript programming.

So my talk: I have a full working demo right here: github.com/michael-simons/NetBeansEveningCologne and if you walk through the slides and codes, you’ll even find a coupon for my book.

That said the demo is centered around a super simple REST application that registers people for the NetBeans day. If you familiar with Spring Boot and Spring Data, the stuff isn’t probably to new, but you can still learn about the NB-SpringBoot plugin which does a lot of the stuff in NetBeans, that STS or IntelliJ do for Spring Boot.

The second part of my talk is about two great libraries respectively Maven plugins, Project Lombok and JaCoCo:


netbeans-maven-und-springboot-mehr-spas-zusammen-009

Project Lombok want’s to remove some of Javas necessary boilerplate code, that is: You can replace Getter and Setter with annotations, as well as constructors, equals/hashCode and more. Lombok is a source code annotation post processor and I always thought that I will have a hard time using it in a sane way in an IDE, but that’s actually not the case, you have in contrast, instant IDE support, as I tweeted before. People where surprised who easy Spring Boot can be used inside NetBeans, but the integration of Lombok and JaCoCo was really eye opening for some.

I’m gonna try something new here and show you what I did as a little screencast. The video references the repository above, it uses the commit fa22a87. It’s the first time I recorded something like this, so sorry, if’s a bit rough:

For anyone who doesn’t want to watch a video, Geertjan took some pictures:

I’m really convinced after using NetBeans for 2 years now after many years Eclipse, it deserves a voice. It’s a great tool and most of the time, it just works. And the best: It’s free and open source. I think the new title of Geertjans slides are even better than the jigsaws:

Ever seen kids playing with Lego? Sometimes the result doesn’t look as polished as the sets, but often they work as equally good. NetBeans may not be polished as other, much more expensive IDEs, but that actually doesn’t matter much to me.

For the evening a big thank you to Faktorzehn for providing a great place and great food and drinks, much appreciated.

If you have a JUG or a company who wants to learn more about that stuff, drop me a line, we probably can arrange something. I can give this talk in German as well as in English and can extend all topics, Spring Boot with NetBeans, Maven or Docker.

| Comments (3) »

10-Sep-16


Hibernate Search and Spring Boot: Simple yet powerful archiving

This post has been featured in the Hibernate Community Newsletter 18/2016.

Before my summer holidays I mentioned my personal twitter archive on Twitter again….

This time, Vlad from Hibernate reacted on my tweet:

More reactions came from Sanne and Emmanuel and here we go:

Content

  1. Source
  2. Background
  3. Features
  4. Tools used
  5. Application
  6. Database schema
  7. The Tweet entity
  8. Storing new entities
  9. Querying entities
  10. Conclusion
  11. Try it out yourself

Source

The whole project, which has already grown into more than a tech demo, is on github: michael-simons/tweetarchive.

What I skipped is a fancy gui. So far, it only has a REST interface. But, it can be run as a docker image with local, persistent storage. Check it out, star it, maybe even add stuff to it… Feel free!

Background

I’m running my archive for several years now, from Daily Fratze. Daily Fratze contains a home grown crawler that checks my user time line and stores my tweets in a MySQL database. I’m using JPA with Hibernate as my database access tool, so Hibernate Search fit’s nicely and is really easy to implement. Hibernate Search is a super easy way to add an Apache Lucene full text index to your entities.

For large scale applications, Elastic Search or similar maybe more fitting, but I’m really content with my “small” (at the end of last year ~50Mb) search index and it’s performance. It doesn’t add much (if any) overhead to development and on production.

For the demo, I’ve taken my entities but not the parser. For parsing in the demo I use Twitter4J. Twitter4J is apparently not made for parsing static tweets, so there are some ugly constructs for getting a Twitter archive into the app, but that should not be the point here. The entities have been adapted and refreshed according to my current skills. Some things I created years ago should never see the light of day.

Features

  • I want to be able to search my tweets. With keywords and with full blown Lucene queries
  • The application should track new tweets
  • The original JSON content should be stored as well

Tools used

In order:

Application

The application is a standard Spring Boot application. It’s 2016, you should find several real good guides out there and also on this blog how such an application is build.

I also assume that you have an idea what Apache Lucene is about.

Database schema

My migrations are inside src/main/resources/db/migration/ where Flyway automatically finds it. Flyway itself is recognized by Spring Boot if on the classpath.

I have this PostgresSQL cast

that allows me to store a string java attribute inside a JSONB column without a bunch of custom converters, without explicitly casting it but with type checks.

The table definition for tweets looks like this:

Nothing fancy here except the raw_data column, which contains the tweets original source. You can use PostgreSQLs JSON operators to query it, if you like.

The Tweet entity

You’ll find the Tweet entity here src/main/java/ac/simons/tweetarchive/tweets/TweetEntity.java. Basically, it is a standard JPA entity. I use Project Lombok to get rid of boiler plate code, so you’ll find no getters and setters.

For the following stuff, I assume you know JPA, because I’m not gonna covering that.

To make Hibernate Search aware of an entity, that should be indexed, you have to annotate the entity:

That is already all there is!

Next step: Add a simple field, for example the screen name, just annotate it with @Field:

That actually reads: Index that field, store the value with the index so that it can be searched without hitting the database but don’t to further analysis.

If you read through the entity, you’ll find several such fields.

Next: Analyzing fields. I want to search for similar words in the content of the tweet. While receiving the tweet, the application resolves URLs and stuff and replaces the short urls, see TweetStorageService.

The entity takes this one step further. The content field is annotated with:

Here the @Field annotation says: Index the content, don’t store it, but analyze it. It also says, through @AnalyzerDiscriminator, with which analyzer.

I have defined my analyzers right with the entity, but they can be defined elsewhere, on a package for example, too:

I have 3 analyzers in place: An English analyzer, wo tokenizes the input, lower cases it and then does english based word stemming. The same for German and last but not least, an analyzer that just tokenizes and filters the content.

The analyzer itself can be dynamically inferred with a discriminator, which looks like this:

Read: If the language of the tweet is available and supported, use the fitting analyzer, otherwise use the default analyzer for undefined languages.

Hibernate Search allows spatial queries. You can annotate the whole class or an attribute, that returns Coordinates:

Also nested entities are supported. My example: The information regarding a reply. I have InReplyTo as an @Embeddable class and an attribute inReplyTo

This reads: Please index the embedded class, add a prefix “reply.” to all fields and otherwise, check for @Field annotations in the embedded class.

So far: Not much!

Storing new entities

If you use Spring Boot together with Hibernate and Spring Data JPA, you have nothing to take care of except configuring the database (and you can even skip this, if you use an in memory database).

This is all the configuration it takes, to get Hibernate Search up and running with that setup, if you add org.springframework.boot:spring-boot-starter-data-jpa, org.postgresql:postgresql and org.hibernate:hibernate-search-orm to the classpath:

spring.datasource.platform = postgresql
spring.datasource.driver-class-name = org.postgresql.Driver
spring.datasource.url = jdbc:postgresql://localhost:5432/tweetArchive
spring.datasource.username = tweetArchive
spring.datasource.password = tweetArchive
 
spring.jpa.hibernate.ddl-auto = validate
 
spring.jpa.properties.hibernate.search.default.directory_provider = filesystem
spring.jpa.properties.hibernate.search.default.indexBase = ${user.dir}/var/index/default

Just go ahead and define a Repository the TweetEntity:

This is an Interface with no implementation in my application. It inherits from org.springframework.data.repository.Repository, thus providing means access entities already. I chose the simplest form of repository so that I don’t clutter my application with methods I wouldn’t need. If I instead would have inherited from CrudRepository, I wouldn’t have do define save or delete methods.

Calling the save or delete method from my tweet storage service already updates my search index.

Querying entities

But take good note that this interface inherits also from TweetRepositoryExt. This is the recommended way by Spring Data JPA to add custom behavior. This interface defines to search methods which I actually have to define. This is done in TweetRepositoryImpl and I’m gonna walk you through the search method:

First I retrieve a new FullTextEntityManager inside the declarative transaction and instantiate a query builder. The query builder exposes a nice, fluent interface to define my Lucene query. You’ll see how I add a keyword query on one specific field and also, if the user provided a date range, I add some range queries to a bracing boolean condition.

The FullTextEntityManager is then used again to instantiate a JPA query from the full text query and retrieve the result.

And that’s all there is: I can use (and hide!) the full text queries inside the same repositories I would use elsewhere.

Conclusion

If you already are using Hibernate as your ORM, have embraced Spring Data repositories and you’ll need to search some entities then Hibernate Search maybe the right approach for your project. It’s really easy to implement and also easy to use. One downside for a 12 factor app could be the fact, that the index is directory based in the default setting. You can work around it, though, by using JMS or JGroups.

I have been using Hibernate Search for quite a while now on Daily Fratze and on several other projects intern as well and for my respectively our purpose it has been enough.

Try it out yourself

There’s much more to learn in the demo application. Go to michael-simons/tweetarchive and see for yourself. There’s an extensive README, that should guide you through running the application yourself. The easiest way is to use a local Docker based instance.

If you like it, follow me on Twitter, I am @rotnroll666, leave a comment or a star.

| Comments (5) »

06-Sep-16


Integration testing with Docker and Maven

This post will use

to provide an integration test environment for Spring Boot applications, running at least JUnit 4.12, Spring Boot 1.4, the Failsafe plugin in the version managed by Spring Boot and the latest docker-maven-plugin.

Gerald Venzl asked for it on twitter especially in the context of integration tests with databases. In case you don’t know, Gerald and Bruno Borges are responsible for the official Oracle database docker images, which I am using at my company respectively for my upcoming talk at DOAG 2016.

Apart from Gerald asking I had several reasons to finally get this topic right. First, after upgrading my jugs site to Spring Boot 1.4 and with it, to Hibernate 5 I ran into issues with the ID generator which behaved differently than before (a million thanks to Vlad Mihalcea for his great input).

eurejug.eu is developed locally on an H2 database and runs on Pivotal Cloud Foundry in production where it uses a PostgreSQL database. So my problem was detected during the existing unit tests and broke the application in a way that wasn’t immediately obvious.

And last but not least: We at ENERKO INFORMATIK are creating mainly database centric application, some of them only 2 tier applications with a lot of SQL logic. That logic is fine for us and our customers, but automated integration testing always gives us a hard time, where I am convinced, that the following setup I developed for the Euregio JUG, will give us a lot of improvements, replacing the PostgreSQL docker image with Oracle ones. So everything you’ll read can be applied to other databases as well.

Configuring integration tests

There’s actually not much to do in a Spring Boot / Maven based application, just add the plugin like I did:

<plugin>
  <groupId>org.apache.maven.plugins</groupId>
  <artifactId>maven-failsafe-plugin</artifactId>
  <executions>
    <execution>
      <goals>
        <goal>integration-test</goal>
        <goal>verify</goal>
      </goals>
    </execution>
  </executions>
</plugin>

I can safely omit the version number because Spring Boot has this plugin in its managed dependencies. The failsafe plugin automatically recognizes the following patterns in test classes as integration tests:

  • “**/IT*.java” – includes all of its subdirectories and all Java filenames that start with “IT”.
  • “**/*IT.java” – includes all of its subdirectories and all Java filenames that end with “IT”.
  • “**/*ITCase.java” – includes all of its subdirectories and all Java filenames that end with “ITCase”.

The surefire plugin itself excludes them in the current version so that they aren’t run as Unit tests. I didn’t bother to move them into separate folders but that should be easy be doable by the Build Helper plugin.

Configure your containers with docker-maven-plugin

As I said before, the docker-maven-plugin has a superb documentation and is really easy to use. Here it is used to start a docker container based on the official postgresql image before the integration tests run. The integration tests are run by the failsafe plugin so its made sure that the container will be removed afterwards. See the commit for EuregJUG:

<plugin>
    <groupId>io.fabric8</groupId>
    <artifactId>docker-maven-plugin</artifactId>
    <version>0.20.1</version>
    <executions>
        <execution>
            <id>prepare-it-database</id>
            <phase>pre-integration-test</phase>
            <goals>
                <goal>start</goal>
            </goals>
            <configuration>
                <images>
                    <image>
                        <name>postgres:9.5.4</name>
                        <alias>it-database</alias>
                        <run>
                            <ports>
                                <port>it-database.port:5432</port>
                            </ports>
                            <wait>
                                <log>(?s)database system is ready to accept connections.*database system is ready to accept connections</log>
                                <time>20000</time>
                            </wait>
                        </run>
                    </image>
                </images>
            </configuration>
        </execution>
        <execution>
            <id>remove-it-database</id>
            <phase>post-integration-test</phase>
            <goals>
                <goal>stop</goal>
            </goals>
        </execution>
    </executions>
</plugin>

Here it runs an existing image under the alias “it-database”. It waits either until the configured message appears in the Docker logs or the time expires (The plugin can also be used to create new images, for example based on the Oracle Docker images which is something I’ll use in my talk and publish afterwards). The container is started before the integration tests and stopped afterwards.

Also take note of the important port mapping: it-database.port:5432. Docker maps this port to a random high port on your machine. This random port will be grabbed by the Maven plugin and assigned to the new property it-database.port and can be used throughout the pom file. If you would use the Oracle Database image, that would 1521 in all probability.

I use the new property as an environment variable for the integration tests by adding the following to the failsafe configuration above:

<plugin>
  <groupId>org.apache.maven.plugins</groupId>
  <artifactId>maven-failsafe-plugin</artifactId>
  <executions>
    <execution>
      <goals>
        <goal>integration-test</goal>
        <goal>verify</goal>
      </goals>
    </execution>
  </executions>
  <configuration>
    <environmentVariables>
      <it-database.port>${it-database.port}</it-database.port>
    </environmentVariables>
  </configuration>
</plugin>

and make use of Spring Boots awesome configuration features, as you can see in this commit, where I add a file named application-it.properties containing the following line among others:

spring.datasource.url = jdbc:postgresql://localhost:${it-database.port}/postgres

recognize the property? This way, I don’t have to hardcode the port somewhere, which would lead into problems when several builds run in parallel (for example on a CI machine). Parallel build wouldn’t be possible if Docker would map the exposed port to a fixed port on the machine running the build.

Also that file configures the data to load through:

spring.datasource.data = classpath:data-it.sql

Per default it would load a file called “data-.sql”, so you are free to write SQL fitting to your database. I much prefer this over a DBUnit or similar approach, because it’s much easier for me to just write the SQL down for the specific database than a mediocre replacement for database specific tasks.

Writing the actual tests for a Spring Boot application

So far, that was nothing, wasn’t it? I’m really impressed how much development change the last years. Yes, Maven is still XML based, but I didn’t have to do any “fancy” things or use special scripts or whatever to a sane testing environment up and running.

The actual test I needed looks like this:

import static org.hamcrest.Matchers.is;
import org.junit.Assert;
import org.junit.Test;
import org.junit.runner.RunWith;
import org.springframework.beans.factory.annotation.Autowired;
import org.springframework.boot.test.autoconfigure.orm.jpa.AutoConfigureTestDatabase;
import static org.springframework.boot.test.autoconfigure.orm.jpa.AutoConfigureTestDatabase.Replace.NONE;
import org.springframework.boot.test.autoconfigure.orm.jpa.DataJpaTest;
import org.springframework.test.context.ActiveProfiles;
import org.springframework.test.context.junit4.SpringRunner;
 
@RunWith(SpringRunner.class)
@DataJpaTest
@AutoConfigureTestDatabase(replace = NONE)
@ActiveProfiles("it")
public class RegistrationRepositoryIT {
 
    @Autowired
    private EventRepository eventRepository;
 
    @Autowired
    private RegistrationRepository registrationRepository;
 
    @Test
    public void idGeneratorsShouldWorkWithPostgreSQLAsExpected() {
        // data-id.sql creates on registration with id "1"
        final EventEntity event = this.eventRepository.findOne(1).get();
        final RegistrationEntity savedRegistration = this.registrationRepository.save(new RegistrationEntity(event, "foo@bar.baz", "idGeneratorsShouldWorkWithPostgreSQLAsExpected", null, true));
        Assert.assertThat(savedRegistration.getId(), is(2));
    }
}

Important here are three things:

  • Run the test with @RunWith(SpringRunner.class) and as a @DataJpaTest. I don’t want to fire up the whole application but only JPA and the corresponding repositories.
  • By default, @DataJpaTest replaces all data sources by an in memory database if such is on the class path, which is exactly what I don’t want here. So I prohibit this with @AutoConfigureTestDatabase(replace = NONE)
  • Activate a profile named “it” with @ActiveProfiles("it") so that Spring Boot takes the application-it.properties into account mentioned earlier.

Summary

I’m really, really happy with the solution, especially regarding the fact that I’m probably the right amount late to the Docker party. While it was relatively easy to install Docker on Linux, you would have to jump through hoops to get it running under OS X and especially on Windows. Lately, Docker natively supports libvirt under OS X and HyperV under Windows 10 pro so my colleagues don’t have a reason not to use it. The pre build images works fine and even the mentioned Oracle images above build without errors on several OS X and Windows machines I tested.

Even if you don’t want to have anything to do with Docker, the docker-maven-plugin is your friend. Nothing needs to be started manually, it just works. The EuregJUG site just build flawless on my Jenkins based CI.

If this post if of use to you, I’d be happy to hear in the comments.

Update: Here’s a snippet to build and run Oracle Database instances from the official Oracle Docker files: pom.xml.

Update 2: I just noticed that this commit on the above mentioned Oracle Database Images makes them somewhat less useful for integration testing: Now the Dockerfile only installs the database software but doesn’t create the database and the container takes ages to start up… I don’t have a solution for that right now and i’m staying with my older images, even if they are bit larger.

Update 3: In some cases a log messages appears twice in a container. The Maven-Docker-Plugin supports regex for log-waits since 0.20.1, which is reflected in pom snippet now. Also regarding Update 2: One just has to do it “right”, see this article.

| Comments (17) »

25-Aug-16


Burning Geek Insult Con

Some weeks ago this thread escalated quickly:

and the idea of the Burning Geek Insult Con was born… Bring your favorite IT discussion from the internet into real live…

We could over tracks like Tabs and Spaces (no explanation needed (ok, well, not exactly, how many spaces for one tab?)), The best editor in the world (for realz) and many more… What about Buildtools? For sure, not all has been said about Maven and Gradle… And should you use Groovy or Kotlin? I’m quite sure someone has a strong opinion… Anyway, MAKE is the way to go.

If we finish the track Databases (sorry, we don’t offer NoSQL stores) with survivors, we invite special guest to decide, which ORM to use or wether to use an ORM or not.

I’m quite sure I have forgotten stuff… Like IDEs, OS and such. In the end, it doesn’t matter anyway.

I love to try out new stuff, pretty much everyday. But regarding applications that should life longer than a month, I’ll try to be consistent with regards of the tools used and how they are used. What has been true with monoliths is still valid with micro services: Architecture and the role of the an architect is about communication. And communicating for me is much simpler if me and my team can come to agreements on tools.

| Comments (2) »

17-Aug-16