Skip to main content
Blog

Connecting to Elasticsearch on OpenShift

By 29 juni 2015januari 30th, 2017No Comments

With OpenShift as the new wave in technology our Java-application connecting to and using Elasticsearch also needs to be deployable on OpenShift. There is a nice cartridge available, provided by rbrower3. It requires a scalable gear and it takes the place of the web-part of our gear. Hence we cannot deploy it side-by-side on our gear with Tomcat nor with JBoss/WidFly.

Elasticsearch

Elasticsearch

As we need WildFly or Tomcat for our application to run, we end up setting up an additional application in our domain. This is where we meet our challenge: how to connect these two.

To connect to Elasticsearch you have these options:

  1. run Elasticsearch inside the same JVM: embedded
  2. run Elasticsearch as a non-data-containing node
  3. connect to Elasticsearch via the transport-client
  4. connect to Elasticsearch via the REST-API

The first 3 are supported natively by the Elasticsearch distribution. The fourth option needs some extra work/libraries (Jest).

The setup inside the same JVM is useful for (Unit-)test situations. One can limit the access and disable the discovery systems to minimize start-up-time.
The most optimal setup is the ‘non-data-containing node’ option. In that mode the node knows the cluster and hence where to send its queries. It even cooperates in merging the search results.
The down-side of the ‘non-data-containing node’ option is that your client needs to be inside the network of the other nodes as it needs to discover the other nodes and get notified of changes in that network. This is not an option in our OpenShift-setup as AWS does not support multi-cast between the gears. In this case the transport-client can offer a nice way of connecting. We only need to specify one or more nodes to connect to.
Our application was developed using Spring-Data-Elasticsearch. It offers a nice convenient way of declaring document-structures and auto-created search-options. From this point creating a CRUD-like application with wide searching capabilities was easy. We started by trying to connect Spring-Data-Elasticsearch to the remote Elasticsearch (in the rbrower3-cardridge in our OpenShift-domain) using the Transport-client, which did not work at all. The rbrower3-cardridge provides the Elasticsearch REST-API only. This is understandable as that is the way the haproxy will be able to auto-scale the Elasticsearch-service. But Spring-Data-Elasticsearch does not provide a way to connect via the REST-API. We’re stuck. None of the other ports exposed offer the Transport-client-api.

Solution minded as we are, we dove into the deployed rbrower3-application (using ssh) and stumbled across a wealth of environment variables, all starting with “OPENSHIFT_ELASTICSEARCH_”. One of them (OPENSHIFT_ELASTICSEARCH_CLUSTER) yields a 10.x.y.z IP-address and a port. These can be used to allow Transport-client-access from a different gear in the same OpenShift-domain. Now we are in business. This is how we then create a Client for Spring-Data-Elasticsearch:

TransportClient createTransportClient(String clusterName, String esHost, int port) {
    ImmutableSettings.Builder settingsBuilder = ImmutableSettings.settingsBuilder();
    settingsBuilder.put(PARAM_CLUSTER_NAME, clusterName);
    LOG.info("Using host {}:{}", esHost, port);
    return new TransportClient(settingsBuilder.build()) //
                  .addTransportAddress(new InetSocketTransportAddress(esHost, port));
}

The cluster-name can be acquired via the REST-API, just visiting the URL of the OpenShift application with the rbrower3-deployment.
When using multiple gears to run the Elasticsearch cluster, the variable OPENSHIFT_ELASTICSEARCH_CLUSTER will contain a comma separated list of IP:port pairs. These can be added to our node-configuration using multiple calls to addTransportAddress. Keep in mind that when the cluster would be auto-scaling, the variable will not always be up-to-date. And the client-application cannot auto-re-configure out-of-the-box which could have been possible using the ‘non-data-containing’-node option.

Note 1: the IP:port is accessible to all OpenShift-online domains. Even though this is on a obscure port, make sure you use some kind of security, e.g. Shield.

Note 2: keep in mind that when using the free offering of OpenShift-online, the small gears are subject to ‘application idling‘ when not enough HTTP-traffic goes to the gear. As our Java-application is not using the Elasticsearch REST-API, hence not generating any HTTP-traffic, OpenShift will idle the Elasticsearch gears and not turn them on again when our Java-application needs them.

References: