EEA & Eionet documentation hub

Browse documentation for IT-systems used by the European Environment Agency and the Eionet network.

eea.docker.searchservices

eea.docker.searchservices

ElasticSearch + Facetview complete Docker Stack Orchestration

This repo is DEPRECATED: We now deploy via EEA Rancher catalog templates.

1. Components

1.1 eeasearch [repo], [docker] - Node.js frontend to an ElasticSearch cluster

  • This container listens on port 3000 and provides a readonly API endpoint to the elasticsearch cluster.
  • The rendering is done by using jquery.facetview.js
  • The base image has support for automatic sync jobs and for running index management commands

More details on the source repository

1.2 pam [repo], - Node.js frontend to an ElasticSearch cluster

  • This container listens on port 3010 and provides a readonly API endpoint to the elasticsearch cluster.
  • The rendering is done by using jquery.facetview.js
  • The base image has support for running index management commands

More details on the source repository

1.3 aide [repo], - Node.js frontend to an ElasticSearch cluster

  • This container listens on port 3020 and provides a readonly API endpoint to the elasticsearch cluster.
  • The rendering is done by using jquery.facetview.js
  • The base image has support for running index management commands

More details on the source repository

1.4 esmaster [repo], [docker] - Elastic master configurated node

  • This node can’t do anything besides cluster management. Thus, it has a low chance of getting shut down.

1.5 esclient [repo] [docker] - Elastic HTTP client configured node

  • This node is the only one that can accept, parse, scatter and gather HTTP query requests.
  • The actual work is being performed by the esworkers

1.6 esworker [repo], [docker] - Elastic Data storage nodes

  • Two configurated nodes for data replication.
  • These nodes hold the data and execute the actual queries received from the esclient.
  • In addition, these are the only nodes that can run the River process. Thus, if the river process brings down the node (e.g. consumes too much memory), the other node will be able to serve the data.
**1.7 dataw[1 2]** - Data Volume Containers
  • Lightweight containers holding the data stored in the workers.
  • These containers make the data easy to backup and be restored independent of the esworker container’s faith.

1.8 datam - Data Container for Master node

  • This container doesn’t store any indexed data, but it stores information about the worker nodes, required by the master node
  • More information about elasticsearch node roles can be found here
  • More information about elasticsearch node discovery can be found here

2. Deployment tips

2.1 Getting the latest release up and running for the first time

git clone --recurse https://github.com/eea/eea.docker.searchservices
cd eea.docker.searchservices
docker-compose up -d

To see all commands an elastic app can do type docker-compose run --rm eeasearch help.

Troubleshooting: Data is not indexed? Sometimes during the indexing and even after finishing it queries on the new index throws an error. Restarting elasticsearch solves the problem:

# Restarting the elastic workers if the index is not built
docker-compose restart esworker1
docker-compose restart esworker2

Now go to the <serverip>:9200/_plugin/head/ to see if the index is being built.

Also you can try to increment the ES_HEAP_SIZE for the clients in the docker- compose.yml.

2.1.1 Auto indexing

All elastic search apps run a create index at startup if they haven’t indexes or not have data.

You can stop this feature adding AUTO_INDEXING=false into environment section of the docker-compose.yml

...
environment:
        - AUTO_INDEXING=false
...

After you can run the follow steps to index

# Wait a while for the elastic cluster to get initialized
# Start indexing data
docker-compose run --rm eeasearch create_index
# Check the logs
docker-compose logs
# If the river is not indexing just perform a couple of reindex commands
docker-compose run --rm eeasearch reindex
# Go to this host:3000 to see that data is being harvested
# And the same for PAM
# Start indexing data
docker-compose run --rm pam create_index
# And the same for AIDE
# Start indexing data
docker-compose run --rm aide create_index
# Check the logs
docker-compose logs
# If the river is not indexing just perform a couple of reindex commands
docker-compose run --rm pam reindex
# Go to this host:3010 to see that data is being harvested for pam
# Go to this host:3020 to see that data is being harvested for aide

2.2 Persistent data

The data is kept persistent by using two explicit data containers. The data is mounted in /usr/share/elasticsearch/data Follow te steps from the “Backup, restore, or migrate data volumes” section in the Docker documentation

2.3 Performing production updates

Change the tags in this repo to match the image version you want to upgrade to. Then, push the changes on this repo. On the host runnig this compose-file do:

docker-compose stop    # stop the running containers
git pull origin master # and get the docker-compose-prod.yml containing the latests tags
# Before this step you should backup the data containers if the update procedure fails
docker-compose pull    # get the images and their tags
docker images | grep eeacms # inspect that the new images have been downloaded
docker-compose rm -vf eeasearch aide pam # remove the old containers befor start
docker-compose up -d --no-recreate # start the running containers

Possible problems

In some cases the containers cannot be stopped because for some reason they have no names. This happens mostly for the elastic containers. Running

docker ps -a

Displays the list of containers but some of them have no names. First these containers should be removed with

docker rm --force <container_id>

Second the containers should be rebuilt with

docker-compose up -d --no-recreate

2.4 Running index management scripts from your office :)

Given a webapp and the fact that you can access esclient from your office you can reindex the data or force a sync using this command.

Assuming that esclient:9200 is available at http://some- staging:80/elasticsearch/ and you have permission to perform PUT POST and DELETE over that endpoint from your office, you can run this oneliner to reindex the data from a given app.

docker run --rm -e elastic_host=some-staging -e elastic_path=/elasticsearch/ -e elastic_port=80 eeacms/eeasearch reindex

To see a list of all available commands run:

docker run --rm -e elastic_host=some-staging -e elastic_path=/elasticsearch/ -e elastic_port=80 eeacms/eeasearch help

By default elastic_path is / and elastic_port is 9200. So you can omit them if esclient is accessible on port 9200 at path /.

2.5 A note about scaling

TL;DR - it won’t work with docker-compose scale because the overhead is in worker nodes which need additional ops to be scaled.

By default, ElasticSearch breaks an index into 5 shards (holding different parts of the data). Each shard will have one replica. If we have 4 workers with this setup, then shards could be distributed as such:

  • Node1: Shard 0 Primary, Shard 1 Replia, Shard 3 Primary
  • Node2: Shard 0 Replica, Shard 1 Primary, Shard 2 Primary
  • Node3: Shard 4 Replica, Shard 3 Replica
  • Node4: Shard 4 Primary, Shard 2 Replica

If Node3 and Node4 are scaled down, Shard 4 will get lost and it would be hard to recover.

  • Scaling up will not automatically move shards to other nodes in order to better distribute the jobs.

  • Scaling down will not move shards to remaining nodes to keep availability.

  • Running on the same host would increase the number of parallel disk accesses which can trash the cache, resulting in poor performance.

  • Worker nodes perform most of the work. If something runs slow it’s a high change that something is taking too long on the workers, not the client or the master.

Maintaining a more complex ElasticSearch Cluster means distributing it over more hosts and performing careful operations for scaling so data is not lost. Just don’t do docker scale over elastic nodes.

2.7 Deployment with Rancher

The provided docker-compose-prod.yml in this repo is already configured to run within Rancher PaaS.

Make sure you have the appropriate labels on the docker hosts in your Rancher cluster. See docker-compose-prod.yml and look for labels io.rancher.scheduler.affinity:host_label.

Go to your Rancher Web interface and generate your API key (API & Keys for “…” Environment):

$ export RANCHER_URL=<(Endpoint URL)>
$ export RANCHER_ACCESS_KEY=<(ACCESS KEY)>
$ export RANCHER_SECRET_KEY=<(SECRET KEY)>

$ git clone https://github.com/eea/eea.docker.searchservices.git
$ cd eea.docker.searchservices
$ rancher-compose up

The above will automatically create a stack named eea-docker-searchservices and run it. Now look at the exposed rancher loadbalancer and configure your DNS/proxy to point to it.

3. Clean Development setup

Perform this steps to be able to easily make changes to any of the EEA maintained parts of this stack.

3.1 Prerequisites

  • bash :)
  • python (>= 2)
  • git :)
  • maven (for building the EEA RDF River plugin) sudo apt-get install maven and a Java environment
  • npm (>= 2.8.4) for building and publishing the base node.js webapp module
  • Follow these steps to install the needed versions on a Debian based system [TODO]
  • Docker (>=1.5) and docker-compose (>=1.3.0)
  • Follow these steps to install them [TODO]
  • To easily run the commands ad your user into the docker group and re-login for the changes to take effect.

3.2 Clone all the components of the stack

This repository glues together all the components of the stack and also offers a template for a development docker-compose file. Change directory to your home or working folder and clone the project using:

user@host ~/ $ git clone --recursive git@github.com:eea/eea.docker.searchservices.git

3.3 Run everything on your host

Building the elastic containers from sources is rarely used, and takes lot of time, so we have 2 options:

  • use the elastic images from dockerhub
  • build the images from sources
3.3.1 With elastic images pulled from the hub

Run docker-compose -f docker-compose-dev.yml up to start all services.

Check http://localhost:9200 or http://localhost:9200/_plugin/head/ to see if elastic is up and running. When it’s up, you can go to http://localhost:3000, http://localhost:3010 and http://localhost:3020 then make yourself a coffee, everything works now.

3.3.2 With elastic images built from the source code using the rdf

river plugin from sources

Run docker-compose -f docker-compose-dev-elastic.yml up to start all services.

3.3.3 Indexing

Run docker-compose -f docker-compose-dev.yml run --rm eeasearch create_index to create the index for EEASearch

Run docker-compose -f docker-compose-dev.yml run --rm pam create_index to create the index for PAM

Run docker-compose -f docker-compose-dev.yml run --rm aide create_index to create the index for AIDE

4. Publishing changes and updating Docker Registry images

Assuming you have tested locally and implemented the needed features, depending on the code you changed, perform the following steps to make the changes available in Docker Registry.

You can also use repo specific docker-compose.yml files if the changes affect only a part of the stack.

4.1. eea.searchserver.js

Note: make sure that all the applications using this package work with your new changes before publishing anything.

First, you need to publish the new version of the package.

  • Open package.json and increment the version
  • Commit your changes
  • Commit a new tag

This repository will not automatically build the eeacms/eeasearch (and other apps) Docker images.

4.2. eea.elasticsearch.river.rdf

Note: make sure that all the applications using the river work with your new changes before publishing anything.

First, you need to add a new release of the river.

  • Open pom.xml and increment the version
  • Run mvn clean install to make a new build
  • Commit your changes
  • Go to the releases tab
  • Click on draft a new release
  • Fill in the tag version and release name as the version you added in pom.xml This is needed because the Dockerfile expects this naming scheme
  • Attach eea.elasticsearch.river.rdf/target/releases/eea-rdf-river-plugin-version.zip as a binary release
  • Complete the release

This repository will not automatically build the eeacms/elastic Docker images.

4.3. eea.docker.elastic and eea.docker.eeasearch

Pushing to master will automatically trigger a build with the :latest tag. Make sure that you are building with the correct tags and wait for the builds to complete bofore performing these steps.

4.4. Information about current container, git version and index

All elastic applications will display in the page footer information about the current index and container, like below:

Application data last refreshed 05 April 2016 12:52 PM. Version info eeacms/pam:v2.7.3 and git tag number v2.8 on 718b1e09d6a0.
  • 05 April 2016 12:52 PM - the date when index was updated/rebuilt
  • eeacms/pam:v2.7.3 - current image version used; this is an optional value that can be specified in the docker compose file like below:

    environment: - VERSION_INFO=eeacms/pam:v2.7.3

  • v2.8 - current git tag number (based on git describe –tags)
  • 718b1e09d6a0 - container id (HOSTANME environment variable)

Edit this page