Druid Indexing Fails – ISE – Cannot find instance of indexer to talk to!

Recently my druid overlord and middle manager stopped indexing data.

I tried the basic multiple restarts but it did not help.

The logs of the middle manager revealed the following:

2018-10-02 06:01:37 [main] WARN  RemoteTaskActionClient:102 - Exception submitting action for task[index_nom_14days_2018-10-02T06:01:29.633Z]

java.io.IOException: Failed to locate service uri

        at io.druid.indexing.common.actions.RemoteTaskActionClient.submit(RemoteTaskActionClient.java:94) [druid-indexing-service-0.10.0.jar:0.10.0]

        at io.druid.indexing.common.task.IndexTask.isReady(IndexTask.java:150) [druid-indexing-service-0.10.0.jar:0.10.0]

        at io.druid.indexing.worker.executor.ExecutorLifecycle.start(ExecutorLifecycle.java:169) [druid-indexing-service-0.10.0.jar:0.10.0]

        at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) ~[?:1.8.0_181]

        at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) ~[?:1.8.0_181]

        at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) ~[?:1.8.0_181]

        at java.lang.reflect.Method.invoke(Method.java:498) ~[?:1.8.0_181]

        at io.druid.java.util.common.lifecycle.Lifecycle$AnnotationBasedHandler.start(Lifecycle.java:364) [java-util-0.10.0.jar:0.10.0]

        at io.druid.java.util.common.lifecycle.Lifecycle.start(Lifecycle.java:263) [java-util-0.10.0.jar:0.10.0]

        at io.druid.guice.LifecycleModule$2.start(LifecycleModule.java:156) [druid-api-0.10.0.jar:0.10.0]

        at io.druid.cli.GuiceRunnable.initLifecycle(GuiceRunnable.java:102) [druid-services-0.10.0.jar:0.10.0]

        at io.druid.cli.CliPeon.run(CliPeon.java:277) [druid-services-0.10.0.jar:0.10.0]

        at io.druid.cli.Main.main(Main.java:108) [druid-services-0.10.0.jar:0.10.0]

Caused by: io.druid.java.util.common.ISE: Cannot find instance of indexer to talk to!

        at io.druid.indexing.common.actions.RemoteTaskActionClient.getServiceInstance(RemoteTaskActionClient.java:168) ~[druid-indexing-service-0.10.0.jar:0.10.0]

        at io.druid.indexing.common.actions.RemoteTaskActionClient.submit(RemoteTaskActionClient.java:89) ~[druid-indexing-service-0.10.0.jar:0.10.0]

The main cause was

Caused by: io.druid.java.util.common.ISE: Cannot find instance of indexer to talk to!

Apparently the middle manager and its peons could not seem to find the overlord service for indexing to start and hence all jobs were failing.

Root Cause:

I logged into zookeeper and found the following to entries for Overlord/Coordinator to be empty even though they are up and running.

Since they were empty, the middle manager could not find an indexer to talk to ..!! as the middle manager looks up zookeeper.

[zk: localhost:2181(CONNECTED) 1] ls /druid/discovery/druid:coordinator
[]
[zk: localhost:2181(CONNECTED) 3] ls /druid/discovery/druid:overlord
[]

Fix:

assuming that Coordinator runs on Host_A & Overlord runs on Host_B and … do the following:

(1) Generate 2 random ids. 1 for Coordinator & 1 for Overlord

Druid code uses the java code :

System.out.println(java.util.UUID.randomUUID().toString());

you can use any random string.

lets assume you generate

45c3697f-414f-48f2-b1bc-b9ab5a0ebbd4‘ for coordinator

&

f1babb39-26c1-42cb-ac4a-0eb21ae5d77d‘ for overlord.

(2) log into zookeeper cli and run the following commands:

[zk: localhost:2181(CONNECTED) 3] create /druid/discovery/druid:coordinator/45c3697f-414f-48f2-b1bc-b9ab5a0ebbd4 {"name":"druid:coordinator","id":"45c3697f-414f-48f2-b1bc-b9ab5a0ebbd4","address":"HOST_A_IP","port":8081,"sslPort":null,"payload":null,"registrationTimeUTC":1538459656828,"serviceType":"DYNAMIC","uriSpec":null}

[zk: localhost:2181(CONNECTED) 3] create /druid/discovery/druid:overlord/f1babb39-26c1-42cb-ac4a-0eb21ae5d77d {"name":"druid:overlord","id":"f1babb39-26c1-42cb-ac4a-0eb21ae5d77d","address":"HOST_B_IP","port":8090,"sslPort":null,"payload":null,"registrationTimeUTC":1538459763281,"serviceType":"DYNAMIC","uriSpec
":null}

You will notice a registrationTimeUtc field – put in some time u see fit.

This should create the necessary zookeeper nodes for middle manager to lookup

now restart coordinator then overlord then middleManager

Submit a indexing task and it should work.

Additional tips:

In $Druid_Home/conf/druid/middleManager/runtime.properties you may have

druid.indexer.runner.javaOpts=-cp conf/druid/_common:conf/druid/middleManager/:conf/druid/middleManager:lib/* -server -Ddruid.selectors.indexing.serviceName=druid/overlord -Xmx7g -XX:+UseG1GC -XX:MaxGCPauseMillis=100 -XX:+PrintGCDetails -XX:+PrintGCTimeStamps -Dhadoop.mapreduce.job.user.classpath.first=true

and $Druid_Home/conf/druid/middleManager/jvm.config you may have

-Ddruid.selectors.indexing.serviceName=druid/overlord

Hope this helps you.

Advertisements

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Google+ photo

You are commenting using your Google+ account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s