Storm Spout Co-ordinator – llegal State Exception – Deactivate / Activate Topology

Recently while working with Transactional Topologies storm 0.9.4 we came across this error in the spout

java.lang.IllegalStateException: 
Expecting previous txid state to be the previous transaction

We were a little confused as to why this happened, as the code change was minimal.

Root Cause Analysis:

Normally we make method ‘isReady()’ of Coordinator return true. We made changes for it to return false, i.e we wanted to

implement deactivation/activation of the topology instead of killing it cold i.e. a graceful stop.

In addition to that, we had for our topology

conf.put(Config.TOPOLOGY_MAX_SPOUT_PENDING, 10);

i.e. the number of parallel in flight transactions was set to 10.

What happens when ‘isReady()’ returns false ?

As per Coordinator Java doc,  the next transaction id’s  are skipped,

i.e. will not be used. refer TransactionalSpoutCoordinator.java

if(_activeTx.size() < _maxTransactionActive) {
    BigInteger curr = _currTransaction;
    for(int i=0; i<_maxTransactionActive; i++) {
        if((_coordinatorState.hasCache(curr) || _coordinator.isReady())
                && !_activeTx.containsKey(curr)) {
          ....
          Object state = _coordinatorState.getState(curr, _initializer);
          ....
        }
        curr = nextTransactionId(curr);   //      <------- txn++ ----------------
    }
}

What happens when ‘isReady()’ returns true .i.e. we activate topology again ?

The same logic from above runs i.e.

Object state = _coordinatorState.getState(curr, _initializer);

because we have below

conf.put(Config.TOPOLOGY_MAX_SPOUT_PENDING, 10); 
// hence it enters the for loop above

Now if we go into the getState() method

if(_strictOrder) {
    .....
    if(prev!=null && !prev.equals(txid.subtract(BigInteger.ONE))) {
        throw new IllegalStateException("Expecting previous txid state to be the previous transaction");
    }
    .....                
}

Lets say we deactivate the topology, making isReady() return FALSE.

Assuming the previous committed txn is 3 and current is say 6 after some skipping.

From the above, the 6 – 1 is not equal to 3. i.e. STRICT order needs to be maintained.

Storm expects the next txn to be 4 as 4 – 1 == 3.  but do to skipping, this is violated.

Hence the exception as we have set

conf.put(Config.TOPOLOGY_MAX_SPOUT_PENDING, 10);

if this is set to 1, this will not happen as it will never enter the for loop.

I guess this is a bug.

Work Around:

Deactivate your topology by making isReturn() return False, wait for data processed to reach Zero.

and then kill your topology.

or set

conf.put(Config.TOPOLOGY_MAX_SPOUT_PENDING, 01);

Hope this helps.

a wild Supposition: can MySQL be Kafka ?

This is an idea which i presented at percona live 2015.

Is MySQL an avatar of Apache Kafka ?

Can it be Kafka ?

Yes, it can.

This talk takes a shot at modeling MySQL as Kafka.

PS: it’s unconventional, hence a WILD supposition :)

slides @

http://www.slideshare.net/jaihind213/can-mysql-bekafka

 or

MySQL Cluster – Java Connector / Bindings

While working with MySQL Cluster, i was looking for a monitoring framework for the cluster.

i came across a library @ https://launchpad.net/ndb-bindings – which had java and other connectors to NDB, the library was a wrapper of the existing C++ NDB Api.

This library allowed me to connect to the management node , get the state of the cluster and get real time notifications about heartbeat misses/node disconnections.

The library error-ed out on some conditions, with a small fix, it can work with MySQL Cluster 7.3.

https://github.com/jaihind213/mysql-cluster-ndb-bindings

I have listed down steps for compilation and running a sample program at github

NDB – Forced node shutdown completed. Caused by error 2341: ‘Internal program error (failed ndbrequire)

recently while working on MySQL cluster 7.3.5, we came across this error

Forced node shutdown completed. Caused by error 2341: 'Internal program error (failed ndbrequire)(Internal error, programming error or missing error message, please report a bug). Temporary error, restart node'.

The management nodes would start up but

the data nodes would complete phase 1 & shutdown, some times at phase 2 and some times at phase 5.

The error one would see in the trace file would be as follows:

Status: Temporary error, restart node
Message: Internal program error (failed ndbrequire) (Internal error, programming error or missing error message, please report a bug)
Error: 2341
Error data: CREATE_TABLE_REF
Error object: NDBCNTR (Line: 2493) 0x00000003
Program: ndbd

How did we resolve it ?

First, few of our data nodes had an incorrect connect-string

Second, we had pre-assigned NodeGroups for the data nodes in the config.ini, and had started our node groups with serial number 1.

[ndbd]

NodeGroupt=1

HostName=10.95.139.92

[ndbd]

NodeGroupt=2

HostName=10.95.139.92

Apparently, NDB likes to start up with serial number 0 onwards, so we changed  to

[ndbd]

NodeGroupt= 0

HostName=10.95.139.92

[ndbd]

NodeGroupt= 1

HostName=10.95.139.92

and walaa our cluster started up.

Thanks to Frank, for giving the work around (here).

Mobile Hack Day

Mobile Hack Day

its been a long time since i had a post on M*A*S*H and its nice to blog again.

Well yesterday, on August 7,  we had our HACK day and the theme was MOBILE.  I have participated in all flipkart hack days and i must tell you , they are simply awesome.

Last time the team ( Chetna/Anirudh/ Abhijat  & I )  went out to make Chota Minority Report (link here) and we had an awesome time. we won the Laziest but Effective Hack Award.

This time we set out to make something Different.  Since the Theme was MOBILE, we intended to do just that :)

How about making the MOBILE(noun) truly MOBILE (verb) !!!

Chetna & Anirdudh worked on the android app & Abhijat & I worked on the hardware.

We came up with 2 hacks.

The first is    Bhaag Mobile Bhaag  –  the mobile becomes mobile.

 

Bhaag Mobile Bhaag

THE  HACK  MOBILE

 

as per the theme , we made the mobile truly mobile :)

We made a mobile app (which we can control remotely) and this app flashes light.

once light hits the censor, the vehicle (which we programmed) starts moving. once light stops, the vehicle stops.

A video of mobility below :)

 

The second is called  “App Suicide

app suicide

A simple idea – with the motto –  “Customer First“,  the app cares for the customer and it is willing to save on battery for the customer.

All in all , a great hack day and we had fun.

Team:  Chetna/Anirudh/ Abhijat / Vishnu

PS: we work in the data platform on project Bigfoot @ flipkart