Druid Performance tips

Some tips for druid performance.

We did some benchmarks in our environment and these tips really helped us in getting better query execution times.

https://www.slideshare.net/jaihind213/druid-beginner-performance-tips

 

 

 

 

Advertisements

No data points found – using Prometheus Docker ?

hi there,

recently i started pushing metrics to prometheus but whenever i sent a sample query to prometheus to see the data points, i would get none.

Screen Shot 2017-07-23 at 5.26.56 PMThere were no errors in prometheus and i was scratching my head …. grrr…

I was on Mac OS and using a Docker image of Prometheus (version=1.7.1)

When i went into the docker container and checked the dateTime, walaaaaaaaaaaaa

There was a Time mismatch between host and Docker container.

Screen Shot 2017-07-20 at 2.53.59 PM

#___7_day__difference__ –> Possible docker bug !!!

since i was searching in the current timebucket, and the datapoints were going to time bucket 7 days ago, i was not able to find em.

FIX:

Restarting docker for Mac fixed the issue.

 

Druid Segment Diskspace Calculator

Recently i have been working with Druid & was trying to come up with the disk space sizing on the historical nodes, as we have to deploy onto remote customer locations, for which we need to come up with machine requests way in advance.

dc

This took me into the world of bitmaps : concise and roaring.

Druid uses concise bitmaps by default and has the option for roaring too.

References:

So after reading a bit i decided to come up with a calculator for Druid Segment Sizing needed on the druid HISTORICAL nodes assuming Concise is used.

You can find the calculator  @ github.com/jaihind213/druid-calculator

PS: Initially i wanted to write a blog post explaining the logic behind my calculator, but was too lazy, so wrote code and included the rationale in the code comments. 🙂

Thank you and hope you find it useful. Feedback is welcome.

 

 

 

 

My Attempt to demystify Data Stores.

Recently, i gave a presentation at Singapore Py-DataMeetup on  “Demystifying Datastores”.

Screen Shot 2017-03-15 at 11.33.54 AM.png

The main motivation of the presentation is to help one understand the various facets of a datatstore and how these facets can help you decide which one to use.

here is the link