Well heres a thing. You are reading the ramblings of someone who has kept away from the realms of FB, Twitter, and Blogs in General. The dog has been prolific, but I have kept away for no good reason, except I didn't see the point.
Finally I get it. I really get it. I came across a product stack that would really really help, I tried installing , but found the docs missing in the areas that were really really necessary. They just didn't result in what I needed.. a working product.
So, reader, let me explain the scenario (and for those that just need instructions on how to install the ELK stack.. just scroll on down.. I won't be offended.. I am first an foremost a developer who can't RTFM):
I have a client (nameless for the time being, due to business reasons). It is a startup. I have built a largish database setup as it uses data.. lots of it. Strange and magical data. It gets queried in strange ways, it will only get stranger and bigger. They were using MongoDB, but it had been set up by someone who didn't see the beauty of this product. So it is now set up in a fully extensible way, on a number of machines, as 3 shards of 3 replicas each shard. Thats actually on 7 machines, but we can easily expand out to thousands.
The issue? Once up and working we had an issue. MongoDB comes with monitoring software (mms) its cool.. it works... but it misses the salient points of how the databases are split, what the queries are, the time it takes to execute queries... and importantly... how the heck do I visualise this?
I had been using Splunk for a couple of years. Its good... it gives you data, not necessarily live kickass data, but it does what it says on the tin.... and costs a few k dollars/sterling/Euros (Eeek... euros?? what a dumb name that is.. but I digress) per year to run.. per server... not good.
I trawled the web, I looked at various options, even juggling around with some code I wrote for another client in the MySql world... but nothing there in the short term (did I mention it is a startup with the need for speed?).
Queries by Shard (Fig 1) |
Query throughput by server (Fig 2) |
Log Enquiry (Fig 3) |
Briefly then:
1. Apologies for those that follow the Seattle boys.. I am a Linux Open Source kindda guy. For those that like their fruit cidery, this will probably work with a few tweaks, as Steve followed the right type of OS.
2. Our setup, as I mentioned, involves 7 machines running MongoDB, plus 3 running Elasticsearch, spread over 2 data centres. We also have multiple points of entry to both systems, so we have failover in case of a whole data centre being taken out.
3. (2) Does not preclude the single use machine, it just has relevance to those who try to understand some of the manoevres here.
On with the show:
You need to plan a bit first:
1. Which Machine(s) are you going to put ES on? You should have, even with a single node ES unit, a seperate 'sentinel' (no data) access unit. You will need master/slave/ shards at some point.. plan for it now. You do all your connections through the sentinel, it worries about getting the data for you. Make life easy for yourself. (If you are already a Mongo user, its the same as a mongos unit).
2. Which machine are you going to put Kibana (the graph end) on?
3. You will need Java. ES uses Lucene which uses Java. You will need Logstash, which uses Java. Essentially... you need Java. None of the programming/script building you do is Java, just the tools use it.
Elasticsearch (ES):
1. Install java. You may have to replace the version number..
sh>sudo apt-get install openjdk-7-jre
2. Repeat on all systems using ES, logstash, and probably Kibana for good
measure.
3. Check http://www.elasticsearch.org/download for the latest version of ES
4. sh>wget https://download.elasticsearch.org/elasticsearch/elasticsearch/elasticsearch-[versionnumber].deb (current 1.2.1)
5. sudo dpkg -i elasticsearch-[versionnumber].deb
6. sudo service elasticsearch start
Thats gets you a node running. If you are building a cluster, you will need to change some bits in teh /etc/elasticsearch.yaml file. If you are not you will also need to change some items. The basic list is here... there is plenty of well documented sections on this elsehere.. so I wont bore you.
The essential elements are here:
cluster.name: thisCluster
node.name: thisNode
node.data: true (false if the sentinel)
node.master: true (always true, so it can be voted a master if the
current master dies)
index.number_of_shards: 5 (the default)
index.number_of_replicas: 2 (or however many masters+slaves you have)
the paths for data/configs/ and logs
network.bind_host: thisIP
network.publish_host: thisIP (the Network address other nodes will use
to talk to this one).
http.port: 9200 (default)
transport.tcp.port:9300 (default)
discovery.zen.minimum_master_nodes: 1
discovery.zen.ping.multicast.enabled: false (forces Unicast)
7. Be careful. If installing a cluster, it wont work if one node is a different
version of ES.
8. If you need an answer as regards replicas the ES way... can I suggest looking here
9. sh> sudo service elasticsearch restart (with your .yaml changes)
10. Make sure you have ES set up on each node you want to run it, and also any sentinel machines.
LogStash - The bit that reads the mongo logs and dumps to ES
1. Install Java... you have installed java haven't you?
2. Make sure you have a java tmp folder, that is fully writable/readable
sh>mkdir /home/[user]/javatmp
3. Set the following environment valriables:
sh>JAVA_HOME="/usr/lib/jvm/java-7-openjdk-amd64/"
sh>source /etc/environment
sh>_JAVA_OPTIONS=-Djava.io.tmpdir=/home/[user]/javatmp
4. Set them in the profile options (debian ubuntu... others roll as per your
version)
sudo nano /etc/profile.d/logstash.sh
export JAVA_HOME=/usr/lib/jvm/java-7-openjdk-amd64/
export _JAVA_OPTIONS=-Djava.io.tmpdir=/home/[user]/javatmp
sh> sudo chmod 0755 /etc/profile.d/logstash.sh
5. From youor home directory
sudo wget https://download.elasticsearch.org/logstash/logstash/logstash-1.4.1.tar.gz
6. tar -xvf logstash-1.4.1.tar.gz
7. Edit/Create mongoqry.conf - this bit is important as it tells ES/Lucene how to handle your data, which in turn will help with Kibana using it correctly.
input {
file {
discover_interval => 10
add_field => { host => "your_host"}
add_field => { shard => "your_shard"}
path => ["/var/log/mongodb/your_mongo_log.log"]
start_position => "beginning"
tags => ["your_host", "mongo","other_tags"]
type => "your_host_log_type" <-- This helps in Kibana to identify these entries
}
}
filter {
grok { pattern => ["(?m)%{GREEDYDATA} \[conn%{NUMBER:mongoConnection}\] %{WORD:mongoCommand} %{NOTSPACE:mongoDatabase} %{WORD}: \{ %{GREEDYDATA:mongoStatement} \} %{GREEDYDATA} %{NUMBER:mongoElapsedTime:int}ms"] }
grok { pattern => [" cursorid:%{NUMBER:mongoCursorId}"] }
grok { pattern => [" ntoreturn:%{NUMBER:mongoNumberToReturn:int}"] }
grok { pattern => [" ntoskip:%{NUMBER:mongoNumberToSkip:int}"] }
grok { pattern => [" nscanned:%{NUMBER:mongoNumberScanned:int}"] }
grok { pattern => [" scanAndOrder:%{NUMBER:mongoScanAndOrder:int}"] }
grok { pattern => [" idhack:%{NUMBER:mongoIdHack:int}"] }
grok { pattern => [" nmoved:%{NUMBER:mongoNumberMoved:int}"] }
grok { pattern => [" nupdated:%{NUMBER:mongoNumberUpdated:int}"] }
grok { pattern => [" keyUpdates:%{NUMBER:mongoKeyUpdates:int}"] }
grok { pattern => [" numYields: %{NUMBER:mongoNumYields:int}"] }
grok { pattern => [" locks\(micros\) r:%{NUMBER:mongoReadLocks:int}"] }
grok { pattern => [" locks\(micros\) w:%{NUMBER:mongoWriteLocks:int}"] }
grok { pattern => [" nreturned:%{NUMBER:mongoNumberReturned:int}"] }
grok { pattern => [" reslen:%{NUMBER:mongoResultLength:int}"] }
}
output {
elasticsearch {
index => "logstash"
host => "your_es_sentinel"
protocol => "http"
bind_port => 9200
manage_template => true
}
stdout {codec=> json }
}
This will output to ES as well as to screen. At this point I would like to give thanks to the people at techblog.holidaycheck.com for the filter command, which I unashamedly lifted from their git repo. Thanks guys, saved a ton of work.
8. Repeat on all other mongo units, changing the inputs to teh server of your choice, but keep the outputs.
9. Logstash even has teh decency to stay with a rolled file. So if you swap your main log file every hour or so, it will stay with the original filename, acting similarly to 'tail -f ' so yoou miss nothing.. good old logstash.
10 Finally, start her up. You may want to do this inside a 'screen' so you can leave it running:
sh>sudo bin/logstash -f mongoqry.conf
or to Daemonise
sh>sudo bin/logstash -f mongoqry.conf &
If you get This: (LoadError) Could not load FFI Provider: (NotImplementedError) FFI not available: null
The /javatmp is not executable... make sure it is executeable
Can also be seen if the timestamp filter is out of date... maybe
Right.. still with me? Well done. Finally... yes indeedy.. finally you need Kibana and a webserver, if you don't have one on the machine kibana will be installed on.
Kibana - The Logging output and pretty stuff
1. On your chosen machine for Kibana, install a web server (nginx/apache ..
whatever).
2. Ensure your webserer can listen on the port you want to connect to (443 for
https is advisable).
3. Ensure the servers doc folder is pointing to where kibana will be.
4. Install Kibana
sh> wget https://download.elasticsearch.org/kibana/kibana/kibana-3.1.0.tar.gz
sh> tar -xvf kibana-3.1.0.tar.gz
5. In the kibana default directory will be a file called config.js. In this the
following will need changing:
elasticsearch: Make sure the port number is :9200.
If your sentinel is on the same machine, you will be Ok,
otherwise you will need to change the destination of ES as
well.
kibana-index: A name for your kibana index. Your kibana dashboards also
stored in ES... which is kindda useful.
6. Navigate to your host...and you should have the Kibana welcome screen.
Done! From here you can build panels, graphs, tables, get your log files in... all kinds of cool stuff... but you should be up and running at this point..... let me know if you aren't.
A couple of notes:
1. Mongodb produces a lot of logs. Putting a ttl (Time to Live) on the record of 24 hrs, allows ES to automatically drop old records. Make this a sane number for your throughput. In other areas I have either expanded that to 2-3 days, or reduced to 6 hours, and added aggregation.
I am sure there are others.... please feel free to add at will. Hope this has been useful, enjoy the insights it will give you to your data.