вторник, 9 декабря 2014 г.

Semi-irregular Sysadmin Ninja's Github Digest (Vol. 17)

1. reptyr
Reparent a running program to a new terminal

Quite old tool made by @nelhage, it seems he is actively developing it again. It is really changes terminal for process.  "'reptyr PID' will grab the process with id PID and attach it to your current terminal. After attaching, the process will take input from and write output to the new terminal, including ^C and ^Z."
It is also quite interesting to know how it works - check this blog post if you are curious.

2. dockerana
Docker + Graphite + Graphana = Dockerana
It's exactly what it looks - Graphite + Graphana  packed in Docker container. Quite convenient.

3. seagull
Friendly Web UI to monitor docker daemon
Seagull is the best friend of docker which provides Web UI to monitor docker daemon. Demo site is down but screenshots looks nice. It seems that demo is working now.

4. Algorithms
Data Structures and Algorithms in Python
Not very exciting stuff, but might be useful. Just as it says, it is collection of data structures and algorithms in Python.

5. pg_shard
PostgreSQL extension to scale out real-time reads and writes http://citusdata.com/docs/pg-shard
Sharding helper extension for PostgreSQL. Nuff said, check docs.

6. peru
Maybe sometimes better than copy-paste.
Ah, nice tool. Another approach of eternal problems of dependencies on your repos. Like "git submodules" but easier. Works with Mercurial and SVN too, not only with git. Demo gif below:

7. awesome-public-datasets
A awesome list of (large-scale) public datasets on the Internet. (On-going collection)
List of many public (but sometimes not free) datasets on Internet, for your fun and big data projects.

8. rocket
App Container runtime
CoreOS creates own container instead of Docker. Quite controversial decision, check their blog for explanation.

9. instavpn
the most user-friendly L2TP/IPsec VPN server
Very user-friendly simple but secure VPN. Ubuntu, 512 MB RAM, curl -sS https://sockeye.cc/instavpn.sh | sudo bash, browse at http://IP-ADDRESS:8080 or use cli to setup.

10. shapeme
Evolve images using simulated annealinghttps://github.com/antirez/shapeme
Small toy from @antirez - it takes PNG and try to evolve bunch of triangles to copy it. Just for fun.

понедельник, 8 декабря 2014 г.

Semi-irregular Sysadmin Ninja's Github Digest (Vol. 16)

Ok, I'm still trying to finish with my old drafts and return to normal, weekly issues. Let's go!

1. devopsbookmarks.com
Website of devopsbookmarks.com http://www.devopsbookmarks.com

Cool new website which tries to collect all modern DevOps tools in one place (open-source and commercial too). And what is most exciting - everyone can participate through Github. :)

2. using-ngxlua-in-upyun
2014 Beijing OSC
It's also not standalone repo, but just code repo for this presentation from some Chinese conference. If you're interesting in Nginx + Lua / Openresty - check it out, quite good intro to subject. Don't afraid, it's in English - 

3. sshrc
bring your .bashrc, .vimrc, etc. with you when you ssh
If you're making some remote admin tasks on "not your" servers from time to time you're usually quite frustrated that working environment there is not like perfectly crafted precious configs. You can fix that problem with that script, but beware of big Vim plugins - they're need to be transferred to your home dir on remote host during every login.

4. tmux-resurrect
Persists tmux environment across system restarts.
Doing exactly that was promised - "saves all the little details from your tmux environment so it can be completely restored after a system restart (or when you feel like it). No configuration is required. You should feel like you never quit tmux."

5. Openstackgeek
StackGeek OpenStack Deploy
"StackGeek provides these scripts and this guide to enable you to get a working installation of OpenStack Icehouse going in about 10 minutes."
Nuff said.

6. weave
The Docker Network
Very interesting project, missing part of Docker, really. Networking is still weakest part of Docker IMO, and this project will help you with creation of virtual networks for your containers:

7. ZeroTierOne
Create flat virtual Ethernet networks of almost unlimited size. https://www.zerotier.com/
This project is similar with previous one, but main target of it is "normal" VMs and clouds and not containers. Looks quite mature and feature-full.

8. msr-cloud-tools
MSR Cloud Tools
Again, another tools from Brendan Gregg. For this time you can check is your cloud "hardware" support TurboBoost or read CPU temperature directly from CPU's MSRs (Model Specific Registers).

9. pcstat
Page Cache stat: get page cache stats for files on Linux
Yes, that tool can show for given file how many memory pages lies in Linux' file cache. Nice to know for tuning DBs, e.g. Cassandra (that's why it was written for). Not like very new tool, you can use fadvise tool from https://code.google.com/p/linux-ftools/ too - but Go code looks prettir IMO.
Also TIL mincore(2) syscall on which both tolls were based on.

10. lsleases
list assigned ip from any device in your network

Simple DHCP sniffer - will list all IP/MACs from devices in your network. Could be useful.

11. inspeqtor
Monitor your application infrastructure!

"Famous" inspector tool - modern rewrite of Monit on Go language with extended syntax and commercially available extension (because of which it was DMCAed by Monit developers first, but they're dismissed their claim after)

12. puppet-catalog-diff
Tool to diff Puppet catalogs

"A tool to compare two Puppet catalogs. While upgrading versions of Puppet or refactoring Puppet code you want to ensure that no unexpected changes will be made prior to doing the upgrade."
Very useful tool for upgrade Puppet between versions, indeed.

13. logsend
Logsend is high-performance tool for processing logs

"This like Logstash but more tiny and written by Golang. Supported outputs: influxdb, statsd and
MySQL". If you need some tool for log processing, but logstash looks somewhat bloated - check it out.

воскресенье, 7 декабря 2014 г.

A presentation on building a replacement for Graphite with Riemann, InfluxDB and Grafana

Quite controversial presentation IMO:

Yep, Clojure is cool, JVM has threads, it's nice, InfluxDB rules - but why we need Rieman then?
We can use cyanite - then we got Clojure, JVM and Cassandra for storage or graphite-influxdb - then we got InfluxDB and who cares about threads and Python GIL then?
Maybe Logstash / Heka integration is cool idea but you can do it with Graphite too...

Graphite scaling and my evaluation of Zipper Graphite stack

Many Dev/Ops teams out there are using Graphite – nice tool for collecting and graphing various metrics from your software and/or hardware. It is a really nice tool and a good example of good architecture – you can check out Graphite chapter from famous AOSA book.
But it's also not a big secret that despite Graphite is great tool its scaling is really not an easy task. Until you run it on single server – everything is fine, you can easily spread it over a couple of servers but above that…
·      Problem one. If you are using (default and single production ready) whisper storage, then the single option of clustering is to use normal Graphite cluster mechanism. But then, after adding or removing some nodes from cluster you need to rebalance it, using e.g. carbonate tool. It’s fine but for loaded cluster with hundreds of thousands metrics and tenths servers, it could take not hours - but days and weeks - and during rebalancing Graphite cluster will producing very funny results.
·      Problem two. Current Graphite clustering based on HTTP requests (remote nodes asks each other using same Graphite-web engine) and current code is quite non-optimal, especially for aggregating functions across many nodes. Fixing that is in progress, we already have @bmhatfield's patch in 0.9.x branch (also with not merged yet parallelization improvement), another approach from @jraby which was transformed to patched graphite-web from @datacratic.
·      The third problem is relay / aggregator performance problems. Graphite relays and aggregators are CPU-bound (contrary to carbon-caches, which are IO-bound) and because of python's GIL single process can't use more that one core for CPU-intensive calculations. Then you need to create some loadbalancer-based configurations with many relay processes, which doesn't make things less complex, believe me.

So, if you faced with Graphite scaling problems you have next options:

1. Migrate to OpenTSDB.
Unfortunately, it will require quite big efforts, if you have your Graphite installation up and running for a while, and have many tools and dashboards around it. Also, scaling Hbase is a the little bit harder than throwing more servers into the pool...

2. Migrate to new storage engines.
I'll explain this way little bit.  Which alternative options do we have for now?
a)    @pyr's cyanite
Quite mature, has some production instances running. Uses Cassandra as storage looks like the natural choice for Graphite data because of high write-load tolerance.
I made an evaluation of it and found out that for our metrics it'll require 4x more space for data storage. For us it was "no-go" then, also I was suspecting that its scalability was not so great then (it was year ago, now it's much better)
InfluxDB looks also like quite a natural choice for storage - it's native time series database, with built-in sharding and clustering. Dieter Plaetinck wrote this for Vimeo and run it there in production.
In my tests InfluxDB look much better as storage - it took almost exact same space as whisper, but - only for one month. It's because of InfluxDB still has no retentions with aggregations (current InfluxDB's retentions just purge old data without any aggregation) - so, if you need to make queries across big timespan (e.g. year) you need or store all data with high precision (and wasting space) or make some aggregation by own - but of course, it will hit performance quite bad. Theoretically, you can use continuous queries for aggregation on InfluxDB level, but its support not integrated to graphite-influxdb.
And according to @dieterbe he's running graphite-influx on a single node, for now, so, I also suspect that scaling it across many nodes could be quite an adventurous journey too.
c)    ceres. Looks like abandoned for now.  I know that @dkulikovskiy made some changes in his own repo, including new roll-up mechanism – and Yandex running that on quite a big scale - but anyway, it doesn't look like a right path to go.

So, we still stick up with whisper, so, no solution for problem one yet for us, but if you only start running your Graphite – maybe it’s a good idea to run some new storage in parallel – especially if you already have some Cassandra or InfluxDB in production.
What next? What is about other problems?

We also faced relay scaling problems quite fast, and after struggling with that little bit we adopted Scala-based @markchadwick's graphite-relay – I just make it work correctly with graphite hashing. We lived about a year on that but after it starts to consume too much CPU too.
In that time, my boss @vlazarenko point me on this video from Linux.conf.au 2014 – it’s only 20 minutes and worth watching. I found out from it that Booking.com uses Graphite, and uses it under quite a big load. In this video, Devdas Bhagat also mentioned that after struggling with relay scalability they developed (and what is much better – open-sourced) new and shiny C-based graphite relay, named carbon-c-relay. (Edit: first I mistakenly named it  graphite-c-relay, d-oh... real name is and always was carbon-c-relay). We started using it instantly and from that time and up to now it works very well.
Its main contributor, Fabian Groffen, also implemented aggregation and regexes in a couple of last months, so, for now, carbon-c-relay looks like complete and pretty sane alternative for python-based relay/aggregation daemons. I can find only one downside – its output is still line-based and not-pickle based but that’s completely OK for us. Edit: As @grobian mentioned (and I completely agreed with that) pickle is insecure and bloated - line protocol is better for that case.

So, third problem is solved for now.
But I was really wandered how Booking.com struggling with second scaling problem, and after following @grobian on GitHub I found out how. It seems that they rewrite most of the parts of Graphite stack in Go! And results of it's quite impressive – they’re running about 90 backend servers with more than 55TB of whisper files!
And it looks like this project implemented by only two persons - Damian “@dgryski” Gryski and @grobian. I contacted Damian, asked a couple of questions and checked how their solution works. He also said that he and @grobian will make a blog post about their stack, but they still didn't - so, I'll try to do so. J

So, how normal Graphite cluster stack looks like? I'll take part of Jamie picture to illustrate:

Did you see that 3 graphite-web servers below? They’re communicating between each over (or a single frontend to all backends), so just imagine what happens if you have 10-20-30... backend servers instead of two - and you can imagine that speed of rendering will be very low then.
Then check out Booking.com solution (they need some cool name for it, I will call it zipper-stack for brevity). Please also bear with my non-existing painting skills:

See? Graphite-web talking to the single daemon, named carbonzipper. It talks to all backends, but not over plain text based “pickle-over-HTTP” protocol, but over new “protobuf-over-HTTP” protocol - so, on backends we have a separate daemon for speaking that, named carbonserver. Also, they have special carbonapi daemon - it talks to zipper daemon also but it has some subset of Graphite functions re-implemented in Go, so, his speed is blazing fast - you can switch all your monitoring metrics (which rendering as text and not PNG) there. 
So, looks good - let's deploy it.
As we are doing only evaluation we made it quick-and-dirty – just compile binaries and run it on a server, but for production you’ll need some packaging and configuration, of course. Also, it’s not a manual for Graphite installation – I assume that you have working Graphite cluster already and just want to check how-zipper stack works.
Initial build is easy - go to your build VM, install and set up Go there and run
root@vagrant-ubuntu-precise-64:~# go build github.com/dgryski/carbonapi
root@vagrant-ubuntu-precise-64:~# go build github.com/dgryski/carbonzipper
root@vagrant-ubuntu-precise-64:~# go build github.com/grobian/carbonserver
root@vagrant-ubuntu-precise-64:~# ls -al ~/go/bin/carbon*
total 53552
drwxr-xr-x 2 root root    4096 Nov 28 16:32 .
drwxr-xr-x 5 root root    4096 Aug  8 13:20 ..
-rwxr-xr-x 1 root root 8796848 Oct 28 16:38 carbonapi
-rwxr-xr-x 1 root root 8526040 Oct 28 16:36 carbonserver
-rwxr-xr-x 1 root root 8515904 Oct 28 16:37 carbonzipper

Copy caronserver binary to graphite backends, and carbonzipper and carbonapi binaries to your frontend. If you do not have separate frontend - just make it - install separate server with graphite-web and install binaries there)

Go to backends first and run servers there - do not forget run it under same user as carbon-caches (e.g. using screen when testing):
~$ ./carbonserver -p=8080 -stdout=true -v=true -vv=true -w="/opt/graphite/storage/whisper"
2014/12/07 16:13:10 starting carbonserver (development build)
2014/12/07 16:13:10 reading whisper files from: /opt/graphite/storage/whisper
2014/12/07 16:13:10 set GOMAXPROCS=12
2014/12/07 16:13:10 listening on :8080

Next, run carbonzipper on the frontend. Create its config file first:
~$ cat zipper.json
    "Backends": [
I think you got the pattern. J
Then run it in debug mode:
~$ ./carbonzipper -c="./zipper.json" -p=8080 -stdout -d=3
2014/12/07 15:58:44 starting carbonzipper (development version)
2014/12/07 15:58:44 setting GOMAXPROCS= 1
2014/12/07 15:58:44 querying servers= [http://10.x.y.z1:8080 http://10.x.y.z2:8080 http://10.x.y.z3:8080] uri= /metrics/find/?format=protobuf&query=%2A
2014/12/07 15:58:44 listening on :8080

Now you need to patch graphite-web little bit - after putting own IP to CLUSTER_SERVERS is not allowed - and that's exactly what we need to do:

--- a/webapp/graphite/storage.py
+++ b/webapp/graphite/storage.py
@@ -31,7 +31,7 @@
def __init__(self, directories=[], remote_hosts=[]):
self.directories = directories
self.remote_hosts = remote_hosts
- self.remote_stores = [ RemoteStore(host) for host in remote_hosts if not is_local_interface(host) ]
+ self.remote_stores = [ RemoteStore(host) for host in remote_hosts ]

if not (directories or remote_hosts):
raise valueError("directories and remote_hosts cannot both be empty")

Then run your frontend with CLUSTER_SERVERS = [''] in local_settings.py.

Everything is prepared; you can test your frontend with normal graphs.
You can check how carbonapi works too, but you need to check which functions were re-implemented in Go in https://github.com/dgryski/carbonapi/blob/master/expr.go first:

~$ ./carbonapi -z="http://localhost:8080" -stdout=true -p=9090 -tz="Europe/Amsterdam,3600"
2014/12/07 16:01:27 starting carbonapi (development build)
2014/12/07 16:01:27 using zipper http://localhost:8080
2014/12/07 16:01:27 using fixed timezone Europe/Amsterdam, offset 3600
2014/12/07 16:01:27 listening on port 9090

That's mostly it.
But I want to mention just one important thing. When I test my graphs with zipper-stack with curl rendering speed was quite good. But when I test my last hour graph generated by zipper-stack instance by my eyes it was looking like this:

Normal one looks like that:

Huh? Do you know what's happened there? I know, unfortunately.
As I mentioned, Graphite is a really good piece of software and use quite good engineering solutions to make thing works with quite a big load on pure Python, not even using C. And most known part of this trick is named carbon-cache. E.g. when you put metrics in graphite usually it doesn't flush to disk instantly but goes to RAM instead using carbon-cache daemon, which keeps it in memory, flushes to disk periodically, and responses results to Graphite web, merging on-disk and in-memory results. 
As you can see on my diagram there're no more lines from carbon zipper to carbon-caches for zipper stack. That's right - carbonserver just reads whisper files from disk and didn't ask carbon-caches! It seems that for booking.com instances disk flushing time for every single metric is below 60 seconds, and for “1-minute retention” whisper files (which they and we are using) after each minute every file is updated and fresh. But our Graphite installation is different - we are using SAN disks and plenty of RAM instead of SSDs, so, for us it took up to 40 minutes to flush metric to disk, that’s why we have that graph depletion on “1-hour” graphs…
So, for us, it's not looks like a viable solution, alas! Or we need to implement carbon-cache interface to carbonserver in Go. But in a moment when I test zipper-stack, we were running an internal version of the hack, which shortly become PR #1010 and it works quite well, for now, so, maybe I return to it later. J
Edit: @grobian is making Go-based write daemon now - https://github.com/grobian/carbonwriter - so, maybe it could be combined with carbonserver to make similar to carbon-cache solution.
But YMMV, of course, just try zipper-stack - it looks very good and promising. 

понедельник, 1 декабря 2014 г.

MySQL Replication: What’s New in MySQL 5.7 and Beyond

Very interesting presentation about current and upcoming features of replication in MySQL 5.7.x: