воскресенье, 27 января 2019 г.

The current state of Graphite and its ecosystem

I opened the issue on Graphite project to discuss the future of Graphite project and found out that (my obviously opinionated) review for current ecosystem state is quite big, so I decided to put it separately - below.

The current state of Graphite and its ecosystem
1. Original Graphite https://graphiteapp.org
Language: Python
Storage: Whisper (Python)
Clustering capabilities: medium
Plus points:
* Current implementation standard.
* Still widely deployed, packaged by many distributions.
* Still working great for small to medium installations.
* Graphite-web is still a most full implementation of Graphite render protocol, most of the 3rd party storage implementations still using it as for render engine.
Minus points:
* Whisper storage: No compression, 12 bytes per point, very IO intensive
* Python: vertically scalable only by spawning more instances, which making scaling of relay and carbon components quite hard.
* Current clustering protocol of graphite-web is much better than in 0.9.x Graphite but still not working very well for big and/or volatile clusters.

2. Go-Graphite stack https://github.com/go-graphite/
Go-graphite is an effort to consolidate Golang re-implementations of different Graphite components, which were developed by Booking.com and other companies.
Language: Go / C
Storage: Whisper (Go)
Clustering capabilities: strong
Plus points:
* Go producing single binary per component, easily deployable and vertically  scalable
* New clustering protocol ("carbonserver") working much better in big clusters (Booking.com probably have biggest Graphite cluster in the world, based on that setup)
Minus points:
* Scattered components and development.
The project has no Golang-implemented relay yet, users should use 3rd party relays, e.g. carbon-c-relay or carbon-relay-ng.
The project has no storage component and using lomik's go-carbon, which currently have "carbonserver" built-in.
Carbonapi (graphite-web reimplementation) is not fully compatible with graphite-web and also currently forked in 2 separate forks - community fork and Booking.com fork.

3. Clickhouse stack
Clickhouse is an open-source analytic database, currently, open-sourced by Yandex. During internal development, it was used as Graphite storage, so it has some good implementation of Graphite parts inside (like aggregation). Yandex also open-sourced internal Java-based implementation of Graphite-compatible render part, named "Graphouse", but currently lomik's Golang reimplementation of components - carbon-clickhouse and graphite-clickhouse are much more popular. Please note, that this project contains no rendering components and will use Graphite-web or carbonapi for actual rendering.
Language: Go
Storage: Clickhouse (C++)
Clustering capabilities: strong
Plus points:
* Very good storage: low IO requirements, good compression (2-4 bytes per points typically)
* Can be used in small, medium and large installations - storage is scalable (despite lack of re-sharding, so, a bit like moving whisper files when extending cluster), other components are stateless go binaries
Minus points:
* Depends on Clickhouse's Graphite support - that's not the main purpose of Clickhouse, so, it theoretically can be removed or not-developed in future versions (but currently it's still there)
* User need to experiment with different storage schemas
* Extending big Clickhouse cluster currently can be painful (well, less painful then whisper, probably, I just mean can be not as smooth as e.g. Cassandra cluster).

"Yuuge" (Trump-voice) projects
We have currently 2 projects which were initially developed targeting big and very big Graphite installations - "Metrictank" and "Biggraphite"

4. Metrictank https://github.com/grafana/metrictank
Developed by Grafana Labs for supporting Grafana Cloud and WorldPing projects. A multitenant project aimed for big installations. I'm currently implemented MT cluster in my job, so, I'll describe it in a separate article.
Language: Go
Storage: Cassandra (Java) / BigTable (Google-cloud)
Clustering capabilities: strong
Plus points:
* Designed for scalability - all components are scalable, using Kafka as a bus for metric transport and clustering, using SWIM cluster for cache nodes
* Using strong caching layer for off-loading permanent storage, storing N hours of data in RAM cache for compression/deduplication.
* Re-implement some render functions in Golang, with proper fallback to Graphite-web
* Designed to run in containers (e.g. in Kubernetes)
* Good compression ratio for storage (also around 2-4 bpp)
Minus points:
* Cache nodes are quite RAM hungry and can go OOM (which require big overhead), especially during cluster start. Cache storage quite ineffective (comparable to storage) - 20-30 bytes per point (which is quite logical, the cache should be fast and not compact)
* Quite a complex system, you need to experiment with different deploy/setup strategies (well, that's probably true for every big and loaded storage)
* Not really useful in small installations (better to pick go-carbon or Clickhouse stack)

5. Biggraphite https://github.com/criteo/biggraphite
Designed by Criteo for own Graphite installation. Using Cassandra for extending storage, but reusing other components of Python stack.
Language: Python
Storage: Cassandra (Java) / Elasticsearch (Java)
Clustering capabilities: strong
Plus points:
* Scalable solution (you still need to scale python carbon instances, though)
Minus points:
* Big storage overhead (16-24 bytes per point)
* Not really useful in small installations (better to pick go-carbon or Clickhouse stack)

So, how I mentioned many times before, IMO Graphite is not only a project currently, but more like the whole ecosystem of projects, developed at a different time by different developers for different purposes. Not all of these projects are compatible with all features of the original project, but a user can (and should) pick up that or another implementation considering own use case, requirements, and implementation.

I'm planning to make separate writing about Metrictank and Clickhouse stacks soon.

воскресенье, 28 января 2018 г.

ASAP smoothing in Graphite

Good news, everyone!
In Graphite 1.1 we have custom (user defined) functions support. But how it works and what you can do with that? I can show you.

If you're watching for monitoring landscape state, you probably know about Monitorama conference. And if you watched Monitorama PDX 2017 talks you probably remember this talk from Kexin Rong, named "Automating Dashboard Displays with ASAP".
But if not you can watch her talk below:
Monitorama PDX 2017 - Kexin Rong from Monitorama on Vimeo.
and that's a slide:

You can watch it, it's 30 minutes and quite interesting - but in short she describing new time series smoothing algorithm, named "ASAP", which smooth data as smooth as possible, but not too much.
Right after this talk Wyndham "Bo" Blanton implemented ASAP as graphite-api custom function, also using Kexin's original code.

That's how it works in practice (click for bigger picure):

That was great, but original ASAP code using "numpy" library, which is a great thing, but sometimes is too heavy to include it in your product, and original Graphite had no support for custom functions (contrary to graphite-api).
But nowadays we have it, so, I adapted Bo's code a bit, you can check it using this link - https://github.com/deniszh/graphite_asap

This requires Graphite-web version 1.1.1 or newer and installed "numpy". For installation just copy asap.py and  functions.py files to /opt/graphite/webapp/graphite/functions/custom  directory and restart Graphite-web. Check output of  "http:///functions?pretty=1" - function "asap()'' should be present in the output. 
Check Graphite's Function Plugin documentation for details.

воскресенье, 26 марта 2017 г.

TSDB on PostgreSQL? But why?

This week I stumbled across two new time-series databases (TSDB) - which is good, but both were based on PostgreSQL, which is... confusing, to be honest.

First was named Tgres, it was created by Grisha Trubetskoy and already reach v0.10.0, after 18 month in development, which is quite impressive (no joke). It's written in Go, partially Graphite-compatible and even outperforming "ye olde Graphite" on a single ec2 i2.2xlarge instance (not go-carbon, just normal python Graphite, which is quite not fair IMO). It's also based on rollup archives idea as Whisper and also more effective as storage - around 8 bytes per point (whisper has 12 bytes).

I'm totally fine with people who want to develop something, and I'm not gonna say that Grisha does not understand what he's doing - he's experienced developer and Tgres looks very impressive. But to be honest all rational behind Tgres is really puzzling me. You can read it on the link above (you can go to part named "Avoid Solving the Storage Problem", but it's worth to read all article).
Grisha says: "Someone once said that “anything is possible when you don’t know what you’re talking about”, and nowhere is it more evident than in data storage. File systems and relational databases trace their origin back to the late 1960s and over half a century later I doubt that any field experts would say “the storage problem is solved”. And so it seems almost foolish to suppose that by throwing together a key-value store and a consensus algorithm or some such it is possible to come up with something better? Instead of re-inventing storage, why not focus on how to structure the data in a way that is compatible with a storage implementation that we know works and scales reliably?"
With all respect, but I think that's a wrong direction. Yes, filesystems and databases are in development from the 1960s - and what result do we have? The storage problem is not solved, indeed, but saying "OK, screw it, let's create something on top of weak foundation and hope that it'll fine" is also wrong.
I think that storage engine is the best part of any database and it creates and limits any DB - relational, column or time-series - doesn't matter. Whisper is a good example. It has its own weak points (e.g. no subsecond resolution, IO intensity, 12 bytes per point, only local storage) - and its good points (quite good speed, built-in rollups). But most of Graphite users know its limitations very well - and these limitations limiting Graphite usage from one side - but on the other hand, they created all this new generation of TSDBs / monitoring solutions which are flourishing last time.
And in the same way Tgres inherits all scalability flaws as PostgreSQL (as any relational database) has e.g. good vertical scalability, but quite weak horizontal one. Yes, the author mentions clustering for Tgres, but it's the same approach as we saw already in Whisper - it's external clustering, not built-in in storage.

Another PostgreSQL-based database, named TimescaleDB looks bit better - it still based on Postgres although it uses an own storage engine with built-in clustering and sharding. You can check their paper, it's quite interesting. Now it looks like early InfluxDB, but authors are saying that their approach is better because you can use all real SQL power across all your timeseries.
Let's see. TimescaleDB is quite young, less than 6 months in development, maybe we'll get something useful out there. They have a good and stable foundation, let's see how it will fit in TSDB world.

I still have a strong opinion that in database's world storage engine is a king, and horizontal scalability is a must for any modern data software.

понедельник, 19 сентября 2016 г.

Semi-irregular Sysadmin Ninja's Github Digest (Vol. 21)

Hello, fellow readers!
Issue 21 of "Semi-irregular Sysadmin Ninja's Github Digest" is here. The last issue was very dry, will add more of my thoughts and funny pictures. :)
Let's go!

"A reverse HTTP proxy that duplicates requests."
"You may have production servers running, but you need to upgrade to a new system. You want to run A/B test on both old and new systems to confirm the new system can handle the production load, and want to see whether the new system can run in shadow mode continuously without any issue."

"A tiny open source spacecraft project. http://kicksat.io"

WOW. Just W-O-W. Your eyes are not lying, it's open-source spacecraft. "Our goal is to dramatically lower the cost of spaceflight, making it easy enough and affordable enough for anyone to explore space. We can do this by shrinking the size and mass of the spacecraft, allowing many to be launched together."
I hope guys will be succeeded and we'll see KickSat launch soon!

"P2P web powered by torrents and blockchain."
Rejoice, my paranoid brothers and sisters! New Internet is here! Wear our foil hats on!
It's a combination of webtorrent and blockhain to make not-seizable internet!
"When you open index.html in the browser (live demo), here's what happens:
Bitcoin address 1DhDyqB4xgDWjZzfbYGeutqdqBhSF7tGt4 is searched for the latest outgoing transaction containing OP_RETURN script. Inside the script there is a torrent infohash of webpage.html. webpage.html is downloaded from torrent via webtorrent and displayed."

"IronSSH - End-to-end secure file transfer"

"While sftp and scp use ssh to keep files secure while they are being transferred over the network, once those files hit the remote server, they are no longer protected. The ironsftp executable provides additional security. When you put a file on the server using ironsftp, the file is encrypted before it is uploaded, and it stays that way on the server. When you get a file from the server, it is downloaded then decrypted. So the file remains secure until it is at the place you want to use it - on your local machine."

"QuineDB is a quine that is also a key/value store.
If your database can't print its own source code, can you really trust it?"
Very interesting and funny project! It's simple K/V storage, written in bash4, but it's also a quine!
"When you run it, the (possibly modified) source code of quinedb is printed to STDOUT, and the results of the specific command run are printed to STDERR."

"Build javascript chart dashboards without any front-end code. Uses any json endpoint. JSON config only. Ready to go."
Long time dream is fulfilled! Yes, dashboards w/o any front-end code. !https://github.com/christabor/flask_jsondash

"LogTrail is a plugin for Kibana to view, analyze, search and tail log events from multiple hosts in realtime with devops friendly interface inspired by Papertrail."
Like "tail -f", but for ELK!


"A list of companies that allow remote work and use Python."
Yep, that simple, but maybe useful.https://github.com/mariusavram91/python-remote-companies

Games on the Github 
"list of open source games and game-related projects that can be found on GitHub"

"Bringing the power of the command line to chat http://operable.io"
"Cog is an open chatops platform that gives you a secure, collaborative command line right in your chat window. It is designed to be secure, highly available, chat provider agnostic, and to be extensible using your favorite programming language."

"BORG - A terminal based search engine for bash commands"

Searching bash-related questions on Stackoverlow not leaving your terminal!

"Easily figure out which git commit caused a given stacktrace https://pypi.python.org/pypi/git-stacktrace"
A little bit naive tool which helps you to find out wich commit is responsible for specific stacktrace. Python and Java are supported.
"⚡ Deploy stuff by diff-ing the state you want against the remote server"
Interesting deploy tool. Looks nice, but IMO it better uses real configuration management tool in this case, e.g. Salt or Ansible.

And something more for ML fans:

"Neural Generation of Regular Expressions from Natural Language with Minimal Domain Knowledge"
For now, it's purely academic project - creating regexes using natural language and machine learning.
But beware - regexes it's a start, maybe sometimes computers maybe be able to write own programs.
Why they will need people then? :)

Machine Learning is simple! You can make own TF image classifier in 5 minutes!

суббота, 3 сентября 2016 г.

Semi-irregular Sysadmin Ninja's Github Digest (Vol. 20)

Hello, fellow readers!
I'm back. Now back to the news! :)

Inspired by GNU Parallel, a command-line CPU load balancer written in Rust.
Same as GNU Parallel, but modern and fast.

"rsync for cloud storage" - Google Drive, Amazon Drive, S3, Dropbox, Backblaze B2, One Drive, Swift, Hubic, Cloudfiles, Google Cloud Storage, Yandex Files http://rclone.org

NASA's openmct
A web based mission control framework. https://nasa.github.io/openmct/
You will need it if your project is rocket science ;)
Use your OS X terminal shell to do awesome things. A curated list of shell commands and tools specific to OS X.

Job server in Go
Similar to Celery / Gearman but language agnostic and written in Go.

A curated list of Go patterns and idioms http://tmrts.com/go-patterns
Worth checking for all Go programmers.

Minio is an object storage server compatible with Amazon S3 and licensed under Apache 2.0 License https://minio.io
Defying Amazon's vendor lock. Didn't try it, though.

Mozilla's http-ovservatory
HTTP Observatory https://observatory.mozilla.org/
"The Mozilla HTTP Observatory is a set of tools to analyze your website and inform you if you are utilizing the many available methods to secure it."
Special bonus for machine learning lovers!

Image super-resolution through deep learning
"From left to right, the first column is the 16x16 input image, the second one is what you would get from a standard bicubic interpolation, the third is the output generated by the neural net, and on the right is the ground truth."
Looks like magic -

PArallel Distributed Deep LEarning http://www.paddlepaddle.org/
"PaddlePaddle (PArallel Distributed Deep LEarning) is an easy-to-use, efficient, flexible and scalable deep learning platform, which is originally developed by Baidu scientists and engineers for the purpose of applying deep learning to many products at Baidu."

суббота, 27 августа 2016 г.

Semi-irregular Sysadmin Ninja's Github Digest (Vol. 19)

I was slacking for a long time, I know. Sorry for that. I'll push two issues in a row now, this is the second one.

1. blessed-contrib
Build dashboards using ascii/ansi art and javascript

2. dashiell
A websocket-y frontend to osquery and facter. http://dashiell.io

3. https://telekomlabs.github.io/
T-labs, official Deutsche Telekom R&D department Github page.
Home of FirefoxOS and other projects -  worth to check.

4. jetpack
FreeBSD Jail/ZFS based implementation of the Application Container Specification

5. cachet
An open source status page system is written in PHP https://cachethq.io

6. nginx-resources
A collection of resources covering Nginx, Nginx + Lua, OpenResty and Tengine http://www.cambus.net

7. socketplane
SocketPlane - Multi-Host Container Networking

8. h2o
H2O - an optimized HTTP server with support for HTTP/1.x and HTTP/2

Semi-irregular Sysadmin Ninja's Github Digest (Vol. 18)

I was slacking for a long time, I know. Sorry for that. I'll push two issues in a row now, this is the first one. Will try to make it more regular, will include other sources too.

1. The Crystal Programming Language 
New programming language, named Crystal. "We love Ruby’s efficiency for writing code. We love C’s efficiency for running code. We want the best of both worlds." Programs look like Ruby, but compiles to efficient native code, and has compile-time error evaluation like Rust. Worth to check out, if you're PL freak, like me. :)

2. chef-koans
An experimental, test-driven way to learn about Chef.
"An experimental, test-driven way to learn about Chef. Takes some inspiration from Ruby Koans and from other things that are awesome and simple." Unfortunately, only lesson number 0 is ready now - but you're welcome to contribute, of course!
Also, if you didn't read Vim koans or Git koans - please try, it's quite fun.

3. node-bell
Real-time anomalies detection for periodic time series.

4. streem
prototype of stream based programming language
A prototype of new PL from an author of Ruby - Yukihiro "matz" Matsumoto. It's on very early stage of development.

5. sfs
Asynchronous Filesystem Replication

6. gitfs
Version controlled file system

7. awesome-courses
List of awesome university courses for learning Computer Science!

8. mochi
Dynamically typed functional programming language

9. consul-do
Do something based on leadership status

10. openbay
The Pirate Bay source code.