суббота, 15 декабря 2012 г.


I've decided to move my Russian language posts to separate blog - http://deniszh-rus.blogspot.com
Here will be only English ones. Enjoy!

воскресенье, 16 сентября 2012 г.

End of MySQL vs PostgreSQL arguments

Russian text is here.

Good article about MySQL vs Postgresql. Summary for lazy readers -
"MySQL is designed with the idea that applications provide logic and the database provides dumb storage of the application's state.  While this has changed a bit with the addition of user-defined functions and stored procedures, the overall design constrains MySQL primarily to this use case.  This is not necessarily a bad thing as, traditionally, software licensing costs and requirements have often required that even advanced database systems like Oracle are used in this way.  MySQL targets the "my app, my database" world and is usually sufficient for this, particularly when lowest common denominators are used to ensure portability.

PostgreSQL, on the other hand, is designed with the idea that the database itself is a modelling tool, and that the applications interact with it over an API defined in SQL.  Object-relational modelling advocates point out that often getting acceptable performance in complex situations requires an ability to put some forms of logic in the database and even tie this to data structures in the database.  In this model, the database itself is a development platform which exposes API's, and multiple applications may read or write data via these API's.  It is thus best seen as an advanced data modelling, storage, and centralization solution rather than as a simple application back-end.

These differences show, I think, that when PostgreSQL people complain that MySQL is not a "real database management system" and MySQL people dispute this that in fact the real difference is in definitions, and in this case the definitions are deceptively far apart.  Understanding those differences is, I think, the key to making an informed choice.

пятница, 20 июля 2012 г.

More beauty in feed!

Couple of links.
https://www.shortcutfoo.com - train your favorite text editor's shortcuts!
https://github.com/jkbr/httpie - like, "human friendly cURL" - able to parse JSON and make nice color output.

Beautiful thing

воскресенье, 8 июля 2012 г.

Sad news, camrades

Russian text is here

Sad news, comrades.
In all holywars about Perl vs Everything some high-loaded Perl projects was already mentioned. One of them was YouPorn - one of largest porn ahem .. adult entertainment websites in the world.
But not anymore.
You can check presentation (PDF / PPTX / GoogleDocs) of their technical lead on Confoo conference -

  • Written in PERL with a very complex architecture
  • First few months dedicated to learning the site, maintain it, and plan the re-write.
  • Re-write started in August 2011 and was originally planned for a delivery in mid-November.
  • Actually launched at the end of January.
So, in the end of 2011 site was completely rewritten on PHP - it's simplier, faster and easier to find staff.
Another architecture is pretty neat and straightforward -  HaProxy + Varnish + Nginx/PHP-FPM/Symfony2 + Redis/MySQL + Syslog-ng for logging, only one strange decision - they are using ActiveMQ for DB/Redis writes managing, but as far as I understand from presentation it's not very good solution for them... 

понедельник, 25 июня 2012 г.

What do you know about perlsecret?

All perl lovers - please check perlsecret. It's fun and worth reading. I liked operator called Kite or ... sperm, which looks like  "~~<>". :)  

понедельник, 18 июня 2012 г.

Why Privacy Matters Even if You Have 'Nothing to Hide'

Maybe you heard such opinion about privacy before -  "I've got nothing to hide. Only if you're doing something wrong should you worry, and then you don't deserve to keep it private."
You can check this intresting article about this matter, why this logic is false.

воскресенье, 10 июня 2012 г.

Password cracking, again

I've been intrested in IT security, cryptoigraphy and password security long time ago, I even made
presentation about password security on PerlMova 2010 (sorry, slides in Russian only). So, I was curious, what's new things happened in this area - LinkedIn password hash leak, and Lastfm / Eharmony hash leak shortly after it. In both cases non-salted hashes was used (I'm sorry, but it's totall f*cked up) - LinkedIn uses sha1 and md5 was used in Lastfm! (no hashes was shown in second case, but 95% of all hashes was bruted for 1 year afterb leak).
So, after that mayhem one of famous FreeBSD hacker, Poul-Henning Kamp, ask all people to stop using his own md5crypt password hashing scheme, which was developed in 1995, because it's not secure anymore - modern GPU-based bruteforce programs able to crunch over 1 billion MD5 hashes per sec (it's about 1 million md5crypt/sec), and it's too much for short and/or weak passwords. You can see this impressive presentation by yourself - Speeding up GPU-based password cracking, but I'll show very intresting table from it -
So, eight characters alphanumeric passwords can't be securely hashed in MD5 - 2 days for total bruteforce on commodity hardware, it's disaster from security point of view.
What we can do? Of course, we need to use special thing for password hashing - bcrypt, scrypt or PBKDF2 - many of them already implemented on many programming languages now.
Also you can see very good presentation from PHDays 2012 conference by also famous security resercher Alexander Peslyak aka Solar Designer - wind up video up to 14:00:00 (sorry, but presentation on Russian only, but you can check slides).

понедельник, 4 июня 2012 г.

понедельник, 9 апреля 2012 г.

Couple of intresting tools

Couple of intresting links below -

1. Logstash - http://logstash.net/ - Powerfull log processor. Takes logs from many sources, applies many filters and provides output. One of many outputs can be ElasticSearch-cluster, with fancy webfrontend for searching, also many plugins exists (for amqp, greylog2, graphite, etc). This tool is similar to Splunk, maybe it's not cool as Splunk is - but for free (Splunk price is not for fainted of heart).
2. Varnish Book - "From creators of Varnish". :) PDF and epub will be soon.

пятница, 16 марта 2012 г.

Why server is so slow? - Use "The USE Method" !

I'm excited!
I've just found fabulous posts in DTrace blog -

1. The USE Method - http://dtrace.org/blogs/brendan/2012/02/29/the-use-method/
2. The USE Method: Solaris checklist - http://dtrace.org/blogs/brendan/2012/03/01/the-use-method-solaris-performance-checklist/
3. The USE Method: Linux checklist - http://dtrace.org/blogs/brendan/2012/03/07/the-use-method-linux-performance-checklist/
At glance - it's about method of checking system performance problem - USE (utilization, saturation, errors) method. In case of server system we have different resources - CPU, memory, storage, network etc.
For each resource we check

  • utilization: the average time that the resource was busy servicing work
  • saturation: the degree to which the resource has extra work which it can’t service, often queued
  • errors: the count of error events 

Usually we need to check for errors first, then utilization and saturation after this. But it's really short explanation - I beg you to check original posts. :) In next two posts Brendan provides recomendations with lists of commands for Solaris and Linux for using USE method.
And USE method is usable even without dtrace - Linux do not have native dtrace, and authors of original Dtrace blames current versions of Linux dtrace very hard. :)

воскресенье, 4 марта 2012 г.

Perl is on Heroku now !

Russian link is here.

Recently I found some amazing trick - you can use your favourite web-platform on Heroku! It can be possible using special feature called buildpacks. Of course, Perl is supported too - Miyagawa yourself wrote buildpack for PSGI/Starman application deployment - https://github.com/miyagawa/heroku-buildpack-perl :)
So, I've decided to move my Plusfeed.pl application to Heroku - because Stackato's trial period is over now. Also some annoying bug was also fixed - I've change ID of message to URL instead of Etag and now message will not go up in RSS feed after adding comment or +1. :(
Instruction below - 
1. Please create Makefile.PL for your application (see ExtUtils::MakeMaker for its format)
use strict;
use warnings;
use ExtUtils::MakeMaker;

    NAME      => 'plusfeed.pl',
    VERSION   => '0.05',
    AUTHOR    => 'Denis Zhdanov ',
    EXE_FILES => ['app.psgi'],
    PREREQ_PM => {
        'Google::Plus'             => '0.004',
        'XML::RSS'                 => '1.49',
        'XML::Atom::SimpleFeed'    => '0.86',
        'Plack::App::Path::Router' => '0',
        'Plack::App::File'         => '0',
        'Plack::Builder'           => '0',
        'Path::Router'             => '0.11',
        'CHI'                      => '0.5',
        'Starman'                  => '0.3',
    test => {TESTS => 't/*.t'}

2. Add your app to Heroku using custom buildpack - 
git init
git add .
git commit -m "Initial version"

heroku create --stack cedar --buildpack \ http://github.com/miyagawa/heroku-buildpack-perl.git
git push heroku master
All magic is in this buildpack line (of course, change  to the name of your application). 
You will get something like 
-----> Heroku receiving push
-----> Fetching custom buildpack... done
-----> Perl/PSGI app detected
-----> Installing dependencies
-----> Installing Starman
       Starman is up to date. (0.3000)
-----> Discovering process types
       Procfile declares types     -> (none)
       Default types for Perl/PSGI -> web
-----> Compiled slug size is 6.8MB
-----> Launching... done, v7
       http://plusfeed-pl.herokuapp.com deployed to Heroku
(In the first time you will get much more lines with installation of cpanm and all dependencies, but not for update).
Thats all, folks! Your app is up and running. 
My app lives on http://plusfeed-pl.herokuapp.com now.

понедельник, 6 февраля 2012 г.

Project Voldemort caveats

Russian post is here

We are using Project Voldemort in our project. If you are also hit some disk space problems as we do -
1. Please check current utilization of files -

# java -jar /usr/local/voldemort/lib/je-4.0.92.jar DbSpace -h /usr/local/voldemort/data/bdb -u

  File    Size (KB)  % Used
--------  ---------  ------
00000000      61439      78
00000001      61439      75
00000002      61439      73
00000003      61439      74
000013f6      61415       1
000013fd      61392       2
000013fe      61411       3
00001400      61432       2
00001401      61439       1
0000186e      61413     100
0000186f      61376     100
00001870      16875      95
  TOTALS  112583251       7

2. If TOTALS is much less than default (50%) - it seem that you have cache size shortage. Please calculate cache size for your number of records ( using your key and data average size, of course ) -

# java -jar /usr/local/voldemort/lib/je-4.0.92.jar DbCacheSize -records 1000000 -key 100 -data 300 
Inputs: records=1000000 keySize=100 dataSize=300 nodeMax=128 density=80% overhead=10%
    Cache Size      Btree Size  Description
--------------  --------------  -----------
   177,752,177     159,976,960  Minimum, internal nodes only   208,665,600     187,799,040  Maximum, internal nodes only   586,641,066     527,976,960  Minimum, internal nodes and leaf nodes
   617,554,488     555,799,040  Maximum, internal nodes and leaf nodes
Btree levels: 3
and adjust bdb.cleaner.threads accordingly (as maximum of internal nodes).
After this space must be reclaimed (after couple of hours or days - it depends of DB size).
Official and unofficial BDB JE documentation is very helpful.

Morale of the story - even underlying technology must be thoroughly investigated. :)

вторник, 24 января 2012 г.

I know Kung-Fu now :)

Задался я по работе интересным вопросом - а как собственно узнать какое приложение под Linux грузит диск? Для CPU есть top - а для диска?
Дебианщики скажут - iotop, но там нужно свеженькое ядро, а у нас CentOS 5 на 2.6.18 - и что ж нам делать?
Оказывается есть минимум три выхода.
1. Если ядро 2.6.1 и новее - а оно наверно так и есть, можно сказать

% sudo sysctl vm.block_dump=1
и после чего в /var/log/debug начинает валиться что то типа
May 27 10:00:20 kex kernel: tail(11548): dirtied inode 14093100 (ld.so.cache) on sda1
May 27 10:00:20 kex kernel: tail(11548): dirtied inode 15269995 (libm.so.6) on sda1
May 27 10:00:20 kex kernel: tail(11548): dirtied inode 15270399 (libm-2.3.2.so) on sda1
May 27 10:00:20 kex kernel: tail(11548): dirtied inode 18154515 (locale-archive) on sda1
May 27 10:00:21 kex kernel: pdflush(140): WRITE block 76808312 on sda1
Это конечто не Бог весть что, но с помощью могучей магии Перла или шелла это можно распарсить и просуммировать - 

% grep sda1 /var/log/debug | grep READ | cut -d: -f4 | sort | uniq -c | sort -rn | head
   1056  find(11566)
    247  mysqld(11855)
     43  mysqld(11751)
     26  mysqld(11790)
     21  tar(11717)
     16  mysqladmin(11748)
     13  mysqld(11899)
     13  grep(4560)
     12  mysqld(11863)
      6  mysqld(11760)
В общем, хоть что то, но есть способ лучше -

2. dstat - http://dag.wieers.com/home-made/dstat/
Тоже не "мегамагия", но на CentOS работает, хотя по дефолту выводит просто ИМЯ самого прожорливого до диска процесса - если их несколько (java?) то остается догадываться какой это именно процесс. В принципе плагины для него написаны на Питоне, и можно покопаться, но есть способ еще лучше.

3. SystemTap - http://sourceware.org/systemtap/
О, а вот это уже интересная штучка - аналог DTrace (который под Линукс нельзя портировать из за лицензионных ограничений). Поддерживается кучей контор, в т.ч. RedHat и IBM.
Полностью работоспособен в RHEL/Centos начиная с 5 версии. Для CentOS правда нужно доставить kernel-debug RPMки вручную (если у вас нестандартное ядро - придется их собрать и установить, но вся инфа есть тут - http://sourceware.org/systemtap/wiki/SystemTapOnCentOS)
Безопасен для production систем, вроде как - паники не вызывает.
Устанавиваем, идем на http://sourceware.org/systemtap/wiki/ScriptsTools, качаем disktop.stp, запускаем -

[root@localhost src]# stap -v disktop.stp 
Pass 1: parsed user script and 72 library script(s) using 21380virt/12988res/2280shr kb, in 300usr/220sys/712real ms.
Pass 2: analyzed script: 6 probe(s), 33 function(s), 7 embed(s), 21 global(s) using 122028virt/45148res/4200shr kb, in 2430usr/3790sys/16852real ms.
Pass 3: translated to C into "/tmp/stapFxPYtp/stap_5caa87e008b9d53ed275791ebd211ed4_27487.c" using 119868virt/45400res/5432shr kb, in 660usr/290sys/1328real ms.
Pass 4: compiled C into "stap_5caa87e008b9d53ed275791ebd211ed4_27487.ko" in 4630usr/3600sys/11755real ms.
Pass 5: starting run.

Tue Jan 24 15:41:06 2012 , Average:   2Kb/sec, Read:      13Kb, Write:      1Kb

     UID      PID     PPID                       CMD   DEVICE    T        BYTES
     500     3629     3621           pam_timestamp_c     dm-0    R        13824
     501     2759     2757                      java     dm-0    W          690
     501     2545     2542                      java     dm-0    W          552

Tue Jan 24 15:41:11 2012 , Average:   2Kb/sec, Read:      13Kb, Write:      1Kb

     UID      PID     PPID                       CMD   DEVICE    T        BYTES
       0     2395        1               VBoxService     dm-0    R        13824
     501     2545     2542                      java     dm-0    W          690
     501     2759     2757                      java     dm-0    W          671

Tue Jan 24 15:41:16 2012 , Average:   2Kb/sec, Read:      13Kb, Write:      1Kb

     UID      PID     PPID                       CMD   DEVICE    T        BYTES
     500     3629     3621           pam_timestamp_c     dm-0    R        13824
     501     2759     2757                      java     dm-0    W          690
     501     2545     2542                      java     dm-0    W          671

Tue Jan 24 15:41:21 2012 , Average:   5Kb/sec, Read:      27Kb, Write:      1Kb

     UID      PID     PPID                       CMD   DEVICE    T        BYTES
       0     2395        1               VBoxService     dm-0    R        13824
     500     3629     3621           pam_timestamp_c     dm-0    R        13824
     501     2545     2542                      java     dm-0    W          690
     501     2759     2757                      java     dm-0    W          671
Последняя колонка - трафик в байтах. Ура, радуемся и крутим фонарики.
(На самом деле если копнуть глубже то цифрам трафика верить нельзя, но порядок нагрузки ясен. Почему обьяснено тут - http://dtrace.org/blogs/brendan/2011/10/15/using-systemtap/ )

Тулза, похоже, очень перспективная, нужно будет поковырять и изучить.