Overcoming long Spark job runtime on small datasets

If you are dealing with relatively low datasets < 1M entries (and you just have to use Spark for some reasons), significant speedup can be achieved with tuning (lowering) number of partitions.

Basically, setting `spark.default.parallelism` param to number of cores and `spark.sql.shuffle.partitions` to something like 20 (instead of default 200), will allow you to receive significant speedup, since Spark won’t lose time on shuffling RDDs and generating large number of tasks.

Source.

Another useful link.

Migrating code from Zeppelin to Spark

When you have shiny Zeppelin application, which runs smoothly and does what it supposed to do, you start transferring your code into Spark environment to use it in production. If you are novice in Hadoop environment (like me), you might encounter a couple of tasks, required to be solved before you will celebrate project launch.

Basically, it can be broken down into easy chunks:

  1. Launching spark-submit with test class.
  2. Adding main class and Spark context initialization.
  3. Building fat jar (which includes all the  libraries).
  4. Launching a job with a spark-submit.

Continue reading “Migrating code from Zeppelin to Spark”

Founders at work — short summary

One of the best books about the history of the common internet, how the whole IT sphere was developed, through the interviews of the founders of tech companies. It is unique material of opinions and real life stories of founders companies which made a lot of today commodities.

This is a book, there can be no good short summary for, because every story is unique, but they have something in common. And despite that it can only be highly recommended to read this book, despite the fact that most of the companies were built and sold over 2 decades ago, there are some key insights which can be written down here.

  1. Doing something new and innovative isn’t fun. Usually it takes much more than 40 hours work week. And there is only one way to allocate more time — from sleep.
  2. Business plans are useless. Life is too complicated and the only purpose of writing a business is demonstrating that you are committed enough to do some extra work.
  3. People is everything that matters. Team and connections — this is one of the fundamental things, which if crucial to performing the best.
  4. Users are never speaking of their problems, they are suggesting feature they think can help them.

Замена дисплея на маке по гарантии от Staingate в России

Staingate — это признанная Apple проблема, из-за которой экраны (а точнее — антибликовое покрытие) MacBook-ов начинали выглядеть потёртыми или поцарапанными, не смотря на отсутствие явных физических повреждений. Проблемы были подвержены ноутбуки от 2013 до 2015 выпуска и для них существует возможность бесплатной замены дисплея в авторизованных центрах Apple. Не смотря на отсутствие фирменных магазинов, в России работают авторизованные сервисные центры, в которые можно придти и совершенно бесплатно заменить дисплей.

Continue reading “Замена дисплея на маке по гарантии от Staingate в России”

Nagios doesn’t start / internal service error / PID unfound

Nagios web-interface can fail to connect to process or to display any content at all. There can be different source of the problem, but the key to solving it:

  1. Checking permissions on the cgi-bin, fcgi-bin folders and scripts inside /nagios path.
  2. Checking Suexec log if you use it (`/var/log/apache2/suexec.log` for example).
  3. Checking access mode on statusjson.cgi and status.cgi

In case it doesn’t help, and nagios simply refuses to start, showing only:

Running configuration check...
or
Starting nagios:.

You have to manually run configuration check with the -v flag:

/usr/local/nagios/bin/nagios -v /usr/local/nagios/etc/nagios.cfg

If the check is correct, but nags refuses to run, providing no info in {path}/var/nagios.log and not creating {path}/var/nagios.lock, then the problem might be in the device space.

Setting up Nagios monitoring on Debian-based servers

Nagios is a great tool for server monitoring, providing rich environment to setup different notifications and monitoring scenarios.

It supports hosts and notification service grouping, comes with a GUI out-of-the-box and has really great community and lots of documentation online.

Pretty much all the setup can be done, based on the google’s top search results. Nice place to start is Digital Ocean post about setting up Nagios on Ubuntu 14.04. There is  also another nice post, explaining how configuration is done.

If you fail to install package, or your server is different from Ubuntu, f.e, you run some Debian Stable server, you can address this post (RU) and build Nagios from source.

Configuring client can be done via this post, but check_nrpe command available only on server, not on the client.