Lambda Architecture - why should we care?

Lambda Architecture - why should we care?

Have you heard about Lambda Architecture (LA)? It's an interesting concept introduced by Nathan Marz ("the father" of Apache Storm) and it's basically about processing massive quantitive of data using two parallel streams: batch stream - this one is able to do pretty much anything with unlimited quantity of data, but it has its latency, stored data has its inertia & crunching (even distributed) can take a lot of time itself real-time stream - this one doesn't really care about all the data you've ever picked, but about the most recent part of it; it's aimed for minimum…

Read More

Scalding - functional data processing using Scala & Hadoop

Producing a lot of data doesn't bring value by itself, even if data actually makes sense. Storing a lot of data doesn't bring value by itself either. The ability to process all this data is the key to bring some actual value. What is more, the processing has to: be efficient be ready for continuous adjustments, enhancements, improvements & further development Honestly, if you're dealing with relatively simple data & straightforward analysis, you don't really need sophisticated tools: we've found out that in case of Hadoop (or to be more precise: Hadoop over HDFS), a simple abstraction layer like Hive…

Read More

Hadoop for Windows IS happening

I’ve already committed a detailed blog post about that some time ago, so I’m not going to repeat myself - if you’re interested in Apache Hadoop (http://en.wikipedia.org/wiki/Apache_Hadoop) implementation “by Microsoft”, you should check their latest announcements (but don’t get too excited, it’s still in CTP phase) - http://blogs.msdn.com/b/windowsazure/archive/2012/10/24/getting-started-with-windows-azure-hdinsight-service.aspx Yes, you’ve got it right - HDInsight is dedicated for Windows Azure, BUT as you can find here (http://www.microsoft.com/sqlserver/en/us/solutions-technologies/business-intelligence/big-data.…

Read More