Have you heard about Lambda Architecture (LA)? It's an interesting concept introduced by Nathan Marz ("the father" of Apache Storm) and it's basically about processing massive quantitive of data using two parallel streams:
- batch stream - this one is able to do pretty much anything with unlimited quantity of data, but it has its latency, stored data has its inertia & crunching (even distributed)
Producing a lot of data doesn't bring value by itself, even if data actually makes sense.
Storing a lot of data doesn't bring value by itself either.
The ability to process all this data is the key to bring some actual value.
What is more, the processing has to:
- be efficient
- be ready for continuous adjustments, enhancements, improvements & further development
Honestly, if you're
I’ve already committed a detailed blog post about that some time ago, so I’m not going to repeat myself - if you’re interested in Apache Hadoop (http://en.wikipedia.org/wiki/Apache_Hadoop) implementation “by Microsoft”, you should check their latest announcements (but don’t get too excited, it’s still in CTP phase) - http://blogs.msdn.com/b/