Have you heard about Lambda Architecture (LA)? It's an interesting concept introduced by Nathan Marz ("the father" of Apache Storm) and it's basically about processing massive quantitive of data using two parallel streams:

  • batch stream - this one is able to do pretty much anything with unlimited quantity of data, but it has its latency, stored data has its inertia & crunching (even distributed)
Producing a lot of data doesn't bring value by itself, even if data actually makes sense. Storing a lot of data doesn't bring value by itself either. The ability to process all this data is the key to bring some actual value.

What is more, the processing has to:

  • be efficient
  • be ready for continuous adjustments, enhancements, improvements & further development

