DCSIMG
StreamInsight In a Nutshell - Zuker On Foundations

Zuker On Foundations

The realm of .NET (WPF, WCF and all around)
StreamInsight In a Nutshell

I’ve been reading about StreamInsight lately, there’s a lot of buzz around it nowadays.

StreamInsight is mainly a data processing engine which has great performance and latency achievements, as well as massive throughput handling capabilities.
It generally targets systems which need to handle events in massive concurrency or in cases where short latency is needed for resolving the information results.

Let’s look at High Level Architecture -

http://media.techtarget.com/rms/misc/4streaminsight.jpg

The architecture resembles BizTalk to say the truth, and is common for such processing engines.
The main terms that you need to be familiar with -

  1. Event – An event can be any .NET class which represents an occurrence that should be processed by the processing engine. An event can have different shapes to describe the specific scenario.
  2. Event Sources – Any given source for inputting events into the processing engine. You build the input adapters yourself so you can implement whatever you wish to do with any given source.
    For example, you may have a source which publishes events via WCF Service, Push / Polling style, Feeds, Database, etc.
  3. Input Adapters – You implement input adapters as you need. The role of the input adapter is to get events from sources and push it over to the processing engine.
  4. Streams – The input adapters create an event stream that you can query against to extract any desired information.
  5. Complex Event Processing (CEP) Engine – The processing engine gets events from the input adapters, and handles the events altogether.
  6. Query – You create and define queries over input streams to retrieve any information you like.
  7. Output Adapters – Once you define the input streams and the desired query, you need to do something with the results. The processing engine distributes the proper results to your predefined output adapters where you can do anything you like.
  8. Event Targets – The target components to which you want to send the results to. For example, database, SharePoint, applications, logging, etc.

To get a notion of the order of things, let’s look at the following scenario -

You have a system which monitors a very busy high-way road and publishes events when every car passes by a certain point.
Let’s assume a new requirement is being presented - write an analytic processing system on top of that.
For example, let’s assume we need to extract the daily average amount of black cars that drive through the road.

This is a classic example for using StreamInsight.
Our first step would be to create the input adapter to get the events when a car passes by and push it over to the processing engine.
Then, we will define the query that interests us. You do that using LINQ. In this example I would query all the black cars that passed on a daily basis and get the average amount.
Afterwards, I will connect everything to my output adapter which persists the processed results into my database.
Finally, I can view the results on any given application since I have it already persisted.

General Pros -

  1. Great Performance and Scale-Up – This product is written in native C++ and does great work regarding performance and resource allocations. Furthermore, it knows how to scale-up. It uses more resources if available and take advantage of multi-core environments.
  2. Good architecture – Such architecture which consists of Input/Output and query models enables good and clean separation of things.
  3. Great product for dealing with massive concurrent events or short latency requirements

General Cons -

  1. Customization and External Code Support – The query model generates XML configuration which the processing engine can execute in its unmanaged code environment. This means that the code is executed in their environment. Such thing does enable very good performance on the one hand, but once you need to customize it or call external infrastructures, it becomes a more difficult task, especially if you don’t wish to affect the performance to the worse.
  2. Scale-Out – Currently, the processing engine is designed to work in-memory in one process. This means it doesn’t scale-out by design. You could perhaps write your own solution for that, but doesn’t come out-of-the-box.

There’s much more that you can learn deeper, such as event shapes, windowing, liveliness, CTI, configuration, deployment, testing, debugging, etc.

Published Monday, August 23, 2010 3:16 PM by Amir Zuker

תגים:,

Comments

No Comments

Leave a Comment

(required) 
(required) 
(optional)
(required) 

Enter the numbers above: