DCSIMG

I'm on a mission from God object

Yaniv Rodenski

News



Oredev Hadoop Azure

QCon new-york Hadoop Azure

NDC Hadoop Azure

View Yaniv Rodenski's LinkedIn profileView Yaniv Rodenski's profile

My SDP 13 sessions demo code

Sela Developer Practice for 2013 is finishing it’s breakout sessions stage, and I had three fun sessions:

  • Things Every Developer Most Know about HTTP
  • Big-Time: Introducing HDInsight service (Hadoop on Azure)
  • Get your Node.js Under Control with TypeScript

I had great time, and would like to thank all attendees for attending my sessions. You can find my demo’s code in my Sky Drive.

Next on Thursday, I have a full day workshop on Node.js, with lot’s of good stuff planned like: Express and hosting node (on premises and in Windows Azure). I also want to thank two of my colleagues who help me develop the workshop:

  • Eyal Ben-Ivri – who help me develop the hosting module (and is among other things a Linux ninja).
  • Danny Vernovsky – who is going to join me in the workshop and talk about socket.io.

So hope to see you there,

Yaniv

SDP 2013 Bigger Longer & Uncut

After speaking in few international conferences outside of Israel in the past 6 months, I have the pleasure of taking part in another conference which is near and dear to me: Sela’s own SDP. This year our conference is bigger and stronger than ever with an international lineup, including Jesse Liberty, Shawn Wildermuth, Udi Dahan and Mike Martin who will also be speaking in my user group, The Windows Azure community, that month.

This SDP I have the pleasure of giving four breakout sessions:

  • Things Every Developer Should Know About HTTP
  • Hadoop on Azure
  • The Gentle Art of Building Web APIs
  • Get your Node.js under control with TypeScript

I am very proud to be a part of such a conference, and I believe it is about time Israel will take it’s place in the conferences world as well. If you feel the same join us in the coolest conference in Israel E V E R.

image

Some more Node.js

Beside the breakout sessions I am going to give one full day tutorial, on Node.js. The tutorial is based on a course I am currently creating (soon to be available publically).

So if you want to get up to speed with Node.js I would love to see you there.

Yaniv

BUILD 2012 - Day 2

uild is starting to pick up and today was actually packed with Windows Azure actions. The keynote had a couple of cool announcements and a couple of extremely cool hidden jams. The keynote was led by Satya Nadella. The first thing that caught my eye was a slid showing strategic Windows Azure based operations. One logo belonged to my good friends PlayMyTone and it's a good opportunity to congratulate Ohad and the gang for the great job they have been doing.

BUILD keynote day 2

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

Satya Nadella (Photography by Bnaya Eshet)

The next couple of cool staff that ware kind of hidden in Scott Hansellman's demo:

  • EF 6 Alpha bits
  • New Windows Azure storage explorer  in visual studio now supports CRUD capabilities and Queues
  • Help pages for ASP.NET web APIs

And last but not least, there was David Campbell talking about Hdinsight, the new name for Hadoop on Azure. With a couple of brand new cool features including:

  • Linq provider for HIVE
  • A brand new C# SDK on codeplex that includes a MapReduce API similar to the new Java API (AKA the context API)

Well now I really got to start going over the new bits so you can expect interesting stuff soon

 

BUILD 2012 - Day 1

This year's BUILD opening gave me a great opportunity to put aside the need to deal with our travel arrangements affected by hurricane Sandy. Steve Ballmer gave what is in my mind his best keynote ever. I am not a client guy and the new Windows 8 user experience will not affect most of my day-to-day use of command-line but in the time passed from last year's BUILD windows 8 really matured and it feels nice to meet windows at both ends of this journey.

BUILD 2012 keynote

 Steve Ballmer (Photography by Bnaya Eshet)

Getting a new Surface tablet and a Nokia Lumia 920 helped keeping me excited throughout the day. Another announcement  that got me all excited was the fact that the Microsoft music streaming service is going to be free on all Windows 8 devices.

Other than that  the day was really more client side oriented and left me filled with expectations  for day two where the cloud sessions really start.

Upcoming events: Øredev, SDP and the Windows Azure/Web Developers Communities Holidays special

As always, September is a very special time for me. This year, in addition to my b-day and the Jewish high holidays September marks the beginning of a very intensive period, packed with both public events and very special projects. So without farther ado here are the events that have been keeping me busy lately:

 

Windows Azure Community Israel

The first thing in my very busy calendar is the Windows Azure/Web Developers Communities Holidays special which is going to take place in the Microsoft R&D center in Herzliyya. The meeting is taking place thanks to the great help of the guys at the Windows Azure Accelerator program that helped us greatly and provided us with an alternative venue while the Microsoft Raanana office is closed for the holidays. As you know we are devoted for helping the developer community in Israel (with a passion for windows azure) and feel that the Windows Azure Accelerator program is a great platform to introduce to our groups.

We are also very excited about the session for this meeting, Ken Egozi (who you might remember from the ALT.NET group) is visiting Israel and is going to talk about the project he was recently working on: Windows Azure Mobile Services.

Øredev

After this excitement (and a short vacation) I am going to give a couple of sessions myself, the first one is going to be in Malmö Sweden, where I am going to speak in Øredev, about two of my favorite topics: Hadoop and Windows Azure all wrapped up in one session.

SDP

The next couple of sessions are going to be in Sela's own SDP. Where I am going to give two fully booked one day tutorials along side Ido Flatow titled HTTP Fundamentals using ASP.NET Web API and I am very excited about these sessions as well.

I hope for another few sessions to pop-up, but for now I guess I should go back to work.

So Shana Tova everyone and see you soon.

Troubleshooting Windows Azure diagnostics using the Diagnostic Infrastructure Logs

One of my favorite jams of windows azure is the use of theWindows-Azure-Logo-New diagnostics.wadwcg file for setting up windows azure diagnostics. This is a great alternative for configuring the The Windows Azure diagnostics infrastructure in code.

Last week I had an issue working on a customer deployment where I just could not get trace logs using the diagnostics.wadcfg configuration. I was getting the Windows Event Logs that was also configured just fine, but for some reason the trace logs would just not come through.

Setting the diagnostics.wadcfg (just like setting the configuration in code) basically generates another configuration file (per role per deployment) in a container called wad-control-container:

the wad-control-container

Opening the configuration file I have discovered that wile the the diagnostics infrastructure loaded my Windows Event logs configuration correctly, the configuration for the trace logs was not loaded at all:

image

Another thing you can monitor using the Windows Azure diagnostics infrastructure is the Windows Azure diagnostics infrastructure. To set up the Windows Azure diagnostics infrastructure logs, I just added the DiagnosticInfrastructureLogs section to the diagnostics.wadcfg file:

<?xml version="1.0" encoding="utf-8" ?>
<DiagnosticMonitorConfiguration xmlns="http://schemas.microsoft.com/ServiceHosting/2010/10/DiagnosticsConfiguration"
                                configurationChangePollInterval="PT1M"
                                overallQuotaInMB="4096">
  <Logs bufferQuotaInMB="1024"
        scheduledTransferLogLevelFilter="Verbose"
        scheduledTransferPeriod="PT1M"/>
  <DiagnosticInfrastructureLogs bufferQuotaInMB="1024" 
                                scheduledTransferPeriod="PT1M" 
                                scheduledTransferLogLevelFilter="Verbose"/>
  <WindowsEventLog bufferQuotaInMB="1024" 
                   scheduledTransferPeriod="PT1M" 
                   scheduledTransferLogLevelFilter="Information">
    <DataSource name="System!*"/>
  </WindowsEventLog>
</DiagnosticMonitorConfiguration>

Now I could start my solution and the Windows Azure diagnostics infrastructure created a table called WADDiagnosticInfrastructureLogsTable:

image

looking around I have found the following exception:

Message string  Polling for configuration changes:System.InvalidOperationException: There is an error in XML document (8, 8). ---> System.FormatException: Input string was not in a correct format.
   at System.Number.StringToNumber(String str, NumberStyles options, NumberBuffer& number, NumberFormatInfo info, Boolean parseDecimal)
   at System.Number.ParseInt32(String s, NumberStyles style, NumberFormatInfo info)
   at System.Xml.XmlConvert.ToInt32(String s)
   at Microsoft.WindowsAzure.Diagnostics.XmlSerializationReader1.Read7_BasicLogsBufferConfiguration(Boolean isNullable, Boolean checkType)
   at Microsoft.WindowsAzure.Diagnostics.XmlSerializationReader1.Read11_DiagnosticMonitorConfiguration(Boolean isNullable, Boolean checkType)
   at Microsoft.WindowsAzure.Diagnostics.XmlSerializationReader1.Read14_ConfigRequest(Boolean isNullable, Boolean checkType)
   at Microsoft.WindowsAzure.Diagnostics.XmlSerializationReader1.Read27_ConfigRequest()
   at System.Xml.Serialization.XmlSerializer.Deserialize(XmlReader xmlReader, String encodingStyle, XmlDeserializationEvents events)
   --- End of inner exception stack trace ---
   at System.Xml.Serialization.XmlSerializer.Deserialize(XmlReader xmlReader, String encodingStyle, XmlDeserializationEvents events)
   at System.Xml.Serialization.XmlSerializer.Deserialize(Stream stream)
   at Microsoft.WindowsAzure.Diagnostics.ControlChannel.<>c__DisplayClass10.<ConfigMonitoringPoll>b__f()
   at Microsoft.WindowsAzure.Diagnostics.ControlChannel.TryExpectError(HttpStatusCode status, Action act)
   at Microsoft.WindowsAzure.Diagnostics.ControlChannel.ConfigMonitoringPoll(Object sender, ElapsedEventArgs args); TraceSource 'Microsoft.WindowsAzure.Diagnostics' event

Apparently the was something wrong with my configuration and one of the integer values (probably the bufferQuotaInMb). I actually spent half a day trying to figure out what was wrong (I was just curious) but could not find out what was wrong, so I just rewrote the XML element.

Conclusion

There are actually two things I am taking with me from this experience. The first thing I have learned is that even if the Windows Azure diagnostics infrastructure can not parse parts of the diagnostics.wadcfg file it will use the parts it can. But most of all I learned how informative and helpful the Windows Azure diagnostics infrastructure logs can be.

Windows Azure Community Israel - Here We Go...

A while ago I posted about my busy June, speaking in two major conferences NDC Oslo (I’m actually posting this from Oslo right now)Windows-Azure-Logo-New and QCon New-York. Well June just got even more special for me. On the 25th we are kicking off the Israeli Windows Azure Community that is being managed by Gal Kogman and yours truly. The community is going to meet every month on Monday of the forth week, in ILDC hertzelia.

This community is special for me. I’ve been working for a while now to create a home for Windows Azure developers in Israel (both online in our msdn forum and now physically) and it is nice to see that vision comes alive, with a lot of help from the guys at Microsoft Israel.

For our grand opening we have Noam King over as well as lot’s of cool new things lined up, but we just can’t tell you about them quite yet Smile. So for now just register here for the event and keep following us.

See you there

Yaniv

Hadoop on Azure - Creating and Running a simple Java MapReduce

Apache Hadoop has a variety of APIs for developing MapReduce applications: you can Hadoop-Azure-Logo-Newuse the streaming API to create MapReduce applications with almost any programming language, Hadoop pipes adds native support for C++ applications and Hadoop on Azure provides it’s IsotopeJS library for creating JavaScript MapReduce jobs. You can also use a variety of higher-level abstractions and libraries such as Pig and Hive. With that said and done, it is also useful to know how to develop MapReduce applications using Hadoop’s most natural and primal Java API. This API allows you to develop richer, more powerful MapReduce apps and has a staggering amount of samples around the Internet.

If you come from a .NET background (like I do), creating your first MapReduce job might be a little bit frustrating the first time around. But once you get past some of the initial hurdles you will be writing Java MapReduce code in your sleep.

Setting up your IDE

After playing around with quite a few IDEs I have found JetBrains IntelliJ IDEA to be the best choice. I am currently using the community edition that can be downloaded free of charge from here.

Now we need to create a new project. The easiest way to create a project using Maven. Maven is basically a build automation tool but one of it’s coolest features is the ability to to create a new project based on a template (called an archetype). I used this post to setup Maven on my machine

Next we need to get an archetype that matches the Hadoop version used for the Microsoft Hadoop Distribution  (0.20.2). Matthias Friedrich created such a repository which creates a sample word count project. In order to use this Maven repository you need to run the following command line from your working folder:

mvn archetype:generate -DarchetypeCatalog=http://dev.mafr.de/repos/maven2/ -DgroupId=com.hadoop.example -DartifactId=MyFirstMapReduce

Note that MyFirstMapReduce is the name of my package.

Next you will be prompted to select the archetype from the list of archetypes in the repository:

Maven mvn archetype for Hadoop on Azure

Select 1 and press Enter.

You will be asked to change the version property. Press Enter to keep the default value of 1.0-SNAPSHOT and when prompted again select Y to confirm the details of your project.

Maven mvn archetype for Hadoop on Azure

A Look Around Our Project

Open IntelliJ IDEA, select File->Open Project and the folder created by Maven, Click OK:

IntelliJ project for Hadoop on Azure

The newly created project should load. This project is created with the classic MapReduce sample: word count including all its dependencies as well as some unit tests.

IntelliJ project for Hadoop on Azure

Open the TokenizingMapper class and view its implementation:

public class TokenizingMapper extends Mapper<LongWritable, Text, Text, IntWritable> {
    private static final IntWritable one = new IntWritable(1);
   
    protected void map(LongWritable offset, Text value, Context context)
            throws IOException, InterruptedException {
       
        StringTokenizer tok = new StringTokenizer(value.toString());
        while (tok.hasMoreTokens()) {
            Text word = new Text(tok.nextToken());
            context.write(word, one);
        }
    }
}

This common sample reads the input file line by line, and generates a key-value output with every input word as its key and the digit 1 as its value. This sample actually uses the IntSumReducer class that basically sums the values of the inputs based on their key. Lets take a look at the WordCount class that implements the Tool interface. The Tool interface is used to handle the command-line arguments and configure the MapReduce job.

public class WordCount extends Configured implements Tool {
    
    public static void main(String[] args) throws Exception {
        int res = ToolRunner.run(new Configuration(), new WordCount(), args);
        System.exit(res);
    }

    @Override
    public int run(String[] args) throws Exception {
        String[] remainingArgs 
                = new GenericOptionsParser(getConf(), args).getRemainingArgs();
        
        if (remainingArgs.length < 2) {
            System.err.println("Usage: WorldCount <in> <out>");
            ToolRunner.printGenericCommandUsage(System.err);
            return 1;
        }
        
        Job job = new Job(getConf(), "WordCount");
        job.setJarByClass(getClass());
        
        job.setMapperClass(TokenizingMapper.class);
        job.setCombinerClass(IntSumReducer.class);
        job.setReducerClass(IntSumReducer.class);
        
        job.setOutputKeyClass(Text.class);
        job.setOutputValueClass(IntWritable.class);
        
        FileInputFormat.addInputPath(job, new Path(remainingArgs[0]));
        FileOutputFormat.setOutputPath(job, new Path(remainingArgs[1]));
        
        boolean success = job.waitForCompletion(true);
        
        return success ? 0 : 1;
    }
}

Now we can build the package. To do this we will use Maven again by running the following command from the project’s directory:

mvn package

after maven finishes building your package you should see the following output:

Maven mvn build for Hadoop on Azure

deploying and running the job

before we start, we need some text to analyze. you can use any text file for that, i specifically am going to use the the 2010 cia world factbook i have downloaded from project gutenberg.

to deploy the job you will first need to setup a hadoop on azure cluster. this is fairly easy all you need to do is just give your cluster a dns name, select the size of cluster you would like to setup and create a cluster login using the cluster request form:

Creating a cluster using Hadoop on Azure 

once the cluster is up, click on the remote desktop icon, placed in the “your cluster” group and log in using the credentials you set during the cluster request.

rdp using Hadoop on Azure

copy your input file to the local file system on the remote server and open the hadoop command shell using the desktop shortcut. copy the file to hdfs using the following command:

hadoop fs -copyfromlocal c:\ciabook.txt input/ciabook.txt

let’s list the files in our hdfs using the following command:

hadoop fs -lsr

HDFS

we’re now ready to run the word count application. let’s go back to the portal and create a new job. click on the create job icon, placed in the “your tasks” group

Creating a job using Hadoop on Azure

in the create job page, give the job a friendly name and the jar file. the jar file can be found under target in the project’s directory, and it should be named [package name]-1.0-snapshot-job.jar where [pakagename] is the name of your package.

next you need to add two parameters by clicking the add parameter button (once for each parameter). set the value of the first parameter (parameter 0) to the input directory (the input directory we just created) and set the second parameter (parameter 1) to output (this will be the output directory create by the mapreduce job). click the execute job button.

Creating a job using Hadoop on Azure

your job should start executing. for the cia factbook (approx. 12mb in size) the job executes in less than 2 minutes on average. once the job has completed you should see that its status is “completed sucessfully” (the typo is embarrassingly from the hadoop on azure portal)

Creating a job using Hadoop on Azure

open the interactive console icon placed in the “your cluster” group and run the #lsr command to list the files in our hdfs:

lsr using Hadoop on Azure interactive console

and use the #tail command to see the first few lines in the file output file:

tail using Hadoop on Azure interactive console

summary

coming from a .net background writing hadoop java mapreduce code can be a little bit intimidating. but just like in the .net world, the right tool can make all the difference.

kick it on DotNetKicks.com Shout it

Slides and Demos from my REST via ASP.NET Web API @ WDCIL

I had a very good time today talking at the Israeli Web Developers Community meeting held at Microsoft Raanana. I Want to thank everyone for participating for a great discussion about REST, Hypermedia and ASP.NET Web API.

I would like to take this chance to share my demo code (available here) and slide deck:

Thank you all for a fun evening.

Yaniv

Upcoming Gigs: REST, Web API and Hadoop

It seems that every time I have a talk about Web API’s something big happens, last time Microsoft moved it from WCF to ASP.NET and this week Microsoft announced about the release of ASP.NET web stack under an open source license.

starshiptroopers

This announcement comes a the beginning of a couple of very exciting months for me covered with talks about few things that ware the focus of my professional life for the past year:

  • REST, Web API

On Monday, April 2, I will be talking @ the Israeli web developers community (WDCIL) about REST via ASP.NET web API. In this session we will discuss REST\Hypermedia services and how ASP.NET Web API can help us develop them.

You can register here for the event, hosted at Microsoft office in Ra'anana

  • Apache Hadoop-based services for Windows Azure

For the lost 6 months I have been a part of a wonderful team at Sela that created training content focusing on Apache Hadoop-based services for Windows Azure. In June, I will have a couple of exciting sessions sharing the experiences we had developing and using Hadoop on Azure:

June 6-8 – Norwegian Developers Conference (NDC). Oslo, Norway.

On the beginning of June, I’ll be traveling to Oslo, with my friend and colleague Ido Flatow where we both will be speaking @ the Norwegian Developers Conference (NDC). If you are there I must recommend Ido’s Fiddler session (I have seen myself a couple of times).

NDC Hadoop on Azure

June 18-23 – QCon New-York.

Towards the end of June, I will be traveling to New-York to speak at the first ever QCon New-York. I am honored to be joining real all-star panel of speakers and can’t wait to be a part of this exciting conference.

QCon New-York Hadoop on Azure

If you are planning on attending QCon New-York, you are also invited to use my discount code: RODE100 and get a 100$ discount for all prices.

There are some more talks coming on soon, but for now I have a lot of work ahead of me. I hope to see you in one of my talks, in a conference/develpers community near you.

Yaniv

ASP.NET Web API: Fun with verbs

For those of you who do follow me on twitter (shame on you), last Thursday’s announcement about the reincarnation of the WCF Web API as the ASP.NET Web API could not come in worst timing. As I was scheduled to deliver a 3 hours session in Microsoft for an audience of WCF developers entitled REST via the WCF Web API. Since yours truly is not one of those speakers that is willing to talk about last weeks technology, I have spent most of the weekend rewriting my demos and rebuilding the session (the demos can be found here).

Now the the storm have passed, I had some time to go back and revisit my new found love. Before we begin I have one confession to make: last time I actually wrote ASP code this is what it was called, ASP, and .NET was starting its first beta. I guess what I am going thru right now is going to be a common pain for WCF developers who developed HTTP services in the past few year.

So for my session I built a fantasy football site called my footy. my first and most basic functionality was to create my  services controller (formally known as service) using the ASP.NET MVC 4 template for Web API. the basic template creates a project built more or less like most MVC projects (as far as a WCF guy like me can tell) with minor differences. My first stop was to create the players controller that looked more or less like this:

public class PlayersController : ApiController
{

    // GET /players/neymar
    public Player Get(string name)
    {
        // method implementation
    }

    // POST /players
    public HttpResponseMessage<Player> Post(Player player)
    {
        // method implementation
    }

    // DELETE /players/neymar
    public void Delete(string name)
    {
        // method implementation
    }
}

As you can see I have different methods (that’s C# methods) for different HTTP verbs. the verbs are mapped to the methods by convention (apparently this is called convention over configuration). This convention will map the verb to any method who's name starts with the verb so I can have even more meaningful method names. As you can see my Post and Delete methods accept a name parameter that in the old days (last week) we used to map using a Uri template in our WebGet or WebInvoke attributes. Now in the days of ASP.NET web API this is mapped as part of the routing registration in the Global.asax like this:

routes.MapHttpRoute(
    name: "DefaultApi",
    routeTemplate: "{controller}/{name}",
    defaults: new { name = RouteParameter.Optional }
);

This tells ASP.NET to use the above template when accessing the application, first parameter is the name of the controller second one is a parameter called name. Since I am using IIS to run my application, I needed to allow Delete which is blocked by default by WebDAV (and thanks to Sebastian Pederiva for his IIS support) so I added the following section to my default web site’s Web.Config:

<system.webServer>
  <modules>
    <remove name="WebDAVModule" />
  </modules>
  <handlers>
    <remove name="WebDAV" />
  </handlers>
</system.webServer>

Works like a charm. The next functionality I had for my players service was to add a transfer functionality, so my users could trade players. I have added the Transfer method that looks like this:

public HttpResponseMessage<Player> Transfer(Player player, string teamName)
{
    // method implementation
}

So now I had to tweak my routing to support this, so I’ve added a second HTTP route mapping like this:

routes.MapHttpRoute(
    name: "TransferApi",
    routeTemplate: "{controller}/{action}/{teamName}",
    defaults: new { name = RouteParameter.Optional }
);

Now I could have used HTTP Post to access my method using the Uri template: “players/transfer/{teamName}”. But wait, this is not all. I could now access my method using HTTP Get. This is apparently a usual behavior for ASP.NET MVC and we can solve this by adding the HttpPost attribute:

[HttpPost]
public HttpResponseMessage<Player> Transfer(Player player, string teamName)
{
    // method implementation
}

That’s better, but we are not nearly done. One of the side effects of my second route mapping is that now my Get, Post and Delete methods are also mapped as action (beside being mapped to HTTP verbs) so I can now get the representation of a player both by using http://localhost/players/neymar and http://localhost/players/get/neymar as an address. While I can live with two addresses exposing the same resource the issue is a bit trickier with my Delete method that is now accessible either by using the verb Delete using the http://localhost/players/neymar Uri and using the verb Get and the http://localhost/players/delete/neymar Uri. Using Get to delete resources is just bad HTTP, if for no other reason than due to fact HTTP assumes get is both safe and idempotent (you can read more about safe and idempotent verbs here)  so I added the HttpDelete attribute to make sure my service is consumed the way I planned it to be consumed:

[HttpDelete]
public void Delete(string name)
{
    // method implementation
}

That’s better Smile

I guess if you are coming from ASP.NET MVC background most of what I showed here might seem trivial but for WCF developer there are a lot of things to notice here. With that said and done I do hope ASP.NET web API will encourage developers do write more HTTP services no matter what your background is.

Shout it

Improve WCF services testability with simple Dependency Injection

Dependency injection is a great technique to reduce coupling between components and improve testability. There are few techniques we can create dependency injections, you can use a framework like MEF or spring to Automate dependency injection but I personally favor manually injected dependencies. call me old fashion, but I like creating object via simple constructor calls (most of the time).

This is really straight forward most of the time but when dealing with WCF services there is a slight complexity to take in to consideration. In most scenarios WCF is in charge of instantiating the service class (the only exception here is with single instance context mode, where we can supply ServiceHost with a ready made instance of our service class).

Lately I have come across a really cool (and simple) option in WCF Web API. The WCF Web API supply an HttpConfiguration API that exposes a CreateInstance delegate we can use to manually create a new instance of our service class:

HttpConfiguration config = new HttpConfiguration();
config.CreateInstance = (type, context, message) =>
{
    IPlayersDal dal = new PlayersDal();
    return new PlayersCURD(dal);
};

var factory = new HttpServiceHostFactory() { Configuration = config };

While this API is cool, it can only be used for http based services (using the WCF Web API). I really felt like using something like that in a SOAP based project I am currently working on so I figured what the hack, I can create the similar solution (source code can be found here) for any WCF service host out there.

The first stop was creating an IExtension<ServiceHostBase> that can transport the delegate down the WCF pipeline:

class InstanceInitializerExtension : IExtension<ServiceHostBase>
{
    public Func<object> InstanceInitializer;

    public void Attach(ServiceHostBase owner)
    {
    }

    public void Detach(ServiceHostBase owner)
    {
    }
}

Next we need to create an instance provider. this is the runtime component that WCF actually uses to create a new instance of the service class:

public class ManualInstanceProvider : IInstanceProvider
{
    public object GetInstance(InstanceContext instanceContext, Message message)
    {
        var extension = instanceContext.Host.Extensions.Find<InstanceInitializerExtension>();
        return extension.InstanceInitializer();
    }

    public object GetInstance(System.ServiceModel.InstanceContext instanceContext)
    {
        return GetInstance(instanceContext, null);
    }

    public void ReleaseInstance(System.ServiceModel.InstanceContext instanceContext, object instance)
    {
    }
}

To hook the the ManualInstanceProvider we need to create a service behavior and implement the ApplyDispatchBehavior method like this:

public void ApplyDispatchBehavior(ServiceDescription serviceDescription, ServiceHostBase serviceHostBase)
{
    foreach (ChannelDispatcherBase cdb in serviceHostBase.ChannelDispatchers)
    {
        ChannelDispatcher cd = cdb as ChannelDispatcher;
        if (cd != null)
        {
            foreach (EndpointDispatcher ed in cd.Endpoints)
            {
                ed.DispatchRuntime.InstanceProvider = new ManualInstanceProvider();
            }
        }
    }
}

The last thing we need to do is create an extension method for ServiceHostBase that will allow setting the delegate as the factory function of our host:

public static void ConfigureInstanceFactory(this ServiceHostBase host, Func<object> instanceInitializer)
{
    // adding a behavior that hocks up the ManualInstanceProvider
    host.Description.Behaviors.Add(new InstanceCreationBehavior());

    // passing the instance initialize down the rabbit hole
    host.Extensions.Add(new InstanceInitializerExtension
                            { InstanceInitializer = instanceInitializer });
}

Now we are ready to create our service host:

var host = new ServiceHost(typeof(PlayersService));
host.ConfigureInstanceFactory(() => new PlayersService(new PlayersDal()));

host.Open();

Summary

Using dependency injection can some (if not most) of the time a simple task. Using this extension we can utilize whichever technique we choose.

Yaniv

kick it on DotNetKicks.com Shout it

MSDN Israel Windows Azure and SQL Azure forum

I am very happy to announce the opening of a new Windows Azure and SQL Azure forum in Hebrew. The new forum is managed by Shay Friedman and myself. Azure

If you are new to Windows Azure, I would like to use this opportunity to invite you to experience cloud development with our support and guidance. If you are already experienced with Windows Azure development I would like to assure you that you can find in our forum help from highly experienced professionals.

With your participation I am sure we can create an awesome and vibrant community.

Hope the catch you there soon

Yaniv

Understanding Widows Azure Queue Storage Throughput

The Asynchronous Queuing Pattern describes a classic way to improve service throughput in distributed applications. Azure2Over the years I have seen quite a few implementations of this pattern, from the use of MSMQ to ReactiveQueue, each with its own strengths and weaknesses. Windows Azure queue storage is designed for passing messages between applications in a persisted, scalable and controlled manner. With the above attributes, queue storage is a natural choice for enabling the Asynchronous Queuing Pattern, as described in detail in this MSDN magazine article.

A recent implementation I ran across at a client challenged the performance of the Azure queue storage, especially when dealing with a large queue. Their initial implementation was too slow due to a design issue we identified easily, but now they were stuck with a queue containing millions of records and they could not retrieve the messages fast enough. I decided to measure the length of the different queue operations they were using.

The code I used to measure the performance is very simple and can be found here so you can reproduce the tests for yourself. Keep these considerations in mind:

  • We are using a public storage infrastructure that is prone to preemption by other applications. 
  • The Windows Azure storage infrastructure and API implementations are subject to change.

The following totals reflect 1000 iterations (minus the first 2 to remove the additional cost of the JIT compiler and other potential initialization overhead) of a standard consumer/producer use of Windows Azure queue storage:

Total ticks

% Execution

new CloudQueueMessage

3013727

0.000513214

AddMessage

444328027497

75.66556694

GetMessage

79718072883

13.57536056

DeleteMessage

63164926400

10.75648996

AsString

12151612

0.002069324

image

The first thing we notice is that we can easily improve is the message retrieval code. In the above code we used the GetMessage method to retrieve the messages one by one. However the Windows Azure Queue API also exposes an API that allows the retrieval of up to 32 messages at a time using the GetMessages method. As you can see in the results from the following run, messages retrieval was over 6 times faster.

Note: since I omitted the first two iterations of GetMessages, I also omitted the first 64 iterations of every other queue operation, so at the end of the day we are looking at 936 messages rather than 998, but still the improvement is clearly noticeable.

Total ticks

% Execution

new CloudQueueMessage

2907419

0.000599733

AddMessage

428481361062

88.3858044

GetMessages(32)

12041770036

2.483938924

DeleteMessage

44255399085

9.128866277

AsString

3833020

0.000790663

image

The next stop on our quest for throughput improvement is the deletion of messages from the queue after we retrieve them. The consumer has to perform this operation in order to clear the message from the queue and ensure reliability. The call to DeleteMessage can also be easily improved. If you take a closer look at the code, you can see that we are using the DeleteMessage method, which is a synchronous call to the Azure Queue service. However there is no real need to wait for this call, so we can use its async implementation by calling BeginDeleteMessage. The results of this run (again for 1000 iterations minus 64) are shown here:

 

Total ticks

% Execution

new CloudQueueMessage

4904719

0.001183763

AddMessage

401853371789

96.98804476

GetMessages(32)

12041770036

2.906303177

BeginDeleteMessage

429024316

0.103545802

AsString

3822202

0.000922495

image

In our sample code, we do not handle exceptions for BeginDeleteMessage (as well as for DeleteMessage) but we can easily do so by passing a callback function to BeginDeleteMessage, which calls the EndDeleteMessage method inside a try/catch block.

Until this point, we have dramatically improved the consumer code for our queue, which I must admit the easy part. For the producer part it is going to be a bit more problematic. Windows Azure Queue Storage exposes an APM based API for adding messages to the queue (using the BeginAddMessage/EndAddMessage methods). If you are adding to the queue from a client application you can use this API to release the calling thread and using the network card to perform the majority of the heavy lifting.

If you are adding to the queue from a WCF service this will not be enough, you should consider using an asynchronous service contract. More information about implementing asynchronous services (and asynchronous calls in WCF in general) can be found in this blog post by Wenlong dong.

Summary

Windows Azure Queue Storage was created with the SOA Asynchronous Queuing Pattern in mind. Using it’s async APIs (based on WCFs awesome async capabilities) and calling the GetMessages batch method we ware able to improve it’s throughput and lower the need for more compute instances.

Shout it kick it on DotNetKicks.com

LINQ to HPC (Formerly known as DryadLINQ) Tutorial: Part 2–Data Partitioning (DSC)

A new beta has been released since I wrote part 1 of this tutorial. While very little was changed in the product, we have a new name. Another thing held me back personally from publishing this part was the fact that LINQ to HPC is not a part of Windows HPC R2 SP2. So without farther ado I am proud to present the second part of my tutorial about LINQ to HPC.

In part 1 of this tutorial we discussed the fundamentals of DSC: how to manually write data to DSC files and how to use the FromEnumerable<T> extension method (from the HpcLinqExtras project) to implicitly save object data to a temporary file set (in order to use it inline in a subsequent query). We also saw a caveat in this method, namely that because FromEnumerable<T> saves the data to a single file in the temporary file set, Windows HPC Server 2008 R2 DSC DryadLINQ Dryad LINQ to HPCthe subsequent query cannot be parallelized. This is due to the fact that LINQ to HPC runs any query logic locally on the DSC node containing the data to which it refers.

The task at hand is quite straight forward: we would like to partitions our data into logical pieces that can be distributed across the cluster. Before we start discussing how we can physically partition data in LINQ to HPC, I would like to consider the logic we will use for dividing the data into groups. in order to do so we will take a look at vertices, which are the basic tasks that execute the query on the cluster. I will describe vertices in detail in a later part of this tutorial but for now there are few facts I would like you to consider:

  • A vertex can only use data from a single DSC file, located on the node it is executing on. This is, of course, in order to preserve data locality. The main implication of this little fun fact is that we should make sure that pieces of data that are dependent on each other will reside continuously in the DSC file set. A good example for this is the use of GroupBy in a query. Lets create a Student class defined as follows:

    [Serializable]
    public class Student
    {
        public int Id { get; set; }
        public string Name { get; set; }
        public string Nationality { get; set; }
        public double AvgGrade { get; set; }
    }


    Now let’s say we are grouping our Persons by nationality, so our data should be ordered like this:

    Windows HPC Server 2008 R2 DSC DryadLINQ Dryad LINQ to HPC 
     
    Dryad can execute local queries in each vertex and then union all the groups. If the same data needs to be reordered by the query (let’s say items were ordered by Id in the query), the first thing LINQ to HPC would need to do is to reorganize the data into intermediate files, and only then execute the necessary logic.
    Note: grouping operators are a bit more complex when it comes to LINQ to HPC and will be discussed in a later part of this tutorial.
  • A vertex will process all the data in the DSC file it is accessing. This means that if we would like to break down the processing of local queries in to smaller pieces we need to break the data in to smaller files. This is possible since DSC file set support creating more files than the number of nodes.

We can control the order in which our objects are written to file when using custom HPC serialization (as I have shown in part 1 of the tutorial). However this can become tedious, especially if we need to use the same data in different queries that can benefit from different partitioning and ordering.

Repartitioning Operators

Repartitioning operators are LINQ to HPC operators that result in intermediate DSC files partitioned in a way that is not dependent on the partitioning of the input files. There are two Repartitioning operators in LINQ to HPC: Hash and Range Partitioning.

Hash Partitioning

Hash partitioning provides a mechanism for partitioning data that is not sorted; Returning to our students sample, nationality is a prime candidate for hash partitioning. To use hash partitioning you need to call the HashPartition operation, which provides an overload that accepts the number of partitions to be created, once called you can use the ToDsc operator to create a new DSC file set and call SubmitAndWait to commit the operation (I have reviewed this steps in part 1 of this tutorial):

// getting the list of students
List<Student> students = GetStudentsList();

// saving the students range partitioned to the file set with 5 partitions
context.FromEnumerable<Student>(students)
       .HashPartition(std => std.AvgGrade, 5)
       .ToDsc<Student>("StudentsFileSet")
       .SubmitAndWait(context);

The Why hash partitioning selects the partition for a specific entity is by performing a mod operation between the hash code of the key selector and the number of partitions, the following code mimics the behavior of hash partitioning regarding the partition selection:

var students = GetStudentsList();

foreach (var student in students)
{
    int portNum = student.GetHashCode() % 5;

    var str = "the student {0} with nationality {1} will be written into partition no: {2}";
    Console.WriteLine(str,
                      student.Name,
                      student.Nationality,
                      portNum);
}

This method is disappointingly crude. If you run this code (supplied with my samples) you will see that although we have instructed the HashPartition operator to create 5 partitions, the result of the mod operation results in only four different values. This is of course due to the nature of the values in our key selector (none of them divides evenly by 5). This result is somewhat arbitrary, and we could have had the result distributed in many ways (even and un even) dependent on the result of the key selector GetHashCode. To overcome this pitfall, HashPartition has another overload that accepts an IEqualityComparer that can be used to override the implementation of GetHashCode of the key selector.

Range Partitioning
Range partitioning allows the ordered partitioning of sorted keys. Returning once more to our student’s sample, the average grade can be used as such a key. This is useful if our query uses this key selector ordering in its logic. The way range partitioning works is by assigning a range of keys for every file: any object whose key belongs in that range will be placed in the DSC file. By using this method files can be created un-evenly, but we can ensure that objects within a specific range will reside in the same file. Range separators are used to define ranges: these are values that mark the border points between one range and another. Let’s say we now would like to partition our students into files that are partitioned by grades. We will use two range separators to split the data in to three files:

Windows HPC Server 2008 R2 DSC DryadLINQ Dryad LINQ to HPC

In this case our range separators are 3 and 6. One thing that is very easy to overlook is the fact that if our student’s grade equals the value of a range separator, it can belong, range-wise to the two files on both sides of the separator. Range separators can be assigned in two ways:
  • Statically assigned by user:
    In some cases we would like to explicitly force the range structure. This is useful when we know our data and queries structure and believe we can benefit from it. Let’s say we know our queries mostly filter students with grades of 6 and above, we can reflect this knowledge into our file structure even dough it results in an uneven distribution.
    We can pass an array of range separators like this: 

    // getting the list of students
    List<Student> students = GetStudentsList();

    // saving the students range partitioned to the file set
    context.FromEnumerable<Student>(students)
           .RangePartition(std => std.AvgGrade, new[] { 3d, 6d })
           .ToDsc<Student>("StudentsFileSet")
           .SubmitAndWait(context);


    All we need to provide here is a key selector delegate, to select the value on which we partition and the rangeKeys parameter which holds the array of range separators of the same type as the return type of the key selector.
  • Dynamically sampled: 
    Another, perhaps simpler approach is to use a different overload that allows LINQ to HPC to generate partition separators for us. When we allow RangePartition to select the range separators for us, it will try to create DSC files of approximately equal size, but on the other hand we do lose much of the control we had creating the range separators ourselves. There are few overloads of RanePartition; the simplest looks like this:

    // getting the list of students
    List<Student> students = GetStudentsList();

    // saving the students range partitioned to the file set with 5 partitions
    context.FromEnumerable<Student>(students)
           .RangePartition(std => std.AvgGrade, 5)
           .ToDsc<Student>("StudentsFileSet")
           .SubmitAndWait(context);


    Other than losing control with dynamic range partitioning there are few key points you should bear in mind:
    • Currently dynamic sampling will take place for every 1,000 records - not really useful for small datasets.
    • Dynamic range partitioning is using range separators even if you did not set them yourself. If the key selector will return non-proportional ranges, the files will have to differ in size.
Summary

Data partitioning allows us to implicitly distribute our data over the cluster, thus adding more control to how (and where) our queries will execute. Now that we got all our data just where we want it, we can start creating distributed kick-ass queries. But this calls for a completely different post.

Source code for all the samples can be found here.

Shout it kick it on DotNetKicks.com
More Posts Next page »