Hello guys,
Today I've participated on a one day seminar about Data Mining and BI for enterprises.
The day was part of a European tour of the one and only Rafal Lukawiecki, it's appear he is delivering this day almost every day now and each time at a different country, I was really impressed by that, the man doesn't rest and basically speaking he has high levels of energy during his show, sustaining the crowd during hard periods, and as important with a smile and a wicked sense of humor. I was not surprised to learn that he usually receives high scores at Tech-ed events, and I hope to see him once again in Tech-ed (It is a fact that his is the only lecture I remember from Tech-ed 02. I remember him struggling with some typo when he tried to present .Net at the first time :))
So, by the title of this day seminar you must realize that it's not a a sexy subject and it includes many theoretical aspects which are again sexy mainly for statisticians and Merkov chain lovers. Of course I generalize and although I'm far from having a Ph.d in probability I all together enjoyed and didn't lost a minute wandering off suffering from pre-lunch and post-lunch symptoms that you are all familiar with.
So Rafal didn't have an easy time, It's not WPF-UI presentation that you can put some spinning buttons and all the crowd go "wooooo...". Basicly speaking it's a niche, so most chances that you are not familiar with the subject and want to know what the noise is all about (if you are a tech guy), or you are a business-man drawn to promises of profit predictions, sales forecasts and more honey of that kind. OR you are an expert of neural networks and you analyze data from dawn till dusk. Not much of middle ground here, and it's an hard time pleasing both sides.
But it is a Tech show, so most of the crowd is tech oriented, still we got 3 different specializations, from IT guy, trough SQL professional, to Software developer. Rafal had lots of different guys to please, I belonging with the minority, a software developer and with no or little knowledge was pretty pleased, if Ohad the neural network expert from the crowd is somehow reading this, please state if you were pleased as well. My guess is and I even surprise myself here is YES, why is that? stay tuned for the detailed description of today's topics.
The day was structured wisely and the first half of the day was mainly slides and theoretical background necessary for understanding what the hell we are going to do on that subject.
Microsoft approach at this subject is in one line "Delivering Data Mining for the Masses" in other words people like me and you. "We" are no longer required to understand thoroughly what is going beyond the scenes. And that is a good approach, for doing "little" data mining today it's not something that a small shop or business needs to spend lots of resources to do so.
Furthermore with the using of Excel as a visualization tool, it's enable much more consumers of the analyzed data. Basicly every Data Mining (DM) model we contemplate, an Excel user can consume and use, and that a nice deployment issue.
For me the most interesting aspect of the first half was understanding what are the scenarios we can use and implement the current tools today. Theoretically speaking, you can do everything but I wanted to know what can I do as a common people respecting this issue will do, which means without the ability of developing my own algorithms. I presume that most people will adhere to that, no need to be an expert use what you are given of the shelve and maximize it.
So after running over the process of Data mining for example CRISP-DM and familiarizing with concepts and terminology. After that Rafal made a little tour of T-SQL like language called DMX which I admit didn't really understood it's overall strength.
So how do we cook some DM and churn some data say you? Well you probably wanna use BI project in Visual Studio to do so. A nice DSL implementation enable us to model our DM process, we specify a data source, declare the data we want to use and specify how the algorithm will relate to it, and over it specify an algorithm out of the several you got when using this product.
It is that easy.
Analyzing the results is not that easy, some misleading UI concepts doesn't help either but I guess we need to pay a little with the learning curve, it's a small price to pay for the abilities we do get.
So the second half day was running over various and interesting scenarios such as Profit prediction (What is the connection between number of children to buying stuff), finding anomalies, forecasting sales, understanding what products people buy and what is being bought together, why customers are leaving your business and many many more scenarios.
So for every scenario Rafal showed us how to implement an alghorithm and how to compare which are the best to use for every scenario, and little tricks to use so even one familiar with the tool learned new stuff I'm sure.
So Ohad the neural network expert learned here how to use the same algorthiems from his field expertie on a real life practical example with for every person wizard UI to complement with.
Furthermore IMHO the lecture main strength is that it's enabling you to think and inspire you for achieving innovative approaches for your own old data that you don't normally think of them.
Well I like talks that aspire me and motivate me for doing something new and creative.
I've left with a lots of ideas and I'm not the only one, my friend Dima who had accompany me left with that feeling as well.
I just can't end without ranting about something, Well, I lied, it is not live blogging, you caught me, but you can't blame me, there just wasn't WiFi Internet connection in the hall.
When talking with a Microsoft representative, first I got the first excuse : "We don't want people to work instead of hearing the lecture" fair enough by my side a woman set with cellular phone connected to the Internet for emails and nothing stopped her from answering her phone during the lecture, well emails are less noisy by half, and lack of Internet will not stop those who still wish to work.
Second excuse was more formidable "It is expansive", Actually it is fair, but Microsoft doesn't need to enable bandwidth for so much people, they need just enough for a few bloggers as myself to participate in our favorite sport, Live Blogging.
I think from this day on, every major event such as this, Microsoft need to consider this aspects, it is for their benefit as well.
No need for "Support the Blogger" campaign, right?
Ariel