Scala Performance Tips on Android

October 7, 2014

Our team has been developing an NLP library, used by Ginger Page, using Scala. Now, mostly, things have been really great. Scala allows us to go very fast, and allows us to develop the app’s NLP needs in a way that Java never could have. But, as all things in life, nothing good ever comes free. We’ve noticed that our app is suffering from many GC interruptions, hurting performance. I went to investigate, and this is my story. Benchmark on a mobile device Well, if you’re like us and developing a Scala library to be used with an existing app, it...
tags: ,
no comments

Meet object-csv, a Strongly Typed CSV Helper for Scala

May 16, 2014

Yeah, I know this is a .NET blog, but recently my team at Ginger Software ventured into some Android coding. Now, we didn’t want to use Java, and preferred something with more functional capabilities. Scala was the natural choice. One thing we use a lot in our C# projects is CSV files. They are much easier to programmatically read/write than Excel files, and our analysts can still work with them as if they were Excel files. Sadly, Scala was missing a library to read/write CSV files to/from objects, which was something we sorely missed. Therefore, I set out to...

The Case of The Async Log4Net Appender

October 16, 2013

It was a bad day. I was coding away happily as we started getting alerts that one of our production farms is down. Our service seemed to be stuck on each one of our servers. Requests are coming in, but no responses appear. Restarting the service helped, only to get stuck again after a while. The logs were unhelpful, and we had to pull out windbg to figure out what was going on. When your day starts with windbg, you know it won’t be a good one. The command ~*e!clrstack (=view stack trace of all managed threads) showed...

Charniak Parser on Windows

March 2, 2013

A while ago we wanted to choose a new natural language parser for our product. One of the strongest candidates in that area is the Charniak-Johnson parser. We ended up not choosing to use this parser for various reasons, but in the process of evaluating it we produced a nice side benefit: we compiled it (well, the Charniak part of it) and ran it natively on Windows. The source code for the converted project can be found on GitHub, and the binaries can be found here. Now, before I tell you a bit about the conversion...
tags: ,

Adapting Lucene scoring for an n-gram index

October 24, 2012

At Ginger we use a large index of n-grams, which is basically a sequence of words and their frequency in our corpus. We wanted to make this index searchable, so naturally, we defaulted to using Lucene, which is the most popular open source IR library. This is how we started adding documents to the index: 1: Document document = new Document(); 2: document.add(new Field("ngram", ngram, Field.Store.YES, Field.Index.ANALYZED)); 3: NumericField frequencyField = new NumericField("frequency", Field.Store.YES, true); ...
no comments

Console.ReadKey .NET 4.5 changes may deadlock your system

September 12, 2012

I’ve hit a weird issue today. We have a service that we run both as Windows service and from console. A specific use case seemed to cause our system to hang, but only when running from console. Also, I was sure this didn’t happen before I upgraded my machine to .NET 4.5. The service initialization code looks something like this: serviceHost.Open();   while (Console.ReadKey().Key != ConsoleKey.Q) { Console.WriteLine("Press Q to exit"); ...

NuGet Annoyance

May 29, 2012

I usually love using NuGet, but there are things about it that can sometimes make me scream with annoyance. My main issue is with the following scenario: I want to add a new dependency to a project. This dependency already exists in the solution. Here, one of two things may happen: The version that exists in the solution is the newest that came out. In which case, no problemo, NuGet will see that I already have that package installed and use it. A...
no comments

ThreadAbortException is Special

February 7, 2012

*Update (27/8): It is critical to set the time below to AutoReset = false, otherwise, the exception might be raised again, even after we handled it. I guess you could say that every exception is special and important in its own way, but ThreadAbortException is really special. In what way, you ask? Well, let me tell you a story. I wanted to implement a hard timeout for our system. That is, if a query to our service takes too long, force kill it, not matter what it is doing at the moment. You may claim that this...