Azure Table Storage – limitations and solutions/workarounds(Part 2)

19 בSeptember 2012

one comment

This is part two of 2 parts posts discussing the limitations in Azure table storage. It’s highly recommended to read first Part 1. In this part I will discuss the querying limitations. Using table storage as your persistence layer – you need to be able to query the storage efficiently. The .NET layer above Table Storage (CloudTableClient) exposes the CreateQuery method that returns DataServiceQuery that implements IQueryable<T>

public DataServiceQuery<T> CreateQuery<T>(string entitySetName);

meaning you can write whatever LINQ query you want and it will compile – but only a very small subset is supported as described here – Query Operators (Table Service Support). The only supported query operators (it’s easier than writing what is not supported) are: From,Where,Take,First,FirstOrDefault and select.

Limitation #1 – query returns a large number of rows (more than 1000)

Querying table storage using the API above will return a maximum of 1000 entities – this is a good practice and in this case the table storage RESTful service will return a response that contains a continues token to retrieve the next data – to be precise if the query takes more than 5 second or we cross to other partition the query will return. But what if we need to get more then 1000 entities? In this case we have two options:

Using Paging

As I mentioned before if not all data returns a continues token is returned as part of the response – query using this token will return the next result set part of the complete result – this can be implemented using an object called CloudTableQuery<T> – read the details in this excellent post by Jim O`Neil

 Query As TableServiceQuery 

The simpler case is when we don’t really need paging we just need all the results (and we know that it won’t be a lot more than 1000) – the same CloudTableQuery<T> implements IQueryable<T> but query using it will do the paging under the hood returning all the data needed – more then 1000 rows. simply instead of writing:

context.Blogs.ToList();

Modify it to:

context.Blogs.AsTableServiceQuery().ToList();

and it will query the service as much as needed (using the continues token) – but beware this can be very slow if you have a lot of data so use it wisely.

Limitation #2 – order by

As I mentioned in the beginning of the post most of the LINQ operators are not supported – like order by. So what can we do ? Well if we want the data to retrieve by one specific order – for all queries we can use the fact that the data is stored and retrieved in lexicographic order meaning  query the data in a partition will return by their lexicographic order in the partition and the partitions will be returned by the partition key lexicographic order. Let’s look at real life example (that can come in handy) – we want our entities to be ordered from the most recent creation date to the least recent. here is how it can be done:

public sealed class StoredEntity : TableServiceEntity
{      
     public StoredEntity (DateTime creationDate)
     {            
        RowKey = string.Format("{0:D19}",DateTime.MaxValue.Ticks -
                 creationDate.ToUniversalTime().Ticks);
        PartitionKey = string.Format("{0:D19}",DateTime.MaxValue.Ticks -
                 creationDate.Date.ToUniversalTime().Ticks);     
     } 
}

Explanation of the above: format D19 meaning decimal number with 19 digits (turns 1 to 0000…..0000000001) which is in inverse proportion to the max date.We set the partition key to be a 19 digits string by the date (without time) and the row key inside is 19 digits string by the data include the time. we divided the data to partition keys by dates and inside the partition the data is ordered by the full date and time.

Limitation #3 – query like operator

String operators can be very useful especially when we need to implement a free text search functionality on our data. In this case we don’t have a real solution beside using additional storage that has free text capabilities like lucene  and query it. The only string operator that is supported is CompareTo. So we cannot turn compartTo to like but we can implement “starts with” functionality. Lets say we want to ask if the property “Col” starts with the string “yes” we find the following lexicographic string which is “yet” (T comes after S) and query if:

c.Col.CompareTo("yes") >= 0 && c.Col.CompareTo("yet") < 0

and you have <==>

c.Col.StartsWith("yes")

It’s not like but just maybe it can match your needs.

Summery

Azure table storage is not bad (it’s actually amazing Thumbs up) giving unlimited scalable storage of strong typed entities. I’m just saying that using Azure table storage as your only persistence layer can be problematic if your requirements cannot be answered because of the limitations above and in the previous post. So check carefully your needs not only for the first phase but for the full implementation and future planing because changing from one storage infrastructure to the other can be very painful – I don’t believe that abstraction will work in this case because sooner or later you start using specific table storage features and change it may be too expensive.

Cheers,Offir

Add comment
facebook linkedin twitter email

Leave a Reply

Your email address will not be published.

You may use these HTML tags and attributes: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <s> <strike> <strong>

*

one comment

  1. JanivZ20 בSeptember 2012 ב 11:04

    Excellent ! Thank you

    Reply