UI Virtualization vs. Data Virtualization (Part 2)

October 1, 2009

15 comments
Introduction

In part 1 I’ve talked about the UI Virtualization concept in WPF, and explained how it is differs from Data Virtualization concept. In this post I would like to share with you a unique solution for Data Virtualization implementation in WPF.

Recall from the previous post, Data Virtualization implementation raises several problems, and there is no out-of-the-box solution for Data Virtualization in WPF.

First problem: How do we fake scrollbars, so the user will think that it has all the items, and will be able to scroll.

Second problem: How do we filter, sort, group data that doesn’t exist.

Third problem: How do we search data (like this search) that doesn’t exist.

So lets say that we want to fetch 10,000 items, display each item in a Data Grid, so the user can scroll and focus on specific item. Lets say that each item weights 1MB (image, large amount of text, etc), so fetching and holding all items at once is not an option.

General Solution

Fetch 10,000 display only items asynchronously, in pages (just the shallow data for first display). For example each page is 100 in size, contains only the necessary data for displaying in Data Grid. Such data should ends up with no more than, lets say 1K in size (and this is too much). So having 10,000 items ends up with 10,000K.

The action described above results with a Data Grid that is bound to 10,000 data items. Using UI Virtualization this works very fast since only the viewable items are occupied by the Data Grid.

Now the user can scroll as much as he/she wants, and the data is always available. Of course we can filter and sort.

The problem is that we only have the “proxy” and not the real item (we haven’t fetched it yet). We have to fetch the full item only if it is requested by the user. Requested by the user means any item that is currently visible to the user in the Data Grid or currently selected by the user.

The question now is “How could we know what item is visible or become visible using UI Virtualization”?

We have at least two options to solve this riddle.

  1. Create a custom VirtualizationStackPanel, and replace the original ItemsControl.ItemsPanel (DataGrid, ListView, etc) with it.
  2. Create a custom ICollectionView and use a CollectionViewSource to select it.

Well, if you ask me the second option is much easier to implement, and is less intrusive than the first option, since we don’t have to replace anything in the original items control.

Data Virtualization Collection View

The code snippet bellow demonstrates the usage of such a custom collection view that supports data virtualization.

public class HugeCollection : ObservableCollection<Entry>,

                              IDataVirtualizationItemSponsor

{

    #region IDataVirtualizationItemSponsor Members

 

    public void ExtendItems(IEnumerable<object> items)

    {

       // Fetch full items here.

    }

 

    public void DeflateItem(object item)

    {

       // Teardown/Drop full item here.

    }

 

    #endregion

}   

public Window1()

{

    InitializeComponent();

    DataContext = new HugeCollection();

}

<Window.Resources>
    <CollectionViewSource x:Key="ListViewViewSource"
       CollectionViewType="{x:Type data:DataVirtualizationCollectionView}"
       Source="{Binding}" />
</Window.Resources>
<tk:DataGrid ItemsSource="{Binding Source={StaticResource DataGridViewSource}}"
   VirtualizingStackPanel.IsVirtualizing="True">
</tk:DataGrid>

As you can see, instead of just using the default collection view, we use a DataVirtualizationCollectionView, which is bound to the source collection (HugeCollection). Now we should bind this view to the desired ItemsControl control (DataGrid in our case).

Note that the source collection must implement the IDataVirtualizationItemSponsor custom interface. This interface has two methods:

  • ExtendItems – in this method we should extend the provided items with the rest of the data. Here we usually put asynchronous call to the server, for fetching the full item and merge it with the rest of the data.
  • DeflateItem – in this method we should drop the data we retrieved in the ExtendItems call.
DataVirtualizationCollectionView Implementation

I’ve chosen to implement the DataVirtualizationCollectionView by deriving it directly from the ListCollectionView, instead of implementing the ICollectionView interface from scratch. Now the trick is to override the GetItemAt method. It seems that this method is called by the VirtualizingStackPanel for each item in view, in addition to the selected one, and this is exactly what I’ve been looking for. Here we can place the logic for extending and deflating the data.

Another riddle: How can we know when an item is not needed anymore, and was scrolled out from the view?

Well the answer is quite simple: Use a cyclic cache! Each time an item is requested, add it to the cache and drop old items. In case that an item is re-requested, elevate it so it won’t be dropped from the cache.

Another riddle: How can we extend a bunch of items instead of one-by-one to efficient server access?

And the answer for this riddle is tricky: On the first call to GetItemAt, start a deferred operation by setting a flag and using the Dispatcher.BeginInvoke. Now since VirtualizingStackPanel occupies the dispatcher by iterating all items in view, calling the GetItemAt method on each iteration, our deferred operation will be called last, and we’ll have a chance to collect all items in the same iteration. The code snippet below demonstrates this technique.

public override object GetItemAt(int index)

{

    if (!_isDeferred)

    {

        _deferredItems.Clear();

 

        Dispatcher.BeginInvoke(

           DispatcherPriority.Normal,

           (Action)LoadDeferredItems);

 

        _isDeferred = true;

    }

 

    var item = base.GetItemAt(index);

    if (!_deferredItems.Contains(item))

    {

        _deferredItems.Add(item);

    }

 

    return item;

}

 

private void LoadDeferredItems()

{

    var uniqueSet = new HashSet<object>();

    foreach (object item in _deferredItems)

    {

        var hashCode = item.GetHashCode();

        if (!_cache.Contains(hashCode.ToString()))

        {

            uniqueSet.Add(item);                   

        }

 

        _cache.Add(

           hashCode.ToString(),

           item,

           CacheItemPriority.Normal,

           this);

    }

 

    _sponsor.ExtendItems(uniqueSet);

    _isDeferred = false;

}

In the GetItemAt method above we check if a deferred action is already active. If so we add the item to a local HashSet<T> called _deferredItems. If not, we start a deferred operation by setting a flag, and using Dispatcher.BeginInvoke with normal priority to call LoadDeferredItems method, and then add the first item to the same hash set.

As we’ve talked about, the LoadDeferredItems method is called only after all items are in view. Now in the LoadDeferredItems method we add the new items into the cache, update existing items and call the ExtendItems method with all distinct items collected by the last deferred operation.

Note that this solution uses Enterprise Library 4.1 Caching Application Block for caching.

Here are screen shots of the test application:

image image image

The test application illustrates a WPF Toolkit DataGrid with UI and Data Virtualization active, using max of 100 items in cache. The view below the DataGrid shows all extended items as green boxes.

image

The screen shot above demonstrates a WPF ListView with UI and Data Virtualization active, filtered to display only the items with even ID.

You can download the full solution from here.

Note that you have to install both WPF Toolkit June 2009 and Enterprise Library 4.1 to run this solution.

Please feel free to leave comments.

Add comment
facebook linkedin twitter email

Leave a Reply

Your email address will not be published.

You may use these HTML tags and attributes: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <s> <strike> <strong>

*

15 comments

  1. Bastien ChételatOctober 5, 2009 ב 04:31

    Hi Tom!

    Thanks for this amazing post ;-). Really really helpful.

    ++

    Reply
  2. Tomer ShamamOctober 5, 2009 ב 23:25

    You welcome! I’m glad that I could help 🙂

    Tomer

    Reply
  3. TonyOctober 31, 2009 ב 19:30

    Really a Great post!! But I think the scrolling performance is not good enough. Is there any solution to address this issue?

    Tony

    Reply
  4. Tomer ShamamOctober 31, 2009 ב 21:35

    Hi Tony,

    In case that you’re talking about my demo app, I would advice you to remove the red-green rectangles view.

    Thanks!

    Reply
  5. TonyNovember 1, 2009 ב 03:48

    I removed the red-green rectangles view. The scrolling performance is a little bit better. Comparing with the performance of virtualization in WinForm DataGridView, the performance is still not good. Is it the problem of VirtualizingStackPanel in .NET framework?

    Tony

    Reply
  6. Tomer ShamamNovember 1, 2009 ב 20:44

    IMHO this is the performance for the WPF Toolkit DataGrid. Hope that it will be better in WPF 4.0.

    Try to remove visual lines, use defer scrolling, etc.

    Tomer

    Reply
  7. NtxNovember 24, 2009 ב 10:55

    Great!
    Could item grouping be applied to the datagrid?

    Reply
  8. Smruthi..November 26, 2009 ב 08:42

    Great Article.

    One question, Is it like you load all 1000 items into memory during initialization? Because when I debug the code, DataVirtualizationCollectionView constructor actually had 1000 items loaded ?

    Where am I missing ???

    Reply
  9. Tomer ShamamNovember 26, 2009 ב 14:49

    I wonder what UI element you are using. Does it support UI Virtualization via VirtualizaingStack panel, and is it avaiable by default?

    For exanple Xceed DataGrid doesn’t support WPF UI Virtulization, it has its own implementation.

    Reply
  10. Yossi NaarDecember 3, 2009 ב 15:41

    I am working on this problem at the moment for an app i am developing.
    First of all – i like your deferred action solution,very interesting.

    Perhaps i can suggest some improvements(imo) to your methods.
    Instead of employing a cyclic cache that requires, well, the cache mechanism and some management how about implementing a weak reference mechanism?
    it would look something like this:
    EntityHolder
    {
    WeakReference dataReference;
    public T MyHugeData
    {
    get {
    T resultData = dataReference.Target as T;
    if (resultData == null)
    {
    resultData = //method to get the data from the source.
    dataReference.Target = resultData; //update the reference.
    }
    return resultData;
    }
    }
    }

    (i coded it in the comment, so not sure if it compiles : )
    with this method you dont need the cache, and you dont have to manage anything.
    Once the object gets out of scope it will simply be garbage collected,
    you get an unmanaged cache that is the size of whatever the GC decides is right, and any object that is not held by the view is available for gc.
    you also dont have to manage anything or extend any collection types.

    Reply
  11. Tomer ShamamDecember 3, 2009 ב 20:04

    Hi Yossi,

    WeakReference is indeed another kind of cache strategy. It is good for several cases but not for all. You can use this mechanism for sure, but you don’t really have the power to control whether the object is relevant or not. In many cases you may decide to remove or keep objects in cache based on several strategies not necessarily based on amount of managed memory you’ve got.

    Thanks for sharing your suggestion.

    Reply
  12. Yossi NaarDecember 4, 2009 ב 00:34

    hi Tomer,

    The cache mechanism requirement is a separate one then the data virtualization one.
    By using a WeakReference we essentially seperate the independent requirements from the framework.
    In my particular project, i use nhibernate+active record for data access, and so i do not require an additional caching mechanism – any such mechanism will simply interfere with the other cache, and require more management.

    I actually implemented a mechanism using your deffered loading method and the weak reference mechanism and tested it today, works pretty well.
    I tested using real data, on a sqlite db with data table of >100k records, loading the records is almost unnoticable, and the memory footprint remains very small.

    I will be publishing an article about it soon.

    Reply
  13. Tomer ShamamDecember 4, 2009 ב 09:09

    Hi Yossi,

    I agree that the cache mechanism is separated from the data virtualization, and I strongly suggest to create an abstraction by having an interface similar to ICacheManager (we may call it ICacheAdapter), then have one or more cache adapters. One may adapt the Enterprise Library the other a 3rd party cache and we may use weak reference based cache as default.

    Reply
  14. Yossi NaarDecember 4, 2009 ב 11:26

    Hi Tomer

    In the case of nhibernate, the caching mechanism works directly with the entities.
    nhibernate uses a second level cache to store entities and will return your entity from the cache when you perform the HQL “select” statement.

    In this environment, the use of caching is completely transparent to the other layers.

    I published an article about it, using your technique for grouping db calls.
    You can see it here:
    http://subjectively.blogspot.com/2009/12/weakreference-for-lazy-loading-and.html

    Reply
  15. Ann OnymJanuary 29, 2010 ב 15:57

    You may have a loot at http://bea.stollnitz.com/blog/?p=344 for another take at data virtualization

    Reply