Explain timeouts on Windows AppFabric Cache
I had many customers complaining about performance degradation, timeout errors and other exceptions they got when using Windows AppFabric Cache.
When digging into the logs we found three popular Microsoft.ApplicationServer.Caching.DataCacheException errors:
- ErrorCode<ERRCA0018>:SubStatus<ES0001>:The request timed out.
- ErrorCode<ERRCA0017>:SubStatus<ES0006>:There is a temporary failure.
- ErrorCode<ERRCA0016>:SubStatus<ES0001>:The connection was terminated.
To learn about the server condition I run the Get-CacheClusterHealth Windows PowerShell Command as described in the server Health Monitoring document.
To verify the client situation I run the following command : Netstat –nat | find “22233” | wc –l
(The wc utility can be found here.)
This tells us how many connection the client is trying to establish. If we get large numbers (more than 50) it means that there is a situation of: client network contention. The client is trying to establish too many connection yet someone blocks the client from establishing them.
We can also look at WCF performance counters and search for the numbers of connections.
To fix client network contention we have to configure some throttling configuration:
AppFabric client config:
<dataCacheClient requestTimeout="15000" channelOpenTimeout="3000" maxConnectionsToServer="100"…
when using the cache on http channel for example in Azure Cache it is required to configure ServicePointManager as well. so In each client make sure this is called on start:
ServicePointManager.UseNagleAlgorithm = false;
ServicePointManager.Expect100Continue = false;
ServicePointManager.SetTcpKeepAlive(false);
ServicePointManager.DefaultConnectionLimit = 1000;
Now there will be no bottleneck in the client, no contention and no timeouts.
Enjoy
Manu