More than a year has passed, many blog post ideas are now a distant memory, but I finally made myself sit down and write something I’ve been working on lately (well, like 3 months ago, but who counts anyway?).
I work at Riverbed on an APM (application performance monitoring) software. We concentrate on UX, meaning we don’t just measure the CPU time or memory consumption or other resources; instead, we let our customers define their line-of-business activities (such as entering a bank account, or getting insurance data, or even sending an e-mail from a Web interface) and we provide metrics from the time of said activity (how much time has passed, how much network traffic there was, and so on). All this data goes into a dashboard where the IT department can get alerts about unusual performance degradation and so on and so forth. But this is just the background. What I want to write about today is how we measure our own performance.
Aside from measuring performance (correctly), our most important task is to do no harm, or ״negative value״, as we call it. The way we see it, if our product causes crashes or noticeable performance degradation to our clients’ software, this is worse that not having any measurements at all. And so we finally arrive to our main discussion today – who’s gonna profile the profiler? Well, we’re not exactly a profiler, but we monitor UX, so to measure our impact on the customer’s applications we need to measure UX with us enabled, and without us. But we’re the ones who know how to measure. So it seems we’re in a bit of a pickle here…
Part of our product is a Chrome extension which allows our customers to monitor “events” in their Web applications. These events contain both user interactions with the Website, such as clicks and keystrokes (don’t worry, we’re not a key logger, and in any case, the end users are very well aware of what’s going on), and visual changes in the Websites (banners appearing, pages loading, etc.). This is in fact the feature I’ve been working on for at least 6 months now. We already had a Chrome extension before, but it mostly took telemetry from the background – information about navigations and Web requests. Of course, as a rule of thumb, it’s always important to make sure your performance is good, but in the case of background Chrome extension activity it’s not critical. Even if something goes wrong in the background events, it won’t affect the Website’s UX. Well, normally it won’t. However, as we started developing the user interaction and DOM monitoring feature, it was clear to us that we absolutely must make sure we don’t impose any noticeable UX degradation on the monitored Websites.
What does it even mean “noticeable UX degradation”? Oh man, we’ve spent HOURS discussing what and how we’re going to measure. I must admit, to this day, I’m not 100% sure our definition is good. Somehow, you need to define a measurable characteristic which coincides with a user’s feeling as to whether the sites works well or not. I even asked my husband, who has way more Twitter followers than me, to ask about this:
So, how do folks writing browser extensions measure their perf impact on regular browsing? Asking for a friend @dinagozil 👩🏻💻 Please RT 🤓
— Sasha Goldshtein (@goldshtn) May 12, 2017
The result was very disappointing. The man has ~2500 followers and all I got was 5 likes, 4 retweets and 0 answers!
Anyway, after all the endless discussions, with my boss, his boss, the QA gal, the QA guy, their boss, other Chrome extension developers, we’ve decided to divide the problem into two main categories:
- Load time
- “Real” time (i.e. the time when the use actually interacts with the Website)
Load time is easy to measure. We can use PerformanceTiming API to get the data about load times. What’s complicated about it is that we need to characterize the type of pages that we need to measure. For example, does the size of the page affect the extension overhead? Does the element type? We know for sure that each frame loads the content script, so at the very least need to measure different frame counts.
While load time is at least a well-defined metric, “real” time overhead is an absolute mystery. We could measure the time our callbacks run, but does that represent the overall experience that the user gets? Who knows? And let’s say that we increase the click time by 100%, if this 100% amounts to 10ms, then it’s probably not a problem, right? Does a 50ms delay in a mouse click feel the same as a 50ms delay in keyboard strokes?
And then there are a few general issues, which have to do with the test setup:
- Getting consistent results. If we use a real Website, it can take a different time to load it each time we measure, solely due to network times or even different dynamic content.
- Configuration effect. Our extension receives a configuration which specifies what it is that the IT guys want to monitor – which clicks, which keystrokes, which elements appearing or disappearing. These are defined by element type, its attributes or even by CSS selectors. The size of the configuration and the complexity of the conditions will (obviously) affect the results. What’s a typical configuration size and complexity? The feature is still under development, we don’t have real user data…
- Automation. If we want reliable results, and we want be able to repeat the benchmarks with each version, we must automate the process!
- Different browsers. I didn’t mention this until now, but our code actually runs in IE as well (we have our ways to get it there 😇). Ideally, our measurement technique would work for both browsers. [Luckily, we are only required to support IE11. Currently.]
In summary, a big headache. All of this and more in the next parts… Stay tuned! In the meantime, I’d love to hear if you encountered similar issues and how you decided to proceed.