Optimization and Monitoring Perfomance
Optimization and Monitoring Perfomance
Performance
Optimization & Monitoring
VOLU ME I I I
In the 2017 DZone Guide to Performance: Optimization and 43 PERFORMANCE SOLUTIONS DIRECTORY
Monitoring, we will be sharing with you the exciting
48 GLOSSARY
work being done out there, and we have asked experts
in the field to bring you the latest tools and techniques
PRODUCTION EDITORIAL BUSINESS
to help you break through those performance barriers CAITLIN CANDELMO RICK ROSS
CHRIS SMITH
and unleash the potential of your application. In this DIRECTOR OF PRODUCTION DIRECTOR OF CONTENT + CEO
COMMUNITY
issue, we will explore the best practices for load testing ANDRE POWELL MATT SCHMIDT
SR. PRODUCTION COORDINATOR MATT WERNER PRESIDENT & CTO
in the cloud, look at several profiling tools to help tune CONTENT + COMMUNITY MANAGER JESSE DAVIS
G. RYAN SPAIN
JavaScript applications, and cover strategies for tuning PRODUCTION PUBLICATIONS EDITOR MICHAEL THARRINGTON
EVP & COO
and monitoring cloud based systems, monitoring CONTENT + COMMUNITY MANAGER MATT OBRIAN
ASHLEY SLATE DIRECTOR OF BUSINESS
services, infrastructure, and APIs. We interviewed 12 DESIGN DIRECTOR MIKE GATES DEVELOPMENT
SR. CONTENT COORDINATOR [email protected]
executives, and received over 470 responses on this MARKETING ALEX CRAFTS
SARAH DAVIS
years survey. The interviewees and respondents alike KELLET ATKINSON CONTENT COORDINATOR
DIRECTOR OF MAJOR ACCOUNTS
DIRECTOR OF MARKETING
shared their insights into the tools they prefer, biggest JIM HOWARD
TOM SMITH SR ACCOUNT EXECUTIVE
LAUREN CURATOLA RESEARCH ANALYST
areas for improvement, and what skills developers MARKETING SPECIALIST JIM DYER
need to optimize performance and monitoring. They JORDAN BAKER ACCOUNT EXECUTIVE
KRISTEN PAGN CONTENT COORDINATOR
also provided information on common application MARKETING SPECIALIST ANDREW BARKER
ACCOUNT EXECUTIVE
performance issues, which are illustrated in our NATALIE IANNELLO
CHRIS BRUMFIELD
MARKETING SPECIALIST
infographic, What Ails Your Applications, on page 24. SR. ACCOUNT MANAGER
MIRANDA CASEY ANA JONES
MARKETING SPECIALIST ACCOUNT MANAGER
DZone is the knowledge-sharing company, and we
are excited to bring you the latest volume of our WANT YOUR SOLUTION TO BE FEATURED IN SPECIAL THANKS
COMING GUIDES? to our topic experts,
Performance and Monitoring Guide, we hope you love it. Please contact [email protected] for submission Zone Leaders, trusted
Give it a read and let us know what you think. information. DZone Most Valuable
Bloggers, and dedicated
LIKE TO CONTRIBUTE CONTENT TO COMING GUIDES?
users for all their
BY JESSE DAVIS Please contact [email protected] for consideration.
help and feedback in
CHIEF OPERATING OFFICER, DZONE, INC. INTERESTED IN BECOMING A DZONE RESEARCH making this guide a
PARTNER?
[email protected] great success.
Please contact [email protected] for information.
Executive
Eini and his team optimized the RavenDB database, check
page 16.
Summary
CAUGHT ON YET
DATA The number of DZone users who design programs for
parallel execution increased 1% over last years survey to 44%.
Of the parallel execution design techniques, load balancing
was the most used at 68%. Multithreading is the most popular
BY M AT T W E R N E R parallel programming model at 72%.
C O N T E N T A N D C O M M U N I T Y M A N AG E R , DZO N E
STARTING OFF
Key
54% of this years survey respondents said they worry
about application performance only after they have
built application functionality, a response similar
Research
to the results of DZones 2016 Performance and
Monitoring survey. However, the frequency with which
respondents claimed to experience certain application
performance issues was positively impacted by building
Findings
performance into the application first. For example, the
most frequent area for performance issues in this years
survey was application code, with 35% of respondents
saying they have frequent issues with this part of
BY G . R YA N S PA I N their technology stack. On average, respondents who
P R O D U C T I O N C O O R D I N ATO R , DZO N E said they build performance in from the beginning
of their application were 30% likely to find frequent
performance issues in their application code, as
471 respondents completed our 2017 Performance
opposed to 38% of respondents who worry about
and Monitoring Survey. The demographics of the
performance after functionality. Likewise, those who
survey respondents include:
said they generally considered application performance
24% of respondents work at organizations with at from the beginning were able to solve performance
least 10,000 employees; 18% work at organizations issues 35 hours faster, on average, than those who did
not (187 hours compared to 222). Of course, focusing too
between 1,000 and 10,000; and 19% work at
much on performance from the outset of a project can
organizations between 100 and 1,000.
lead to unnecessarily lengthy design and development
37% of respondents work at organizations in times, but having an idea of how performance will fit
Europe, and 29% work at organizations in the US. into an application from the start can save headaches
later on in the SDLC.
Respondents had 15 years of experience as an IT
professional on average; 29% had 20 years or more
WHAT TOOLS DOES YOUR TEAM COMMONLY USE TO FIND
of experience. ROOT CAUSE FOR APPLICATION PERFORMANCE PROBLEMS?
ASKED FOR
APM TOOLS OTHER
NEVER EXTERNAL HELP
KEEPING AN EYE OUT and interpreting various metrics took the most time.
The majority of respondents (64%) said they use Respondents said they use a number of different tools
between 1 and 4 performance monitoring tools. The in order to search for the root cause of performance
most popular monitoring tools were Nagios, used by issues. The most popular of these methods included
33% of respondents organizations, and LogStash, used application logs (89%), database logs (69%), profilers
by 27%. Both LogStash and Amazons CloudWatch saw (61%), and debuggers (60%). Individually, none of
significant growth from last years results, with LogStash these tools had an impact on how time-consuming
growing 5% and CloudWatch growing 6% to 21%, making respondents found root cause discovery; however,
it this years third most popular performance monitoring respondents using more of these tools together were
tool. Increased usage of monitoring tools decreased the increasingly less likely to find root cause discovery
average estimated amount of discovering performance time consuming until peaking at 6 tools (because of a
issues through user support emails/social media or sample size of less than 1%, responses showing 0 tools
through dumb luck; respondents whose organizations used were not considered in this analysis).
use 3 or 4 monitoring tools were 7% less likely to find
out about performance problems from users than those SPLITTING THE LOAD
The usage of parallel execution in application design has
who used none (17% vs. 24%), and were 12% less likely to
not taken off much since last year. 44% of respondents
accidentally stumble upon performance issues through
this year said they regularly design programs for parallel
dumb luck (11% vs. 23%). The most popular types of
execution, only 1% higher than last year. The tools
monitoring were real user monitoring (34%) and business
and methods for parallel design hasnt changed much
transaction monitoring (26%).
either; like last year, the ExecutorService framework in
Java is the most frequently used framework/API among
WHATS THE PROBLEM? respondents, with 50% of those who design for parallel
Much like last year, finding the root cause of an execution regularly using this framework often. Also,
issue was found to be the most time-consuming load balancing is again the most popularly used parallel
part of fixing performance-related problems. 52% of algorithm design technique used, with 68% of parallel
respondents ranked this as the most time consuming, execution designers using this often. And multithreading
followed by 25% of respondents who said collecting is at the top of the list for parallel programming
models, with 72% of this subset of respondents using
multithreading often. The choice to design for parallel
WHICH OF THE FOLLOWING PERFORMANCE TESTS AND/OR
MONITORING TYPES DOES YOUR ORGANIZATION USE? execution in an application can be affected by multiple
factors. For instance, the type of application being
70 designed may increase the need for parallel execution;
67 respondents who said they build embedded services
65 or high-risk software (i.e. software in which failure
60
could lead to significant financial loss or loss of life)
55 were much more likely to regularly design for parallel
50 53
execution, with over half of these respondents (54%
each) answering this question positively.
40 42
17
15
10 12 13
8 BUILD PERFORMANCE
1 6 INTO THE APPLICATION
FROM THE START
WEBSITE SPEED
TESTS
REAL USER
MONITORING
SYNTHETIC
MONITORING
STRESS
TESTS
54% 46%
BUILD APPLICATION
FUNCTIONALITY FIRST,
LOG MGMT./ BOTTOM UP THEN WORRY ABOUT
LOAD TESTS SMOKE TESTS
ANALYSIS MONITORING
PERFORMANCE
you arent sure how frequently you need to test, make sure
you invest in tools that support flexible, consumable licensing
Performance Testing models until you establish your necessary cadence. This will
keep your team from overinvesting in performance and save
in an Agile World
your constrained budget for more pressing issues.
LoadRunner, Performance Center and StormRunner Load- a comprehensive product suite for
testing any application, scenario, and protocol. Because performance matters.
QUICK VIEW
Into APM?
and functionally correct.
APIs are taking the world by a storm. As you strive to deliver is critical to ensuring that the web and mobile applications
world class web, mobile, and SaaS applications, make sure that depend on them are meeting the increasingly high
the APIs that power them are running smoothly. expectations of users. Some highlights of that survey
include the following:
The top two measures of success identified by API API monitors should be able to be easily created by
producers are: reusing your existing test cases as the foundation for
Performance your monitors. In addition to freeing up development
Uptime/availability and operational resources from monitor creation
duties, leveraging existing test cases provides the
The top two barriers to solving API issues are: added benefit of allowing monitoring to become a key
Determining root cause part of each version of the application as it progresses
through the development lifecycle. With increased
Isolating the API as the cause of the issue
use of agile development, continuous integration, and
Looking at the results from these three questions, it is clear delivery methodologies, pre-production monitoring
that performance and uptime/reliability are paramount to makes more sense than ever.
both producers and consumers. Adding to the clear need
In addition to the above-mentioned API-specific
for performant and reliable APIs, ease of issue diagnosis
attributes, a monitoring solution will not be successful
also emerged as a key trend. Knowing exactly which API
without robust scheduling, playback, and alerting
endpoint is failing within a chain of successive calls is a
capabilities. You will only reap the benefits of
clear operational need.
proactive monitoring if you are getting accurate alerts
Another key takeaway of the survey is that both API and results in real time.
consumers and producers are in alignment on the key Another important consideration when monitoring
measures of success for API delivery. Since it appears that your APIs is location. Whether around the globe or
all involved agree that the need for deep API monitoring behind your firewall, it is important to monitor where
is obvious, what are the key attributes of a successful API your users or API consumers are. Increasingly, more
monitoring approach? customers are concerned about users spread across
both a mix of physical geographic locations and
internal points of presence inside your infrastructure.
These internal locations can be physical data centers,
private cloud deployments, various office locations, or
AVAILABILITY CORRECTNESS
even vendor and customer locations.
The most powerful query language The results are higher quality alerts, For serious SaaS and digital
in monitoring running against a anomaly detection, and crucially businesses where performance,
unified, full detail, metrics store in valuable insights that no other reliability, scale and support are
real-time with no limits. monitoring tool can offer. essential to their business.
Unlearn You Must Do: and SREs know of anomalies earlier, without exhausting alert
noise. Three, they embrace a culture of sharing dashboards,
5 Tips for Devs to Relearn thus accelerating collaborative learning for everyone. Four,
they use metrics and analytics that go beyond reactive
Monitoring of Code in the Cloud Era methods to proactive methods, allowing them to validate
code optimizations, and fix things well before customers are
impacted. Five, they plan for growth and choose a monitoring
platform that will scale, knowing demand for code metrics and
Yoda understood that change is constant. Dev teams of modern
analytics will only expand across teams.
cloud applications realize the same. Yesterdays static approaches
to knowing code and services behavior must be unlearned.
Wavefront gives you continuous metrics, analytics, and alerts at enterprise scale for
applications, services, cloud, containers, and infrastructure EVERYTHING.
Real World
QUICK VIEW
Future of JavaScript
beyond just peak scripting
performance.
Benchmarking
to the browser to avoid the
transpiler overhead.
This drove JavaScript peak performance to incredible heights On average, the time spent in executing JavaScript is roughly
in the last two years, but at the same time, we neglected 20%, but more than 40% of the time is spent in just parsing, IC
other aspects of performance like page load time, and we (inline cache) Miss and V8 C++ (the latter of which represent
noticed that it became ever more difficult for developers to the subsystems necessary to support the actual JavaScript
stay on the fine line of great performance. In addition to that, execution, and the slow paths for certain operations that
despite all of these resources dedicated to performance, the are not optimized in V8). Optimizing for Octane might not
user experience on the web seemed to get worse over time provide a lot of benefit for the web. In fact, parsing and
especially page load time on low-end devices. compiling large chunks of JavaScript is one of the main
problems for startup of many web pages nowadays, and
This was a strong indicator that our benchmarks were no
Octane is a really bad proxy for that.
longer a reasonable proxy for the modern web, but rather
turned into a caricature of reality. Looking at Googles Octane Theres another benchmark suite named Speedometer, that
benchmark we see that it spends over 70% of the overall was created by Apple in 2014, which shows a profile that is
execution time running JavaScript code. closer to what actual web pages look like. The benchmark
consists of the popular TodoMVC application implemented in
various web frameworks (i.e. React, Ember, and AngularJS).
like, yet its still not perfect - it doesnt take into account offering a view into the less obvious places. V8 has a step-
parse time for the score, and it creates 100 todos within a by-step guide on how to use this. For most use cases
few milliseconds, which is not how a user interacts with a though, Id recommend sticking to the Developer Tools,
web page usually. V8s strategy for measuring performance because they offer a more familiar interface and dont
improvements and identifying bottlenecks thus changed expose an overwhelming amount of the Chrome / V8
from using mostly traditional JavaScript benchmark methods internals. But for advanced developers, chrome://tracing
toward using browser benchmarks like Speedometer and also might be the swiss army knife that they were looking for.
tracking real-world performance of web pages.
Looking at the web today, weve discovered that it is
Whats interesting to developers in light of these findings is important to significantly reduce the amount of JavaScript
that the traditional way of deciding whether to use a certain that is shipped to the browser, as we live in a world where
language feature by putting it into some kind of benchmark more and more users consume the web via mobile devices
and running it locally or via some system like jsperf.com that are a lot less powerful than a desktop computer and
might not be ideal for measuring real-world performance. might not even have 3G connectivity.
When following this route, its possible for the developer to
fall into the microbenchmark trap and observe mostly the raw One key observation is that most web developers use
JavaScript execution speedup, without seeing the real overhead ECMAScript 2015 or later for their daily coding already, but
cumulated by the other subsystems of the JavaScript engine for backwards compatibility compile all their programs to
(i.e. parsing, inline caches, slow paths triggered by other parts traditional ECMAScript 5 with so-called transpilers, like
of the application, etc.) that negatively affect a web pages Babel, for example. This can have unexpected impact on
performance. At Chrome, we have been making a lot of the the performance of your application because often the
tooling that supported our findings available to developers via transpilers are not tuned to generate high performance
the Chrome Developer Tools. code. Thus the final code that is shipped might be less
efficient than the original code. But theres also the increase
in code size due to transpilers: The generated code is
usually 200-1000% the size of the original code, which
means the JavaScript engine has up to 10 times the work, in
parsing, compiling, and executing your code.
Web Monitoring experience. Not having this visibility will prevent you from
delivering the amazing customer experiences that drive
Basic web monitoring wont provide any insights into how WRITTEN BY DENNIS CALLAGHAN
this complex network performs and affects your end-user DIRECTOR OF INDUSTRY INNOVATION, CATCHPOINT
Catchpoint
Work smarter. Act faster. Deliver Better.
QUICK VIEW
Writing 600
01 By changing our data access
patterns to match what the
hardware is capable of and
optimizing for that, we were able to
use the same overall architecture
but get a performance boost of 600
times for RavenDB.
Times Faster
02 We utilized transaction merging
and compression of the journal
output to perform a large number
of operations in a single transaction,
but only write a fraction of the size
of the data that we manipulate to
the journal
In my day job, Im building RavenDB, a NoSQL in the database even in the face of a failure. In other
document database. It is a very cool product, but words, if the database reported a successful transaction,
pulling the plug from the machine immediately
not the subject of todays article. Instead, Id like
afterward would not impact the data. Once we restart,
to talk about how we made RavenDB very fast, all the data that we committed is still there and still
the techniques and processes we used to get valid. At the same time, a transaction that was midway
there, as well as the supporting infrastructure. through (not committed to the database) is not going to
be there at all.
During the design process for the next release of RavenDB,
That is a pretty standard requirement for databases, and
we set ourselves a pretty crazy goal. We wanted to get a
most people dont give it a lot of thought, but, as it turns
tenfold performance improvement across the board. Its
out, this is incredibly hard to do in a performant manner.
one thing to say that, but a lot harder to do. Weve had to
The underlying reason is the hardware. We cannot buffer
work quite hard to get there, re-working pretty much every
changes in memory, but instead must physically write
layer in the system to make it better.
those changes to disk. This means we have to flush all of
our changes after every transaction, and that is slow.
Here, well talk about how we improved the write speed
of RavenDB (and more particularly, Voron, which is the
In the graph at the top of the next page, you can see the
low-level storage engine we wrote) by a tremendous
results of running the following code on a variety of disks
margin, while maintaining the same transaction safety
and using various sizes of buffers.
properties. This was neither easy nor fast, but it was a
very rewarding journey.
const int count = 1000;
I chose to write about the write speed improvements
using (var jrnl = File.Create(0001.journal))
because it was a challenging task that doesnt respond {
very well to standard optimization techniques. You cant sp = Stopwatch.StartNew();
just throw more cores at the problemand while you can for (int i = 0; i < count; i++)
buy faster hardware, at some point youll reach the limit. {
jrnl.Write(buffer, 0, buffer.Length);
Before we get to the optimizations we implemented, let
jrnl.Flush(flushToDisk: true);// fsync
me explain what the problem is. }
sp.Stop();
RavenDB is an ACID compliant database, meaning that }
once a transaction has been completed, it should remain
Those two optimizations (transaction merging and that the disk is actually idle for a non-trivial portion of the
compressing the journal output) have managed to timea crime in high performance circles.
dramatically improve the performance of our system, but
there was still a lot of additional work to be done. At that Instead, we made the transaction commit an asynchronous
point, we pulled a profiler and started going over the code, process. Whenever a transaction is completed, it will start
finding hotspots and improving them one by one. a task to compress its data and write the data to disk. At
the same time, we can already start the next transaction,
The really nice thing about such work is that it is which gives us much more concurrency. In fact, as long as
cumulative. That is, you improve 2% here and 0.5% there, we are waiting for the previous transaction to complete
then suddenly it is like releasing the flood gates and you writing to the disk, we can continue accepting more work
have a 5% increase in performance. A small increase in into the new transaction.
Metric or
To meet critical SLAs and maintain reliability, modern digital enterprises
running applications in the cloud must measure the performance of their
revenue generated essential services, distributed applications, and
Log Analytics infrastructures. For developers, DevOps and TechOps engineers, it can be
confusing to know when to use metrics or log monitoring to isolate code
Checklist
performance anomalies, proactively monitor and baseline their scaled out,
dynamic and distributed applications.
Metrics describe numeric measurements in time. The metric format includes the
measured metric name, the metric data value, the timestamp, the metric source,
and an optional tag. Metrics convey small information bits, much lighter than logs.
Logs, unlike metrics, contain textual information about an event that occurred.
Logs are meant to convey detailed information about the application, user, or
system activity. The primary purpose of logs is troubleshooting a specific issue
after the fact, e.g., code error, exception, security issue, or other. This checklist
BY STELA UDOVICIC
SR. DIRECTOR, PRODUCT MARKETING, WAVEFRONT will help you select the right approach for your environment.
Use metric analytics if you: Use metric and log analytics if you:
Need to continuously measure and get split-second insights Need to process both continuous metric data events and logs.
from your cloud application code performance, business Metrics analytics helps you get the first-pane of glass across
KPIs, and infrastructure metrics at high scale. The almost the entire application stack. Then use log monitoring to deep-
instant insights are essential for digital businesses generating dive into a specific issue to investigate the root-cause after an
revenue from customer-facing applications. issue happened.
Use messaging pipelines for your application monitoring data Use log analytics if you:
including Kafka or others. Need to analyze only unstructured text-based data from your
applications and infrastructure.
Work for an organization that has many developers that need
to collaborate and share metrics analysis and dashboards Can afford application performance data under-sampling and
(such as self-service analytics for coarser monitoring.
engineering teams). Dont need to develop and dont need to run highly distributed
applications that require high scalability.
Need to apply complex processing on your code performance
measurments or business KPI data such as using aggregates, Are developing monolithic applications that typically do not
histograms (distributions), and other mathematical require frequent code updates requiring continuous monitoring.
transformations.
Are not concerned with slower processing of your application
performance data, such as in batch-like processing.
Scale Outages?
monitoring solution with:
OUTAGES EXPOSE CRITICAL VULNERABILITIES Visibility across the network and application stacks,
Its a time for reflection in the tech community, after huge including web, network, routing and device layers, so
numbers of popular and critical applications were rocked by you can correlate data and understand root cause
the recent AWS S3 and Dyn DNS outages.
The ability to share data with providers, team members
and affected users
What can we learn from them? Its true that widely
impactful outages target specific vulnerabilitieslike the
With the right solutions in place, youll have a birds-eye view
lack of redundancy and overdependence on AWS S3 and
of critical applications and the networks that deliver them.
Dynbut these outages also expose and publicize those
Youll be able to rapidly deduce the root cause of issues, keep
same vulnerabilities.
your providers accountable with actionable data, and be
equipped with the knowledge to reinforce your environment
Learn from each major outage (even if you werent affected) against future events.
and adjust your network architecture and monitoring
strategies accordingly. Dealing with outages then becomes
a process of incremental fortification. Your networks will be WRITTEN BY YOUNG XU
strengthened by each outage, and history wont repeat itself. PRODUCT MARKETING ANALYST, THOUSANDEYES
QUICK VIEW
Understanding 01 Multiple regions can be used to
comprise a single offering, multiple
and Monitoring
services are combined to provide
another cloud product, or both. There
are complex dependencies built into
almost every cloud service on the
market today.
Cloud Applications
the dependencies and foundational
structures of these services.
Its been a nerve-wracking few months for teams or maintain on your own. But these services, from firewalls
to DDoS mitigation to globe-spanning databases to data
managing cloud applications. In October 2016,
streaming platforms, are themselves composed of many other
a DDoS impaired Dyns DNS services for hours, services. Like a digital matryoshka doll, it can be hard to know
rendering unavailable myriad sites and services just how many layers and dependencies are bound up inside.
In the case of the AWS S3 outage, many operations teams
across the Internet. And in an unrelated, but
were surprised at how many different AWS offerings failed.
similarly impactful event, the outage of AWS S3 They had not appreciated, and AWS had not communicated,
at the end of February 2017 caused widespread just how interdependent various services were.
and unpredictable collateral damage. With more
In your own data center, the failure of your entire file storage
applications leveraging more services hosted
system would have a dramatic impact. In the cloud, it is the
in just a few infrastructure environments, how same story. The oldest services are building blocks from
can we make sense of application dependencies? which other services are built (as represented in the AWS
logo), and are foundational and critical to almost all other
How can we adopt a monitoring strategy that
services. Basic compute (AWS EC2), storage (AWS S3) and
clearly accounts for the risks of improbable but networking (underpinning it all) are critical services that you
hugely catastrophic service disruptions? should be monitoring and evaluating for failure scenarios.
The same goes for Microsoft Azure (VMs, Blob Storage) and
Well dig into how you can identify and manage cloud Google Cloud (Compute Engine, Cloud Storage). If you use
dependencies by: cloud services that depend on these foundational elements,
make sure they are part of your monitoring strategy.
Understanding underlying cloud architectures and
failure scenarios
Developing for the cloud also requires an understanding
Getting a handle on the API connections in your app of failure isolation. AWS is built around the concept of
and in customer interactions regions, with previous outages typically corresponding
Developing a comprehensive monitoring strategy to a single region. Unfortunately, many developers dont
based on these requirements invest (sometimes wisely, sometimes naively) in cross-
region failover strategies. So when US-East-1, the first and
MAKING SENSE OF IAAS ARCHITECTURES largest of the AWS regions has an issue, the impact is
Public cloud environments are a popular and powerful way to unmistakable. Some services, like Google Spanner, have
gain access to advanced services that would be costly to build different isolation mechanisms that need to be evaluated.
When it comes to architecture planning, performance Track trends over time to understand services that
monitoring and optimization, youll want to monitor each fail under your application load.
potential failure domain. So if you are using cloud services
in 4 different regions, make sure that you are collecting 2. Actively monitor API servers and infrastructure
Your applications depend on the specialized functionality Your cloud provider typically has canary servers or
of third-party applications, typically accessed via APIs. endpoints (here is the list for AWS) they can point
Dont think you rely on APIs for critical capabilities? Think you to.
again. APIs are very common in modern applications, Taken together, these two approaches will give you an
hiding in plain view a complex set of dependencies. Some understanding of baseline performance and specific issues
of these external services are important for just small as they occur. As a bonus, tying both of these methods
portions of functionality. But many impact customer together with a correlation engine such as Splunk can be
experience and revenue generation in fundamental ways. an effective way to make sense of seemingly disparate
events that are actually all related.
What kind of APIs should you be monitoring? The specific
APIs will be unique to your application, but some FOUR STEPS TO TACKLING DEPENDENCIES
examples include: Cloud-based applications, and the business models
User authentication is accomplished with single sign- that they support, rely on an increasingly diverse set
on APIs and services to detect fraud or abuse. of underlying services, tied together through APIs. The
availability and efficacy of APIs and infrastructure services
Pricing and merchandising require the complex has, therefore, become a key element in monitoring and
integration of many back-end applications to show an optimizing cloud applications.
accurate price to a customer.
APPLICATION CODE
Those using
their languages
36% of respondents built-in tooling and thread
encountered frequent dump analyzers were 4%
performance issues with less likely to find such issues
their code, while 47% to be challenging than
had some issues. those who did not, while
THE FLU developers using debuggers
were 5% less likely.
DATABASE
Those who build
performance into
their applications
Frequent database throughout the SDLC are
issues plagued 24% of 5% more likely to have no
database issues than those
survey respondents.
STREP who build application
functionality first and worry
THROAT about performance later.
WORKLOAD
Developers
16% of respondents using APM
encountered frequent
tools were 5% more
workload issues, with
likely to solve workload
12% nding such issues
issues easily compared
BROKEN to be challenging.
to those who do not.
BONES
MEMORY
Those who
build perf-
15% of respondents ormance into their app-
were having problems lications from the start
with application see an 8% decrease in
memory. frequent memory issues
MIGRAINES compared to those who
do not.
NETWORK
Those who
build perf-
Frequent network issues ormance into their
aected 14% of survey applications from the
respondents, with 10% start of development see
nding such problems to be a 5% decrease in frequent
COMMON challenging. network issues compared
COLD to those who worry about
performance later.
QUICK VIEW
Performance
the cost of large-scale tests.
Testing
03 In Agile development,
performance testing should be
interwoven throughout the SDLC,
not an independent step.
validation in a test lab using a record-playback major shift left, allowing us to start testing early.
load testing tool is no longer enough. Theoretically, it should be rather straightforwardevery iteration
you have a working system and know exactly where you stand
with the systems performance. From the agile development side,
CLOUD
the problem is that, unfortunately, it doesnt always work this
Cloud practically eliminated the lack of appropriate hardware
way in practice. So, such notions as hardening iterations and
as a reason for not doing load testing while also significantly
technical debt get introduced. From the performance testing side,
decreasing the cost of large-scale tests. Cloud and cloud
the problem is that if we need to test the product each iteration or
services significantly increased a number of options to
build, the volume of work skyrockets.
configure the system under test and load generators. There are
some advantages and disadvantage of each option. Depending Recommended remedies usually involve automation and
on the specific goals and the systems to test, one deployment making performance everyones job. Automation here means
model may be preferred over another. not only using tools (in performance testing, we almost always
use tools), but automating the whole process including setting
For example, to see the effect of a performance improvement up the environment, running tests, and reporting/analyzing
(performance optimization), using an isolated lab environment results. Historically, performance test automation was almost
may be a better option for detecting even small variations non-existent as its much more difficult than functional testing
introduced by a change. For load testing the whole production automation, for example. Setups are more complicated, results
environment end-to-end to make ensure the system will are complex (not just pass/fail) and not easily comparable, and
handle the load without any major issue, testing from the changing interfaces is a major challengeespecially when
cloud or a service may be more appropriate. To create a recording is used to create scripts.
production-like test environment without going bankrupt,
moving everything to the cloud for periodical performance While automation will take a significant role in the future, it
testing may be your best solution. only addresses one side of the challenge. Another side of the
agile challenge is usually left unmentioned. The blessing of
When conducting comprehensive performance testing, youll agile development, early testing, requires another mindset
probably need to combine several approaches. For example, and another set of skills and tools. Performance testing of new
you might use lab testing for performance optimization to get systems is agile and exploratory in itself. Automation, together
reproducible results and distributed, realistic outside testing with further involvement of development, offloads performance
to check real-life issues you cant simulate in the lab. engineers from routine tasks. But, testing earlythe biggest
benefit being that it identifies problems early when the cost of NEW TECHNOLOGIES
fixing them is lowdoes require research and analysis; it is not a New technologies may require other ways to generate load. Quite
routine activity and cant be easily formalized. often, the whole area of load testing is reduced to pre-production
testing using protocol-level recording/playback. Sometimes,
CONTINUOUS INTEGRATION it even leads to conclusions like performance testing hitting
Performance testing shouldnt just be an independent step the wall just because load generation may be a challenge.
of the software development life-cycle where testers get the
While protocol-level recording/playback was (and still is) the
system shortly before release. In agile development/DevOps
mainstream approach to testing applications, it is definitely just
environments, it should be interwoven with the whole
one type of load testing using only one type of load generation;
development process. There are no easy answers here to fit
such equivalency is a serious conceptual mistake, dwarfing load
every situation. While agile development/DevOps is becoming
testing and undermining performance engineering in general.
more and more mainstream, their integration with performance
testing is just making its first steps. Protocol-level recording/playback is the mainstream approach
to load testing: recording communication between two tiers
What makes agile projects really different is the need to run
of the system and playing back the automatically created
a large number of tests repeatedly, resulting in the need for
tools to support performance testing automation. The situation script (usually, of course, after proper correlation and
started to change recently as agile support became the main parameterization). As far as no client-side activities are involved,
theme in load testing tools. Several tools recently announced it allows the simulation of a large number of users. But, such a
integration with Continuous Integration Servers (such as Jenkins tool can only be used if it supports the specific protocol used for
and Hudson). While initial integration may be minimal, it is communication between two tiers of the system. If it doesnt or it
definitely an important step toward real automation support. is too complicated, other approaches can be used.
It doesnt look like well have standard solutions here, as UI-level recording/playback has been available for a long
agile and DevOps approaches differ significantly and proper time, but it is much more viable now. New UI-level tools for
integration of performance testing cant be done without browsers, such as Selenium, have extended the possibilities
considering such factors as development and deployment of the UI-level approach, allowing the running of multiple
processes, system, workload, and the ability to automate browsers per machine (limiting scalability only to the resources
gathering and the analysis of results. available to run browsers). Moreover, UI-less browsers, such as
HtmlUnit or PhantomJS, require significantly fewer resources
NEW ARCHITECTURES than real browsers.
Cloud seriously impacts system architectures, having a lot of
performance-related consequences. Programming is another option when recording cant be used at
all, or when it can, but with great difficulty. In such cases, API
First, we have a shift to centrally managed systems. Software as
calls from the script may be an option. Often, this is the only
a Service (SaaS) are basically centrally managed systems with
option for component performance testing. Other variations
multiple tenants/instances.
of this approach are web services scripting or the use of unit
Second, to get the full advantage of cloud, such cloud-specific testing scripts for load testing. And, of course, there is a need
features as auto-scaling should be implemented. Auto-scaling to sequence and parameterize your API calls to represent a
is often presented as a panacea for performance problems, but, meaningful workload. The script is created in whatever way
even if it is properly implemented, it just assigns a price tag for is appropriate and then either a test harness is created or a
performance. It will allocate resources automatically, but you load testing tool is used to execute scripts, coordinate their
need to pay for them. Any performance improvement results in executions, and report and analyze results.
immediate savings.
SUMMARY
Another major trend involves using multiple third-party Performance testing should reinvent itself to become a flexible,
components and services, which may be not easy to properly context-, and business-driven discipline. It is not that we just
incorporate into testing. The answer to this challenge is service need to find a new recipe; now, we need to be able to adjust on
virtualization, which allows one to simulate real services during the fly to every specific situation in order to remain relevant.
testing without actual access.
A L E X P OD E L KO has specialized in performance since 1997,
Cloud and virtualization triggered the appearance of dynamic,
working as a performance engineer and architect for several companies.
auto-scaling architectures, which significantly impact collecting Currently he is Consulting Member of Technical Staff at Oracle,
and analyzing feedback. With dynamic architectures, we responsible for performance testing and optimization of Enterprise
have a great challenge ahead of us: to discover configuration Performance Management and Business Intelligence (a.k.a. Hyperion) products. Alex
automatically, collect all necessary information, and then periodically talks and writes about performance-related topics, advocating tearing
properly map the collected information and results to a changing down silo walls between different groups of performance professionals. His collection
of performance-related links and documents (including his recent papers and
configuration in a way that highlights existing and potential
presentations) can be found at alexanderpodelko.com. He blogs atalexanderpodelko.
issuesand potentially, to make automatic adjustments to avoid
com/blogand can be found on Twitter as @apodelko. Alex currently serves as a
them. This would require very sophisticated algorithms and director for the Computer Measurement Group (CMG,cmg.org), an organization of
sophisticated Application Performance Management systems. performance and capacity planning professionals.
Open API
Real-time Alerts
Trusted by 200.000
SPONSORED OPINION DZONES GUIDE TO PERFORMANCE: OPTIMIZATION AND MONITORING, VOLUME III
A study by research firm IHS Markit released last year A website is the digital front for a product or service;
reported that information and communication technology nowadays its not an online catalogue anymore but a
(ICT) downtime is costing North American organizations mandatory business generating engine.
$700 billion per year. Thats a monumental amount of
lost revenue and lost productivity, not to mention the Lots of time and money have been spent in building your website
negative impacts on company branding. And its completely wouldnt you want to take the last step to ensure it is
unnecessary. With web performance monitoring one can protected and functioning properly?
ensure that end-users are interacting with a website or web
Its always preferable to be notified of any unexpected downtime
application as expected.
events in realtime, rather than discovering it once its too late.
Web performance monitoring is a critical means to avert the Better yet, shouldnt you see how real users are interacting with
negative consequences of unpredictable downtime and ensuring your website, and gain control over each step they are taking by
proper functioning of the entire IT infrastructure. A contributor making sure mission-critical flows work flawlessly?
to the success of any business, web monitoring is still often
These are just some of the immediate benefits from website
viewed as an optional nice to have feature. This way of
performance monitoring. Think of it as an insurance policy on
thinking misses the point. The same rationale for having an
your digital landscape, which gives you peace of mind, empowers
elegant mobile-friendly website must also drive the decision to
your IT team and keeps your customers coming back.
establish a clear business-focused web monitoring strategy.
Monitis
CATEGORY COMPANY
Cloud-based Web Performance and End-User Monitoring solution Monitis is a TeamViewer company founded in 2006.
Today over 200,000 users in 150 countries rely on its
world-class monitoring system to monitor the
PRODUCTS
WEBSITE PERFORMANCE MONITORING functionality of 300,000+ websites, servers, and
The Monitis Website Monitoring suite provides users with all the applications. Monitis offers all the innovative
information they need about the availability and performance of solutions necessary to ensure that your mission
critical web applications from a single dashboard, identifies and critical web applications are running continually and
isolates any problem associated with end user experience before it efficiently.
harms the business. The right blend of Uptime, Real User Monitoring
INTEGRATIONS
and Synthetic Transaction Monitoring gives a complete picture of
To equip clients with the most innovative
web applications performance. With a static selection of test node
capabilities in web monitoring, Monitis has
locations all over the world, rather than Round-Robin location
introduced a line-up of SaaS integrations to ensure
checks, a customer has hand on the pulse of the most critical markets
that no performance issue ever goes unnoticed.
for the business.
Monitis supports integrations with the following
Real User Monitoring industry leading services:
Uptime Monitoring VictorOps
Transaction Monitoring Slack
Full-Page Load Zapier
JIRA
APPLICATION PERFORMANCE MONITORING
HipChat
Monitis offers full SDKs for all popular languages including Java, Perl,
OpsGenie
Python, PHP, Ruby, C#
CloudWatch
OPEN API WHMCS
To fulfill specific monitoring needs, Monitis also provides an open API PagerDuty
for extending and customizing the platform.
and more
QUICK VIEW
6 Common
01 Documentation is crucial to the
success of your API. Make sure
that it stays up to date and that
developers can quickly find the
information they need.
API Mistakes
visibility into your APIs. They can
help you quickly debug a wide
array of issues, from missing
headers to invalid certificates.
Have you ever used an API that returned an has given us unique insight into issues they often see when
HTML error page instead of the JSON you integrating and interacting with APIs.
On the other side of the table, we have developers APIs may also stop supporting HTTP, so its important to
interacting with these APIs. And we, as developers, stay up-to-date with any changes. Good API providers will
sometimes make mistakes. We can make false let users know beforehand via email and any social media
assumptions about how an endpoint should work, not channels they have. Another step you can take is to use a
read the docs closely enough, or just not have enough tool like Hitch, which lets you follow certain APIs and be
coffee that morning to parse an error message. notified if anything changes.
Our testing and monitoring tools can help you uncover If youre asking yourself if your API should support HTTPS,
issues that would otherwise stay hidden by a lack of then the answer is yes. The process for getting certificates
integration tests, or real-world use case scenarios. Working used to be a hassle, but with solutions like Lets Encrypt
with thousands of developers to resolve their API problems and Cloudflare, theres no excuse to not support HTTPS.
If youre unsure why you should do it, or dont think you As API consumers, we need to be careful and not assume
should because youre not transmitting any sensitive data, that an API 200 status code means the request made a
I highly recommend reading Why HTTPS for Everything? successful call and returned the information we want.
from CIO.gov. Some APIs, like Facebooks Graph API, always return a 200
status code, with the error being included in the response
2. UNEXPECTED ERROR CODES data. So, when testing and monitoring APIs, always be
A good API error message will allow developers to quickly careful and dont automatically assume that a 200 means
find why, and how, they can fix a failed call. A bad API error everything is OK.
message will cause an increase in blood pressure, along
with a high number of support tickets and wasted time. Another great resource about response handling
is Mike Stowes blog post on API Best Practices:
I ran into this issue a couple of weeks ago while trying to Response Handling.
retrieve an APIs access token. The code grant flow would
return an error message saying that my request was 3. USING THE WRONG HTTP METHOD
invalid, but it wouldnt give me any more details. After This is an easy one, but surprisingly common. A lot of
an hour banging my head against the wall, I realized I times this can be blamed on poor documentation. Maybe
hadnt paid attention to the docs and forgot to include an the endpoints do not explicitly say what methods are
Authorization header with a base64 encoded string of my supported between GET/POST/PUT etc., or they have the
applications client_id and client_secret. wrong verb.
Good usage of HTTP status code and clear error messages Tools can also play tricks on you if youre not careful.
may not be sexy, but it can be the difference between a For example, lets say you want to make a GET request
developer evangelizing your API and an angry tweet. with a request-body (not a great practice, but it happens).
If you make a curl request using the -d option, and dont
Steve Marx had this to say in How many HTTP status use the -XGET flag, it will automatically default to POST
codes should your API use?: ...developers will have and include the Content-Type: application/x-www-form-
an easier time learning and understanding an API if urlencoded header.
it follows the same conventions as other APIs theyre
familiar with. As an API provider, you dont have to This post by Daniel Stenberg (author and maintainer of
implement 70+ different status codes. Another great curl) on the unnecessary use of curl -X also illustrates
advice by Steve is: another possibility you might run into this issue when
dealing with redirects:
Following this pragmatic approach, APIs should
probably use at least 3 status codes (e.g. 200, 400, 500)
One of most obvious problems is that if you also tell
and should augment with status codes that have specific,
curl to follow HTTP redirects (using -L or location), the
actionable meaning across multiple APIs. Beyond that, keep
-X option will also be used on the redirected-to requests
your particular developer audience in mind and try to meet
which may not at all be what the server asks for and the
their expectations.
user expected.
4. SENDING INVALID AUTHORIZATION CREDENTIALS 6. APIS RETURNING INVALID CONTENT TYPES WHEN
APIs that implement OAuth 2, such as PayPal, usually THERE IS AN ERROR
Diving Deeper
INTO PERFORMANCE
Performance dzone.com/performance
@Souders @tyler_treat
Scalability and optimization are constant concerns for the Developer
and Operations manager. The Performance Zone focuses on all things
performance, covering everything from database optimization to garbage
@bbinto @jaffathecake collection to tweaks to keep your code as efficient as possible.
@appperfeng @ayende Web Dev Zone is devoted to all things web development, including front-
end UX, back-end optimization, JavaScript frameworks, and web design.
Getting Java apps to run is one thing. But getting them to run fast is High Performance JavaScript: Build Faster Web
another. Performance is a tricky beast in any object-oriented environment, Application Interfaces
but the complexity of the JVM adds a whole new level of performance- by Nicholas C. Zakas
tweaking trickiness and opportunity. This Refcard covers JVM internals,
class loading (updated to reflect the new Metaspace in Java 8), garbage PERFORMANCE TOOLS
collection, troubleshooting, monitoring, concurrency, and more.
PageSpeed Insights
developers.google.com/speed/pagespeed/insights
Getting Started with Real User Monitoring
dzone.com/refcardz/getting-started-with-real-user-monitoring Key CDN
tools.keycdn.com/speed
Teaches you how to use new web standardslike W3Cs Beacon API
to see how your site is performing for actual users, letting you better Page Scoring
understand how to improve overall user experience. pagescoring.com/website-speed-test
Copyright 2017 xMatters. All rights reserved. All other products and brand names are trademarks or registered of their respective holders.
SPONSORED OPINION DZONES GUIDE TO PERFORMANCE: OPTIMIZATION AND MONITORING, VOLUME III
3 Ways to Maximize
a message for multiple audiences so engineers get detailed
specifics while stakeholders get the business language they
understand all from the same notification. Focus your engineering
xMatters
xMatters is a toolchain communication platform that relays data between systems
while engaging the right people to resolve incidents.
Executive Insights
QUICK VIEW
Performance
pipeline to ensure an optimal
user experience with video,
applications, and web pages.
Optimization and
tools providing visibility across
networks, architectures, and
devices, no one has developed
a single, holistic solution.
To gather insights on the state of performance requirements, and devices in diverse geographic
locations has made visibility into the entire network
optimization and monitoring today, we spoke
critical. You need to be able to see where all of your data
to 12 executives from 11 companies that provide is residing to understand how performance is, or is not,
performance optimization and monitoring being optimized.
solutions for their clients Heres who we spoke to:
02 Theres a greater need for visibility, and theres
JOSH GRAY, Chief Architect, Cedexis a proliferation of tools coming online to provide
that visibility. However, no one has developed a
JEFF BISHOP, General Manager, ConnectWise Control
single solution to provide a complete view across a
BRYAN JENKS, CEO and Co-Founder, DropLit.io diverse collection of infrastructures and application
DORU PARASCHIV, Co-Founder, IRON Sheep TECH architectures. Response times and page-load times
have continued to decrease with the adoption of
YOAV LANDMAN, Co-Founder and CTO, JFrog
virtualization and microservices. Were evolving from
JIM FREY, V.P. Strategic Alliances, Kentik performance monitoring to performance intelligence
with the addition of easy-to-understand, contextually
ERIC SIGLER, Head of DevOps, PagerDuty
relevant, algorithmically-driven performance analytics.
NICK KEPHART, Senior Director Product Marketing, ThousandEyes However, its important to identify and focus on key
KUNAL AGARWAL, CEO, Unravel Data business metrics, or else you run the risk of being
overwhelmed with data.
LEN ROSENTHAL, CMO, Virtual Instruments
ALEX RYSENKO, Lead Software Engineer, Waverly Software 03 The most frequently mentioned performance and
monitoring tools used are AppDynamics, New Relic, and
EUGENE ABRAMCHUK, Sr. Performance Engineer, Waverly Software
DataDog. However, these were just three of more than
30 mentioned, with a trend towards more granular and
Here are the key findings from the subjects we covered:
specialized offerings, and respondents mentioning just a
01 The keys to performance optimization and few solutions that came to mind besides their own.
monitoring are the design infrastructure and real-
time user monitoring (RUM) to ensure an optimal 04 Real-world problems that are being solved with
end-user experience (UX) whether its videos, web performance optimization and monitoring are time to
pages, or applications. The proliferation of new services, market, optimization of UX, and reduction in time to
resolve issues through greater collaboration among expertise. Companies are not moving quickly enough to
teams. While more tools are coming online, some share and integrate different viewpoints. Smaller teams
providers are enabling disparate tools to provide an can implement more iterative solutions more quickly,
integrated view to the client, which results in greater which allows them learn faster and observe how small
visibility into the entire pipeline and faster time to optimization differences can have massive hardware
problem resolution. This visibility is also enabling implications. Its important to identify and agree upon
clients to ensure service level agreements (SLAs) are KPIs for each business unit, and how they will be
being met by third-party providers. measured. Premature optimization is a common pitfall
in software development. Its common to see software
05 Nonetheless, the most common issues continue being developed without concern for consistency or use
to be the need to improve visibility, ease of use, cases, which dramatically affect the quality and speed
performance, and knowledge of the impact that of the software.
code has on the UX. Incomplete visibility throughout
the pipeline prevents organizations from accurately 08 The skills needed by developers to optimize
finding the source of latency in the network, the application performance and monitoring are: 1)
application, or the endpoint. There continues to be understanding of the fundamentals; 2) understanding
a lack of knowledgeable professionals that know the concept of benchmarking and improving; and 3)
distributed computing and parallel processing. As such, staying creative. Have authoritative understanding
technical complexity of these tools must be reduced for of the underlying IT infrastructure and the expertise
companies to get the most value from them. Vendors to keep it running in the face of constant change,
should also improve the ease of use through analytics independent of vendors or location. Understand the
so IT operations do less data interpretation and can architecture of the system, how services talk to each
focus more on remediation. Understanding the product, other, how the database is accessed, and how messages
load, load tests, and performance graphs is critical. are read by concurrent consumers. Keep a broad
Several developers do not understand the performance perspective, an open mind, and an understanding of
impact of their code and they are not pre-optimizing the needs and wants of the end user. Dont assume
their code, which can lead to less readable code with the model you have in your mind is correct and know
more complex bugs. Ensure that you talk to end users youre going to get it wrong. Get used to designing in
in order to understand what they are experiencing and a way that makes it easy to make a few small changes
whats important to them. Do not assume you know than having to rebuild the entire application. Set a
what they want. reliable benchmark for the performance goals that
are relevant to your business application and work to
improve on those goals as you get more information.
06 The biggest opportunities for improvement are the
automatic reaction to, and correction of, issues and 09 An additional consideration made by a few
having more elegant, thoughtful design, and testing of our participants is the question of where
resulting in an optimal UX. In the future, performance performance monitoring begins and ends versus
and monitoring tools will automatically react to issues testing and validation. Once a problem is identified
and know the difference between mitigating and fixing and remediation proposed, there is a need to test
problems. Theyll be able to do this by collecting more and validate that the change has completely fixed
data and identifying a dynamic system to determine the problem. What effect will advancements in
what the problem may be before it affects the customer. technologies such as AI, bots, BI, data analytics,
Data will be more manageable with automated ElasticSearch, natural language search, and new open
analysis. Application design will feature higher level source frameworks with standardized APIs have on
programming, better tools, and graceful degradation. performance and monitoring?
Just as data is used to solve problems, it can also be
used to change the way performance testing is done Let us know if you agree with their perspective or have
and measured. All monitoring products will monitor answers to the questions they raised. Wed love to get
across the hybrid data center, including on-premise your feedback.
and public cloud-deployed applications.
TOM S M I T H is a Research Analyst at DZone who excels
at gathering insights from analyticsboth quantitative and
07 The biggest concerns about performance and
qualitativeto drive business results. His passion is sharing
monitoring today are the lack of collaboration, information of value to help people succeed. In his spare time, you
identification of KPIs and how to measure them, and can find him either eating at Chipotle or working out at the gym.
Monitoring
monitoring among other things to achieve them. The
problem is that monitoring keeps people busy. They have
to continuously watch the data in order to understand
This approach might work fine in a small environment By using these three generalizations as your
with a few dependencies between them, but not at scale. operational model to monitor applications, you can
evolve monitoring, from you doing all the work, to
In an IT organization, teams of people typically managing, and leaving the heavy lifting to a modern
are responsible for specific IT systems for their platform like Instana.
development, maintenance, health, and user satisfaction.
In order to follow the direction of the organization,
WRITTEN BY PAVLO BARON
teams have goals within their own scope, and they use CHIEF TECHNOLOGY OFFICER, INSTANA
Built for agile organizations, Instana monitors and correlates data from every
aspect of the application stack. As IT teams integrate and deploy new code, Instana
automatically discovers and continuously aligns with any change.
Organizations benefit from real-time impact analysis, improved quality of service, and
optimized workflows that keep applications healthy.
QUICK VIEW
of UX vs. Application
solution can provide.
Performance
entirely new unit of measuring the
success of your applications.
BY OMED HABIB
DIRECTOR OF PRODUCT MARKETING, APPDYNAMICS
Can a better UX simultaneously deliver a worse that the users perceptions about the apps performance
user experience? It sounds like a paradox, but need to be managed. One real world example involves a
test of loading screen animations for the mobile version
it may be more common than you think. It
of Facebook. The test demonstrated that users reported
describes a category of UX design practices an improved perception of the apps loading speed when
that have little to do with improving the actual developers changed the design of the loading animation.
experience and everything to do with suggesting
As iOS developer, Rusty Mitchell reported, A Facebook
that the experience is a good one.
test indicating that when their users were presented with
a custom loading animation in the Facebook iOS app,
The difference can be subtle, but true user they blamed the app for the delay. But when users were
experience improvement begins with precision shown the iOS system spinner, they were more likely to
application performance optimization that only blame the system itself.
where very few people understand how they work, pattern emergence, reification (fill-in-the-gaps), image
but at the same time we want these invisible public- multi-stability, and object invariance. These often cover
facing interfaces. The result is a growing gulf between over issues related to slow-performing software.
the mental model (how the person thinks the thing
MENTAL MODEL DECEPTIONS: DESIGNED TO GIVE
works) and the system model (how it actually works). A
USERS A SPECIFIC MENTAL MODEL ABOUT HOW THE
bigger gulf means bigger tensions, and more and more SYSTEM OPERATES.
situations where deception is used to resolve these gaps. Explainer videos may be the worst offenders in this
category, since they apply dramatic and distracting
BEYOND BENEVOLENT DECEPTION
metaphors in an attempt to engage distracted prospects.
One step beyond benevolent deception is volitional
Many times, the more outlandish the model, the more
theater, which refers to functions and displays that dont
memorable it is. These mental models help sales, but
correspond to the underlying processes. Outside the
support teams then have to explain how things really
world of software defined businesses, you can see these
work to frustrated customers. It also covers popular
techniques at work in the fact that elevator Close Doors
skeuomorphs, like the sound of non-existent static on
buttons dont do anything at all. They just give impatient
Skype phone calls.
riders something to do until the doors close on their own.
BETTER PRACTICES IN UX
Pretty shocked? Yep, you better believe it. Despite these trends, there are many developers who
remain strongly opposed to benevolent deception and
The New York Times cited several other examples of volitional theater. Some of their insights and arguments
volitional theater, such as the Press to Walk button are presented on the Dark Patterns website.
on street corners. Remember all those times you kept
the crosswalk pedestrian button to cross the street? These developers dont feel that its ethical to trick the
You were most likely pushing a button designed to calm user or remove their freedom to know whats going on
your nerves that didnt have any actual impact on the with the software. Instead of masking issues, they feel
timing of the street lights. Designers refer to non- that slowdowns or clunky design can and should be
operation controls as placebo buttons. They shorten eliminated. What developers need to achieve that is a
the perceived wait time by distracting people with the comprehensive monitoring solution that pinpoints the
imitation of control. causes of latency. Then engineering teams can go in and
make the necessary code and infrastructure optimization
THREE TYPES OF VOLITIONAL THEATER
necessary for software to actually perform better. They
Adars research identified three major trends in the consider volitional theater to be sloppy design. One way
way that volitional theater is being used in modern UX to start correcting those design flaws is by evaluating the
application design: software and corresponding infrastructure dependencies.
The Business Transaction perspective prioritizes the applications performance, start by fixing up these regular
goals of the end user. What are your customers really occurring exceptions. Who knows, you just might help
trying to achieve? In the past, developers argued over everyones code run a little faster.
whether application monitoring or network monitoring
were more important. Heres how an AppDynamics This approach is becoming even more critical as smaller
engineer reframed the problem: devices attempt to crunch higher volumes of data. A good
example is how applications built for the Apple Watch
For me, users experience Business Transactions are expected to provide the same level of performance
they dont experience applications, infrastructure or as those built for a tablet or smartphone. Users dont
networks. When a user complains, they normally say lower their expectations to compensate for processing
something like, I cant log in, or, My checkout timed power. In the end, users care about the benefits of the
out. I can honestly say Ive never heard them say, The application, not the limitations of the device.
CPU utilization on your machine is too high, or, I dont
think you have enough memory allocated. Now think THE AGE OF EXPERIENCE
about that from a monitoring perspective. Do most Gartner reported that 89 percent of companies expected
organizations today monitor business transactions, or do to compete based on customer experience by 2017.
they monitor application infrastructure and networks? However, customers want more from their software than
The truth is the latter, normally with several toolsets. So just great UX. Beautiful design and clever tricks cant
the question Monitor the application or the network? is distract these time-sensitive users from being very aware
really the wrong question. Unless you monitor business of application performance issues.
transactions, youre never going to understand what your
end users actually experience. As data volumes accelerate and devices shrink, it will
grow harder to maintain optimal performance and
Starting from the business transactions will help DevOps continuous improvement schedules. Applications
teams view your system as a function of business teams need speed and precision tools to pinpoint areas
processes vs. individual requests firing off everywhere. for improvement. At the same time, more businesses
This is how you can start solving the problems that are undergoing their own digital transformations and
matter most to customers. Its always better to diagnose discovering the importance of performance management
and solve performance issues instead of merely covering for the first time.
them up or distracting the user with UX techniques.
Developments in user hand movement recognition and
ISOLATING CAUSES motion control mapping have been accelerating along
This approach is much closer to approximating the multiple fronts, such as VR best practices by Leap Motion
true UX of an average user. Starting from the business and Googles Project Soli, which uses micro-radar to
transaction, you can use APM solutions to drill down precisely translate user intent by the most minute finger
from end user clients to application code-level details. gestures. These advancements likely represent whats
Thats how you isolate the root cause of performance coming next in terms of UX, but they will demand IT
problems that matter most to specific users. infrastructures with access to a great deal more data-
processing power.
Of course, isolating the problem doesnt matter unless
you resolve it. Weve discussed this with application DRILLING DOWN TO MAXIMUM IMPACT
teams who have isolated problems related to runtime Excellence in UX for the next generation of applications
exceptions for Java-based applications in production, has to start by troubleshooting business transaction
but they tended to gloss over those that didnt break performance from the users point of view. From
the application. there, youll be able to drill down to the code level and
intelligently capture the root cause that impacts the user.
Thats a mistake we addressed in a series about Top
Application Performance Challenges. Bhaskar Sunkar,
AppDynamics co-founder and CTO, concluded that, OME D HA BI B is a Director of Product Marketing at
Runtime Exceptions happen. When they occur frequently, AppDynamics. He originally joined AppDynamics as a Principal
Product Manager to lead the development of their world-class
they do appreciably slow down your application. The
PHP, Node.js and Python APM agents. An engineer at heart,
slowness becomes contagious to all transactions being
Omed fell in love with web-scale architecture while directing technology
served by the application. Dont mute them. Dont ignore throughout his career. He spends his time exploring new ways to help
them. Dont dismiss them. Dont convince yourself they some of the largest software deployments in the world meet their
are harmless. If you want a simple way to improve your performance needs.
Solutions Directory
This directory of monitoring, hosting, and optimization services provides comprehensive, factual
comparisons of data gathered from third-party sources and the tool creators organizations.
Solutions in the directory are selected based on several impartial criteria, including solution maturity,
technical innovativeness, relevance, and data availability.
akamai.com/us/en/solutions/
CDN, Network & Mobile Monitoring &
Akamai ION Free tier available SaaS products/web-performance/web-
Optimization, FEO
performance-optimization.jsp
Apica Apica Systems APM, Infrastructure Monitoring Limited by usage SaaS apicasystems.com
Appnomic On-premise
AppsOne ITOA Upon request appnomic.com/products/appsone
Systems or SaaS
Aternity Aternity APM, ITOA, Real User Monitoring Upon request On-premise aternity.com
On-premise
Bugsnag Bugsnag Application Montioring 14 days bugsnag.com/
or SaaS
CA Unified
Available by ca.com/us/products/ca-unified-
CA Infrastructure Infrastructure Monitoring On-premise
request infrastructure-management.html
Management
On-premise
Catchpoint Catchpoint Suite Synthetic, RUM, UEM 14 days www.catchpoint.com/products
or SaaS
Available by
Cedexis Impact Infrastructure Monitoring, FEO, ITOA SaaS cedexis.com/products/impact
request
Circonus Circonus Infrastructure Monitoring, ITOA Free tier available SaaS circonus.com
Datadog Datadog Performance Metrics Integration and Analysis 14 days SaaS datadoghq.com
Dotcom-
Dotcom-Monitor APM, Infrastructure Monitoring, FEO 30 days SaaS dotcom-monitor.com
Monitor
Available by dynatrace.com/capabilities/synthetic-
Dynatrace Dynatrace Synthetic Synthetic monitoring, managed load testing SaaS
request monitoring
dynatrace.com/platform/offerings/
Dynatrace Dynatrace UEM Real user monitoring (web and mobile) 30 days On-premise
user-experience-management
Available by
Evolven Evolven ITOA On-premise evolven.com
request
ExtraHop
ExtraHop Networks ITOA Free tier available SaaS extrahop.com
Networks
On-premise
f5 Big-IP Platform APM, Network Monitoring 30 days f5.com/products/big-ip
or SaaS
fortinet.com/products/management/
Fortinet FortiSIEM ITOA, Network Monitoring 30 days SaaS
fortisiem.html
hp.com/us/en/software-solutions/
HPE HPE APM APM, ITOA, Real User Monitoring 30 days On-premise application-performance-
management
On-premise hp.com/us/en/software-solutions/
HPE LoadRunner Load Testing Free tier available
or SaaS loadrunner-load-testing/try-now.html
hp.com/us/en/software-solutions/
HPE StormRunner Load Testing 30 days SaaS
stormrunner-load-agile-cloud-testing
on-premise ibm.com/software/products/en/api-
IBM IBM API Connect API Management Platform Free tier available
or SaaS connect
infovista.com/products/Application-
APM, Network Monitoring, Real User Available by
InfoVista 5view Applications On-premise Performance-Monitoring-and-
Monitoring request
Management
Available by
INTECO INTECO Insight APM, Middleware Monitoring On-premise inetco.com/products/inetco-insight
request
On-premise
jClarity Censum JVM Garbage Collection Optimization 7 days jclarity.com/censum
or SaaS
On-premise
jClarity Illuminate JVM Performance Diagnosis and Optimization 14 days jclarity.com/illuminate
or SaaS
jennifersoft.com/en/product/product-
JENNIFERSOFT Jennifer APM 14 days On-premise
summary
Librato Librato Performance Metrics Integration and Analysis 30 days SaaS librato.com
liveaction.com/solutions/liveaction-
LiveAction LiveNX Network Monitoring and Diagnostics 14 days SaaS
network-performance-management
Logentries Logentries Log Management and Analytics Free tier available SaaS logentries.com
microsoft.com/en-us/cloud-platform/
Microsoft System Center 2016 APM 180 days On-premise
system-center
On-premise
Monitis Monitis Network and IT Systems Monitoring 15 days monitis.com
or SaaS
Available by netscout.com/product/service-
NetScout nGeniusONE APM, Network Monitoring, ITOA On-premise
request provider/ngeniusone-platform
neustar.biz/security/web-
Neustar Website
Neustar FEO 30 days SaaS performance-management/
Monitoring
monitoring
Available by
OpsGenie OpsGenie Alert Software On-premise opsgenie.com
request
poweradmin.com/products/server-
Power Admin PA Server Monitor Infrastructure Monitoring, Network Monitoring 30 days On-premise
monitoring
Progress Telerik Analytics End-User Monitoring and Analytics Free tier available On-premise docs.telerik.com/platform/analytics
Available by
Quest Foglight APM, Database Monitoring, RUM, ITOA On-premise quest.com/foglight
request
Rackspace Rackspace Monitoring Cloud monitoring Free tier available SaaS rackspace.com/cloud/monitoring
Sauce Labs Sauce Labs FEO, Automated Web and Mobile Testing 14 days SaaS saucelabs.com
Available by
SevOne SevOne Infrastructure Monitoring, Network Monitoring SaaS sevone.com
request
soasta.com/videos/soasta-platform-
SOASTA Soasta Platform Real User Monitoring, Load Testing Up to 100 users SaaS
overview
Available by
SpeedCurve SpeedCurve FEO, ITOA SaaS speedcurve.com
request
Spiceworks Spiceworks Network Monitoring, ITOA Free tier available On-premise spiceworks.com
On-premise
Sysdig Sysdig Cloud Application Montioring 14 days sysdig.com
or SaaS
Available by
TeamQuest TeamQuest ITOA On-premise teamquest.com
request
Available by
Tingyn Tingyun App APM, FEO, Real User Monitoring SaaS tingyun.com/tingyun_app.html
request
On-premise
Unravel Data Unravel Application Montioring 30 days unraveldata.com
or SaaS
Available by
VictorOps VictorOps Alert Software On-premise victorops.com
request
Virtual Available by
VirtualWisdom Metrics Monitoring and Analytics SaaS virtualinstruments.com
Instruments request
Available by
Wavefront Wavefront Metrics Monitoring and Analytics SaaS wavefront.com
request
Available by On-premise
xMatters xMatters IT Alerts xmatters.com/maxAPM
request or SaaS
IoT SECURITY
dzone.com/security