Monitoring and management of performance and availability of software applications
In the fields of information technology and systems management, application performance management (APM) is the monitoring and management of the performance and availability of software applications. APM strives to detect and diagnose complex application performance problems to maintain an expected level of service. APM is "the translation of IT metrics into business meaning ([i.e.] value)."[1]
Measuring application performance
Two sets of performance metrics are closely monitored. The first set of performance metrics defines the performance experienced by end-users of the application. One example of performance is averamge response times under peak load. The components of the set include load and response times:
The load is the volume of transactions processed by the application, e.g., transactions per second, requests per second, pages per second. Without being loaded by computer-based demands (e.g. searches, calculations, transmissions), most applications are fast enough, which is why programmers may not catch performance problems during development.
The response times are the times required for an application to respond to a user's actions at such a load.[2]
The second set of performance metrics measures the computational resources used by the application for the load, indicating whether there is adequate capacity to support the load, as well as possible locations of a performance bottleneck. Measurement of these quantities establishes an empirical performance baseline for the application. The baseline can then be used to detect changes in performance. Changes in performance can be correlated with external events and subsequently used to predict future changes in application performance.[3]
The use of APM is common for Web applications, which lends itself best to the more detailed monitoring techniques.[4] In addition to measuring response time for a user, response times for components of a Web application can also be monitored to help pinpoint causes of delay. There also exist HTTP appliances that can decode transaction-specific response times at the Web server layer of the application.
In their APM Conceptual Framework, Gartner Research describes five dimensions of APM:[5][6][7][8]
A new dimension, Application discovery, tracing, and diagnostics (ADTD), combines three formerly separate dimensions (Application topology [runtime architecture] discovery and visualization, User-defined transaction profiling, and Application component deep-dive), since all three are primarily focused on problem remediation and are interlinked;
Application analytics (AA).
Current issues
Since the first half of 2013, APM has entered into a period of intense competition in technology and strategy with a multiplicity of vendors and viewpoints.[10] This has caused an upheaval in the marketplace with vendors from unrelated backgrounds (including network monitoring,[11] systems management, application instrumentation, and web performance monitoring) adopting messaging around APM[which?]. As a result, the term APM has become diluted and has evolved into a concept for managing application performance across many diverse computing platforms, rather than a single market.[clarification needed][12] With so many vendors to choose from, selecting one can be a challenge. It is important to evaluate each carefully to ensure its capabilities meet your needs.[13]
Two challenges for implementing APM are (1) it can be difficult to instrument an application to monitor application performance, especially among components of an application, and (2) applications can be virtualized, which increases the variability of the measurements.[14][15] To alleviate the first problem application service management (ASM) provides an application-centric approach, where business service performance visibility is a key objective. The second aspect present in distributed, virtual and cloud-based applications poses a unique challenge for application performance monitoring because most of the key system components are no longer hosted on a single machine. Each function is now likely to have been designed as an Internet service that runs on multiple virtualized systems. The applications themselves are very likely to be moving from one system to another to meet service-level objectives and deal with momentary outages.[16]
The APM conceptual framework
Applications themselves are becoming increasingly difficult to manage as they move toward highly distributed, multi-tier, multi-element constructs that in many cases rely on application development frameworks such as .NET or Java.[17] The APM Conceptual Framework was designed to help prioritize an approach on what to focus on first for quick implementation and overall understanding of the five-dimensional APM model. The framework slide outlines three areas of focus for each dimension and describes their potential benefits. These areas are referenced as "Primary" below, with the lower priority dimensions referenced as "Secondary. "[18]
End user experience (primary)
Measuring the transit of traffic from user request to data and back again is part of capturing the end-user experience (EUE).[19] The outcome of this measuring is referred to as Real-time Application monitoring (aka Top-Down monitoring), which has two components, passive and active. Passive monitoring is usually an agentless appliance implemented using network port mirroring. A key feature to consider is the ability to support multi-component analytics (e.g., database, client/browser). Active monitoring, on the other hand, consists of synthetic probes and web robots predefined to report system availability and business transactions. Active monitoring is a good complement to passive monitoring; together, these two components help provide visibility into application health during off-peak hours when transaction volume is low.
User experience management (UEM) is a subcategory that emerged from the EUE dimension to monitor the behavioral context of the user. UEM, as practiced today, goes beyond availability to capture latencies and inconsistencies as human beings interact with applications and other services.[20] UEM is usually agent-based and may include JavaScript injection to monitor the end-user device. UEM is considered another facet of Real-time Application monitoring.
Runtime application architecture (secondary)
Application Discovery and Dependency Mapping (ADDM) offerings exist to automate the process of mapping transactions and applications to underlying infrastructure components.[21] When preparing to implement a runtime application architecture, it is necessary to ensure that up/down monitoring is in place for all nodes and servers within the environment (aka, bottom-up monitoring). This helps lay the foundation for event correlation and provides the basis for a general understanding of how network topologies interact with application architectures.
Focus on user-defined transactions or the URL page definitions that have some meaning to the business community. For example, if there are 200 to 300 unique page definitions for a given application, group them into 8–12 high-level categories. This allows for meaningful SLA reports, and provides trending information on application performance from a business perspective: start with broad categories and refine them over time. For a deeper understanding, see Business transaction management.
Deep dive component monitoring (DDCM) requires an agent installation and is generally targeted at middleware, focusing on web, application, and messaging servers. It should provide a real-time view of the J2EE and .NET stacks, tying them back to the user-defined business transactions. A robust monitor shows a clear path from code execution (e.g., spring and struts) to the URL rendered, and finally to the user request. Since DDCM is closely related to the second dimension in the APM model, most products in this field also provide application discovery dependency mapping (ADDM) as part of their offering.
It is important to arrive at a common set of metrics to collect and report on for each application, then standardize on a common view on how to present the application performance data. Collecting raw data from the other tool sets across the APM model provides flexibility in application reporting. This allows for answering a wide variety of performance questions as they arise, despite the different platforms each application may be running on. Too much information is overwhelming. That is why it is important to keep reports simple or they won't be used.[22]