From Compuware APM Blog from 15 May 2013
With our new service platform and the convergence of dynaTrace PurePath Technology with the Gomez Performance Network, we are proud to offer an APMaaS solution that sets a higher bar for complete user experience management, with end-to-end monitoring technologies that include real-user, synthetic, third-party service monitoring, and business impact analysis.
Before we get started, let’s have a look at the Compuware APMaaS architecture. In order to collect real user performance data all you need is to install a so called Agent on the Web and/or Application Server. The data gets sent in an optimized and secure way to the APMaaS Platform. Performance data is then analyzed through the APMaaS Web Portal with drilldown capabilities into the dynaTrace Client.
4 Steps to setup APMaaS for our Blog powered by WordPress on PHP
From a high-level perspective, joining Compuware APMaaS and setting up your environment consists of four basic steps:
- Sign up with Compuware for the Free Trial
- Install the Compuware Agent on your Server
- Restart your application
- Analyze Data through the APMaaS Dashboards
In this article, we assume that you’ve successfully signed up, and will walk you through the actual setup steps to show how easy it is to get started.
After signing up with Compuware, the first sign of your new Compuware APMaaS environment will be an email notifying you that a new environment instance has been created:
While you can immediately take a peek into your brand new APMaaS account at this point, there’s not much to see: Before we can collect any data for you, you will have to finish the setup in your application by downloading and installing the agents.
After installation is complete and the Web Server is restarted the agents will start sending data to the APMaaS Platform – and with dynaTrace 5.5, this also includes the PHP agent which gives insight into what’s really going on in the PHP application!
Now we are ready to go!
For Ops & Business: Availability, Conversions, User Satisfaction
Through the APMaaS Web Portal, we start with some high level web dashboards that are also very useful for our Operations and Business colleagues. These show Availability, Conversion Rates as well as User Satisfaction and Error Rates. To show the integrated capabilities of the complete Compuware APM platform, Availability is measured using Synthetic Monitors that constantly check our blog while all of the other values are taken from real end user monitoring.
For App Owners: Application and End User Performance Analysis
Through the dynaTrace client we get a richer view to the real end user data. The PHP agent we installed is a full equivalent to the dynaTrace Java and .NET agents, and features like the application overview together with our self-learning automatic baselining will just work the same way regardless of the server-side technology:
Before drilling down into the performance analytics, let’s have a quick look at the key user experience metrics such as where our blog users actually come from, the browsers they use, and whether their geographical location impacts user experience:
If you are responsible for User Experience and interested in some of our best practices I recommend checking our other UEM-related blog posts – for instance: What to do if A/B testing fails to improve conversions?
Going a bit deeper – What impacts End User Experience?
dynaTrace automatically detects important URLs as so-called “Business Transactions.” In our case we have different blog categories that visitors can click on. The following screenshot shows us that we automatically get dynamic baselines calculated for these identified business transaction:
Here we see that our overall response time for requests by category slowed down on May 12. Let’s investigate what happened here, and move to the transaction flow which visualizes PHP transactions from the browser to the database and maps infrastructure health data onto every tier that participated in these transactions:
Since we are always striving to improve our users’ experience, the first troubling thing on this screen is that we see errors happening in browsers – maybe someone forgot to upload an image when posting a new blog entry? Let’s drill down to the Errors dashlet to see what’s happening here:
PHP Performance Deep Dive
We will analyze the performance problems on the PHP Server Side in a follow up blog. We will show you what the steps are to identify problematic PHP code. In our case it actually turned out to be a problematic plugin that helps us identify bad requests (requests from bots, …)
Conclusion and Next Steps
Compuware unveils Outage Analyzer, a new generation performance analytics solution that raises the intelligence of SaaS APMPosted: 10/10/2012
Tracks cloud and third-party web service outages with instant notification of cause and impact
Compuware, the technology performance company today announced a new generation performance analytics solution that raises the intelligence of software-as-a-service (SaaS) application performance management (APM).
Outage Analyzer provides real-time visualizations and alerts of outages in third-party web services that are mission critical to web, mobile and cloud applications around the globe. Compuware is providing this new service free of charge. Check out Outage Analyzer here.
Utilizing cutting-edge big data technologies and a proprietary anomaly detection engine, Outage Analyzer correlates more than eight billion data points per day. This data is collected from the Compuware Gomez Performance Monitoring Network of more than 150,000 test locations and delivers information on specific outages including the scope, duration and probable cause of the event — all visualized in real-time.
“Compuware’s new Outage Analyzer service is a primary example of the emerging industry trend toward applying big data analytics technologies to help understand and resolve application performance and availability issues in near real-time,” said Tim Grieser, Program VP, Enterprise System Management Software at IDC. “Outage Analyzer’s ability to analyze and visualize large masses of data, with automated anomaly detection, can help IT and business users better understand the sources and causes of outages in third-party web services.”
Cloud and third-party web services allow organizations to rapidly deliver a rich user experience, but also expose web and mobile sites to degraded performance—or even a total outage—should any of those components fail. Research shows that the typical website has more than ten separate hosts contributing to a single transaction, many of which come from third-party cloud services such as social media, ecommerce platforms, web analytics, ad servers and content delivery networks.
Outage Analyzer addresses this complexity with the following capabilities:
- Incident Visualization: Issues with third-party services are automatically visualized on Outage Analyzer’s global map view. This view displays information on the current status, impact—based on severity and geography—and duration, along with the certainty and probable cause of the outage. Outage Analyzer also provides a timeline view that shows the spread and escalation of the outage. The timeline has a playback feature to replay the outage and review its impact over time.
- Incident Filtering and Searching: With Outage Analyzer, users can automatically view the most recent outages, filtered by severity of impact, or search for outages in specific IPs, IP ranges or service domains. This allows users to find the outages in services that are potentially impacting their own applications.
- Alerting: Users can sign-up to automatically receive alerts—RSS and Twitter feeds—and can specify the exact types of incidents to be alerted on such as popularity of third-party web service provider, certainty of an outage and by the geographical region impacted. Alerts contain links to the global map view and details of the outage. This provides an early-warning system to potential problems.
- Performance Analytics Big Data Platform: Utilizing cutting-edge big data technologies in the cloud, including Flume and Hadoop, Outage Analyzer collects live data from the entire Gomez customer base and Gomez Benchmark tests, processing more than eight billion data points per day. The processing from raw data to visualization and alerting on an outage all happens within minutes, making the outage data timely and actionable.
- Anomaly Detection Algorithms: At the heart of Outage Analyzer’s big data platform is a proprietary anomaly detection engine that automatically identifies availability issues with third-party web services that are impacting performance of the web across the globe. Outage Analyzer then correlates the outage data, identifies the source of the problem, calculates the impact and lists the probable causes — all in real-time.
“Since Outage Analyzer has been up and running, we’ve seen an average of about 200 third-party web service outages a day,” said Steve Tack, Vice President of Product Management for Compuware’s APM business unit. “Outage Analyzer is just the beginning. Our big data platform, propriety correlation and anomaly detection algorithms, and intuitive visualizations of issues with cloud and third-party web services are key building-blocks to delivering a new generation of answer-centric APM.”
Outage Analyzer harnesses the collective intelligence of the Compuware Gomez Network, the largest and most active APM SaaS platform in the world. Now eight billion measurements a day across the global Internet can be harnessed by any organization serious about delivering exceptional web application performance. Determining whether an application performance issue is the fault of an organization’s code, or the fault of a third-party service has never been easier.
Compuware APM® is the industry’s leading solution for optimizing the performance of web, non-web, mobile, streaming and cloud applications. Driven by end-user experience, Compuware APM provides the market’s only unified APM coverage across the entire application delivery chain—from the edge of the internet through the cloud to the datacenter. Compuware APM helps customers deliver proactive problem resolution for greater customer satisfaction, accelerate time-to-market for new application functionality and reduce application management costs through smarter analytics and advanced APM automation.
With more than 4,000 APM customers worldwide, Compuware is recognized as a leader in the “Magic Quadrant for Application Performance Monitoring” report.
To read more about Compuware’s leadership in the APM market, click here.
Recently, at the Internet Retailer Conference and Exhibition annual event at McCormick Place West in Chicago had record number of attendees with more than 8,600 in attendance over the four-day event including 564 companies exhibiting e-commerce technologies and services.
This year’s event was focused on “Connecting with the 21st Century Consumer.” A description from the event brochure stated, ‘It was not long ago that having a decently performing retail web site was cool. No more. Today there a millions of e-commerce sites and the competition between them is fierce.
So fierce, in fact, that e-retailers can no longer succeed simply by keeping up with the pack. Growth comes by outperforming your competition and the surest way of doing that is by understanding who are the frequent web shoppers, what they demand from online stores, and how best to reach and serve them.’
To help attendees understand their site’s performance, we ran the “Gomez Challenge” where attendees provided their website URL to have the site’s performance measured in real-time and compared to other participants taking part in the challenge during the event.
The Gomez Challenge is a set of tests that provide event participants – whether performance focused or just beginning to learn about it – valuable insight into how both market leaders and smaller companies sites are performing and context for discussions between IT and business site stakeholders on how to balance user experience with site speed.
Over the four-day event, we ran home page tests of participant’s web site performance from multiple geographic locations looking at webpage response time, number of connections, hosts, objects, and page size to provide insight into how each site is performing.
Using a series of waterfall charts and other diagnostics tools built into the Gomez Challenge, the test also provided participants with immediate suggestions for optimizing performance.
The Gomez Challenge results are presented on a scoreboard that lists each participant along with their results across the following page load thresholds:
- Green = less than 2 seconds, good customer experiences
- Yellow = between 2.1 and 5 seconds, considered to be customer impacting
- Red = more than 5 seconds, critical issues and very customer impacting
The winner of the Gomez Challenge had the the fastest average response time during the event across multiple geographies. This year’s challenge winner was Belk.com, the nation’s largest privately owned mainline department store company with 303 Belk stores located in 16 Southern states – congratulations!
Check out your own website with a free test with the Gomez Website Performance Test. You can also find out how your website performs across browsers, compared to your competitors and on mobile applications here.
New Relic and SOASTA offer comprehensive solution for web application testing and performance managementPosted: 31/05/2012
According to recent SOASTA press release…
SOASTA, the leader in cloud and mobile testing, and New Relic, the SaaS-based cloud application performance management provider, today announced their partnership and the integration of New Relic’s application performance management solution on SOASTA’s CloudTest platform.
The integrated solution gives developers complete visibility into the performance of their web applications throughout the entire application development lifecycle. Now developers can immediately begin building performance tests to assure the highest test coverage and tune their apps based on deep performance diagnostics to ensure the highest level of application performance and availability, all at no charge.
Hopefully get an update shortly…
From APM Thought Leadership at App Dynamics…
I have yet to meet anyone in Dev or Ops who likes alerts. I’ve also yet to meet anyone who was fast enough to acknowledge an alert, so they could prevent an application from slowing down or crashing. In the real world alerts just don’t work, nobody has the time or patience anymore, alerts are truly evil and no-one trusts them. The most efficient alert today is an angry end user phone call, because Dev and Ops physically hear and feel the pain of someone suffering.
Why? There is little or no intelligence in how a monitoring solution determines what is normal or abnormal for application performance. Today, monitoring solutions are only as good as the users that configure them, which is bad news because humans make mistakes, configuration takes time, and time is something many of us have little of.
Its therefore no surprise to learn that behavioral learning and analytics are becoming key requirements for modern application performance monitoring (APM) solutions. In fact, Will Capelli from Gartner recently published a report on IT Operational Analytics and pattern based strategies in the data center. The report covered the role of Complex Event Processing (CEP), behavior learning engines (BLEs) and analytics as a means for monitoring solutions to deliver better intelligence and quality information to Dev and Ops. Rather than just collect, store and report data, monitoring solutions must now learn and make sense of the data they collect, thus enabling them to become smarter and deliver better intelligence back to their users.
Change is constant for applications and infrastructure thanks to agile cycles, therefore monitoring solutions must also change so they can adapt and stay relevant. For example, if the performance of a business transaction in an application is 2.5 secs one week, and that drops to 200ms the week after because of a development fix. 200ms should become the new performance baseline for that same transaction, otherwise the monitoring solution won’t learn or alert of any performance regression. If the end user experience of a business transaction goes from 2.5 secs to 200ms, then end user expectations change instantly, and users become used to an instant response. Monitoring solutions have to keep up with user expectations, otherwise IT will become blind to the one thing that impacts customer loyalty and experience the most.
So what does behavioral learning and analytics actually do, and how does it help someone in IT? Let’s look at some key Dev and Ops use cases that benefit from such technology.
#1 Problem Identification – Do I have a problem?
Alerts are only as good as the thresholds which trigger them. A key benefit of behavioral learning technology is the ability to automate the process of discovering and applying relevant performance thresholds to an application, its business transactions and infrastructure, all without human intervention. It does this by automatically learning the normal response time of an application, its business transactions and infrastructure, at different hours of the day, week and month, ensuring these references create an accurate and dynamic baseline of what normal application performance is over-time.
A performance baseline which is dynamic over-time is significantly more accurate than a baseline which is static. For example, having a static baseline threshold which assumes application performance is OK if all response times are less than 2 seconds is naive and simplistic. All user requests and business transactions are unique, they have distinct flows across the application infrastructure, which vary, depending on what data is requested, processed and packaged up as a response.
Take for example, a credit card payment business transaction – would these requests normally take less than 2 seconds for a typical web store application? not really, they can vary between 2 and 10 seconds. Why? There is often a delay whilst an application calls a remote 3rd party service to validate credit card details before it can be authorized and confirmed. In comparison, a product search business transaction is relatively simple and localized to an application, meaning it often returns sub-second response times 24/7 (e.g. like Google). Applying a 2 second static threshold to multiple business transactions like “credit card payment” and “search” will trigger alert storming (false and redundant alerts). To avoid this without behavioral learning, users must manually define individual performance thresholds for every business transaction in an application. This is bad, because as I said earlier, nobody in IT has the time to do this, so most users resort to applying thresholds which are static and global across an application. Don’t believe me? ask your Ops people whether they get enough alerts today, chances are they’ll smile or snarl.
The screenshot below shows the average response time of a production application over-time, with spikes representing peak load during weekend evening hours. You can see on weekdays normal performance is around 100ms, yet under peak load its normal to experience application performance of up to several seconds. Applying a static threshold in this scenario, of 1 or 2 seconds would basically cause alert storming at the weekend even though its normal to see such performance spikes. This application could therefore benefit from behavioral learning technology so the correct performance baseline is applied for the correct hour and day.
Another key limitation with alerts and traditional monitoring solutions is that they lack business context. They’re typically tied to infrastructure health rather the health of the business, making it impossible for anyone to understand the business impact of an alert or problem. It can be the difference between “Server CPU Utilization is above 90%” and “22% of Credit Card Payments are stalling”. You can probably guess the latter alert is more important to troubleshoot than pulling up a terminal console, logging onto a server and typing prstat to view processes and CPU usage. Behavioral learning combined with business context allows a monitoring solution to alert on the performance and activity of the business, rather than say, the performance and activity of its infrastructure. This ensures Dev and Ops have the correct context to understand and be aligned with the business services.
Analytics can also play a critical role in how monitoring data is presented to the user to help them troubleshoot. If a business transaction is slow or has breached its threshold, the user needs to understand the severity of the problem. For example, were a few or lot of user transactions impacted? how many returned errors or actually stalled and timed out? Everything is relative, Dev or Ops doesn’t have the time to investigate every user transaction breach, its therefore important to prioritize with business impact before jumping in to troubleshoot.
If we look at the below screenshot of AppDynamics Pro, you can see how behavioral learning and analytics can help a user identify a problem in production. We can see the checkout business transaction has breached its performance baseline (which was learnt automatically), we can also see the severity of the breach which shows no errors, 10 slow requests, 13 very slow and no stalls. 23 out of the 74 user requests (calls) were impacted meaning this is a critical problem for Dev and Ops to troubleshoot.
#2 Problem Isolation – Where is my problem?
Once a user has identified abnormal application performance, the next step for them is to isolate where that latency is spent in the application infrastructure. A key problem today is that most monitoring solutions collect and report data, but they don’t process or visualize it in a way that automates problem isolation for a user. Data exists, but its down to the individual users to drill down and piece together data, so they can find what they’re looking for. This is made difficult by the fact that performance data can be fragmented across multiple silo’s and monitoring toolsets, making it impossible for Dev or Ops to get a consistent end to end view of application performance and business activity. To solve this data fragmentation problem, many monitoring solutions use time-based correlation or Complex Event Processing (CEP) engines to piece together data/events from the multiple sources, so they can look for patterns or key trends which may help a user isolate where a problem or latency exists in an application.
For example, if a user credit card payment business transaction took 9 seconds to execute, where was that 9 seconds spent in the application infrastructure exactly? If you look at performance data from an OS, app server, database or network perspective you’ll end up with four different views of performance, none of which relate to that individual credit card payment business transaction which took 9 seconds. Using time-based correlation won’t’ help either, knowing the database was running at 90% cpu whilst the credit card payment transaction executed is about as helpful as a poke in the eye. Time-based correlation is effectively a guess, given the complexity and distribution of applications today, the last thing you want to be doing is guessing where a problem might be in your application infrastructure. Infrastructure metrics tell you how an application is consuming system resource, they don’t have the granularity to tell you where an individual user business transaction is slow in the infrastructure.
Behavioral learning can be used together to learn and track how business transactions flow across distributed application infrastructure. If a monitoring solution is able to learn the journey of a business transaction, then they can monitor the real flow execution of them across and inside distributed application infrastructure. By visualizing the entire journey and latency of a business transaction, at each hop in the infrastructure, monitoring solutions can make it simple for Dev and Ops to isolate problems in seconds. If you want to travel from San Francisco to LA by car, the easiest way to understand that journey, is to visualize it on Google Maps in seconds. In comparison, the easiest way for Dev or Ops to isolate a slow user business transaction, is to do the same thing and visualize its journey across the application infrastructure. For example, take the below screenshot which shows the distributed transaction flow of a “Checkout” business transaction which took 10 seconds across its application infrastructure. You can see that 99.8% of its response time is spent making a JDBC call to the Oracle database. Isolating problems this way is much faster and efficient than tailing log files or asking sys, network or DBA administrators whether their silos are performing correctly.
You can also apply dynamic base-lining and analytics to the performance and flow execution of a business transaction. This means a monitoring solution can effectively highlight to the user which application infrastructure tier is responsible for a performance breach and baseline deviation. Take for example, the below screenshot which visualizes the flow of a business transaction in a production environment, and highlights the breach for the application tier “Security Server” which has deviated from its normal performance baseline of 959ms.
Behavioral learning and analytics can therefore be a key enabler to automating problem isolation in large, complex, distributed applications.
#3 Problem Resolution – How do I fix my problem?
Once Dev or Ops has isolated where the problem is in the application infrastructure, the next step is to then identify the root cause. Many monitoring solutions today can collect diagnostic data which relate to the activity of components within an application tier such as a JVM, CLR or database. For example, a java profiler might show you thread activity, a database tool might show you top N SQL Statements. What these tools lack is the ability to tie diagnostic data to the execution of real user business transactions which are slow or breaching associated performance thresholds. When Ops picks up the phone to an angry user, users don’t complain about CPU utilization, thread synchronization or garbage collection. Users complain about specific business transactions they are trying to complete like login, search or purchase.
As I outlined above in the Problem Isolation section, monitoring solutions can leverage behavioral learning technology to monitor the flow execution of business transactions across distributed application infrastructure. This capability can also be extended inside an application tier, so monitoring solutions can learn, and monitor, the relevant code execution of a slow or breaching business transaction.
For example, here is a screenshot which shows the complete code execution (diagnostic data) of a distributed Checkout business transaction which took 10 seconds. We can see in the top dialogue the code execution from the initial struts action all the way through to the remote Web Service call which took 10 seconds. From this point we can drill inside the offending web service to its related application tier and see its code execution, before finally pinpointing the root cause of the problem which is a slow SQL statement as shown.
Without behavioral learning and analytics, monitoring solutions lack intelligence on what diagnostic data to collect. Some solutions try to collect everything, whilst others limit what data they collect so that their agent overhead doesn’t become intrusive in production environments. The one thing you need when trying to identify root cause is complete visibility, otherwise you begin to make assumptions or guess what might be causing things to run slow. If you only have 10% visibility into the application code in production, then you’ve only got a 10% probability of finding the actual root cause of an issue. This is why users of most legacy application monitoring solutions struggle to find root cause – because they have to balance application code visibility with monitoring agent overhead.
Monitoring today isn’t about collecting everything, its about collecting what is relevant to business impact, so any business impact can be resolved as quickly as possible. You can have all the diagnostic data in the world, but if that data isn’t provided in the right context for the right problem to the right user, it becomes as about as useful as a chocolate teapot.
With applications becoming every increasingly complex, agile, virtual and distributed. Dev and Ops no longer have the time to monitor and analyze everything. Behavioral learning and analytics must help Dev and Ops monitor whats relevant in an application, so they can focus on managing real business impact instead of infrastructure noise. Monitoring solutions must become smarter so Dev and Ops can automate problem identification, isolation and resolution. The more monitoring solutions rely on human intervention to configure and analyze, the more monitoring solutions will continue fail.
If you want to experience how behavioral learning and analytics can automate the way you manage application performance, take a trial of AppDynamics Pro and see for yourself.
UK broadband speeds drop by an average of 35% from their off-peak highs when most people are online in the evening, according to a report.
The research, conducted by the comparison site Uswitch, was based on two million broadband speed tests.
The peak surfing times between 7pm and 9pm were the slowest to be online, the report said.
There were also huge regional variations between evening and early morning surfing times.
The report suggested the best time to be online was between 2am and 3am.
Users in Evesham, Worcestershire, fared worst, according to the survey, with a massive 69% drop-off between off-peak morning and evening surfing.
Those living in Weston-super-Mare did little better with speeds falling from an off-peak average of 9.5Mbps (megabits per second) to 3.4Mbps in the evening – a 64% drop.
The difference was often most noticeable in rural areas where even peak speeds were relatively slow. In Wadebridge, in Cornwall, speeds nearly halved from 4.1Mbps at off-peak times to 2.1Mbps at peak times.
“It really is surprising just how much broadband speeds fluctuate at different times of the day, with drop-offs of almost 70% in some areas of the UK,” said Uswitch’s technology expert Ernest Doku.
“Not many internet users enjoy the maximum headline broadband speeds offered by providers, and certainly not during the working week,” he added.
New rulesBroadband speed is becoming more important as bandwidth-hungry services such as on-demand TV become more popular.
Telecoms regulator Ofcom recently revealed that British households download an average of 17 gigabytes of data every month over their home broadband connections.
That monthly data diet is equivalent to streaming 11 movies or 12 hours of BBC programmes via iPlayer.
Critics say consumers are being misled by internet service providers who continue to advertise their maximum broadband speeds, even though many users do not get them.
New rules from the Committee of Advertising Practice (CAP) say that from April next year providers will no longer be able to advertise maximum speeds for net packages unless 10% of customers receive them.
Almost half of broadband users are now on packages with advertised speeds above 10Mbps but the average broadband speed is 6.8Mbps according to Ofcom.
See what the experts are saying about the APM market. Read Gartner’s comprehensive 2011 “Magic Quadrant for Application Performance Monitoring (APM)” report. It evaluates 29 vendors on completeness of vision and ability to execute.
When I saw this report, I was expecting it to be downloadable.
So here is my downloadable pdf version, which might be a little more manageable.