Load Testing Best Practices
Load testing is a branch of software testing that measures a system’s capacity to process concurrent transactions per second. The primary purpose of load testing is to determine the breaking point of a system’s weakest link, whether it’s a database or a storage system. This helps determine a system’s capacity limits and identify performance bottlenecks and other issues that may arise with increased load. Load testing also helps operations teams determine the configuration requirements of a fully scaled platform and predict the associated infrastructure costs.
Load testing and performance testing are sometimes used interchangeably. Load testing finds the system's breaking point in terms of transaction capacity, and performance testing ensures optimal user experience in terms of milliseconds of response time. Both practices are related as performance depends on load conditions and the system's maximum capacity.
When implemented effectively, load testing provides helpful data to validate system performance, determine scalability limits, identify performance bottlenecks, mitigate performance-related risks, and increase confidence in system reliability. To help your team realize these and other benefits of load testing, this article will present ten load testing best practices for current and future software projects.
Summary of load testing best practices
Explanations of load testing best practices
The following sections will expand upon the best practices summarized above and provide practical tips to help your organization gain maximum benefits from load testing.
Simulate a variety of realistic and expected usage patterns
Effective load testing is predicated on a thorough understanding of how users interact with your application. Start by defining user journeys and identifying critical flows, such as login, search, purchase, and other common scenarios users may encounter. Next, identify specific user actions that comprise each journey (such as searching a particular term, adding an item to a cart, entering payment details, etc.) and script those actions using an appropriate load testing tool. To add variability to the scripts that more realistically simulates actual user behavior, parameterize user input such as usernames, passwords, and search terms.
Once you have accomplished the steps above, there are a few other best practices to keep in mind as you refine test cases. First, ensure you are testing more than ideal or complete user journeys. Real users may drop off without completing their actions within a system, and it is therefore essential to test scenarios such as users who fail to set up their accounts completely or add items to a shopping cart without ever checking out. If possible, monitor actual user behavior within your application (using tools such as Fullstory, for example, or simply Google Analytics) and use the data to determine which scripts to write and which test cases to prioritize. Once you have enough data to observe trends, look for moments in which system performance degraded and identify user actions that led to that behavior. Doing so will help pinpoint the components or subsystems that could be contributing to performance issues.
Test API endpoints and system components strategically
A comprehensive load testing strategy should combine system-wide (or end-to-end) load tests with more granular tests on individual components or API endpoints. Granular load tests provide insights into the behavior of individual components or subsystems, which can be helpful in identifying areas for performance optimization and assisting with scalability down the line. For example, targeted load testing may identify that an application’s authentication system supports 1,000 concurrent users while the payment processing system supports only 700 users. If the development team wishes to scale the application up to support 900 users, this information would be of greater value than information gained from system-wide tests showing that the application cannot support 900 users without an unacceptable level of performance degradation.
Finally, it is worth mentioning that load testing every system component or API endpoint individually is usually not realistic (or advisable) given the time and resources needed to write test scripts and generate the necessary load. Instead, identify the most time-consuming, highest throughput, and most mission-critical components on which to focus granular load testing efforts.
Run tests in a production-like environment
For the most accurate results and best confidence in system performance, load testing should be performed either in production or in an isolated environment identical to production. However, this can be impractical due to time, cost, and other constraints. In general, there are three primary options for load testing environments:
- The production environment. The most realistic way to assess the performance of system components is to test those actual components directly. However, production load testing can be costly and involve several risks, such as negatively impacting system performance for real users, creating side effects (such as unnecessary records in the production database), and skewing system logs or user analytics.
- A production replica. While testing an exact and isolated replica of production mitigates many of the risks of load testing the production environment directly, its primary drawback is the cost of deploying one or more additional full-scale application instances and generating enough load to simulate a representative number of users.
- A small-scale environment similar to production. This is the most flexible and common approach because it allows organizations to load test more quickly and at a lower cost. Running tests on a small-scale environment can involve deploying an identical but scaled-down version of the production application, or it can mean mocking certain services and dependencies to mimic production as closely as is feasible given time and resource constraints. Applications based on microservices architecture that can be containerized are easier to spin up and tear down for the duration of the test, and are known as ephemeral environments.
The appropriate testing environment for your application will vary depending on testing goals, testing frequency, and the resources available. Many organizations may choose to utilize small-scale environments for lightweight, more frequent tests while reserving full-scale environments for more comprehensive and larger-scale testing. The following two sections will provide more details on combining different-sized tests and test environments in practice.
Include lightweight load tests in your test plan
As indicated above, running full-scale load tests frequently is not always possible or practical due to time, cost, and system availability. A practical and effective solution to this issue is to run a smaller suite of lightweight tests frequently (for example, after each build) and run full-scale tests for more significant events, such as new releases or in anticipation of high-traffic periods.
There are several ways to decrease the time and resources needed to run load tests. Some common approaches include:
- Running tests with a smaller number of VUs. Reducing the number of VUs reduces the computing resources required to generate VUs and run test scripts.
- Running tests on a scaled-down version of the test target. This practice cuts the costs associated with maintaining a larger-scale environment. It also eliminates the risk of disrupting the production environment during testing.
- Utilizing fewer test scripts. Besides requiring fewer computing resources, using a smaller number of test scripts allows you to focus on essential scenarios and user workflows that significantly impact on the system. This also makes analyzing test results more manageable.
- Running tests for a shorter period of time. In addition to the cost and resource benefits described above, this practice allows for more rapid feedback and iteration cycles.
In short, these practices allow organizations to load test in ways that cut costs and align better with Agile software development. Utilizing them in combination will allow your organization to meet testing goals while mitigating many of the drawbacks associated with other load testing strategies.
{{banner-1="/design/banners"}}
Find the right testing frequency
Curating an effective load testing schedule involves careful planning and consideration of various factors, including the size and availability of testing environments, the components to be tested, the number and scale of test scenarios, and the organization’s testing objectives. While no single approach that will work in every case, following some best practices will help you determine an appropriate schedule for your application.
First, we recommend utilizing full-scale and small-scale end-to-end load tests with different frequencies. Full-scale tests should simulate anticipated load scenarios against an environment similar or identical to production, while small-scale tests should perform similar tests on a scaled-down version of the test target. Because of their lower cost, small-scale tests can run more frequently or even be integrated into your CI/CD pipeline, and help determine the impact of new software releases on performance. On the other hand, full-scale tests may only be possible to run within specific maintenance windows or in anticipation of a high-traffic period.
For more targeted (or “unit”) tests of subsystems or individual components, scheduling can vary widely depending on the test’s purpose. Some common reasons to run unit load tests include:
- pinpointing bottlenecks (which may have been identified via end-to-end testing)
- examining the behavior of individual components or services, such as image processing, authentication, search, databases, or message queues
- comparing different technology options for a new component or service
Unit load tests are faster to run and cost less than end-to-end tests, which means unit load tests can be scheduled more flexibly. For example, unit load tests could be automated to run with each new release or performed on-demand when making changes to a particular part of the code base.
Integrate with the CI/CD Pipeline
Automating load testing within a CI/CD pipeline shifts performance concerns left and quantifies the performance implications of every code or infrastructure change. This helps development teams refactor inefficient code or evaluate performance tradeoffs early before changes reach production. In addition, load testing in automated CI/CD processes typically requires more thoughtful planning and means that load tests will be run with greater frequency. This allows development teams to track a narrower and more relevant subset of test data over time to identify patterns, trends, and potential performance bottlenecks that may need to be addressed as areas of concern or optimization down the line.
Do not forget other types of load tests
There are several types of load tests, each accomplishing different testing goals and focusing on different aspects of system behavior. Some common load test types and their uses are listed below.
- Anticipated load tests measure system performance under expected load patterns and real-world conditions.
- Stress tests measure system performance under extreme or beyond-normal load conditions, often in anticipation of a high-traffic event (such as a sale) or simply to observe how the system responds and recovers when pushed beyond anticipated limits.
- Soak tests (or endurance tests) measure system performance over an extended period of sustained load. Its primary purpose is to determine if the system can maintain stable performance, functionality, and resource utilization over an extended duration, such as hours, days, or even weeks.
Identify types of load test
While it is not realistic to perform all types of load tests frequently (soak tests, for example, may only be performed quarterly or biannually due to cost and time constraints) including each type of load test in your test plan provides a more complete picture of system performance, reliability, and scalability.
Choose the right tooling for your application
There are many open-source and hosted commercial options for load testing tools. While many offer similar features, such as the ability to generate virtual users and configure different types of HTTP requests, there are a number of features that greatly simplify the process of setting up, running, scaling, and evaluating the results of load tests. Some desirable features include:
- support for tests written in a common or familiar scripting language
- easy scalability (see the next section on hosted load testing tools)
- the ability to run tests and adjust test parameters with minimal effort
In addition to the features above, it is worth considering which metrics the tool allows you to track and whether it supports the creation of custom metrics that can be tailored to your application and testing goals. For more information on how to choose a load testing tool, check out our guide to must-have features for load testing tools.
Use a hosted service
While it is not uncommon for organizations to create custom load testing solutions using open-source tools and in-house or cloud-based computing resources, hosted load testing tools provide several key advantages over other solutions.
First, hosted tools remove the need for your developers to setup and maintain a cluster of machines. This allows for easy scalability without the effort of purchasing and managing load generators. In addition, many hosted tools provide a more intuitive interface and finer grain control over test results compared to their open-source counterparts, which can save considerable time and effort when adjusting test parameters and interpreting results. Finally, hosted solutions like Multiple allow teams to manage test scripts and results in the cloud, which facilitates easier collaboration and ensures all team members have access to up-to-date test data.
Track relevant performance metrics
Tracking performance metrics such as response time, throughput, error rates, and resource utilization provides measurable insight into system performance and ensures that the system is operating within acceptable tolerances under different load conditions. However, analyzing test results and drawing relevant conclusions from load testing data can quickly become overwhelming due to the complexity and volume of data generated.
To avoid common pitfalls, ensure that your team has a set of clearly defined performance goals for the system that state acceptable response times given a specific user action and load on the application. For example, a performance goal for an e-commerce application might be: “the home page should load in less than 1 second while 3,000 concurrent users are performing search operations and browsing product pages.” Once performance goals are set, identify specific metrics (for our e-commerce example above, response time and throughput) that will determine whether the system can perform under the given load within acceptable tolerances.
Finally, to allow your team more precise control over which metrics to track, we recommend choosing a load testing tool like Multiple that allows you to create custom metrics, such as database size, correctness of API response, Kafka queue length, and more.
{{banner-2="/design/banners"}}
Conclusion
Load testing is a complex and iterative process. A system’s capacity will likely be affected by each change to an environment’s configuration or update to middleware components, and test cases may need to be adjusted accordingly. To ensure that your test suite remains relevant and continues to accomplish testing goals, continue to revisit and revise your load testing strategy with each significant release. Although there is no single approach that will work for every application, we hope that the best practices presented in this article will help your team leverage the many benefits of load testing in current and future software projects.