Dynamically Right-Sizing Your Cloud Infrastructure

See All Blogs

The benefits of cloud service providers such as Amazon Web Services, Microsoft Azure, and Google Cloud Platform cannot be overstated. With the click of a button (or execution of a script), a user can create anything from a simple web site to hundreds of servers, each with hundreds of cores of CPU power. This allows development resources to be created and scaled on-demand, giving development teams access to bleeding-edge technologies without having to wait for a dedicated infrastructure team to design, request, and build the supporting infrastructure. However, as we will see, with great power comes great fiscal responsibility.

A common objection to cloud migrations, as well as a motivation behind rolling back cloud systems to traditional on-premises infrastructure, is that of the persistent cost involved. In the case of an on-prem setup, machines are right-sized up front to account for bursts of resource demands, that is, machines are sized to be able to handle some level of peak usage and sit idle when not needed. A high initial cost is paid, but after the initial investment, costs drop down significantly as the servers largely run in maintenance mode. In this case, right-sized infrastructure is directly tied with the absolute peak system demands, regardless of how often these resources are actually required.

In the case of a cloud subscription, the up-front cost is a mere fraction of the up-front investment of physical infrastructure, but the costs persist for the life of the systems – many machines are billed with a formula along the lines of base machine cost * hours of operation. The recurring spend is arguably the most common cost-related objection to cloud providers – physical systems are a one-time expense (until the systems need to be upgraded). That’s not to say significant savings in cloud systems are impossible – cloud services today provide on-demand scaling of resources, allowing you to ensure your systems are not only right-sized for peak demands, but also scaled down when demands are lower. There are many tactics one can use to manage cloud costs effectively; this article dives into two possible ways to alter one or both of the above multipliers as needed to reduce compute costs – but it does require insight as to how – and perhaps more importantly, when, your systems are being used.

Example 1: Scaling Vertically
You are using Platform-as-a-Service (PaaS) managed services for SQL databases. One such database is used internally for handling batch processing operations. Investigation into the activity of the resource shows a heartbeat-like graph indicating extreme write-heavy activity for about one hour every day at 4AM corresponding to when a file import is processed. The rest of the time, usage never climbs above 10%. In the on-prem world, the server would be built to be able to handle the spikes and sit idle the rest of the day; however, taking this same approach with managed resources would result in substantial unnecessary costs for the 23 hours per day that 90% of the available resources are unused.

In this case, we have a very predictable pattern with respect to system demands and time of day and can take proactive action to supply system resources as needed. The actual implementation can vary based on cloud provider and personal preference; examples include automation accounts or functions in Azure clouds, lambdas in AWS, or an orchestrator machine running scheduled tasks in any environment.

The basic logical flow would read as follows:
At 3:50 AM, (10 minutes before daily spike), scale database server up to premium tier.

At 5:10 AM (10 minutes after spike time ends), verify usage has dropped down to normal levels; if so, scale down to standard tier.

If usage does not ramp up or drop down in the expected window, notify relevant teams.

Example 2: Scaling Horizontally
You are operating a set of load-balanced Infrastructure-as-a-Service (IaaS) Virtual Machines hosting a consumer-facing web application. During off-peak hours, two machines is enough to handle incoming traffic while allowing for failover if necessary. During the workday, traffic gradually ramps up, requiring the resources of one to three additional machines, variable by day and time of day. This graph has much more variance than the previous example, with some days experiencing higher load than others, but never represented by an immediate spike.

In this scenario, we can be more reactive with respect to our resizing and make use of threshold-based metrics to tell us when we need to scale up or down. Again, there a multitude of ways to accomplish this – for example, a cloud monitor tied to trigger an event when machines begin to see 60% CPU load to scale out, and a second monitor to send the ‘scale in’ broadcast when load has dropped below 40% for more than five minutes.

The logical flow here would read as follows:
If total CPU exceeds 60%, add another machine to the set.

If the VM pool contains more than two machines and total usage drops below 40% for more than five minutes, remove one machine from the pool. Report any anomalies to relevant teams.

As you can see, this example makes no assumptions about time of day and may upsize the pool for 3 hours beginning at 7AM on Monday but only need to scale out for 2 hours at 1:30PM on Wednesday.

Complex Scaling Strategies in the Real World
In all likelihood, you will see benefit from a mix and match of these (and other) cost-management strategies in your cloud environment. Oftentimes a single piece of the solution architecture will serve as a candidate for multiple scaling strategies; for example, in an internal build system, one may seek to scale up an individual agent to allocate more power to resource-intensive jobs, while at the same time scaling out to create more build agents and allow parallelization of builds. This is the principle behind many build orchestration tools which provide the ability to spawn ephemeral build agents in response to demand. Similarly, a tax preparation service may need to create significantly more instances of their webserver farms around tax day to handle higher levels of internet traffic, while also vertically scaling to increase the processing power of any given database server.

Conclusion
Like development itself, a cloud migration is forever a work-in-progress. Simply lift-and-shifting your systems from physical hardware to managed solutions provide immense benefits in ease of use and reliability; however, best practices for things like right-sizing your systems have changed and there are significant benefits from constantly evaluating your system architecture and looking for areas of improvement. By routinely monitoring how and when your systems are being most heavily used, you can then work to implement solutions to get the most out of your infrastructure as well as your budget.

About the Author:
Chris Gutmanis is an engineering consultant based out of the Milwaukee Development Center. He’s worked as a software and systems engineer and has been focusing lately on DevOps and cloud computing. Chris has a wide range of experience covering everything from startups to large financial services and health care companies. He lives in Milwaukee with his wife, two dogs and three cats and enjoys Brazilian jiu-jitsu, heavy metal music, making guacamole and trips to the dog park.

A member of a product team collaborating with colleagues remotely.

Blog

Jul 9, 2025

How AI Helped Me Think Like a Designer Before I Had One

When a UX gap threatened to slow a critical user flow update, our product team had no choice but to adapt. In this article, Director of Product Brittany Langosch discusses how although there's no substitute for a talented designer, using AI tools like ChatPRD and V0 can support fast collaboration and clear decision-making in a pinch.

Blog

Jun 26, 2025

Snowflake Summit 2025 Announcements

Snowflake Summit 2025’s latest announcements made it clear: the path to genuine AI-driven impact hinges on frictionless access to data, the ability to act on it with clarity, and absolute confidence in its protection. Learn more about how they're making that happen for customers in this article.

Blog

Jun 25, 2025

How ChatPRD Helps Build Better Stories (and a Stronger Team)

When user stories are vague, it slows down delivery, trust, and momentum. This article by Senior Product Strategy Consultant Traci Metzger shows how she used a lightweight, AI-guided system (ChatPRD) to write clearer, developer-ready requirements that actually accelerated execution.

Blog

Jun 6, 2025

QA in the Age of AI: The Rise of AI-Powered Quality Intelligence

As organizations push code to production faster, respond rapidly to new customer needs and build adaptive systems, the expectations on quality have changed. It's no longer enough to simply catch bugs at the end of the cycle. We’re entering an era where quality engineering must evolve into quality intelligence and organizations adopting quality intelligence practices are reporting measurable gains across key delivery metrics. Learn more in this article by Principal Engineer Jarius Hayes.

See All Blogs

Dynamically Right-Sizing Your Cloud Infrastructure

Sparq IT Blog Cookies Policy