What’s CPU Steal Time?

What’s CPU Steal Time?

One of the most important features of the cloud is the sharing of resources by multi-tenants. It is this sharing that enables the cloud operator to provide scalability and support “economies of scale” for its business. If the utilization of these resources is to be optimized, these resources must be shared between the cloud consumers.

What is Steal Time? The basic metric for how a server utilizes its CPU is the idle capacity – the amount of CPU that is free.

The CPU utilization compounds from allocations of the following:

  • User – the running application
  • System – the operating systems
  • Interrupt – Hardware interruptions
  • Wait – waiting for I/O jobs to end
  • Steal – cycles that are not related to the virtual machine
  • Idle – no work is being done

Steal time (ST) also referred to as “Stolen CPU”, exists in virtualized computing environments –It is the time that the CPU uses to run internal virtual machine tasks, with the hypervisor allocating CPU cycles to other “external tasks” that are probably caused by one of your noisy neighbors.

Researching this subject on AWS cloud forums I’d found that when the CPU utilization spikes for some time (configured by the IaaS vendor); the system automatically throttles back the CPU to a low load. This makes sense as the cloud must protect itself from overload and the threat of crash. Amazon is transparent for that matter -

“Micro instances provide a small amount of consistent CPU resources and allow you to burst CPU capacity up to 2 ECUs when additional cycles are available. They are well suited for lower throughput applications and web sites that consume significant compute cycles periodically but very little CPU at other times for background processes, daemons” Read more

[Newvem’s Amazon S3 Analytics offers additional value for Amazon S3 users by enabling amplified visibility into their current Amazon S3 footprint. Newvem helps users define and implement AWS storage policies so they can tier their storage needs with confidence. Learn More]

Amazon doesn’t detail its Xen configuration however they explain:

“The instance is designed to operate with its CPU usage at essentially only two levels: the normal low background level, and then at brief spiked levels much higher than the background level.” Read more

According to what I learned, monitoring CPU using a standard monitoring tool can mislead the cloud user. For example, Linux instances will not report the proper values for CPU usage due the virtualization layer on the underlying infrastructure. For accurate values for CPU usage on EC2 instances, the cloud user should rely only on the CloudWatch metrics and use tools such as Newvem that can support the TMI (too much information) the virtual layer generates.

Batch and Real-time Workloads

Another important aspect regarding CPU utilization is the workload model. You should differentiate between two workload models – Batch workload and Real-time workload.

>> Batch workload -  provides greater tolerance for shortage and can wait for an available capacity. The batch model describes a task that generates a steady utilization or aggregated amount of CPU usage, so once there is heavy utilization it will be compensated later on.

>> Real-time workload - balance never compensated and overloads will be restrained by the cloud operators such as in the AWS cloud Micro instance example above. Moreover, cloud operators tend to deploy a more batch workload model to control loads on their physical layer.

On the Amazon developers forums you can find the following:

“For example, when the occasion comes where I might need to do a “yum update” the system becomes unresponsive within one minute. I would have expected it to do this at three or five minutes, as it has always done, but today this throttling happens at about thirty seconds to one minute.” Check the thread

In order to utilize right your micro (or any size) instances, you need to be able to control your online resources behavior. You could also try playing your web server configuration settings, for example, limiting the number of clients. Make sure to stay up-to-date, making sure you have plan on utilizing other AWS services to support performance and scalabilty needs. Plan on moving some of the load to other cloud resources thereby you will maintain a reasonable CPU consumption of your EC2 instances.

[Newvem analytics tracks you AWS cloud utilization:

  • Hourly Utilization Pattern Analysis 
  • Reserved Instances Decision Tool 
  • Resource Resizing Opportunities

Create Your Free Account or Learn More]

About the Author

Ofir Nachmani is Chief Evangelist and Community Leader at Newvem. On his previous adventure, he led ClickSoftware’s Cloud adoption initiative. He also held several positions at Zarathustra SaaS development including ContractorOffice.com product manager and company CEO. In 2009, ClickSoftware acquired the AST group and Zarathustra as part of it. Check out his personal cloud computing blog at http://www.iamondemand.com

Keywords: Amazon web services, Amazon AWS console, Amazon AWS instances, EC2 Service, Amazon cloud computing, EC2 CPU Utilization, Amazon EC2 capacity, Cloud Monitoring, Cloud Scalability, AWS S3, Multi-tenancy, Steal Time

You must be to post a comment.

* As a bonus, you'll receive our weekly newsletter!

Hitchhiker's Guide to The Cloud

Newvem's eBook for Cloud Operations