Series 1: Concurrency experiment in 1 vCPU Azure Function Host
There is been a lot of misconeption when it comes to how servers are handling concurrent requests. Few believe that they are processed parallely and few disagree to it. Actually both the answers are correct!. Let’s find how it actually works in this blog.
Handling concurrent requests are entirely dependent on whether your server operates on a single core processor or multi core processor. That doesn’t mean in a single core processor you won’t be able to handle multiple requests.
Each core can have more that one threads and these might not necessarily execute simultaneously. However they are managed in a way that gives the illusion of simultaneous execution. This is referred as concurrency.
Multi core processor on the other hand can achieve actual simultaneous execution commonly referred as Parallelism.
Imagine there are 100 concurrent requests hit your web server at the same time. In a single core processor, application could start by processing the first request and meanwhile it waits for a task to perform in the first request ( like waiting for data from a network or reading a datbase or reading a file), the context switches between different task that would handle the second requests and goes on. That is how it achieves concurrency (illusion of simultaneous execution) through asynchronous operations and by switching between different requests quicky.
Whereas in the multi core processor actual parallelism ( simultaneous execution of different requests) can be achieved. Each core can handle a spearate request simultaneously leading to faster overall processing. Hence multi core processors will have better throughput than the single core processor.
- Single Core processor
Limited Parallelism : As we have only one physical processing unit, tasks won’t execute simultaneously, though we can still make progress through asynchronous operations and context switching.
- Multi Core processor
Ture parallelism : As we have multiple physical processing units, we could execute tasks on different cores simultaneously and perform concurrent execution.
- Concurrency involves the execution of multiple tasks that appear to run simultaneously, while parallelism is the actual simultaneous execution of multiple tasks across multiple processors or cores.
In Azure, I couldn’t find a direct comparison of vCPU with number of Cores. Each CPU can have one or more cores. Ideally in all our personal laptops we should be having multiple cores ( you could see those information in the Task manager -> Performance) but that is not the case in the cloud. However in the Azure pricing calculator I could see few of virtual machine series are mentioned in terms of Core and few in terms of vCPU. A and B series virtual machines have clearly said how many cores it has and other series are mentioned in vCPUs
Typically a Core is a physical unit of a CPU. And a general assumption is that 1vCPU = 1 Physical CPU Core. However this might not be the case for all scenarios.
General guidelines are mentioned here , where we could get an idea of vCPU : core ratio https://learn.microsoft.com/en-us/azure/virtual-machines/acu
I have chosen Azure Function to deploy my code and test concurrency with an assumption that 1vCPU is equivalent to 1Core. We all are aware that Azure function is able to scale out to multiple instances based on the events that the Scale controller monitors. However to test the concurrency and to understand it much better I have scaled down the maximum limit to 1 instance.
Each instance of the Functions host in the Consumption plan is limited, typically to 1.5 GB of memory and one CPU. An instance of the host is the entire function app, meaning all functions within a function app share resource within an instance and scale at the same time. Function apps that share the same Consumption plan scale independently. In the Premium plan, the plan size determines the available memory and CPU for all apps in that plan on that instance. ref: https://learn.microsoft.com/en-us/azure/azure-functions/event-driven-scaling?tabs=azure-cli
In the above microsoft article it is said that typically Azure function host is said to have one CPU. However it is not clear if it is running on 1Core or multiple cores
At the end of this experiment let’s find out if we have multiple cores in 1 vCPU or is it single core.
- We will experiment the concurrency in Azure function by sending 100 concurrent requests using Apache JMeter testing tool.
- All those requests will reach the Azure function host at the same time, however the scale controller decides if it needs to scale out into multiple instances based on the number of events it monitors.
- However for the testing purpose we have scaled down the instance of Azure function host to maximum one instance
- Now it will reach the Azure function host (1 vCPU) to handle those requests
- Concurrency model in the server is achieved by switching between the requests quickly with the help of multiple threads
Now that we have understood the concepts, let’s see how concurrent requests are handled in a 1 vCPU Azure Function. To visualize the concurrency traces in zipkin/Jaeger/ApplicationInsights I made few tweaks in the application code.
Implement open telemetry tracing for the Azure Function HTTP trigger (Code is shared below)
We can inject middleware in Azure Function (In-Process) using this package
Deploy the application
Execute the endpoint and grab the parentSpanId and operationId.
Now I have used it in the code, traces are formed based on the parent span id that is stored in our previous step, This will ensure we could visualize the entire traces of 100 concurrent requests under a single roof.
Code looks like below
- I have created delegate to simulate the network calls/reading file/reading database that could possibly happen in any requests.
- Delegate code looks like below
- For setup of Apache Jmeter in windows please refer my blog here :
- Add the path in the HTTP request sampler as below
- From above visualization it is not so clear if the requests were processed in a concurrent fashion or parallel execution.
- Now let’s find out it with some queries using kql in Log Analytics workspace for our operation Id
- As we have scaled down the instance to maximum one instance, for 100 concurrent requests Scale controller doesn’t spin up any new instance and we were using Cloud Role Instance : 10-30-14-87 for all requests
- There are totally 202 responses from the kql . This implies there are totally 100 spans for (/api/concurrency) + 100 spans for (event publish) + 2 spans for initial request made
- hmm! surprisingly I could see exactly at the same moment there were 2 records found to be processed. This could possibly happen only if there are multiple cores, that process the requests simultaneously
- Hence our assumption on vCPU:Core ratio is wrong, at least in the context of Azure Functions.
- Dynamic concurrency on Azure Blob, Azure Queue, and Service Bus triggers
- will experiment on the concurrency in single core processor. Probably I will choose a virtual machine with single core and test my application there.