04 Feb 2026
In the project I am currently working on, we inherited a codebase written in an older version of .NET Framework. The ASP.NET part of it was still using synchronous controller methods. This wasn’t changed for a long time because, why bother if it works? Then, some newer methods became async Task type methods with async code. Then we continued to have both, with some wiring when we went from sync to async. Then a performance problem came up. The issue was weird, and we didn’t know whether the wiring was at fault or not. So we just went in and updated all the old methods into async methods without changing the logic or anything.
To our surprise, the same code, on the same .NET Framework, on the same machine, started responding 29% better. Then we did some minor fixes, and that percent became even higher to ~44%
| Iteration |
From (ms) |
To (ms) |
Absolute Improvement (ms) |
Improvement (%) |
| Sync → Async controllers |
276 |
197 |
79 |
28.6% |
| Sync → Async controllers + code optimizations |
276 |
155 |
121 |
43.8% |
These results caught me off guard, so I wanted to find out what was happening behind the scenes, how async-await actually works, and why it led to such a big performance boost.
What is asynchronous programming?
First, let’s look at a few definitions of asynchronous programming:
[!NOTE] JavaScript
Asynchronous programming is a technique that enables your program to start a potentially long-running task and still be responsive to other events while that task runs, rather than having to wait until that task has finished. Once that task has finished, your program is presented with the result.[^1](https://developer.mozilla.org/en-US/docs/Learn_web_development/Extensions/Async_JS/Introducing)
[!NOTE] Rust
Asynchronous programming is an abstraction that lets us express our code in terms of potential pausing points and eventual results that take care of the details of coordination for us. [^2](https://doc.rust-lang.org/book/ch17-00-async-await.html)
[!NOTE] Dotnet
Async methods are intended to be non-blocking operations. An await expression in an async method doesn’t block the current thread while the awaited task is running. Instead, the expression signs up the rest of the method as a continuation and returns control to the caller of the async method.[^3](https://learn.microsoft.com/en-us/dotnet/csharp/asynchronous-programming/task-asynchronous-programming-model#threads)
What we want to achieve with async code is not to block other operations from executing while a longer-running task is being processed. This provides a pleasant user experience. Think of a button that plays a song: once pressed, nothing else can happen; you cannot pause it, stop it, change the volume, or manage the playlist until the song finishes. This is what async enables.
Similar solutions include using events or callbacks. Events enable operations to be performed at command when the event is triggered. Callbacks are another way to pass an operation to another operation, with the expectation that the second will call the first when needed. Both are used in different scenarios, but without oversight, they can be hard to follow and lead to event spaghetti, callback hell, or other popular code disasters.
The async-await combo became popular because of this. It enables writing code that looks like normal synchronous code while adding asynchronous capabilities in a simple way.
It’s important to clear up two related concepts that people often mix up: concurrency and parallelism. Knowing the difference helps you get the most out of async programming.
Concurrency ≠ Parallelism: understanding the difference
Both seem to do multiple things at once, and they do; the difference is in how they do it.
To better understand it, we need to realize that these notions apply to the CPU. Meaning that the operating system has other system components that also do work, and when we are talking about concurrency, we mainly are thinking about what the CPU can do while the other components respond. We can group these system components and name them I/O-bound tasks, such as waiting for the disk to read a file’s contents, waiting for a network response, or waiting for results from a database. Even with a single-core CPU, we can better utilize it while we wait for I/O (Input/Output) responses and handle the next task.
The example would be having one chef making a pasta dish. First, a pot is put on the stove filled with water. Then some salt is added to the pot, and the heat is turned on. While we wait for the pasta water to boil, we can do the next task, like grate some cheese. When the water reaches a boil, we return and put the pasta in, then do another task while it cooks, and so on.
For parallelism, we need multicore processors, which have been a feature since 2005. Meaning that if you run something on a modern PC, it will most likely be on a multicore processor (CPU). Usually, a thread pool manages tasks and the available threads to perform the work.
If we were to go on the previous example. This would mean that multiple chefs are making a pasta dish. Chef-1 could take care of the pasta task, and Chef-2 could do the sauce for it.
If you observe closely, you can see that in this example, the chefs work in parallel but not concurrently. Each chef handles the entire task, start to finish, by themselves, making each task “synchronous”.
Mixing the two concepts, concurrency and parallelism, would mean having Chef-1 put the water to boil while, in parallel, Chef-2 cuts onions for the sauce. If Chef-1 finishes faster, it could continue chopping tomatoes for the sauce. Then, when Chef-2 finishes, it can continue the pasta task or finish it if Chef-1 is still busy.
This is what async-await actually does behind the scenes. It helps with doing tasks so that blocked parts can be resumed later. If the context allows another thread to pick up the task, it can happen, but it is not guaranteed. This is why concurrency does not always guarantee parallelism. All other threads can be busy with other things, so the same thread will be used when the I/O operation finishes.
In summary, embracing asynchronous programming with async-await in .NET is more than a modern trend. It’s a practical way to achieve real-world performance improvements, often with minimal code changes. By clarifying the concepts of concurrency and parallelism and understanding their impact, we can write applications that are not only faster but also more responsive and maintainable. Revisiting and updating legacy codebases can yield surprising benefits and remind us that sometimes, questioning “what just works” leads to breakthroughs that benefit both developers and users alike.
12 Dec 2025
The documentations on the Langchain site and also on the Microsoft site seems to be outdated with the introduction of the Azure Foundry (new) interface.
So for setting up a Langchain model the URLs are a bit different.
The process for configuring LangChain to work with Azure’s AI models has recently changed due to the introduction of the new Azure Foundry interface. If you’ve found that the documentation on the LangChain site or older Microsoft documentation is leading to errors, the core issue is likely an outdated endpoint URL structure.
🔗 Identifying the Correct Azure Endpoint
The key change is in the required format for the model endpoint.
| Status |
URL Type |
Old/New Format |
Example Structure |
| ❌ Outdated |
LangChain Docs |
Old |
https://{your-resource-name}.services.ai.azure.com/openai/v1 or https://{your-resource-name}.services.ai.azure.com/models |
| ❌ Incorrect |
Found in Azure Foundry (New) API call |
Incorrect for LangChain |
https://{your-resource-name}.cognitiveservices.azure.com/openai/deployments/{your-deployment-name}/chat/completions?api-version=2024-05-01-preview |
| ✅ Correct |
Required for LangChain |
New |
"https://{your-resource-name}.cognitiveservices.azure.com/openai/deployments/{your-deployment-name}/" |
The correct endpoint for use with the langchain-azure-ai package must end after the deployment name
💻 LangChain Python Usage Example
The following code demonstrates how to correctly set up the necessary environment variables and initialize an agent using the AzureAIChatCompletionsModel class with the new endpoint format.
Prerequisites
You will need the langchain and langchain-azure-ai libraries installed.
pip install langchain langchain-azure-ai
Python Code
from langchain_azure_ai.chat_models import AzureAIChatCompletionsModel
from langchain.agents import create_agent
import os
os.environ["AZURE_AI_CREDENTIAL"] = (
"THE KEY THAT AZURE FOUNDRY AI GIVES"
)
os.environ["AZURE_AI_ENDPOINT"] = (
"https://<<PROJECT NAME>>.cognitiveservices.azure.com/openai/deployments/<<DEPLOYMENT NAME>>/"
)
os.environ["DEPLOYMENT_NAME"] = (
"<<DEPLOYMENT NAME YOU GIVE TO THE MODEL>>"
)
def get_weather(city: str) -> str:
"""Get weather for a given city."""
return f"It's always sunny in {city}!"
llm = AzureAIChatCompletionsModel(
endpoint=os.environ["AZURE_AI_ENDPOINT"],
credential=os.environ["AZURE_AI_CREDENTIAL"],
model=os.environ["DEPLOYMENT_NAME"],
)
agent = create_agent(
model=llm,
tools=[get_weather],
system_prompt="You are a helpful assistant",
)
# Run the agent
result = agent.invoke(
{"messages": [{"role": "user", "content": "what is the weather in sf"}]}
)
print(result)
Sources
07 Oct 2025
Sometimes requests are denied due to many reasons (like 429 Too Many Request) and it is wise to just retry. The retry can be set on multiple levels, in code (with polly), in service level or in api gateway with a simple policy. In this post I will focus on setting the retry on the Azure Api Gateway.
The following policy retries backend calls up to three times if they fail with certain status codes, waiting 10 seconds between attempts.
<backend>
<retry condition="@(context.Response != null && new List<int>() { 403, 404, 500 }.Contains(context.Response.StatusCode))" count="3" interval="10">
<forward-request buffer-request-body="true" />
</retry>
</backend>
Breakdown:
- The
retry block sets the conditions on witch it retries and also how many times and the interval
- the
forward-request is the important part with the buffer-request-body set to true, as this ensures that when we retry the request, the same body will be sent again. (This was a head-scratcher until we figured it out that is needed)
Adding logging to the retry
To better understand when and why retries happen, you can log each attempt using trace and custom variables. Here’s a more complete example that tracks retry counts and logs the retried operation.
<policies>
<inbound>
<base />
<set-variable name="someVariableFromRequest" value="@(context.Variables.GetValueOrDefault<JObject>("requestBody")?["someVariableFromRequest"]?.ToString())" />
<set-variable name="retryCount" value="0" />
...
</inbound>
<backend>
<retry condition="@(context.Response != null && new List<int>() { 403, 404, 500 }.Contains(context.Response.StatusCode))" count="3" interval="10">
<choose>
<when condition="@(context.Response != null && new [] { 403, 404, 500 }.Contains(context.Response.StatusCode))">
<set-variable name="retryCount" value="@((Convert.ToInt32(context.Variables.GetValueOrDefault<string>("retryCount", "0")) + 1).ToString())" />
<trace source="RetryPolicy" severity="information">@( "Retrying {Operation Name} request with parameter " + context.Variables.GetValueOrDefault<string>("someVariableFromRequest") + ". Attempt " + context.Variables.GetValueOrDefault<string>("retryCount") )
</trace>
</when>
</choose>
<forward-request buffer-request-body="true" />
</retry>
</backend>
...
</policies>
This approach makes it easier to see retry attempts in APIM’s trace logs, including which operation retried and what parameter values were used.
Sources
23 Sep 2025
When building an agent with Azure AI Foundry, you’ll often need to look back at the conversation so far. Whether you’re debugging, showing it for reference, or implementing agent “memory” fetching thread message history is essential.
Install dependencies
You’ll need these packages:
- azure-ai-projects
- azure-ai-agents
- azure-identity
Install them with pip:
pip install azure-ai-projects azure-ai-agents azure-identity
Initialize the client
from azure.ai.projects.aio import AIProjectClient
from azure.identity.aio import DefaultAzureCredential
client = AIProjectClient(
endpoint="Endpoint of your Azure Ai Foundry project",
credential=DefaultAzureCredential()
)
Get thread ID
To fetch history, you need a thread ID. You can either persist it when creating threads in code or find it in the Azure AI Foundry portal:
List all messages in a thread
The simplest way is to iterate over all messages with the async list method:
agents_client = client.agents
msgs = agents_client.messages.list(thread_id=thread_id)
async for msg in msgs:
...
Limiting Results (API Calls)
The limit parameter is confusing. It does not cap the total number of messages returned—it only controls how many items are retrieved per API call. For example, limit=3 still fetches the entire history, just in smaller batches.
To truly process a limited number of messages, use paging and break early:
messages = agents_client.messages.list(thread_id=thread_id, limit=3)
for i, page in enumerate(messages.by_page()):
print(f"Items on page {i}")
for message in page:
print(message.id)
# break after first page if only X items are needed
this will produce something like this:
Items on page 0
msg_1
msg_2
msg_3
Items on page 1
msg_4
msg_5
msg_6
Items on page 2
msg_7
If you only want the first N messages, you can exit the loop after processing the desired count.
Sources