Setup LangChain with Azure Foundry (new) model

The documentations on the Langchain site and also on the Microsoft site seems to be outdated with the introduction of the Azure Foundry (new) interface. So for setting up a Langchain model the URLs are a bit different.

The process for configuring LangChain to work with Azure’s AI models has recently changed due to the introduction of the new Azure Foundry interface. If you’ve found that the documentation on the LangChain site or older Microsoft documentation is leading to errors, the core issue is likely an outdated endpoint URL structure.

🔗 Identifying the Correct Azure Endpoint

The key change is in the required format for the model endpoint.

Status URL Type Old/New Format Example Structure
❌ Outdated LangChain Docs Old https://{your-resource-name}.services.ai.azure.com/openai/v1 or https://{your-resource-name}.services.ai.azure.com/models
❌ Incorrect Found in Azure Foundry (New) API call Incorrect for LangChain https://{your-resource-name}.cognitiveservices.azure.com/openai/deployments/{your-deployment-name}/chat/completions?api-version=2024-05-01-preview
Correct Required for LangChain New "https://{your-resource-name}.cognitiveservices.azure.com/openai/deployments/{your-deployment-name}/"

The correct endpoint for use with the langchain-azure-ai package must end after the deployment name

💻 LangChain Python Usage Example

The following code demonstrates how to correctly set up the necessary environment variables and initialize an agent using the AzureAIChatCompletionsModel class with the new endpoint format.

Prerequisites

You will need the langchain and langchain-azure-ai libraries installed.

pip install langchain langchain-azure-ai

Python Code

from langchain_azure_ai.chat_models import AzureAIChatCompletionsModel
from langchain.agents import create_agent
import os

os.environ["AZURE_AI_CREDENTIAL"] = (
    "THE KEY THAT AZURE FOUNDRY AI GIVES"
)
os.environ["AZURE_AI_ENDPOINT"] = (
    "https://<<PROJECT NAME>>.cognitiveservices.azure.com/openai/deployments/<<DEPLOYMENT NAME>>/"
)
os.environ["DEPLOYMENT_NAME"] = (
    "<<DEPLOYMENT NAME YOU GIVE TO THE MODEL>>"
)


def get_weather(city: str) -> str:
    """Get weather for a given city."""
    return f"It's always sunny in {city}!"


llm = AzureAIChatCompletionsModel(
    endpoint=os.environ["AZURE_AI_ENDPOINT"],
    credential=os.environ["AZURE_AI_CREDENTIAL"],
    model=os.environ["DEPLOYMENT_NAME"],
)

agent = create_agent(
    model=llm,
    tools=[get_weather],
    system_prompt="You are a helpful assistant",
)

# Run the agent
result = agent.invoke(
    {"messages": [{"role": "user", "content": "what is the weather in sf"}]}
)
print(result)

Sources

How to Set a Retry Policy in Azure API Management (APIM)

Sometimes requests are denied due to many reasons (like 429 Too Many Request) and it is wise to just retry. The retry can be set on multiple levels, in code (with polly), in service level or in api gateway with a simple policy. In this post I will focus on setting the retry on the Azure Api Gateway.

The following policy retries backend calls up to three times if they fail with certain status codes, waiting 10 seconds between attempts.

<backend>
  <retry condition="@(context.Response != null && new List<int>() { 403, 404, 500 }.Contains(context.Response.StatusCode))" count="3" interval="10">
    <forward-request buffer-request-body="true" />
  </retry>
</backend>

Breakdown:

  • The retry block sets the conditions on witch it retries and also how many times and the interval
  • the forward-request is the important part with the buffer-request-body set to true, as this ensures that when we retry the request, the same body will be sent again. (This was a head-scratcher until we figured it out that is needed)

Adding logging to the retry

To better understand when and why retries happen, you can log each attempt using trace and custom variables. Here’s a more complete example that tracks retry counts and logs the retried operation.

<policies>
    <inbound>
        <base />

        <set-variable name="someVariableFromRequest" value="@(context.Variables.GetValueOrDefault<JObject>("requestBody")?["someVariableFromRequest"]?.ToString())" />

        <set-variable name="retryCount" value="0" />
        ...
    </inbound>
    <backend>
        <retry condition="@(context.Response != null && new List<int>() { 403, 404, 500 }.Contains(context.Response.StatusCode))" count="3" interval="10">
            <choose>
                  <when condition="@(context.Response != null && new [] { 403, 404, 500 }.Contains(context.Response.StatusCode))">
                    <set-variable name="retryCount" value="@((Convert.ToInt32(context.Variables.GetValueOrDefault<string>("retryCount", "0")) + 1).ToString())" />
                    <trace source="RetryPolicy" severity="information">@( "Retrying {Operation Name} request with parameter " + context.Variables.GetValueOrDefault<string>("someVariableFromRequest") + ". Attempt " + context.Variables.GetValueOrDefault<string>("retryCount") )
                    </trace>
                </when>
            </choose>
          <forward-request buffer-request-body="true" />
        </retry>
    </backend>
    ...
</policies>

This approach makes it easier to see retry attempts in APIM’s trace logs, including which operation retried and what parameter values were used.

Sources

Getting Thread Message History in Azure AI Foundry with Python

When building an agent with Azure AI Foundry, you’ll often need to look back at the conversation so far. Whether you’re debugging, showing it for reference, or implementing agent “memory” fetching thread message history is essential.

Install dependencies

You’ll need these packages:

  • azure-ai-projects
  • azure-ai-agents
  • azure-identity

Install them with pip:

pip install azure-ai-projects azure-ai-agents azure-identity

Initialize the client

from azure.ai.projects.aio import AIProjectClient
from azure.identity.aio import DefaultAzureCredential

client = AIProjectClient(
              endpoint="Endpoint of your Azure Ai Foundry project",
              credential=DefaultAzureCredential()
          )

Get thread ID

To fetch history, you need a thread ID. You can either persist it when creating threads in code or find it in the Azure AI Foundry portal:

List all messages in a thread

The simplest way is to iterate over all messages with the async list method:

agents_client = client.agents
msgs = agents_client.messages.list(thread_id=thread_id)
async for msg in msgs:
  ...

Limiting Results (API Calls)

The limit parameter is confusing. It does not cap the total number of messages returned—it only controls how many items are retrieved per API call. For example, limit=3 still fetches the entire history, just in smaller batches.

To truly process a limited number of messages, use paging and break early:

messages = agents_client.messages.list(thread_id=thread_id, limit=3)

for i, page in enumerate(messages.by_page()):
    print(f"Items on page {i}")
    for message in page:
        print(message.id)
    # break after first page if only X items are needed

this will produce something like this:

Items on page 0
msg_1
msg_2
msg_3
Items on page 1
msg_4
msg_5
msg_6
Items on page 2
msg_7

If you only want the first N messages, you can exit the loop after processing the desired count.

Sources

Electric vs Gas Car Cost Calculator









Diagonal Comparison Table

Updates based on electricity and fuel prices you enter.

Fuel L/100km ↓
Electric kWh/100km →
10 15 20 25 30 35 40

Entity Framework Query Optimization

Entity Framework (EF) is a powerful Object-Relational Mapping (ORM) framework for .NET applications. It simplifies data manipulation, but without proper query optimization, EF can lead to suboptimal performance. This post explores two examples of EF query optimization in C# to enhance application efficiency and response times.

Making queries better

In the first example, the query is made with Any(), but EF interprets the query more complexly than it should and generates suboptimal SQL code for our case.

After checking the SQL value that shows what the EF will execute against the DB, we can experiment with different ways to write the query. Rewriting this example to Contains() generated a much simpler SQL for our case, hence better performance.

Takeaway: Take a look at the SQL the EF generates from the queries that you use in order to gain performance upgrades.

Best Practices for Query Optimization in Entity Framework

  1. Use Projections: Avoid retrieving entire entities if only specific fields are needed. Use .Select() to retrieve only the necessary data.
  2. Filter at the Database Level: Whenever possible, apply filters directly in the query rather than retrieving data and filtering in memory.
  3. Be Mindful of Lazy Loading: Lazy loading can cause performance issues due to multiple round trips to the database. Consider eager loading (using .Include()) when you know related data will be needed.
  4. Use AsNoTracking for Read-Only Data: If the data you’re retrieving won’t be updated in the current context, using .AsNoTracking() can improve performance because EF doesn’t need to track changes.
  5. Benchmark and Profile: Always measure performance changes after optimizations. Use profiling tools to identify slow queries and bottlenecks.