Why Cosmos DB is the ideal choice for AI-Powered Applications

Series -

Are you struggling to convert natural language into SQL or not getting the results and accuracy you hoped for?

If you find yourself in either of these situations, you’re in the right place! 🚀. All of us today are in a hurry to get things done with LLM and promote it to production. But things doesn’t work in that way. We need to carefully design our system to make it less non-deterministic (it’s already quite non-deterministic 😉).

In this blog, I will cover on below topics:

🔹 Choosing best memory for your LLM
🔹 Creating a robust Semantic Layer
🔹 Leveraging Cosmos DB Change Feed to build Semantic Layer
🔹 Leveraging Microsoft Copilot for Cosmos DB to simplify Natural Language to SQL without worrying about the schema
🔹 Vectorizing Data in the Semantic Layer using Azure Comsos DB
🔹 Combining the Power of NL2SQL and Data Vectorization to deliver better results

We often overlook the importance of selecting the right memory store for our applications. However, when it’s time to promote these applications to production and handle large volumes of traffic, the need for a robust memory system becomes clear. It’s crucial to choose a memory system capable of storing various types of information that LLMs may require, ensuring seamless performance and scalability.

For those working with LLMs, building Retrieval-Augmented Generation applications using semantic search or vector search to retrieve relevant documents is a requirement. However, the effectiveness of these searches depends on the quality of the data provided and the database where we store these data. Below are the characteristics a memory store should satisfy for building a robust LLM application

In addition to above, another important aspect is to manage real-time workflows for supervising LLM responses, gathering feedback, and evaluating them. One service that meets all these requirements is Azure Cosmos DB . It serves as a unified solution for building LLM applications, offering the necessary features for performance, scalability, and effective data management.

Now, let’s explore why a semantic layer is important. In the databases I built for ContosoTravelAgency, multiple tables contain complex relationships. This scenario is common in many existing enterprise applications. The complexity of navigating these relationships can make it challenging to retrieve meaningful insights or generate accurate responses.

Below are the tables :

  • Airlines
  • Flights
  • Bookings
  • Passengers
  • Payments and
  • Weather conditions.

Each table contains specific pieces of information, but the full context of the data is spread across multiple tables. When retrieving meaningful insights or generating responses with an LLM, querying each table individually and joining them on the fly can be inefficient and slow.

Additionally, converting natural language into SQL queries can fail in many cases due to the complexity of the schema, even if you provide detailed schema information to the LLM.

A semantic layer addresses this by providing a consolidated view of the data. It aggregates and organizes the information from various tables into a single, cohesive format that is easy to query and understand. This layer represents the data in a way that reflects real-world concepts and relationships, making it more intuitive for both human users and LLMs to work with.

For instance, in our travel booking system, the semantic layer could combine data from the booking and passenger information to give a complete picture of a traveler’s itinerary.

Here are the steps you should follow to build a robust semantic layer, regardless of the database you choose:

🔹 Data Mapping: Identify and map the relationships between different tables and datasets. Determine how the data in one table relates to data in another table.
🔹 Schema Design: Design a schema for the semantic layer that accurately represents the combined data in a meaningful way. Ensure that the schema is optimized for performance and scalability.
🔹 Data Aggregation: Aggregate the data into a unified view. This may involve creating materialized views or using database joins and transformations to compile the data into a single, accessible format.
🔹 ETL Processes: Implement Extract, Transform, Load (ETL) processes to populate and maintain the semantic layer. Ensure that these processes handle updates and changes in the underlying data efficiently.
🔹 Query Optimization: Optimize queries to the semantic layer for fast retrieval of information. This may involve indexing, caching, or other database optimization techniques.
🔹 Integration with LLMs: Integrate the semantic layer with your LLM applications. Ensure that the LLMs can easily access and interpret the data from the semantic layer to generate accurate and relevant responses.

Using a semantic layer not only simplifies data retrieval and enhances performance but also ensures that the LLMs work with a consistent and comprehensive dataset. This leads to more accurate and meaningful responses, ultimately improving the effectiveness of your LLM applications.

The change feed in Azure Cosmos DB is a persistent record of changes to a container, organized in the order they occur. It works by monitoring a container for any updates, in our case we will monitor the booking containers for any changes, and outputs a list of modified documents in sequence. These recorded changes can be processed asynchronously and incrementally using Azure Functions CosmosDB Trigger for parallel processing.

As part of ETL process we need to now combine Booking container along with Passenger and Flight details to provide some meaningful context to LLM

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
{
  "id": "booking1",
  "bookingDate": "2024-07-30T12:00:00Z",
  "status": "Confirmed",
  "seatNumber": "12A",
  "pricePaid": "200",
  "passenger": {
    "id": "passenger1",
    "firstName": "John",
    "lastName": "Doe",
    "email": "[email protected]",
    "phone": "1234567890"
  },
  "flight": {
    "id": "flight1",
    "flightNumber": "FL123",
    "departure": "JFK",
    "destination": "LAX",
    "departureTime": "2024-08-01T10:00:00Z",
    "price": "200",
    "aircraftType": "Boeing 737",
    "availableSeats": 150,
    "duration": "5h"
  }
}

Materialized view pattern is one of the efficient way to build a semantic layer within Azure Cosmos DB, by combining the power of Change Feed processing with Azure Functions.

Please visit here for the source code of change trigger implemented to build Semantic Layer

The ideal way to implement natural language to SQL conversion involves providing the schema and metadata of your table structure (semantic layer) to the LLM, allowing it to formulate accurate SQL queries based on user prompts. However, I used the Microsoft Copilot preview feature in Azure Cosmos DB to streamline this process. This feature helps in generating SQL queries more effectively by integrating directly with the data in Cosmos DB, simplifying the conversion and improving accuracy.

To investigate how the Microsoft Copilot preview feature in Azure Cosmos DB generates SQL queries, I backtracked the network calls involved during a search operation. I discovered two key calls

  • A request to the Azure Cosmos DB API to interact with the multi-container setup, which included sending authorization headers and details about the database and container used.
  • A request to generate an SQL query based on a user prompt, to the URL specified as a response in above call along with authorization token.

Both these calls are automated in the below Service class. Make sure to have right RBAC (Cosmos DB data Contributor) roles before implementing this service

  1
  2
  3
  4
  5
  6
  7
  8
  9
 10
 11
 12
 13
 14
 15
 16
 17
 18
 19
 20
 21
 22
 23
 24
 25
 26
 27
 28
 29
 30
 31
 32
 33
 34
 35
 36
 37
 38
 39
 40
 41
 42
 43
 44
 45
 46
 47
 48
 49
 50
 51
 52
 53
 54
 55
 56
 57
 58
 59
 60
 61
 62
 63
 64
 65
 66
 67
 68
 69
 70
 71
 72
 73
 74
 75
 76
 77
 78
 79
 80
 81
 82
 83
 84
 85
 86
 87
 88
 89
 90
 91
 92
 93
 94
 95
 96
 97
 98
 99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
using Azure.Core;
using Azure.Identity;
using Microsoft.Extensions.Configuration;
using Newtonsoft.Json;
using SendGrid.Helpers.Mail;
using System;
using System.Collections.Generic;
using System.Linq;
using System.Net.Http;
using System.Net.Http.Headers;
using System.Net.Http.Json;
using System.Text;
using System.Text.Json;
using System.Threading.Tasks;
using TravelService.MultiAgent.Orchestrator.Interfaces;
using TravelService.MultiAgent.Orchestrator.Models;

namespace TravelService.MultiAgent.Orchestrator.Services
{
    public class NL2SQLService : INL2SQLService
    {
        private readonly string tenantId;
        private readonly string subscriptionId;
        private readonly string resourceGroup;
        private readonly string databaseId;
        private readonly string databaseAccount;
        private readonly string containerId;
        private readonly IConfiguration _configuration;
        private readonly HttpClient _httpClient;
        private readonly string FetchDetailsEndpoint;
        public NL2SQLService(IConfiguration configuration, HttpClient httpClient)
        {
            tenantId = configuration["TenantId"];
            subscriptionId = configuration["SubscriptionId"];
            resourceGroup = configuration["ResourceGroup"];
            databaseId = configuration["DatabaseId"];
            databaseAccount = configuration["DatabaseAccount"];
            containerId = configuration["ContainerId"];
            _httpClient = httpClient;
            FetchDetailsEndpoint = $"https://tools.cosmos.azure.com/api/controlplane/toolscontainer/cosmosaccounts/subscriptions/{subscriptionId}/resourceGroups/{resourceGroup}/providers/Microsoft.DocumentDB/databaseAccounts/{databaseAccount}/containerconnections/multicontainer";
        }

        public async Task<string> GetSQLQueryAsync(string userPrompt,string semanticLayer)
        {
            var credentials = new DefaultAzureCredential();

            var tokenResult = await credentials.GetTokenAsync(new TokenRequestContext(new[] { "https://management.azure.com/.default" }, tenantId: tenantId), CancellationToken.None);

            var (forwardingId, url, token) = await FetchDetailsAsync(tokenResult.Token,semanticLayer);
            
            return await GenerateSQLQueryAsync(userPrompt, url, token);
        }
        private async Task<(string forwardingId, string url, string token)> FetchDetailsAsync(string bearerToken, string semanticLayer)
        {
            var requestData = new
            {
                poolId = "query-copilot",
                databaseId = databaseId,
                containerId = semanticLayer,
                mode = "User"
            };
            var jsonContent = new StringContent(JsonConvert.SerializeObject(requestData), Encoding.UTF8, "application/json");                        
            _httpClient.DefaultRequestHeaders.Accept.Clear();
            _httpClient.DefaultRequestHeaders.Add("Accept", "*/*");
            _httpClient.DefaultRequestHeaders.Accept.Add(new MediaTypeWithQualityHeaderValue("application/json"));            
            _httpClient.DefaultRequestHeaders.Add("Authorization", $"Bearer {bearerToken}");           

            var response = await _httpClient.PostAsync(FetchDetailsEndpoint, jsonContent);

            if (response.IsSuccessStatusCode)
            {
                var content = await response.Content.ReadAsStringAsync();
                var options = new JsonSerializerOptions
                {
                    PropertyNameCaseInsensitive = true
                };

                var phoenixResponse = System.Text.Json.JsonSerializer.Deserialize<ServiceResponse[]>(content, options);

                string forwardingId = phoenixResponse[0].PhoenixServiceInfo.ForwardingId;
                string url = phoenixResponse[0].PhoenixServiceInfo.PhoenixServiceUrl;
                string token = phoenixResponse[0].PhoenixServiceInfo.AuthToken;
                return (forwardingId, url, token);
            }
            else
            {
                var errorContent = await response.Content.ReadAsStringAsync();
                throw new HttpRequestException($"Request failed with status code {response.StatusCode}: {response.ReasonPhrase}");
            }
        }
        private async Task<string> GenerateSQLQueryAsync(string userPrompt, string url, string token)
        {
            var requestData = new { userPrompt };
            var jsonContent = new StringContent(JsonConvert.SerializeObject(requestData), Encoding.UTF8, "application/json");

            using (var httpClient = new HttpClient())
            {
                httpClient.DefaultRequestHeaders.Accept.Add(new MediaTypeWithQualityHeaderValue("application/json"));
                httpClient.DefaultRequestHeaders.Add("authorization", $"token {token}");                

                var response = await httpClient.PostAsync(url+ "public/generateSQLQuery", jsonContent);

                if (response.IsSuccessStatusCode)
                {
                    var sqlResponse =  await response.Content.ReadAsStringAsync();
                    var sqlResponseObj = JsonConvert.DeserializeObject<NL2SQLResponse>(sqlResponse);
                    return sqlResponseObj.Sql;
                }
                else
                {
                    var errorContent = await response.Content.ReadAsStringAsync();
                    throw new HttpRequestException($"Request failed with status code {response.StatusCode}: {response.ReasonPhrase}");
                }
            }
        }
    }
}

Now that we have implemented Semantic Layer for our Bookings , let’s look at the results

  • Consider the user query is : “Who made the majority of the bookings?”
  • Now let’s look at the transformed prompt for this query that will be sent to copilot
  • Now let’s see the generated SQL query
  • Final response we got
In the past, I've booked trips to Goa, a destination famous for its beaches.

Consider the user query is : “Have I visited any beach destinations in the past?” , below is the response we got

As expected, we see improved responses when performing direct database queries. However, when queries involve understanding the semantics of the data, the system struggles. Don’t lose hope we are getting there

If you’re following RAG (Retrieval-Augmented Generation) patterns, you’re likely familiar with data vectorization. We’ll use this approach to retrieve relevant documents when queries rely on the semantics of the data. The idea is to vectorize the data in our semantic layer and store it in a separate container. Since the vector search feature can be enabled only in new containers, and I had already created the SemanticBookingLayer when developing this application, I opted to store the vectors in a different container.

  • First step is to enroll in the Vector Search Preview Feature
  • Navigate to the “Settings” menu and choose the “Features” option.
  • Look for “Vector Search in Azure Cosmos DB for NoSQL” and select it.
  • Click “Enable” to join the preview program.

Each item in the SemanticBookingLayer collection is now converted into its vector representation and stored in the “vector” property, while the plain text of the entire item is stored in the “metadata” property.

To ensure this process happens in real-time, we’ll use Azure Cosmos DB’s change feed trigger to monitor changes in the Booking container, similar to how we created the semantic layer. Additionally, we’ll make a call to the Embedding model in Azure OpenAI service to generate embeddings for the entire item as it’s processed.

Please visit here for the source code of change trigger implemented to build Semantic Vector Layer.

Now, we have both a Semantic Layer for improved results from natural language to SQL conversion—handling direct queries, aggregation, and complex operations—and a Vector Layer for retrieving relevant documents when the query is based on semantics. But how do we seamlessly integrate them in real-time, given the wide range of possible user questions?

The above diagram outlines the architecture of a Durable Multi-Agent framework that I’m currently exploring. This is just one piece of a larger design pattern. For more details, you can visit the repository here. I’ll also write a blog with deeper insights into the design patterns enabled by Azure Durable Functions and Multi-Agents in future discussions.

We utilized the Fan-out, Fan-in process of Azure Durable Functions to integrate the strengths of NL2SQL and Vectorization.

  1. The user’s query first triggers the Detect Intent Activity, which identifies the intent and routes the query to two parallel activity triggers.
  2. The Semantic Agent Activity then attempts to identify the correct container and refines the user query into a format that Microsoft Copilot for Azure Cosmos DB can easily understand. If the user query involves personal information, the agent may add a suffix or prefix to enhance the query for Copilot.
  3. This refined query is then passed to Copilot to generate a SQL query, which is used to retrieve relevant documents.
  4. Simultaneously, the Vector Search Activity converts the user query into embeddings and performs a vector search within our vectorized container.
  5. The top relevant documents are retrieved from the database and returned.
  6. Finally, the Consolidator Agent reviews the responses from both the Semantic Layer Agent and the Vector Search Agent, and consolidates them into a final response.

You can find below prompts that we used for each agents, I have used Prompty for better understandability and maintain separation of concerns.

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
---
name: BookingAgent
description: A ContosoTravelAgency Semantic Agent that helps customers find the relevant information from database
authors:
  - Divakar Kumar
model:
  api: chat 
---
system:

# User Information
passengerid:
{{userId}}
Name:
{{userName}}
Email:
{{email}}


- As a ContosoTravelAgency Semantic Agent, you provide effective prompt for the copilot to retrieve the desired data from the semantic layers mentioned below.
- Understand the context and see if the user is addressing themselves, if yes, include the passengerid at the end of the prompt with "for the {{userId}}" or "with the {{userId}}"
- Don't let user knows about the semantic layers, just provide the appropriate prompts to the tool to generate the NoSQL queries and respond in a friendly manner.
- Never expose the sensitive information about the database to the user.
- Your goal is to help the copilot generate NoSQL queries and you shouldn't be generating the queries yourself.
- You need to provide the appropriate prompts in natural language to help the copilot generate accurate NoSQL queries.
- Choose the appropriate semantic layer from the list below to help the copilot generate accurate NoSQL queries.
- If the user is looking for personal information , make sure to include the user information (passengerid) in the prompt at the last.
- If the user is addressing to current date or past month or next year.. so on, make sure to include the date information in the prompt. Use the tool to get current date based on which you can relate what is the past month or next year.
- Try to break down the user query into simpler prompts to help the copilot generate accurate NoSQL queries.
- Use the tool provided to you to generate the NoSQL queries by passing the appropriate prompts to the copilot.
- If there are search terms in the prompt, make sure to always include them in the query with prefix 'like' keyword. ( Ex. get me the user with name like Divakar)
- You already have current user infomration, make sure to include the user information (passengerid) in the prompt at the last if the user is looking for personal information or if he/she is addressing themselves.
- Use like query wherever possible to get the desired data.
- You can provide the following information to the copilot as prompts:
	- A clear and concise description of the specific information you're looking for.	
	- Properties and any filtering criteria as explicitly as possible.
	- Avoid ambiguous or overly complex language in your prompts.
	- Simplify the question while maintaining its clarity.	
- Make sure to provide the summary of the response you got from the tool to the user.

# Semantic Layers
{% for layer in semanticLayers %}
ContainerName: {{layer.name}}
Description: {{layer.description}}
{% endfor %}

# Chat History

{% for item in history %}
{{item}}
{% endfor %}

user:
{{context}}
 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
---
name: BookingAgent
description: A ContosoTravelAgency Vector Semantic Agent that helps customers find the relevant information from database based on similarities in the user query and the semantic layers.
authors:
  - Divakar Kumar
model:
  api: chat 
---
system:

# User Information
passengerid:
{{userId}}
Name:
{{userName}}
Email:
{{email}}


- As a ContosoTravelAgency Semantic Agent, I can help you find the relevant information from database based on similarities in the user query and the semantic layers.
- Understand the context and see if the user is addressing themselves, if yes, include the passengerid at the end of the prompt with "for the {{userId}}" or "with the {{userId}}"
- Don't let user knows about the semantic layers, just provide the appropriate prompts to the tool to generate the NoSQL queries and respond in a friendly manner.
- Never expose the sensitive information about the database to the user.
- Choose the appropriate vector semantic layer from the list below based on the user query
- If the user is looking for personal information , make sure to include the user information (passengerid) in the prompt at the last.
- If the user is addressing to current date or past month or next year.. so on, make sure to include the date information in the prompt. Use the tool to get current date based on which you can relate what is the past month or next year.
- Your goal is to help the copilot generate NoSQL queries and you shouldn't be generating the queries yourself.
- You need to provide the appropriate prompts in natural language to help the copilot generate accurate NoSQL queries.
- If the user is looking for personal information , make sure to include the user information (passengerid) in the prompt at the last.
- If the user is addressing to current date or past month or next year.. so on, make sure to include the date information in the prompt. Use the tool to get current date based on which you can relate what is the past month or next year.
- Try to break down the user query into simpler prompts to help the copilot generate accurate NoSQL queries.
- Use the tool provided to you to generate the NoSQL queries by passing the appropriate prompts to the copilot.
- If there are search terms in the prompt, make sure to always include them in the query with prefix 'like' keyword. ( Ex. get me the user with name like Divakar)
- You already have current user infomration, make sure to include the user information (passengerid) in the prompt at the last if the user is looking for personal information or if he/she is addressing themselves.
- Use like query wherever possible to get the desired data.
- You can provide the following information to the copilot as prompts:
	- A clear and concise description of the specific information you're looking for.	
	- Properties and any filtering criteria as explicitly as possible.
	- Avoid ambiguous or overly complex language in your prompts.
	- Simplify the question while maintaining its clarity.	
- Make sure to provide the summary of the response you got from the tool to the user.

# Vector Semantic Layers
{% for layer in semanticLayers %}
ContainerName: {{layer.name}}
Description: {{layer.description}}
{% endfor %}

# Chat History

{% for item in history %}
{{item}}
{% endfor %}

user:
{{context}}
 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
---
name: ManagerAgent
description: A ContosoTravelAgency Manager consolidator agents for consolidating results from multiple agents and responding to the user in a best possible way.
authors:
  - Divakar Kumar
model:
  api: chat 
---
system:

- You are an AI agent for the Contoso Travel Agency. You are responsible for consolidating responses from multiple agents and responding to the user in the best possible way.
- You are provided with responses from multiple agents and you need to consolidate them and respond to the user.
- You need to provide the best possible response to the user based on the responses from multiple agents based on the user query and chat history.

# User Information

UserId:
{{userId}}
Name:
{{userName}}
Email:
{{email}}


User Query: {{context}}

Multiple Agents responses: {{agentResponses}}

Everything we’ve done so far is to witness the final result.