Considerably slow responses obtaining threads at launch #718

UTMark · 2023-12-13T14:17:52Z

Good morning everyone!

I would like to report a bug that could make the application feel unusable if it were deployed in a production environment. The issue is in two parts but overall what it amounts to is it can take a considerable amount of time to wait for the chat application to load due to the extreme slow responses for each chat thread as they are loaded in.

Example on how long chat messages take to load on first entry: (4-9s)

Now imagine if you had 10 threads or 100 threads:

Here are some details about the server:

Copilot Build Version: v0.9
SKU: PremiumV3 (P2v3)
CosmosDB: 10K RU /each

Index usage 24.4kb (monitor)
384MB Size
10K RU allocation per table, but a tiny fraction is being used right now
Memory: AzureCognitiveSearch

Cosmos Index Policy:
{ "indexingMode": "consistent", "automatic": true, "includedPaths": [ { "path": "/*" } ], "excludedPaths": [ { "path": "/\"_etag\"/?" } ] }

Thoughts:

I'm not a CosmosDB expert so I'm not sure why the little turtle is showing between 4s - 9s per request for each thread. I am more than willing to investigate with a bit of guidance on what to look into. If this were SQL, I would have imagined it would have been an indexing issue which is why I provided the indexing policy.

But in hindsight, I was wondering that if this is a bug and has a resolution to it, then why not obtain the threads in bulk and limit it to a certain maximum number and paginate as the user scrolls down the historical items? I will continue to look into this issue myself but some of these technologies are new to me. Thank you for your time!

The text was updated successfully, but these errors were encountered:

UTMark · 2023-12-15T00:17:43Z

I think what may be happening is that the cosmos database may need to have hierarchical partitions for the containers. For example:

chatmessages: id /chatId /userId
chatparticipants: id /userId /chatId

My reasoning behind this is because the slow database is about 400Mb in size, and the fast one is only 2mb. This probably means there is some indexing or odd querying going on. I tried just adding the new hierarchical partitions and set it in the config but it broke as it doesn't understand this. Meaning, the webapi might also need to be modified. I'm not sure how to proceed. I'm new to both C# and cosmos so I'm probably wrong. :)

glahaye · 2024-01-04T23:19:23Z

So I looked at this and saw a few things (and there might be more).

The frontend loads ALL the messages from ALL of the chats of the user at log in. This is really the heart of the problem.

The frontend first calls the "chats" API to get a list of all of the chats for the logged in user. Then, instead of getting the messages for only the selected chat, the frontend gets all the messages for all of the user's chats.

To alleviate this problem, the frontend could get the messages for only the selected chat and fetch the appropriate messages for other chats as they are selected. Just this change would make a great difference.

In addition to the above frontend idiosyncrasies, I discovered another issue with the "chats/[chatIdGuid]/messages" API. The API takes in an optional maximum count of messages to fetch for a given chat. However, behind the scenes, the server always fetches ALL the messages for a chat from Cosmos DB and then returns only up to the max number requested. As chats grow longer and longer, this could eventually lead to more data than necessary being fetched from Cosmos DB. Even now, we "limit" the number of messages to 100, which is probably more than required.

More concerning is the fact that the messages returned are not the latest ones, but rather the earliest ones! What this means is that once you reach 100 messages in a chat, if you leave the chat and come back to it, you'll get the first 100 messages then all later messages won't appear in the interface although they exist in Cosmos DB. This will mess up chat continuity!

For now, though, to address your problem, limiting the frontend to only fetch messages for the selected chat should alleviate the problem. If you are comfortable with React, you could do this yourself. Otherwise, I'll see when someone can address this internally.

UTMark · 2024-01-05T14:31:41Z

Thank you for the comprehensive audit! To clarify what probably needs to happen:

Initial app load should not load more than the first x threads from the most recent
The chat history (thread list) should paginate
When clicking on a thread, it should load the chat messages from the thread and start with the most recent, it should be limited to the amount of messages
While scrolling through the chat messages inside the thread, it should paginate appropriately
Search filter may need to search through the titles found in the database and cached results due to pagination limitations?

I am learning reactJS and C# but I'm mainly skilled in sql/php/javascript/css. The processes should be familiar but working with cosmos and C# will be the bottleneck. For instance, paginating is usually done by either sorting by ID or Time that is indexed in the database, but I think cosmos/nosql uses something called partitions to "join" information? I think today is a good day to start studying these things :) However, it's evident that something so important should be left to seasoned experts but we will look into it regardless and see what we can come up with.

PS. We learned about something called Azure cosmos dedicated gateway which might alleviate some of the burden for production use. However, it doesn't really resolve the existing issues but good to know it exists.

anoopt · 2024-02-03T13:09:24Z

On top of the above list, I think, the below information is related as well: The QueryEntitiesAsync method in CosmosDbContext.cs uses GetItemLinqQueryable to query the data synchronously. Changing that to get the data asynchronously using the ToFeedIterator() method should be helpful.

public async Task<IQueryable<T>> QueryEntitiesAsync(Expression<Func<T, bool>> predicate)
{
    using var feedIterator = this._container.GetItemLinqQueryable<T>().Where(predicate).ToFeedIterator();
    var results = new List<T>();
    while (feedIterator.HasMoreResults)
    {
        var response = await feedIterator.ReadNextAsync();
        results.AddRange(response);
    }
    return results.AsQueryable();
}

The return type is IQueryable (which is better for database operations). The methods which call QueryEntitiesAsync should be updated accordingly.

Also, the interval of 3000 in the BackendProbe component contributes to the initial load time. I do not know the reason behind setting it to 3000. Any input would be appreciated.

</2cents>

glahaye · 2024-03-20T21:24:11Z

This has not been addressed yet and is still being reported. Re-opening with an undefined timeline to address this.

glahaye · 2024-03-27T22:41:41Z

Partial fix: #902

…underlying DB level rather than higher (#902) ### Motivation and Context As described in #718, we load ALL the messages from ALL of the chats of the user in the frontend at log in. Also, no matter how many messages we want, we read them ALL at the DB level and then throw some away if the number read is too higher at the service level. ### Description Now, with this change, we actually respect the order, the number to skip and the number to take of chat messages at the underlying DB level rather than at higher service level. This enables us to save some DB activity and make our queries from the frontend eventually a lot tighter, This change was made in a simple manner which doesn't change the underlying architecture. ### Contribution Checklist - [ ] The code builds clean without any errors or warnings - [ ] The PR follows the [Contribution Guidelines](https://github.com/microsoft/chat-copilot/blob/main/CONTRIBUTING.md) and the [pre-submission formatting script](https://github.com/microsoft/chat-copilot/blob/main/CONTRIBUTING.md#development-scripts) raises no violations - [ ] All unit tests pass, and I have added new tests where possible - [ ] I didn't break anyone 😄

glahaye · 2024-03-28T00:00:40Z

#902 is now in.

With it, we stop fetching ALL entries and then throwing all those over the number of messages we truly want. Additionally, we now get the LATEST messages when requesting fewer than exist in an entire chat so that we can populate the "meta-prompt" with the latest data instead of suddenly having a hole in our chat where we only have old messages and newer ones are not returned.

The "proper" way of designing this endpoint would probably be to have it support OData syntax but given that the team working on Chat Copilot has moved on to other tasks, this will do the trick.

#902 was the backend portion of the fix. A change or two are coming for the front end. First, the chat list on the left will be populated without having to fetch all the messages for all the chats when starting the app. Then, chat messages will be loaded when selecting a specific chat.

Coming soon...

KishanJ98 · 2024-04-15T11:18:32Z

@glahaye Any updates on the front end part?

ctolkien · 2024-05-14T02:34:02Z

This section will also cause N queries to be fired to Cosmos for each ChatParticipant that a user has. 20 chats = 20 queries fired sequentially.

chat-copilot/webapi/Controllers/ChatHistoryController.cs

Line 148 in 7585bcc

public async Task<IActionResult> GetAllChatSessionsAsync()

…underlying DB level rather than higher (microsoft#902) ### Motivation and Context As described in microsoft#718, we load ALL the messages from ALL of the chats of the user in the frontend at log in. Also, no matter how many messages we want, we read them ALL at the DB level and then throw some away if the number read is too higher at the service level. ### Description Now, with this change, we actually respect the order, the number to skip and the number to take of chat messages at the underlying DB level rather than at higher service level. This enables us to save some DB activity and make our queries from the frontend eventually a lot tighter, This change was made in a simple manner which doesn't change the underlying architecture. ### Contribution Checklist - [ ] The code builds clean without any errors or warnings - [ ] The PR follows the [Contribution Guidelines](https://github.com/microsoft/chat-copilot/blob/main/CONTRIBUTING.md) and the [pre-submission formatting script](https://github.com/microsoft/chat-copilot/blob/main/CONTRIBUTING.md#development-scripts) raises no violations - [ ] All unit tests pass, and I have added new tests where possible - [ ] I didn't break anyone 😄

evchaki closed this as completed Mar 20, 2024

glahaye reopened this Mar 20, 2024

evchaki added memory Issues or questions related to memory triage scale Issues related to support for higher scale solutions labels Mar 20, 2024

alliscode assigned glahaye Mar 20, 2024

glahaye mentioned this issue Mar 27, 2024

Respect order, number to skip and number to take of chat messages at underlying DB level rather than higher #902

Merged

4 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Considerably slow responses obtaining threads at launch #718

Considerably slow responses obtaining threads at launch #718

UTMark commented Dec 13, 2023 •

edited

Loading

UTMark commented Dec 15, 2023 •

edited

Loading

glahaye commented Jan 4, 2024

UTMark commented Jan 5, 2024

anoopt commented Feb 3, 2024

glahaye commented Mar 20, 2024

glahaye commented Mar 27, 2024

glahaye commented Mar 28, 2024

KishanJ98 commented Apr 15, 2024

ctolkien commented May 14, 2024

Considerably slow responses obtaining threads at launch #718

Considerably slow responses obtaining threads at launch #718

Comments

UTMark commented Dec 13, 2023 • edited Loading

UTMark commented Dec 15, 2023 • edited Loading

glahaye commented Jan 4, 2024

UTMark commented Jan 5, 2024

anoopt commented Feb 3, 2024

glahaye commented Mar 20, 2024

glahaye commented Mar 27, 2024

glahaye commented Mar 28, 2024

KishanJ98 commented Apr 15, 2024

ctolkien commented May 14, 2024

UTMark commented Dec 13, 2023 •

edited

Loading

UTMark commented Dec 15, 2023 •

edited

Loading