-
Notifications
You must be signed in to change notification settings - Fork 705
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Considerably slow responses obtaining threads at launch #718
Comments
I think what may be happening is that the cosmos database may need to have hierarchical partitions for the containers. For example: chatmessages: id /chatId /userId My reasoning behind this is because the slow database is about 400Mb in size, and the fast one is only 2mb. This probably means there is some indexing or odd querying going on. I tried just adding the new hierarchical partitions and set it in the config but it broke as it doesn't understand this. Meaning, the webapi might also need to be modified. I'm not sure how to proceed. I'm new to both C# and cosmos so I'm probably wrong. :) |
So I looked at this and saw a few things (and there might be more). The frontend loads ALL the messages from ALL of the chats of the user at log in. This is really the heart of the problem. The frontend first calls the "chats" API to get a list of all of the chats for the logged in user. Then, instead of getting the messages for only the selected chat, the frontend gets all the messages for all of the user's chats. To alleviate this problem, the frontend could get the messages for only the selected chat and fetch the appropriate messages for other chats as they are selected. Just this change would make a great difference. In addition to the above frontend idiosyncrasies, I discovered another issue with the "chats/[chatIdGuid]/messages" API. The API takes in an optional maximum count of messages to fetch for a given chat. However, behind the scenes, the server always fetches ALL the messages for a chat from Cosmos DB and then returns only up to the max number requested. As chats grow longer and longer, this could eventually lead to more data than necessary being fetched from Cosmos DB. Even now, we "limit" the number of messages to 100, which is probably more than required. More concerning is the fact that the messages returned are not the latest ones, but rather the earliest ones! What this means is that once you reach 100 messages in a chat, if you leave the chat and come back to it, you'll get the first 100 messages then all later messages won't appear in the interface although they exist in Cosmos DB. This will mess up chat continuity! For now, though, to address your problem, limiting the frontend to only fetch messages for the selected chat should alleviate the problem. If you are comfortable with React, you could do this yourself. Otherwise, I'll see when someone can address this internally. |
Thank you for the comprehensive audit! To clarify what probably needs to happen:
I am learning reactJS and C# but I'm mainly skilled in sql/php/javascript/css. The processes should be familiar but working with cosmos and C# will be the bottleneck. For instance, paginating is usually done by either sorting by ID or Time that is indexed in the database, but I think cosmos/nosql uses something called partitions to "join" information? I think today is a good day to start studying these things :) However, it's evident that something so important should be left to seasoned experts but we will look into it regardless and see what we can come up with. PS. We learned about something called Azure cosmos dedicated gateway which might alleviate some of the burden for production use. However, it doesn't really resolve the existing issues but good to know it exists. |
On top of the above list, I think, the below information is related as well: The QueryEntitiesAsync method in CosmosDbContext.cs uses
The return type is Also, the interval of </2cents> |
This has not been addressed yet and is still being reported. Re-opening with an undefined timeline to address this. |
Partial fix: #902 |
…underlying DB level rather than higher (#902) ### Motivation and Context As described in #718, we load ALL the messages from ALL of the chats of the user in the frontend at log in. Also, no matter how many messages we want, we read them ALL at the DB level and then throw some away if the number read is too higher at the service level. ### Description Now, with this change, we actually respect the order, the number to skip and the number to take of chat messages at the underlying DB level rather than at higher service level. This enables us to save some DB activity and make our queries from the frontend eventually a lot tighter, This change was made in a simple manner which doesn't change the underlying architecture. ### Contribution Checklist - [ ] The code builds clean without any errors or warnings - [ ] The PR follows the [Contribution Guidelines](https://github.com/microsoft/chat-copilot/blob/main/CONTRIBUTING.md) and the [pre-submission formatting script](https://github.com/microsoft/chat-copilot/blob/main/CONTRIBUTING.md#development-scripts) raises no violations - [ ] All unit tests pass, and I have added new tests where possible - [ ] I didn't break anyone 😄
#902 is now in. With it, we stop fetching ALL entries and then throwing all those over the number of messages we truly want. Additionally, we now get the LATEST messages when requesting fewer than exist in an entire chat so that we can populate the "meta-prompt" with the latest data instead of suddenly having a hole in our chat where we only have old messages and newer ones are not returned. The "proper" way of designing this endpoint would probably be to have it support OData syntax but given that the team working on Chat Copilot has moved on to other tasks, this will do the trick. #902 was the backend portion of the fix. A change or two are coming for the front end. First, the chat list on the left will be populated without having to fetch all the messages for all the chats when starting the app. Then, chat messages will be loaded when selecting a specific chat. Coming soon... |
@glahaye Any updates on the front end part? |
This section will also cause N queries to be fired to Cosmos for each ChatParticipant that a user has. 20 chats = 20 queries fired sequentially.
|
…underlying DB level rather than higher (microsoft#902) ### Motivation and Context As described in microsoft#718, we load ALL the messages from ALL of the chats of the user in the frontend at log in. Also, no matter how many messages we want, we read them ALL at the DB level and then throw some away if the number read is too higher at the service level. ### Description Now, with this change, we actually respect the order, the number to skip and the number to take of chat messages at the underlying DB level rather than at higher service level. This enables us to save some DB activity and make our queries from the frontend eventually a lot tighter, This change was made in a simple manner which doesn't change the underlying architecture. ### Contribution Checklist - [ ] The code builds clean without any errors or warnings - [ ] The PR follows the [Contribution Guidelines](https://github.com/microsoft/chat-copilot/blob/main/CONTRIBUTING.md) and the [pre-submission formatting script](https://github.com/microsoft/chat-copilot/blob/main/CONTRIBUTING.md#development-scripts) raises no violations - [ ] All unit tests pass, and I have added new tests where possible - [ ] I didn't break anyone 😄
Good morning everyone!
I would like to report a bug that could make the application feel unusable if it were deployed in a production environment. The issue is in two parts but overall what it amounts to is it can take a considerable amount of time to wait for the chat application to load due to the extreme slow responses for each chat thread as they are loaded in.
Example on how long chat messages take to load on first entry: (4-9s)
Now imagine if you had 10 threads or 100 threads:
Here are some details about the server:
Copilot Build Version: v0.9
SKU: PremiumV3 (P2v3)
CosmosDB: 10K RU /each
Memory: AzureCognitiveSearch
Cosmos Index Policy:
{ "indexingMode": "consistent", "automatic": true, "includedPaths": [ { "path": "/*" } ], "excludedPaths": [ { "path": "/\"_etag\"/?" } ] }
Thoughts:
I'm not a CosmosDB expert so I'm not sure why the little turtle is showing between 4s - 9s per request for each thread. I am more than willing to investigate with a bit of guidance on what to look into. If this were SQL, I would have imagined it would have been an indexing issue which is why I provided the indexing policy.
But in hindsight, I was wondering that if this is a bug and has a resolution to it, then why not obtain the threads in bulk and limit it to a certain maximum number and paginate as the user scrolls down the historical items? I will continue to look into this issue myself but some of these technologies are new to me. Thank you for your time!
The text was updated successfully, but these errors were encountered: