-
Notifications
You must be signed in to change notification settings - Fork 3.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Python: .NET: initial adr on list keys method #9932
base: main
Are you sure you want to change the base?
Changes from all commits
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,118 @@ | ||
--- | ||
# These are optional elements. Feel free to remove any of them. | ||
status: proposed | ||
contact: eavanvalkenburg | ||
date: 2024-12-10 | ||
deciders: sergeymenshykh, markwallace, rbarreto, dmytrostruk, westey-m, eavanvalkenburg, bentho | ||
consulted: | ||
informed: | ||
--- | ||
|
||
# Adding list keys method to VectorStoreRecordCollection interface | ||
|
||
The new [vector store design](./0050-updated-vector-store-design.md) has been released, but a few users noticed that a method to list all existing keys is missing. This decision proposes a new method to list all keys in the vector store. | ||
|
||
## Context and problem statement | ||
|
||
Since we support CRUD operations in our data stores we should also provide a way to keep a collection in sync between our code and the backend database. This is currently not easy since it would required either querying every key to see if it exists or maintaining a separate list of keys in the code. Only when you know which keys are in the database and not in your code can you decide to delete them. | ||
|
||
## Decision drivers | ||
1. We want to provide a way to list all keys in the vector store | ||
2. We want to provide this in a non-breaking way | ||
3. | ||
|
||
## Considered Options | ||
|
||
- Option #1: add a new method to the VectorStoreRecordCollection interface | ||
- Option #2: add a new method to a new separate interface | ||
- Option #3: alter the behavior of the Get/GetBatch methods to allow getting all records (key=None or similar) | ||
|
||
## Option 1: add a new method to the VectorStoreRecordCollection interface | ||
|
||
We can add a new method to the VectorStoreRecordCollection interface to list all keys in the vector store. This method will return a list of keys. | ||
|
||
```csharp | ||
public interface IVectorStoreRecordCollection<TKey, TRecord> : IVectorizedSearch<TRecord> | ||
{ | ||
Task<TRecord?> GetAsync(TKey key, GetRecordOptions? options = default, CancellationToken cancellationToken = default); | ||
IAsyncEnumerable<TRecord> GetBatchAsync(IEnumerable<TKey> keys, GetRecordOptions? options = default, CancellationToken cancellationToken = default); | ||
... | ||
Task<IEnumerable<TKey>> ListKeysAsync(CancellationToken cancellationToken = default); | ||
} | ||
``` | ||
Pros: | ||
- This is a simple and straightforward way to add the functionality | ||
Cons: | ||
- It is a breaking change (at least for csharp, python can do a default implementation that returns None) | ||
|
||
## Option 2: add a new method to a new separate interface | ||
|
||
We can add a separate interface to list all keys in the vector store. This method will return a list of keys. | ||
|
||
```csharp | ||
public interface IVectorStoreRecordCollectionListKeys<TKey, TRecord> : IVectorStoreRecordCollection<TKey, TRecord> | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Should it inherit |
||
{ | ||
Task<IEnumerable<TKey>> ListKeysAsync(CancellationToken cancellationToken = default); | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. As described in previous comment, I think there is no need in separate method dedicated to keys, since we can return an entire record, which could be more useful. On the other hand, we may return some extra data, which won't be used by consumer. So, we can implement it in a way that user will be able to decide which properties to return back by providing the list of them in |
||
} | ||
``` | ||
Pros: | ||
- This is a non-breaking change, each vector store record collection must choose to implement this interface. | ||
|
||
Cons: | ||
- This is a more complex way to add the functionality | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. If we decide to avoid this approach, I think we should add more details here why it's complex, so it will be easier to understand in the future why we decided to go with another option. |
||
|
||
## Option 3: alter the behavior of the Get/GetBatch methods to allow getting all records (key=None or similar) | ||
|
||
This options alters the behavior of the Get methods to allow getting all records, this can be done by passing a special key value (e.g. None) to the Get method. | ||
|
||
```csharp | ||
|
||
public interface IVectorStoreRecordCollection<TKey, TRecord> : IVectorizedSearch<TRecord> | ||
{ | ||
Task<TRecord?> GetAsync(TKey? key, GetRecordOptions? options = default, CancellationToken cancellationToken = default); | ||
IAsyncEnumerable<TRecord> GetBatchAsync(IEnumerable<TKey>? keys, GetRecordOptions? options = default, CancellationToken cancellationToken = default); | ||
} | ||
|
||
``` | ||
Pros: | ||
- This allows the other options to be used in concert with getting all records | ||
- This is a non-breaking change | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. It depends. For example, if someone already implemented our interface for their own connector and then we make the parameter nullable, it will raise warning |
||
Cons: | ||
- This returns the full records instead of just the keys, which can be useful in some cases, but can be overhead in others. | ||
- It is not immediately clear that passing a special key value will return all records, a dedicated method would be more clear. | ||
|
||
## Options 3b: alter the behavior of the Get/GetBatch methods to allow getting all keys | ||
|
||
This options alters the behavior of the Get methods to allow getting all keys, this can be done by passing a special key value (e.g. None) to the Get method, combined with a change in the GetRecordOptions, to specify Keys only. | ||
|
||
```csharp | ||
public interface IVectorStoreRecordCollection<TKey, TRecord> : IVectorizedSearch<TRecord> | ||
{ | ||
Task<TRecord?> GetAsync(TKey? key, GetRecordOptions? options = default, CancellationToken cancellationToken = default); | ||
IAsyncEnumerable<TRecord> GetBatchAsync(IEnumerable<TKey>? keys, GetRecordOptions? options = default, CancellationToken cancellationToken = default); | ||
} | ||
public class GetRecordOptions | ||
{ | ||
public bool IncludeVectors { get; init; } = false; | ||
public bool KeysOnly { get; init; } = false; | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. As described in one of the previous comments, accepting a collection of field/column names to return should be more extensible approach, since it will be possible to specify just a key property name as well as key together with other properties that are required in specific case. |
||
} | ||
|
||
``` | ||
Pros: | ||
- This allows the other options to be used in concert with getting all keys | ||
- This is a non-breaking change | ||
- This allows the user to choose if they want the full records or just the keys | ||
Cons: | ||
- This is a more complex way to add the functionality | ||
- It is not immediately clear that passing a special key value will return all records, a dedicated method would be more clear. | ||
|
||
## Naming of the method | ||
|
||
In the examples above the term ListKeys is used, but other terms could be used as well, such as GetKeys, Keys, etc. The term ListKeys is chosen because it is clear that this method will return a list of keys. | ||
|
||
## Decision | ||
Option 1: add a new method to the VectorStoreRecordCollection interface | ||
|
||
This is the cleanest approach, and for Python, the breaking change can be mitigated by providing a default implementation in the interface that returns an empty list. | ||
Comment on lines
+114
to
+116
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Before describing |
||
|
||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We can keep it as an option in this ADR, but just a comment why we shouldn't probably go with this approach - I'm not sure if dedicated method to get only keys is extensible, because another requirement may be to get keys together with specific column/property (e.g.
FirstName
).As an alternative, I would probably add something similar to
GetBatchAsync
method, but withoutkeys
parameter, whereGetRecordOptions
parameter will containFilter
property (similarly toVectorSearchOptions.Filter
). Then, if thisFilter
is null, all records should be returned (but we should think about pagination here), otherwise it will be possible to filter by key or any other property.Another point is that we should come up with some mechanism how to extend vector store functionality. Since it's interface at the moment, we don't want to add more methods to it, because it's going to be a breaking change.
Current interface contains methods to support main scenarios for vector stores, so maybe we can think about adding another interface for such cases that are present in this ADR or consider abstract class. Even if we decide to add it to current interface with breaking change, we should be careful about the size of this interface, to align with Interface Segregation Principle.