Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Poor node deduplication - how to have more control? #495

Open
jbartot opened this issue Dec 19, 2024 · 2 comments
Open

Poor node deduplication - how to have more control? #495

jbartot opened this issue Dec 19, 2024 · 2 comments

Comments

@jbartot
Copy link

jbartot commented Dec 19, 2024

I see issues with node deduplication. Ingesting transcripts of informal conversations, I am getting, for example, duplicate nodes for what is clearly the same person, e.g., "John Doe" and "John T. Doe". Is there a way to have more control over this, of even post-training, a capability to collapse these nodes into a single one?

@rabner
Copy link

rabner commented Dec 19, 2024

I've posted the same question on discord yesterday. I'm testing LightRAG with scientific papers which use abbreviated names very often. The same is for entities that are rephrased.

@jbartot
Copy link
Author

jbartot commented Dec 19, 2024

@rabner - I guess I could try to preprocess the raw text and normalize the names before training, but being able to have some control over the deduping seems like a basic capability that is still missing.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants