Poor node deduplication - how to have more control? #495

jbartot · 2024-12-19T16:55:32Z

I see issues with node deduplication. Ingesting transcripts of informal conversations, I am getting, for example, duplicate nodes for what is clearly the same person, e.g., "John Doe" and "John T. Doe". Is there a way to have more control over this, of even post-training, a capability to collapse these nodes into a single one?

rabner · 2024-12-19T19:24:36Z

I've posted the same question on discord yesterday. I'm testing LightRAG with scientific papers which use abbreviated names very often. The same is for entities that are rephrased.

jbartot · 2024-12-19T23:21:43Z

@rabner - I guess I could try to preprocess the raw text and normalize the names before training, but being able to have some control over the deduping seems like a basic capability that is still missing.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Poor node deduplication - how to have more control? #495

Poor node deduplication - how to have more control? #495

jbartot commented Dec 19, 2024

rabner commented Dec 19, 2024

jbartot commented Dec 19, 2024 •

edited

Loading

Poor node deduplication - how to have more control? #495

Poor node deduplication - how to have more control? #495

Comments

jbartot commented Dec 19, 2024

rabner commented Dec 19, 2024

jbartot commented Dec 19, 2024 • edited Loading

jbartot commented Dec 19, 2024 •

edited

Loading