Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Update _bertopic.py to fix question/ github issue #1696 #1721

Open
wants to merge 1 commit into
base: master
Choose a base branch
from

Conversation

jonaslandsgesell
Copy link

@jonaslandsgesell jonaslandsgesell commented Jan 3, 2024

As discussed in #1696, I provide an updated doc string to reflect that topic_model.transform(docs)[0][i] is sometimes different from topic_model.transform(docs[i])[0][0]

@MaartenGr
Copy link
Owner

Thanks for this PR! Could you rephrase the following a bit:

(especially when using the HDBSCAN algorithm)

This makes it seems that this behavior is across many different algorithms when in reality this is HDBSCAN-specific behavior.

@jonaslandsgesell
Copy link
Author

jonaslandsgesell commented Feb 8, 2024

Sure! Do you have a suggestion for a specific wording?

I am currently lacking the fantasy for other ways to express the fact that HDBSCAN is responsible here while we could also have a pipeline without HDBSCAN (but another component which may or may not behave similarly)

@MaartenGr
Copy link
Owner

Sure! Do you have a suggestion for a specific wording?

I am currently lacking the fantasy for other ways to express the fact that HDBSCAN is responsible here while we could also have a pipeline without HDBSCAN (but another component which may or may not behave similarly)

You could do something like this: "A single document or a list of documents to predict the topic(s) for. NOTE: When using
HDBSCAN, the prediction might differ depending on whether a single document or a list of documents is passed
since it leverages the data points of other documents"
.

I think it's best to stay close to the original documentation and inner workings of HDBSCAN. I believe this and this resource are relevant from the top of my head.

Also, a small tip. ChatGPT works wonders for helping with these kinds of issues ;)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants