-
Notifications
You must be signed in to change notification settings - Fork 774
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
BERTopic Loading Issue #1764
Comments
Could you also share your imports and clean up the formatting a bit? As it is, it is difficult to read when you are not using the ```python tags. Also, which version of BERTopic are you using? |
Hi Marteen ! Sorry I was in a hurry. That's why I had to type so fast. BERTopic Version: 0.16.0 My whole code with the imports is below,
|
I believe this might be related to cuML. Have you checked your error message on the open/closed issues? I remember there being one or more about this which state that simply upgrading your cuML version might fix things. |
Hi Maarten! Thank you for your input. The problem can be solved adding: probabilities = probabilities[1] at the beginning of this part of the script. But it's hard to fix the source code all the time. We would greatly appreciate if you could change the source code.
|
Thanks for the suggestion but like I said, I believe this is fixed with a newer version of cuML. What you suggest might actually break most other applications since generally this |
I'm still facing the same issue with cuml 23.10.0 , 23.08.0 and BERTopic 0.16.0. |
It seems this is related to #1317 which indeed mentions that even though the newest versions of cuML should fix the issue, users are still experiencing this issue. It might be worthwhile to also open up an issue on the cuML repo. Other than that, I can imagine two fixes, either you can try the fix in #1324 but I'm not sure whether that fixes all of the issues or you use the following snippet instead before running from bertopic.cluster import BaseCluster
topic_model.hdbscan_model = BaseCluster() This way, the inference is done using purely embeddings and no dimensionality reduction or clustering algorithms. It should also speed inference up quite a bit. |
Hi I used dimendsion reduction tehcniques and saved the model.
I can load it, but it doesnt predict topics for a new dataset,
umap_model = UMAP(n_neighbors=15,
n_components=5,
min_dist=0.0,
metric='cosine')
model_save_path = os.path.join(repo_path, location, f'{location}_model')
topic_model.save(model_save_path)
print(f"Model saved to {model_save_path}")
model_path = "/content/drive/MyDrive/istanbul-crm-topic-modeling/istanbul/istanbul/istanbul_model"
model = BERTopic.load(model_path)
data_to_predict_path = "/content/drive/MyDrive/istanbul-crm-topic-modeling/istanbul/stratified_sample_20K.json"
df_to_predict = pd.read_json(data_to_predict_path, orient="records", lines=True)
docs_to_predict = df_to_predict['Başvuru Açıklaması'].tolist() # Replace with your actual column name
topics, probabilities = model.transform(docs_to_predict)
AttributeError: 'tuple' object has no attribute 'shape'
The text was updated successfully, but these errors were encountered: