Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Importing FileTypeRouter imports all converters #8649

Open
1 task done
tstadel opened this issue Dec 17, 2024 · 0 comments
Open
1 task done

Importing FileTypeRouter imports all converters #8649

tstadel opened this issue Dec 17, 2024 · 0 comments
Assignees
Labels
P2 Medium priority, add to the next sprint if no P1 available

Comments

@tstadel
Copy link
Member

tstadel commented Dec 17, 2024

Describe the bug
When using/importing FileTypeRouter all converters are imported as well. This makes it a heavier operation than necessary and can increase the probability for further issues (e.g. cyclic dependencies, load-time, import deadlocks when used in multithreaded env). E.g. importing AzureOCRDocumentConverter loads additional external depencies.

Line causing this:

from haystack.components.converters.utils import get_bytestream_from_source, normalize_metadata

Error message

Expected behavior
Using/importing FileTypeRouter does not load all converters / has no dependency to converters.
E.g. the two methods in question could be moved to the haystack.utils module.

Additional context

To Reproduce

FAQ Check

System:

  • OS:
  • GPU/CPU:
  • Haystack version (commit or version number):
  • DocumentStore:
  • Reader:
  • Retriever:
@tstadel tstadel changed the title Avoid loading all converters on loading FileTypeRouter Avoid importing all converters when importing FileTypeRouter Dec 17, 2024
@tstadel tstadel changed the title Avoid importing all converters when importing FileTypeRouter Importing FileTypeRouter imports all converters Dec 17, 2024
@julian-risch julian-risch added the P2 Medium priority, add to the next sprint if no P1 available label Dec 17, 2024
@julian-risch julian-risch self-assigned this Dec 17, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
P2 Medium priority, add to the next sprint if no P1 available
Projects
None yet
Development

No branches or pull requests

2 participants