Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

async file in multipart file upload #3339

Open
wants to merge 7 commits into
base: master
Choose a base branch
from

Conversation

tsimoshka
Copy link

Summary

Trying to implement support for async files (any AsyncIterable[bytes | str]) multipart file upload. Partially fixes #1620

Checklist

  • I understand that this PR may be closed in case there was no previous discussion. (This doesn't apply to typos!)
  • I've added a test for each change that was introduced, and I tried as much as possible to make a single atomic change.
  • I've updated the documentation accordingly. (not sure if update is needed)

httpx/_types.py Outdated
@@ -93,7 +93,7 @@

RequestData = Mapping[str, Any]

FileContent = Union[IO[bytes], bytes, str]
FileContent = Union[IO[bytes], bytes, str, AsyncIterable[bytes], AsyncIterable[str]]
Copy link
Member

@tomchristie tomchristie Oct 10, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there an async equivalent to typing.IO?
Note that only support byte streams, not text streams.

(Ie. files must be opened in binary mode, or else we raise an exception.)

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there an async equivalent to typing.IO?

I don't think so. Since all the asyncio files I know are implemented in 3rd party libraries, there is no need for typing.IO asynchronous alternative in stdlib, I guess.

Note that only support byte streams, not text streams.

But it kinda works in that case. I'm using yield to_bytes(achunk) in arender_data to convert str chunk to bytes. I've updated the test to work with the file in text mode also. Not sure if this is correct aproach =/.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah yeah, I think the failure case is this?...

  • Read file size, used for Content-Length header.
  • Open a file on a system using a non utf-8 encoding.
  • Write to bytes using utf-8.
  • Oops, uploaded data stream didn't match the Content-Length.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok, I've updated the code and added a runtime checks for bytes in arender_data and updated tests accordingly.

@tomchristie
Copy link
Member

Oh, thanks @tsimoshka 😊

@tsimoshka
Copy link
Author

BTW, It looks like content=... is already working with {trio,anyio}.open_file, so the #1620 can be closed if this is merged.

Comment on lines +192 to +227
if not isinstance(self.file, AsyncIterable):
for chunk in self.render_data():
yield chunk
return
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we want block loop for non-AsynIterable types?

Suggested change
if not isinstance(self.file, AsyncIterable):
for chunk in self.render_data():
yield chunk
return
if not isinstance(self.file, AsyncIterable):
raise TypeError("Invalid type for file. Only AsyncIterable is supported.")

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Correct me if I'm wrong, but won't it break IO[bytes] usage with AsyncClient?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, you're right.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Async generator objects should always be typed as AsyncGenerators so they can be used with contextlib.aclosing

Comment on lines 198 to 202
if not isinstance(achunk, bytes):
raise TypeError(
"Multipart file uploads must be opened in binary mode,"
" not text mode."
)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could it be outside of for loop?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wanted to check the type of file in FileField.__init__. But I think that it's impossible to check the type of the chunks in AsyncIterable at runtime without iterating. Besides, strictly speaking, nothing stops me in python from doing something like:

import asyncio

async def aiter():
    yield b"bytes"
    yield "some str"

async def main():
    result = []
    async for chunk in aiter():
        result.append(chunk)
    print(result)


asyncio.run(main())

I know that this is a very strange example and it's unlikely that someone will do something like that. So I can check only the first chunk in a loop to ensure it is of the correct type. Something like:

checked = False
async for achunk in self.file:
    if not checked and not isinstance(achunk, bytes):
        raise TypeError(
            "Multipart file uploads must be opened in binary mode,"
            " not text mode."
        )
    checked = True
    yield achunk

But I'm not sure whether it'll significantly impact the performance.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would something like this work?

    file_aiter = self.file.__aiter__()
    try:
        achunk = await aiter.__anext__()
    except StopAsyncIteration:
        return

    if not isinstance(achunk, bytes):
        raise TypeError(
            "Multipart file uploads must be opened in binary mode,"
            " not text mode."
        )

    yield achunk

    async for achunk in file_aiter:
        yield achunk

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, it would work, but it wouldn't catch the mixed string/bytes iterators I mentioned here. However, I've modified the code as you suggested, since I agree that handling mixed iterators is an edge/unlikely case in this situation..

@noam-better
Copy link

Hello! This looks great, any chance it can be merged soon?

@tsimoshka tsimoshka requested a review from marcodlk November 20, 2024 12:39
Copy link
Member

@graingert graingert left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You need to explicitly aclose async generators. contextlib.aclosing was added in 3.10 for this purpose, you'll need to copy in the code to support 3.9 or conditionally use contextlib2.aclosing

Comment on lines +192 to +227
if not isinstance(self.file, AsyncIterable):
for chunk in self.render_data():
yield chunk
return
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Async generator objects should always be typed as AsyncGenerators so they can be used with contextlib.aclosing

def render(self) -> typing.Iterator[bytes]:
yield self.render_headers()
yield from self.render_data()

async def arender(self) -> typing.AsyncIterator[bytes]:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Use AsyncGenerator type

def render(self) -> typing.Iterator[bytes]:
yield self.render_headers()
yield from self.render_data()

async def arender(self) -> typing.AsyncIterator[bytes]:
yield self.render_headers()
async for chunk in self.arender_data():
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Use async with contextlib.aclosing

for field in self.fields:
yield b"--%s\r\n" % self.boundary
if isinstance(field, FileField):
async for chunk in field.arender():
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

async with contextlib.aclosing

httpx/_multipart.py Outdated Show resolved Hide resolved
@tsimoshka
Copy link
Author

You need to explicitly aclose async generators. contextlib.aclosing was added in 3.10 for this purpose, you'll need to copy in the code to support 3.9 or conditionally use contextlib2.aclosing

Hey! Thanks for reviewing this.

I won't do this for the following reasons:

  • contextlib.closing is not used for synchronous files, and files should be closed outside httpx, if I'm not mistaken. Using contextlib.aclosing for async files would make the API different for sync/async operations, which in my opinion might be confusing for users. I think that using closing/aclosing might be an interesting feature, but if it should be implemented, it needs to be done separately and consistently for both sync/async operations.

  • It wouldn't allow simple chunking with an iterator when you wrap anyio/trio/anything else into a simple iterator with bufferization and control the chunk sizes yourself. You could implement a generator in that case, but it would be more verbose, I think.

@graingert
Copy link
Member

graingert commented Nov 25, 2024

You need to explicitly aclose async generators. contextlib.aclosing was added in 3.10 for this purpose, you'll need to copy in the code to support 3.9 or conditionally use contextlib2.aclosing

Hey! Thanks for reviewing this.

I won't do this for the following reasons:

* `contextlib.closing` is not used for synchronous files, and files should be closed outside httpx, if I'm not mistaken. Using `contextlib.aclosing` for async files would make the API different for sync/async operations, which in my opinion might be confusing for users. I think that using `closing/aclosing` might be an interesting feature, but if it should be implemented, it needs to be done separately and consistently for both sync/async operations.

* It wouldn't allow simple chunking with an iterator when you wrap anyio/trio/anything else into a simple iterator with bufferization and control the chunk sizes yourself. You could implement a generator in that case, but it would be more verbose, I think.

I'm not talking about using aclosing on the files, just the async generators. The AsyncIterator from the file doesn't need to be closed

@tsimoshka tsimoshka force-pushed the async_files_in_multipart branch from 7595f86 to 7984e22 Compare December 1, 2024 15:30
@tsimoshka tsimoshka force-pushed the async_files_in_multipart branch from 7984e22 to a6ea8b4 Compare December 1, 2024 15:40
@tsimoshka
Copy link
Author

You need to explicitly aclose async generators. contextlib.aclosing was added in 3.10 for this purpose, you'll need to copy in the code to support 3.9 or conditionally use contextlib2.aclosing

Hey! Thanks for reviewing this.
I won't do this for the following reasons:

* `contextlib.closing` is not used for synchronous files, and files should be closed outside httpx, if I'm not mistaken. Using `contextlib.aclosing` for async files would make the API different for sync/async operations, which in my opinion might be confusing for users. I think that using `closing/aclosing` might be an interesting feature, but if it should be implemented, it needs to be done separately and consistently for both sync/async operations.

* It wouldn't allow simple chunking with an iterator when you wrap anyio/trio/anything else into a simple iterator with bufferization and control the chunk sizes yourself. You could implement a generator in that case, but it would be more verbose, I think.

I'm not talking about using aclosing on the files, just the async generators. The AsyncIterator from the file doesn't need to be closed

Sorry for the delay. I've added aclosing as requested. To support Python versions prior to 3.10, I've restored the previously removed httpx/_compat.py file, though I'm not certain this is the best approach.

@tsimoshka tsimoshka requested a review from graingert December 1, 2024 15:44
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Support async file types in files = {} and content = ...
6 participants