Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Websocket Support for Streaming Input and Output #320

Open
ChenghaoMou opened this issue Oct 3, 2024 · 5 comments
Open

Websocket Support for Streaming Input and Output #320

ChenghaoMou opened this issue Oct 3, 2024 · 5 comments
Labels
enhancement New feature or request question Further information is requested

Comments

@ChenghaoMou
Copy link


🚀 Feature

Support websocket endpoints to allow two-way real-time data communication.

Motivation

Currently, the requests are processed with the expectation that the data is complete and stateless. However, the input data isn't always ready immediately for use cases like speech to text, text to speech, audio/speech understanding, especially in time-sensitive situations. With the recent release of Realtime API from OpenAI and a new family of voice AI models (ultravox, mini-omni, llama-omni, moshi), support for streaming input and output could benefit the community in many ways and unlock even more creative uses of AI models.

Pitch

Support streaming input and output with websocket or any other methods to allow real-time AI applications.

Alternatives

A typical FastAPI websocket implementation is very template-like:

@app.websocket("/ws")
async def websocket_endpoint(websocket: WebSocket):
    await websocket.accept()
    try:
        while True:
            message = await websocket.receive_bytes()
            # process data
            # results = model(parse(message))
            await websocket.send_json(results)
    except WebSocketDisconnect:
        logger.error("WebSocket disconnected")
    except Exception as e:
        logger.error(f"Error: {e}")
        if websocket.client_state != WebSocketState.DISCONNECTED:
            await websocket.close(code=1001)
    finally:
        # clean up

However, this might make the batching impossible or complicated.

I am new to this repo, so if there is a workaround by hacking the server/spec/api to allow websocket, I am more than happy to contribute. If this is duplicate/irrelevant, sorry for the trouble.

Thanks a million for open sourcing this awesome project. ❤️

Additional context

@ChenghaoMou ChenghaoMou added the enhancement New feature or request label Oct 3, 2024
@aniketmaurya
Copy link
Collaborator

hi @ChenghaoMou, thank you for looking into and trying LitServe 💜 !! We support streaming which can be enabled by adding a stream=True argument to LitServer class. We have the streaming documentation here.

Please let me know if it helps.

@ChenghaoMou
Copy link
Author

Thanks for the prompt response! @aniketmaurya

If I am reading the documentation right, the current streaming is only for the output, not the input. It feels more like server side events (one input and multiple outputs) rather than websocket (streaming both input and output). The difference could be, for example, in speech to text:

  1. Existing streaming: upload an entire audio file (input non-streaming) to get transcription "word by word" (output streaming);
  2. This request: streaming audio from a live speech (input streaming) to get transcription "word by word" (output streaming);

I hope this makes sense.

@aniketmaurya
Copy link
Collaborator

yes @ChenghaoMou, the current streaming is server-sent event. Let's keep this open and we can evaluate this feature based on requests from the community.

@aniketmaurya aniketmaurya added the question Further information is requested label Oct 4, 2024
@cyberluke
Copy link

cyberluke commented Oct 14, 2024

Come on, websocket is basic feature. That was the first thing I required after signing to PRO account. Currently they open ports only if there is GET endpoint that returns 200. Using FastAPI you just need to enable both GET request and WSS endpoint.

Me as community member replies that this is a crucial and basic element in software development for web & mobile apps.

Studio AI have some Whisper examples, but they do it wrong. They just encapsulate Whisper with StreamLit and it looks like REST API or it just processes whole audio file. It made me a little bit sad as perfectionist and software developer with 20 years of experience :-D

@dreamerwhite
Copy link

i tired something to make request like streaming in and streaming out.

  class Llama3API(ls.LitAPI):

      def setup(self, device):
          self.llm = litgpt.LLM.load("checkpoints/meta-llama/Meta-Llama-3-8B-Instruct")
  
      def decode_request(self, request):
          for i in range(10):
            yield self.model(i)
  
      def predict(self, prompts):
          for i in range(prompts):
             yield from self.llm.generate(i, max_new_tokens=200, stream=True)
      def encode_response(self, output):
  
          for out in output:
              yield {"output": out}

but i haven't tested the performance yet。

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request question Further information is requested
Projects
None yet
Development

No branches or pull requests

4 participants