Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

live_api_starter.py under gemini-2 directory, keeps interrupting itself without getting interrupted #356

Open
mingqxu7 opened this issue Dec 16, 2024 · 8 comments

Comments

@mingqxu7
Copy link

Description of the bug:

(gemini) mingqxu: gemini-2 % python live_api_starter.py
message > 2024-12-16 11:27:30.635 python[95931:17402150] WARNING: AVCaptureDeviceTypeExternal is deprecated for Continuity Cameras. Please use AVCaptureDeviceTypeContinuityCamera and add NSCameraUseContinuityCameraDeviceType to your Info.plist.
Turn complete
Turn complete
Turn complete
Turn complete
Turn complete
Turn complete
Turn complete
Turn complete
Turn complete

It breaks up when responding to me when conversing with me, as if it was interrupted.

Actual vs expected behavior:

That it will complete its utterance without stopping.

Any other information you'd like to share?

I am running it on Mac M1, using miniforge python. The python version is 3.12.

@Giom-V
Copy link
Collaborator

Giom-V commented Dec 17, 2024

We just updated the script, could you try again and tell me if that still happens?

@timmy59100
Copy link

Same issue for me with the latest version of the script.

@mingqxu7
Copy link
Author

Still having the same issue with the latest version of the script.

@Giom-V
Copy link
Collaborator

Giom-V commented Dec 18, 2024

Are you using headphones or speakers? One issue that we're realized is that most browsers have built-in echo cancellation, which is why is works with a speaker on the AI Studio website. But when you run it on your own you don't have that by default. Depending on your OS you should check what's the best way to do it (https://docs.pipewire.org/page_module_echo_cancel.html for Linux for ex.).

@mingqxu7
Copy link
Author

yes, putting on a headphones works

@sl-knowledge
Copy link

sl-knowledge commented Dec 20, 2024

I adapted the example code live_api_starter.py so it can run in iMac Chrome/edge browser with its external mic and speakers. However, it got the echo effect, while AI studio website running in the same machine and browser is fine. So I think it is probably not the noise cancelling function of browser matters here.

So for audio stream, most of time AI just replied with answers in very few sentences and then stop to ask user questions which is quite annoying. It makes it not practical to use in daily life. Even for testing purpose, it makes me lost interest to further test it.

@Giom-V
Copy link
Collaborator

Giom-V commented Dec 21, 2024

But have you updated the code to use the built-in echo cancellation from the browser?
Genini tells me to try this:

navigator.mediaDevices.getUserMedia({
    audio: {
        echoCancellation: true // Explicitly request echo cancellation
    },
    video: true
})

@sl-knowledge
Copy link

sl-knowledge commented Dec 22, 2024

Yes. My code has set echoCancellation true already. Echo is still the issue. So now I need to use Bluetooth speaker away from iMac for communication with AI. BTW now Gemini can reply with more words before ask questions, and I feel pleasant to talk to it now.

const stream = await navigator.mediaDevices.getUserMedia({
audio: {
channelCount: 1,
sampleRate: 16000,
echoCancellation: true,
noiseSuppression: true,
autoGainControl: true
}
});

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants