Generic LLM Performance Monitoring within the streamlit GUI #203
Closed
shailensobhee
started this conversation in
General
Replies: 1 comment
-
Hey Shailen, Thanks for the detailed question and explanation. I think its fair to assume that most LLMs wont provide the TTFT and TPOT in the usage metrics, so we'll have to implement this on our end. I've provided a sample implementation here for the ollama/chat LLM: https://github.com/phidatahq/phidata/pull/207/files What do you think? I tested it with If you think this is helpful, i can get it in the next release. We can and should start tracking these metrics |
Beta Was this translation helpful? Give feedback.
0 replies
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
-
Hi,
I am trying to increment my GUI with some detailed performance monitoring as I run my local LLM.
I was able to dig in the code and found helper file (
timer.py
) which you can use to capture timing information. I am testing Llama3 (so that would bephi/llm/ollama/chat.py
). I see thatresponse_timer
would give me the overall time to generate the response.However, I am looking for finer-grained timing information such as Time for First Token (TTFT) and Time per Output Token (TPOT). With these two metrics, I can have an apple-to-apples runtime perf comparison with other LLMs.
I am still digging into your code to find out where I could start instrumenting the code to get me the TTFT/TPOT. Maybe it is somewhere in
phi/assistant/assistant.py
, but I cannot find a clear spot where I can do that.If I run ollama from the cli, I can pass
--verbose
(ex:ollama run llama3 --verbose
) to get some metrics. Any insights how to get these data within phidata would be helpful!If you have any ideas, please kindly share! Thanks!
Shailen
Beta Was this translation helpful? Give feedback.
All reactions