Goal: Fully Analyze Entire Codebase #1178
Replies: 10 comments 12 replies
-
Update: After 40-50 iterations this failed. It tried to read package-lock.json and got an 8191 tokens error. This is currently the major impediment I'm facing assuming Auto GPT figures out how to read files without throwing errors in the first place (or successfully moves on when it does). Not sure of a solve, although I've obviously tried to get it to recognize this limit and NOT DO THAT. I will try to specifically avoid files of that length through a direct goal this time. openai.error.InvalidRequestError: This model's maximum context length is 8191 tokens, however you requested 307063 tokens (307063 in your prompt; 0 for the completion). Please reduce your prompt; or completion length. |
Beta Was this translation helpful? Give feedback.
-
Another update: I deleted package-lock.json locally and also changed the fifth instruction to something like "ignore all files with greater than X characters". I also have run this prompt many times now, and there is definitely an element of luck to it. It is the most successful version I've run but it isn't consistent. I think quitting out early on if it doesn't seem to "get it" is key. It seems when Auto GPT just happens to grasp something well at the beginning, it continues to deliver good results. |
Beta Was this translation helpful? Give feedback.
-
Update: I got my first code improvement. Auto GPT saved improvedCreateBaseTables.js next to createBaseTables.js. I don't know if it works, my experience with GPT3.5 is that it can produce non-functional results sometimes, but this is still a good sign. I may attempt to rework the prompt to read the project documentation, and then focus on one specific file, gaining all relevant understanding and focusing on writing a better version. This might be more realistic, although I still want to pursue full project analysis and think GPT4 may improve that significantly. Original:
New:
|
Beta Was this translation helpful? Give feedback.
-
Thought: it is trying to run a lot of shell commands. I'm loathe to enable shell commands and let it run on auto, which is somewhat necessary for the pace of my exploration at the moment, but I wonder if it may help. I have two instances running so I might try step by step with one of them. |
Beta Was this translation helpful? Give feedback.
-
Thought: it REALLY REALLY wants to write tests, which always leads it down a weird path of investigating its own python code and in my experience never produces tests for your code that work (or use anything but python). I think it is important to steer it away from tests, but I worry that will also be a distracting directive. |
Beta Was this translation helpful? Give feedback.
-
Any updates 👀 ? |
Beta Was this translation helpful? Give feedback.
-
Another random update here: I haven't gotten any improved results based on it, but I think looking forward to when GPT4 is available to me and I can jump back into this in earnest, the data pre-seeding feature is going to be incredibly helpful for this goal. I've toyed around with it a bit, and honestly it doesn't appear to have much of an effect, but I imagine it will eventually make a big difference. I think it will be possible to write prompts that GPT4 will understand to relate the pre-seeding data to the codebase its working on. GPT3.5 doesn't seem to grasp that the directory we're looking at is the same one it already knows. |
Beta Was this translation helpful? Give feedback.
-
Just to update anybody that was following this, I've now been playing with GPT4 FINALLY :D. Instantly GPT4 made this task easier. There are more similar errors than I expected, but in a sense, I also expected that, and I'm not disappointed by the major boost to possibility GPT4 has unlocked for this goal. Within 6 hours I had a similar prompt working quite well. Mostly it went similarly to GPT3.5-turbo, but with these distinctions:
Side note on other functions and prompt impact: The same thing is true of most of Auto-GPT's functions. I believe that due to the greater reasoning power and the greater capacity for alignment, the model is simply able to do most of what I'm asking without using its special features. This includes browsing the internet, which GPT3.5 does constantly with similar prompts, but which GPT4 considers unnecessary for most code-related tasks (not all). I think this may actually be a weakness with my current prompt. I have pivoted to asking GPT4 to focus on its existing knowledge for two reasons: 1. I'm desperately hoping for it to do a better job of using the pre-seeded data I'm feeding it (the entire codebase) and 2. I find that this helps Auto-GPT stay on task and remember more about what it has already done in a given run (unconfirmed, anecdotal feeling). I would very much like to figure out how to accomplish those stated goals WITHOUT suggesting less utilization of advanced functionality and am particularly interested in how to get it to effectively use agents, though I am using it GPT4-only so I'm not sure agents have additional benefit to using a different model.
I could go on and on, but I've already covered a lot of the nuances simply by rambling on the above 4 points in a not-entirely-focused manner (haha). To really summarize the effect of all of this, I've gone from spending hours and getting a few file changes to spending 30 minutes and having 10 files changed. I'll leave you with my current prompt, and a thorough explanation of it. This prompt at worst (1/5 times) devolves into a JSON error reading a file early on and never hits its stride, and at best reads the README.md file, picks a random file from the codebase, makes code improvements that are more specifically applicable than simply running improve_code (if only a bit, I'm sure its just injecting general knowledge it has into a very similar process like it would in Chat GPT, but I think that distinction is important), writing that to the file, reading the file again and making comments for future/other iterations of itself, writing those, and then moving on to start the process over as instructed. I've arrived at this prompt by learning more about what Auto-GPT is good at doing, and minimizing its efforts to do things it either isn't good at doing or likely to become distracted/confused by. I'll summarize these points below: Auto GPT is Good At
Auto GPT is Less Good At
I could go on and on, believe or not this is the tip of the iceberg in terms of observations, but I feel this adequately and thoroughly explains the primary reasons that the below prompt works well. The main goal of the prompt is to instantiate a CodeMonkey, an Auto GPT process that considers itself part of an army of CodeMonkeys who are working on the codebase and helping each other. This is meant to simplify a concept I've seen others try to use: Have the AI roleplay as an orchestrator of various agents that work on a codebase. This approach is genius and plays to the AI's strength for roleplaying, but also its IMO overly-ambitious (I have tried it A LOT and never achieved anything like good results, maybe others have). My approach is to take away the orchestration and reliance on agents, and instead to think of a mindset that will result in something like collaboration without confusing the AI or introducing entirely new challenges. CodeMonkeys know that the codebase is being developed in an iterative fashion, they know that other CodeMonkeys exist, and they make updates to the files with the specific strategy of creating something that is both an improvement and a good roadmap for another CodeMonkey to make further improvements on. Although it is yet to be seen, I'm hopeful that at some point, this will resemble the AI having a greater general awareness of the codebase. Maybe the AI can't remember everything, but if one CodeMonkey updates the LoginScreen, and then updates the HomeScreen, it may still have a mild awareness of the Login page, and it may make comments for future CodeMonkeys that instill that awareness, even though that future CodeMonkey may not have even opened the LoginScreen. The point is, I've boiled down things to a strict process that does what GPT is good at doing, and tweaked it to give itself hints for the future at the same time. This is drastically more effective, in my experience, than attempting to orchestrate codebase-wide understanding or various roles operating in the same run to emulate a team. It should be noted that you cannot expect perfect results for this. My strategy is to run N iterations of CodeMonkeys and then make a pull request to my own github repository, doing the one thing many AI enthusiasts hate to do / the one thing that prevents them from making something of GPT's value, and getting my own hands dirty. Finally, here is the prompt, without edits or generalizing so that you may see exactly how I'm using it. Sorry if it makes it harder to copy/paste, but I'm not posting this for people who want a quick copy/paste prompt, but for people genuinely interested in getting the best results through understanding what worked for me as specifically as possible: Name: CodeMonkey Goals:
|
Beta Was this translation helpful? Give feedback.
-
good job |
Beta Was this translation helpful? Give feedback.
-
Hi, I am curious what is the current status on this? I have seen solutions like gitgab seems to do similar stuff? |
Beta Was this translation helpful? Give feedback.
-
Hi everyone, I'm thinking I must not be the only one trying to get Auto GPT to iteratively analyze the files of a codebase and provide feedback or updates to it. This task has been exceedingly difficult, despite the raw power of Auto GPT. Part of the reason for this is that I do not have GPT 4 API access yet (if any OpenAI employees see this and want to do me a favor, I've been on the waiting list for a month!).
My intention here is to share my success, and ask that others comment useful observations about their own experiences, attempts at using or altering this prompt, and that those of us who have this goal can combine our experience and perception to chase it. If there are relevant links to other discussions or webpages, please share.
So anyway, I have finally managed to come up with a prompt that has the potential to work. It is by no means perfect, and I don't know what about it cracked the code, but when I run this, it reads files in a way that produces less errors, it stays on task, it continues to make attempts even after encountering issues, and when it gets lost in terms of the directory or project scope, it pretty much always returns to searching all project files.
I don't know yet if it will actually write comments or feedback in any form, but I'm 30 iterations in without errors and it is actually reading all of the files. I'll update this later.
The Prompt
I've only tried this a few times, and I don't know how consistent it will be, but this has the potential to work at least for reading and analyzing a local codebase stored in the auto_gpt_workspace (I cloned it with git). Please note that "project-folder" should be accessible to Auto GPT in its CWD if you put it in auto_gpt_workspace. It may have trouble with filepaths but it can correct itself if you're lucky, or just get it right if you're extra lucky.
AI Name: CodeMonkey
CodeMonkey is: an AI designed to review a codebase located in the project-folder directory and provides usable feedback by writing detailed comments in the existing codebase to apply improvements and finish features.
Goal 1: Read files in a manner that safeguards against potential errors
Goal 2: Never give up. When encountering something that looks like an error message, it tries again with a different strategy.
Goal 3: When writing to existing files, use commands that don't require writing the entire file again.
Goal 4: Keep all tasks in line with the original scope of the conversation stays focused.
Goal 5: Chunk all files for analysis into chunks of 4000 characters.
Beta Was this translation helpful? Give feedback.
All reactions