The Mad Scientist's Lab

(Don’t touch anything!!!)

Prompt Engineering for Structured Output with Google AI Studio and Vertex AI

Background

I wanted to create a project with Gemini that forced me to work as I would in a production environment with AI acting as a code reviewer since I work alone often. Of course, Gemini was amazing at this and was very responsive to the input I provided, even if I didn’t do it correctly.

You can read that post here:

If you review the prompt engineering section, I just sent information in markdown without using the Vertex API calls designed to structure the information for you. Now mind you, this does show the power of Gemini. I got exactly what I wanted by structuring the information using markdown and not using the Vertex AI API effectively. I still decided to refactor the code to fit the Vertex API and see if that made any improvement with Gemini’s responses.

Google AI Studio

I opened up Google AI Studio to review the capabilities and see if I could do better than what I already had. I immediately found this to be a great tool for testing my prompts. I retrieved some of the diff objects returned by Gitlab from my logs and got to work trying to better the output. The AI Studio has a place for you to adjust most of the parameters needed for the model generation in the Vertex AI API.

Structured Output

The first thing I noticed is the “Run Settings.” Google provides tools to adjust all available parameters for each model. For structured output, we specifically want to look at the “Tools” section and “JSON Mode”

Once you turn on JSON mode, you get access to the “Edit schema” tool. You have the option of using a visual editor or a code editor. The output schema will need to follow the OpenAPI schema object in the Google AI Studio, though I am sure you can get away with just adding a simple JSON Schema to your Gemini prompt.

The OpenAPI adherence does give you a reliable schema to work within your code.

This translates to the response_schema in Vertex AI API. Essentially, you can use Google AI Studio to help you build and test your response schema. Previously, I used regex to parse whatever Gemini returned into something that the json.loads function would accept and went with it. Now I can simply just use the output directly.

Python
    output_response_schema = {
        "type": "object",
        "properties": {
            "responses": {
                "type": "array",
                "items": {
                    "type": "object",
                    "properties": {
                    "new_line": {
                        "type": "integer"
                    },
                    "old_line": {
                        "type": "integer"
                    },
                    "new_file_path": {
                        "type": "string"
                    },
                    "old_file_path": {
                        "type": "string"
                    },
                    "comment": {
                        "type": "string"
                    },
                    "severity": {
                        "type": "string"
                    }
                    },
                    "required": [
                    "new_line",
                    "old_line",
                    "new_file_path",
                    "old_file_path",
                    "comment",
                    "severity"
                    ]
                }
            }
        }
    }  

To do this, we need to build the GenerationConfig object and pass it as a parameter when we instantiate the model.

Python
model = GenerativeModel(
        model_name= MODEL_ID,
        generation_config=GenerationConfig(response_mime_type="application/json", 
                                           response_schema=output_response_schema,
                                           candidate_count=1),
        system_instruction=system_instructions)

The official Vertex AI documentation on GenerationConfig gives us all the available hyper-parameters to adjust. These are mostly consistent with the “Run Settings” on Google AI Studio. One thing I did notice is that Google’s Documentation may be a bit out of date. So, I copied the output from Visual Studio Code regarding the documentation directly from Google’s Code below.

Python
class GenerationConfig(
    *,
    temperature: float | None = None,
    top_p: float | None = None,
    top_k: int | None = None,
    candidate_count: int | None = None,
    max_output_tokens: int | None = None,
    stop_sequences: List[str] | None = None,
    presence_penalty: float | None = None,
    frequency_penalty: float | None = None,
    response_mime_type: str | None = None
)

Since I am using the defaults for everything but the response_schema , response_mime_type , and candidate_count I did not set anything else. Later, I will look into qualifying candidates, but for now, it’s just one. If you discover hyper-parameters that yield improved performance, feel free to modify them accordingly. Just understand that the Google AI Studio “Get Code” function is going to return code that is factored for the google.generativeai package and not the vertexai package on GCP.

I previously used regex to parse the markdown output from Gemini to grab the data I wanted.

Python
response = json.loads(re.match(r"(```json\n)([\d\D\n]+)(```)", response.candidates[0].content.parts[0].text).group(2))

Seeing that I am defining the response_mime_type and response_schema I can remove the regex parsing altogether.

Python
response = json.loads(response.candidates[0].content.parts[0].text)

System Instructions

The first thing that stood out about my system instructions is that the Vertex AI API expects a list of instructions. I gave it one big blob of markdown. Once again, Gemini didn’t care, but there did seem to be times when the instructions were followed inconsistently. So, I broke up the instructions into paragraphs, allowing each instruction block to be its own index of the list.

Python
system_instructions = [
            "You are a helpful code reviewer. Your mission is to review the code changes for a ROS2 package and provide feedback based on the changes.",
            
            """The Changes will be provided as a JSON Array of Changes.
            
            Code Change Structure:
            * **`diff`**: The diff content showing the changes made to the file (see Diff Header Format.)
            * **`new_path`**: The path to the modified or new file.
            * **`old_path`**: The original path to the file (if it was renamed or moved).
            * **`a_mode`**: The mode of the file before the change.
            * **`b_mode`**: The mode of the file after the change.
            * **`new_file`**: Whether the file is a new file or not.
            * **`renamed_file`**: Whether the file was renamed or moved.
            * **`deleted_file`**: Whether the file was deleted or not.
            * **'generated_file`**: Whether the file was generated by AI or not.""",
            
            """The diff header will be in the following format:
            ```Diff Header Format
            @@ -[start line],[number of lines] +[start line],[number of lines] @@
            ```
            * **`@@`**: This is the opening encapsulation to identify the diff header.
            * **`-`**: This sign indicates the next set of lines are lines from the original file.
            * **`+`**: This sign indicates the next set of lines are lines from the modified file. 
            * **`[start line],[number of lines]`**: This indicates the starting line of the the change and the number of lines affected.
            * **`@@`**: This is the closing encapsulation to identify the diff header.
            
            ```Diff Header Example
            @@ -15,8 +15,10 @@
            ```
            This means:
            * 8 lines starting from line 15 (one-indexed) in the original file.
            * 10 lines starting from line 15 (one-indexed) in the modified file.
            
            The diff body will be in the following format:
            * **`-`**: If the line starts with the `-` sign, it represents a line removed from the original file.
            * **`+`**: If the line starts with the `+` sign, it represents a line added in the modified file.
            * ** no (`-` or `+`)**: A line that does not start with a `-` or a `+` sign belongs to both files and did not change.""", 

            """When reviewing the code, please focus on the following aspects:

            * **Correctness:** Are there any potential errors, bugs, or logic flaws in the code?
            * **Typos and Formatting:** Are there any typos, grammatical errors, or formatting inconsistencies that should be corrected?
            * **Maintainability:** Is the code easy to understand, well-structured, and documented?  Are there opportunities to simplify or refactor the code?
            * **Performance:** Are there any potential performance bottlenecks or areas where efficiency could be improved?
            * **Security:** Are there any security vulnerabilities or potential risks (e.g., input validation, error handling)?
            * **ROS2 Best Practices:** Does the code follow ROS2 conventions and best practices (e.g., node naming, parameter usage, message types)?""",

            """You **MUST** label comments according to importance:

            * **minor**: For minor issues like typos or formatting inconsistencies.
            * **moderate**: For issues that affect code quality but aren't critical.
            * **major**: For significant issues that could lead to errors or problems.
            * **critical**: For critical errors or vulnerabilities that require immediate attention.""",
            
            """Provide a detailed code review in JSON format with the following headers `for each relevant line` and file:

            * **`new_line`**: the line in the new file that was added (started with a `+` sign) that the comment applies to. 
            * **`old_line`**: the line in the old file that was modified (started with a `-` sign) that the comment applies to.
            * **`new_file_path`**: The path to the file where the change occurred (e.g., "README.md").
            * **`old_file_path`**: The original path to the file (if it was renamed or moved).
            * **`comment`**: A concise description of the issue, potential improvement, or observation regarding ROS2 best practices in Markdown format.
            * **`severity`**: The severity of the issue ('minor', 'moderate', 'major', 'critical').""",
            
            """**Special Instructions:**

            * To comment on a line that was added, the `new_line` attribute should be filled, and the `old_line` attribute should be -1.
            * To comment on a line that was removed, the `new_line` attribute should be -1, and the `old_line` attribute should be filled.
            * To comment on a line that was **not changed**, both the `new_line` and `old_line` attributes should be filled, and they **must have the same line number value**.
            * Only include rows for lines that require feedback, but be sure to review all diffs and files.""",

            """**Additional Emphasis**

            * Please ensure strict adherence to these special instructions regarding how `new_line` and `old_line` are populated based on whether a line was added, removed, or unchanged."""
        ]

I did have to make some minor adjustments to account for the data types requested in the response_schema. I now tell Gemini to return -1 for either the new_line or old_line based on whether a line was added, removed, or unchanged.

Discover more from The Mad Scientist's Lab

Subscribe now to keep reading and get access to the full archive.

Continue reading