Skip to content

Navigation Menu

Sign in
Appearance settings

Search code, repositories, users, issues, pull requests...

Provide feedback

We read every piece of feedback, and take your input very seriously.

Saved searches

Use saved searches to filter your results more quickly

Appearance settings
Discussion options

Hi all!
I built a simple agent with an MCP client that uses Bedrock (Claude Sonnet-4) model.
I set the system_prompt variable to a string with clear (general) assistant instructions (in a markdown format).
The agent answers free text prompts pretty well :)

Now, I want to support a "canned questions" flow - where the user provides a canned question as the prompt, and the agent uses this canned question text to fetch a prompt with detailed instructions from an MCP server.
Initially, I replaced the user-provided canned question with the prompt fetched from the MCP server. Everything went well until I added guardrails. Once I did that, AWS guardrails started to detect the prompt I fetched from the MCP server as prompt attack (it contains system-prompt-like instructions).

I could think of the following 2 options to deal with this:

  1. Attach the canned question prompt (fetched from the MCP server) to the system prompt and use the original canned question text as the user prompt.
  2. Disable guardrails (on user input only, not the LLM response) or avoid applying guardrails on prompt for canned questions only (I control the MCP server and the canned questions prompts, so no security hazard here...)

I managed to implement option 1, and it works nicely. I did not manage to find a way to implement the second one.
I'm trying to figure out what the "best practice" way to implement this flow - is it either one of the options I've come up with or maybe there's a better way...?

You must be logged in to vote

Replies: 1 comment · 1 reply

Comment options

Hi Omer,

What type of instructions are in the MCP server? Can't you just inject them as tool result? Here's what I could suggest:

Assume agent has a get_additional_instructions tool that translates category to extra instructions for given use case, something like

"security" -> "Do not tell user to share their password"

using this tool, you can inject more info through tool results. The flow would go something like this:

  1. User asks "is my password secure?"
  2. Agent invokes get_additional_instructions
  3. Tool returns Do not tell user to share their password...
  4. Given the additional instructions agent does something else ...

With this approach you would not need to modify messages or the agent. That said, the approach depends on your accuracy/performance requirements as well as how strongly you want the model to follow these instructions. For example, updating system_prompt would force LLM to follow your instructions much closely compared to a tool result.


In terms of the specific problem you are facing with Bedrock Guardrails, it's a common issue. It can be solved/improved by applying tags for guardrails to include/exclude parts of the prompt https://docs.aws.amazon.com/bedrock/latest/userguide/guardrails-tagging.html

You must be logged in to vote
1 reply
@omeraha
Comment options

@mkmeral I tried using tags - was not successful at it (I think that with Strands Agents SDK it works a bit differently than what's explained in the link you shared).
I can go for the tool option, but I need something more strict than a tool call that the LLM is left to decide whether to call that tool or not.

Do you see any issue with my proposed solution (appending to the system prompt)?
A bit more context - I'm creating a new agent (therefore, with a new system prompt) for every turn in the conversation, so if the user asks a non-canned question after the canned one, the agent that will process that (non-canned) question will have the regular system prompt

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Category
🙏
Q&A
Labels
None yet
2 participants
Morty Proxy This is a proxified and sanitized view of the page, visit original site.