20 Nov 2023 5:18 PM

A team of researchers from academia and tech have developed a simple system for monitoring the outputs of AI language models. ...

  • Researchers from AutoGPT, Northeastern University, and Microsoft Research have developed a tool to monitor large language models (LLMs) for potentially harmful outputs.
  • The tool can prevent harmful outputs, such as code attacks, from executing.
  • Existing tools for monitoring LLM outputs often fall short when applied to models already in production on the open internet due to the dynamic intricacies of the real world.
  • The researchers built a dataset of safe human/AI interactions and a competing dataset of adversarial outputs to train the monitoring agent.
  • The agent achieved an accuracy factor of nearly 90% in distinguishing between innocuous and potentially harmful outputs.

