0

Preprocessing Reward Functions for Interpretability

We present a method for simplifying a learned reward model before visualizing it and show that this can make the reward more interpretable.

Erik Jenner, Adam Gleave