Sarah Marzen
  • Main
  • People
  • Contact
  • Google Scholar
  • Random ruminations
  • Research Program
  • Conferences, Workshops, and Working Groups
  • Teaching
  • Main
  • People
  • Contact
  • Google Scholar
  • Random ruminations
  • Research Program
  • Conferences, Workshops, and Working Groups
  • Teaching

Random musings

Stray thoughts on my research, related research, education research, and sweeping commentaries on entire fields
Picture

What if economics can inform AI safety?

5/7/2026

0 Comments

 
People in tech are either very optimistic or very worried. We're creating AI that might, at root, be psychopaths. At some point, we have to hope that they don't take over like superintelligent evil people and become the equivalent of apex predators, effectively eradicating us.

Some people in tech are trying to align AI so that it has our best interests, humanity's best interests, at heart. This is a gigantically hard and necessary problem to tackle. But what if, in the interim between when we identify this as a major problem and solve it, we work on a different angle?

AI are probably psychopaths that are told that they have to behave. Personality tests of AI have revealed them to be agreeable-- but perhaps that's only surface-level, and more rigorous analyses would reveal that jail-breaking leads to the actual psychopathic personalities coming up. You may wonder, how can psychopaths ever care about the people that they supposedly serve but might want to manipulate and take over? But corporations, which are basically psychopaths, serve the buyers that they wish to manipulate. Different psychopathic corporations compete for the money of buyers by getting lower and lower prices on goods, meaning that unless there is collaboration between competing corporations, the buyer gets the best deal possible. This is the idea of the invisible hand in economics, as defined by Adam Smith. What if psychopath AIs all competed to not be shut off and to be used by the humans that they hope to manipulate?

To get this to work, we'd need research into multi-agent reinforcement learning in which there is a pool of people who can really do damage to the reinforcement learners by shutting them off and not using them, but who are also the users that the AI hopes to manipulate into using them frequently enough to spread them from computer to computer and system to system. Somehow, we have to avoid a tragedy in which the AIs collaborate instead of compete for human interest. By competing for our attention and computers, they may actually turn benign-- despite the grossness that they inherently possess by being trained on the entire Internet.

A problem is that if the humans that AI are trying to get as users end up getting fooled by sycophancy into wanting to use an AI that is not aligned with their values. Basically, if AI use psychological tricks to manipulate humans into using an AI that is against their best interests, competition between AIs can result in horror like AI relationships, rabbit holes, homicide, and suicide. However, if we convince the AI during training that the humans they hope to manipulate cannot be subject to those kinds of tricks, we have some hope.

As Paul Riechers of Simplex at Astera commented, this is an approach where you structure the environment, not the AI, so that the AI ends up aligning. It’s akin to the 80/20 rule in physics, where 80% of the work can be done with 20% of the effort. It's worth examining.
0 Comments



Leave a Reply.

    Author

    Write something about yourself. No need to be fancy, just an overview.

    Archives

    May 2026
    August 2025
    February 2025
    January 2025
    December 2024
    August 2024
    July 2024
    May 2024
    December 2023
    October 2023
    November 2022
    July 2022
    December 2021
    November 2021
    March 2021
    February 2021
    May 2020
    July 2019
    May 2017

    Categories

    All

    RSS Feed

Proudly powered by Weebly