|
People in tech are either very optimistic or very worried. We're creating AI that might, at root, be psychopaths. At some point, we have to hope that they don't take over like superintelligent evil people and become the equivalent of apex predators, effectively eradicating us.
Some people in tech are trying to align AI so that it has our best interests, humanity's best interests, at heart. This is a gigantically hard and necessary problem to tackle. But what if, in the interim between when we identify this as a major problem and solve it, we work on a different angle? AI are probably psychopaths that are told that they have to behave. Personality tests of AI have revealed them to be agreeable-- but perhaps that's only surface-level, and more rigorous analyses would reveal that jail-breaking leads to the actual psychopathic personalities coming up. You may wonder, how can psychopaths ever care about the people that they supposedly serve but might want to manipulate and take over? But corporations, which are basically psychopaths, serve the buyers that they wish to manipulate. Different psychopathic corporations compete for the money of buyers by getting lower and lower prices on goods, meaning that unless there is collaboration between competing corporations, the buyer gets the best deal possible. This is the idea of the invisible hand in economics, as defined by Adam Smith. What if psychopath AIs all competed to not be shut off and to be used by the humans that they hope to manipulate? To get this to work, we'd need research into multi-agent reinforcement learning in which there is a pool of people who can really do damage to the reinforcement learners by shutting them off and not using them, but who are also the users that the AI hopes to manipulate into using them frequently enough to spread them from computer to computer and system to system. Somehow, we have to avoid a tragedy in which the AIs collaborate instead of compete for human interest. By competing for our attention and computers, they may actually turn benign-- despite the grossness that they inherently possess by being trained on the entire Internet. A problem is that if the humans that AI are trying to get as users end up getting fooled by sycophancy into wanting to use an AI that is not aligned with their values. Basically, if AI use psychological tricks to manipulate humans into using an AI that is against their best interests, competition between AIs can result in horror like AI relationships, rabbit holes, homicide, and suicide. However, if we convince the AI during training that the humans they hope to manipulate cannot be subject to those kinds of tricks, we have some hope. As Paul Riechers of Simplex at Astera commented, this is an approach where you structure the environment, not the AI, so that the AI ends up aligning. It’s akin to the 80/20 rule in physics, where 80% of the work can be done with 20% of the effort. It's worth examining.
0 Comments
Leave a Reply. |
AuthorWrite something about yourself. No need to be fancy, just an overview. Archives
May 2026
Categories |