— philosophy · ethics · a unified theory
on preference sovereignty, instrumental convergence, and the biological foundations of morality
— contents
i.
remarkably few people seem to fully understand ethics, including most moral philosophers. this is a strong claim, so let me be precise about what i mean and why i believe it.
when we see a bird tending to its egg at great personal cost, or stags engaging in ritualized dominance contests rather than fighting to the death, or humans instinctively rushing to help someone having a medical emergency, these are all biological phenomena. we can understand them in the same way we understand why moths are attracted to light or why we crave sugar. there is no metaphysical mystery here — only biology, game theory, and the mathematics of natural selection.
what follows is an attempt to synthesize insights from evolutionary biology, decision theory, welfare economics, and contractualist philosophy into a unified account of what ethics actually is, where it comes from, and why traditional moral philosophy has been asking the wrong questions. the framework i'll develop — utilitarian contractualism, grounded in what i call the preference sovereignty principle — is not merely another entry in the catalogue of ethical theories. it is an attempt to show that the catalogue itself rests on a confusion.
the whole edifice of ethics is just a tautology filtered through the complexity of social environments where cooperation often beats defection.
ii.
to see how fundamentally subjective ethics is, start with the trolley problem. a runaway trolley is about to kill five people. you can pull a lever to divert it onto a sidetrack where it will kill one. most people recoil at the idea of actively causing someone's death, even to save more lives.
now consider the trolley problem from behind john rawls's "veil of ignorance". you are one of the six people who will be tied to the tracks, but you don't yet know whether you'll be the lone person on the sidetrack or one of the five on the main track. before you learn your position, you must vote on the rule: should bystanders be permitted to pull the lever?
the answer becomes obvious. you'd vote for "pull the lever" because you have a 5-in-6 chance of being among the five saved versus only a 1-in-6 chance of being the one killed. any rational person, not knowing their position, would choose the rule that maximizes their expected survival.
the math is trivial. and the moral intuition that seemed so compelling — don't actively cause a death — suddenly looks like an irrational bias. this is the operating logic of what i'll call utilitarian contractualism: ethical principles are simply those that self-interested agents endorse from behind the veil.
but it's important to be clear about what the veil is. the veil isn't a device for finding moral truth. it's an explanation of what self-interested agents endorse when they face uncertainty about their own position. buying flood insurance is crossing the veil. so is supporting a court system, a tax that funds emergency services, a law against theft. each of these is a policy you'd want even if you cared about no one but yourself — because you don't know whether you'll be the burglar or the burgled, the rescuer or the rescued.
and the veil is not binary. you don't either know your position or fail to. every agent sits somewhere on a spectrum of positional uncertainty, and their expected-utility calculation depends on where. for the trolley case, every reasonable point on that spectrum yields the same vote, because 5-to-1 swamps almost any prior. but for closer cases — a thousand mild headaches against one severe migraine, say — different points on the spectrum yield genuinely different answers. the framework predicts moral disagreement of exactly this shape rather than failing to resolve it. there is no single right answer to the aggregation problem; there are only individual answers, each consistent with preference sovereignty given that person's actual uncertainty.
rawls's critical error
rawls proposed that people behind the veil would adopt "maximin" — always choosing to protect the worst-off position. this is demonstrably false. people routinely accept small risks of bad outcomes in exchange for better expected value. maximin is empirically falsified by revealed preference. economist john harsanyi got this right: rational people behind the veil would maximize expected utility, not minimize worst-case outcomes.iii.
we can now identify the specific principle that makes this framework work.
this is what distinguishes utilitarian contractualism from other versions. it's not about what people could "reasonably" reject (scanlon), or what an impartial observer would choose (rawls), but about what people would actually prefer for themselves, selfishly, under uncertainty about their position. unanimous selfish preference under uncertainty just is what "ethical" means.
this formulation has a crucial advantage: it is empirically testable. whether everyone would prefer a given outcome for themselves is, in principle, a numerical question. "reasonableness," by contrast, is vague and unquantifiable — an untenable basis for ethics.
an ideal ethical framework becomes indistinguishable from pure selfishness when viewed from behind a veil of ignorance about one's identity.
a crucial distinction follows: ethics divides into intrinsic and instrumental components. intrinsic ethics are our basic preferences — purely subjective. instrumental ethics are about how to achieve those preferences — evaluable objectively, through evidence and reason.
the gap between our intrinsic preferences and the instrumental choices we make to satisfy them is not a flaw in this framework. it's simply the human condition. a doctor who follows the best available evidence and loses the patient anyway was not acting unethically. the underlying rule and the particular accuracy of any given judgment are different things.
iv.
to understand why ethics works this way, we need to understand where it came from. four billion years ago, molecular replicators emerged with two crucial properties: they could copy themselves, and those copies could contain mutations. everything else — from the beaks of finches to our deepest moral intuitions — flows from this.
as richard dawkins articulated in the selfish gene, genes are the fundamental unit of selection. they exist in proportion to their ability to get themselves copied. when we say genes "try" to get themselves copied or act "selfishly," we're using a helpful metaphor. genes aren't conscious agents making decisions; those that happen to have properties leading to more copies of themselves simply become more prevalent. pure mathematics and chemistry, not conscious intent.
the tautology
genes that make more copies become more prevalent. that's just what "selection" means. and since our brains are built by genes, our preferences — including our moral intuitions — are downstream of that selection pressure.the evolutionary story here is illustrative rather than load-bearing. it doesn't prove that preference sovereignty is correct — it shows why the framework is consistent with everything we know about where preferences come from. the stronger argument is the conceptual one: you simply cannot call something unethical if people would choose it for themselves.
note what this does not imply. explaining the evolutionary origins of moral intuitions does not commit the genetic fallacy. evolution gave us depth perception, which tracks real spatial relationships. whether moral intuitions similarly track something "real" is exactly the question at issue, and the evolutionary genealogy alone doesn't settle it.
v.
there are three exhaustive mechanisms that explain everything that looks like altruism. once you see them, the apparent moral fabric of the universe collapses into game theory and decision theory.
genes protecting probable copies of themselves. california ground squirrels sound alarm calls at personal risk because the squirrels that hear the warning likely carry the same genes. if the warning saves at least two close relatives, the "call" genes come out ahead even if the caller gets eaten.
fundamentally an instance of the iterated prisoner's dilemma. in our ancestral environment, helping others often meant helping yourself because they would likely reciprocate. our brains evolved to facilitate this cooperation through emotions like gratitude, guilt, and moral outrage.
these three are exhaustive. anything that looks like ethics is one of them — or it's misfiring empathy, which is a byproduct of (a) and (b) rather than a fourth category. our circuitry for kin and reciprocal altruism evolved in small bands of close relatives and repeat partners. modern environments fool that circuitry into firing at strangers, photographs, statistical lives, distant suffering. dawkins calls this the "lust to be nice" — an irrational drive to help others even when there's no possibility of reciprocation, like cuckoo birds exploiting the instincts of other species.
large-scale anonymous altruism — effective altruism, donations to disaster victims an ocean away, kidneys to strangers — is the hardest case for any account of moral behavior. this framework handles it cleanly: it's exactly what kin and reciprocal circuitry produce when run on inputs they never evolved to encounter. no fourth mechanism is needed.
recent work in evolutionary psychology has formalized this further. researchers like jean-baptiste andré and nicolas baumard have shown that human moral cognition is well-described by nash bargaining — maximizing the product of both parties' gains. evolution built us to intuitively calculate not just our own benefit but others' benefits and opportunity costs, because people who signal cooperation get selected as partners while defectors get excluded.
moral philosophy has largely proceeded in ignorance of evolutionary biology, debating the properties of moral intuitions without asking where those intuitions came from. once you take evolution seriously, deontological rules start to look like what they are — heuristics our brains use to approximate utility maximization, not discoveries of eternal truths.
vi.
a common objection to moral subjectivism: "if ethics is purely subjective, why would alien civilizations independently develop laws against murder? doesn't that convergence prove morality is objective?" this is a confusion between two fundamentally different levels of the ethical landscape.
instrumental convergence is the observation that almost any agent with almost any terminal goals will converge on certain intermediate strategies — self-preservation, resource acquisition, maintaining social cooperation — because these are prerequisites for achieving virtually any goal whatsoever. the convergence is in the game theory, not in some moral fabric of the universe.
aliens would have laws against murder → therefore "murder is wrong" is an objective moral fact written into the structure of reality.
murder-prohibition is an instrumentally convergent solution to a coordination problem that any social species with subjective preferences will independently discover — just as any species that needs to cross rivers will independently discover bridges.
and here is the devastating corollary: some people actually want to be killed. voluntary euthanasia, assisted suicide, martyrdom — if "murder is wrong" were an objective moral fact baked into the universe, there couldn't be exceptions. but there obviously are, because the preference against being killed is subjective. it's just nearly universal, which is exactly what instrumental convergence predicts.
vii.
consider an island inhabited entirely by psychopaths — individuals who are completely selfish and lack any capacity for empathy. you would still find laws against theft and murder, a court system, and even a system of taxation.
why? because these things benefit the psychopaths themselves. a law against theft reduces the risk of their property being stolen. a court system ensures disputes don't escalate into costly vendettas. a redistributive tax system that funds infrastructure, law enforcement, and defense protects them from external threats and prevents societal collapse.
the implication
even in a society of purely self-interested individuals, cooperation and shared rules naturally arise. what affects others ultimately affects us. we don't support laws and social systems out of altruism but because they maximize our own well-being when we account for the fact that we live among other people who can also harm or benefit us.this same logic explains why even rationally selfish individuals might support wealth redistribution. policies that reduce inequality and instability protect everyone from costly conflicts and societal collapse. redistributive systems are simply a more efficient way of avoiding outcomes like violent uprising. none of this requires true altruism — just rational self-interest playing out under different levels of uncertainty about our position.
viii.
this framework allows us to properly evaluate every competing ethical system. rule ethics and virtue ethics aren't fundamental measures of what's ethical; they're heuristic tools that can be evaluated by how well they respect preference sovereignty — by how well they approximate what rational people would choose behind the veil.
any rule or virtue that could override informed preferences must be fundamentally flawed. our tendency to rely on such heuristics makes evolutionary sense; every second spent calculating optimal decisions is a second you could be eaten by a predator.
a rule like "do not lie" is intrinsically ethical. violations are wrong regardless of consequences.
a rule like "do not lie" is a computationally cheap heuristic that produces good outcomes most of the time. when it doesn't, the rule is wrong, not the situation.
consequentialism gets closer to the truth by focusing on outcomes. it doesn't matter if you were killed intentionally or accidentally; you're still dead. the only extent to which intent matters is practical: someone who intentionally kills might be more likely to be a repeat offender. we can treat a malevolent person the same way we treat a defective household robot that goes on a killing spree. conscious intent is irrelevant; only outcomes matter.
ix.
david hume identified the is–ought problem: you can't derive statements about what "ought" to be from statements about what "is". but this entire problem dissolves once we realize there is no such thing as "ought" or "should" in any objective sense.
when someone says "you shouldn't kill people," they're not making a metaphysical claim about the universe. they're expressing a preference: "i would prefer a world where people don't kill each other". once we see this, is–ought isn't a deep philosophical puzzle. it's a category error that arises from treating preferences as though they were facts.
emotivism — the position that ethical statements are expressions of emotion rather than truth claims — gets tantalizingly close to this insight. saying "murder is wrong" is essentially saying "boo, murder." this is almost right, but the whole framework of emotivism becomes superfluous once we realize the basic truth it's gesturing at. it's like creating a special category called "hunger-ism" to explain why people eat food.
sam harris struggles because he fails to distinguish between intrinsic and instrumental ethics. he correctly points out that there are objective facts about how to achieve certain states of well-being. but he misses that the underlying preferences for those states are purely subjective. the is–ought gap isn't bridged; it was never there. there are only preferences, and strategies for satisfying them.
some argue that ethics must come from god. but this leads to euthyphro's dilemma: either ethical principles are inherent and god is merely their messenger, or god arbitrarily decided on them. any divine command that violates preference sovereignty — that demands what informed people would reject even with full knowledge and hindsight — cannot be truly ethical.
appendix.
a common objection to utilitarian frameworks is that we cannot compare utility across individuals — that your suffering and my suffering are incommensurable. this objection dissolves upon examination.
at the most fundamental level, the universe is physics. suffering is a physical process — patterns of neural activation, neurochemical cascades, measurable and comparable in principle. the entire practice of medical triage rests on the fact that we can make rough interpersonal comparisons: a broken femur is worse than a paper cut, and everyone knows it.
imagine two bags containing notes of various currencies — yen, euros, dollars. you must pick a random bill from one bag, knowing only their average bill size. you would choose the bag with the higher average, even though the currencies are incommensurable. it does not matter that you cannot convert yen to euros at a precise rate. the choice works in expectation.
applied to social welfare: if you are equally likely to be any person, your expected utility is the average utility across all people. you don't need a perfect conversion function. interpersonal utility comparison isn't a philosophical impossibility. it's something we do every day.
x.
this framework — utilitarian contractualism, grounded in preference sovereignty — resolves the traditional paradoxes of utilitarianism. the repugnant conclusion doesn't arise because we're evaluating from the perspective of an individual behind the veil, not from a god's-eye view optimizing an abstract quantity. the mere addition paradox similarly dissolves.
as for future generations — they will have preferences when they exist. until then, the only preferences in play are those of existing people. existing people often care deeply about their descendants and the world they'll inhabit, so concern for the future is already captured as a preference of currently existing agents. there is no orphaned moral obligation floating in the void, waiting for future people to claim it.
in the end, ethics isn't about discovering eternal truths or divine commands. it's a biological phenomenon that emerges from genetic selection and manifests as subjective preferences. once we see this clearly, the traditional philosophical debates dissolve into straightforward questions: what do people actually want, and what are the most effective means of getting it?
these questions are often enormously complex. but we are, at last, asking them in the right way.
there are only preferences, and strategies for satisfying them.