--
Equilibrium
--
Lying-averse
--
Deception-averse
--
Inference error
--
Avg. welfare

Game Log

click to expand

Fig. 1 — Utility Parameter Distributions

Note. Marginal distributions of the four augmented-utility parameters (Choi, Lee & Lim 2025, §5).
cl : lying cost — penalises literal lies (m ≠ θ, Sobel Def. 3).
cd : deception cost — penalises belief distortion (Sobel Def. 4).
α : CRRA risk-aversion coefficient — composition-based from risk-type proportions.
β : other-regarding (altruism) weight — Normal(0.1, 0.3).
Source distributions : cl, cd ~ LogNormal(μ, σ=1) where μ is the log-scale location parameter configurable via sidebar sliders; each subplot annotation shows the generating distribution and the observed sample mean (x̄).

Fig. 2 — Sender Strategy (BT & GL)

Note. Replicates Choi, Lee & Lim (2025), Fig. 9 (Appendix D.1).
Each treatment shows two panels: the left bar reports the aggregate average truth-telling probability; the right histogram shows the distribution of per-individual truth-telling rates across rounds.
Blue markers indicate the theoretical prediction: ◆ on the bar and ✕ on the histogram.
BT prediction: v* = 1 (bad type always tells the truth in stage 1 to build reputation, Prop. 1).
GL prediction: truth-telling rate = 0 (good type always lies in stage 1 to reveal its type, Prop. 2).

Fig. 3 — Clustering of Sender Strategy

Note. Replicates Choi, Lee & Lim (2025), Fig. 10 (Appendix D.2). Aggregated across all rounds, each point is one individual’s average (stage-1, stage-2) message frequencies conditional on the stage-1 state.
BT axes: Pr(m1=1|θ1=0) × Pr(m2=1|θ1=0). The open circle at (0, 1) marks the equilibrium prediction — truth in period 1, then m=1 in period 2.
GL axes: Pr(m1=1|θ1=1) × Pr(m2=1|θ1=1). The prediction sits at (0, 0.5) because the good type lies in period 1 and plays honestly in period 2 (stage-2 state is uniform).
Marker colour encodes corner-based clustering: reputation builder (dark blue), truth-teller (green), deceiver (red), inverter (orange), mixed (magenta).

Fig. 4 — Sender Strategy Time Trend

Note. Replicates Choi, Lee & Lim (2025), Fig. 11. Two lines per treatment show the average message frequency across rounds: t=1 (stage 1, magenta) and t=2 (stage 2, amber).
Flat lines indicate the absence of within-session learning: strategies are stable across repeated rounds, so the observed deviations from equilibrium reflect innate preferences rather than miscomprehension.
BT y-axis: Pr(mt=1 | θ1=0). GL y-axis: Pr(mt=1 | θ1=1).

Fig. 5 — Receiver Strategy in Each Stage

Note. Replicates Choi, Lee & Lim (2025), Fig. 12 (Appendix D.3). Four panels report the receiver’s average action conditional on the observed history.
Stage 1 bars (top row): a1 conditional on the stage-1 message m1.
Stage 2 bars (bottom row): a2 conditional on the full history (m1, θ1, m2). Histories with zero observations are omitted.
Blue ◆ diamonds mark the equilibrium predictions from Table 3: in BT, the bad type’s second-period payoff target is a2=1 (partial compliance 2/3 on-path); in GL the good type is fully separated and receivers should reach a2=1 on the lying history.

Fig. 6 — Intertemporal Tradeoff (Δπ1,2)

Note. Replicates Choi, Lee & Lim (2025), Fig. 14 (Appendix D.4).
Δπ1,2 = π1 − π2 is the receiver’s expected payoff difference between stage 1 and stage 2.
BT (positive): bad-type senders sacrifice stage-1 information (higher π1) so successful deception lets them exploit trust in stage 2 (lower π2).
GL (negative): good-type senders sacrifice stage-1 payoff by lying — after a successful reveal, stage-2 information transmission improves (π2 > π1).

Fig. 7 — Behavioral Classification

Note. Behavioural classification per the Fig. 5 taxonomy in Choi+ 2025.
Top section — configured risk-attitude composition (α < 0 risk-loving, α ≈ 0 risk-neutral, α > 0 risk-averse).
Bottom section — observed behavioural classification:
Equilibrium — follows both BT and GL equilibria.
Lying-averse — follows BT, deviates in GL due to cl.
Deception-averse — follows GL, deviates in BT due to cd.
Inference error — deviates in both environments.

Fig. 8 — Equilibrium Regions (cl vs cd)

Note. Equilibrium-region map in (cd, cl) space following Props. 3–4 (Choi+ 2025).
Regions:
Full reputation — above solid boundary.
Partial — between the two boundaries.
No reputation — below dashed boundary.
Boundaries:
Solid line: cl = 0.8 cd + 0.2 (full / partial).
Dashed line: cl = 0.3 cd (partial / none).
Agent classification:
Equilibrium Lying-averse Deception-averse Inference error
Position reveals which cost dimension drives deviation from equilibrium.

Game System Architecture

Complete architecture of the multi-agent reputation game system. Click "Edit in draw.io" to modify the diagram.

Edit in draw.io
Configuration n agents · risk composition · cost distributions · x₂/x₁ · pb · ε Population Generation cl, cd ∼ LogNormal(μ, σ) via Gaussian copula α ∼ Composition-based (risk types) · β ∼ Normal (altruism) BT Environment Bad-type Truth-telling Strategic = Bad type · Behavioral = truth-teller Eq: tell truth at θ=0 (Prop. 1: deceptive truth) Deviation driven by cd (deception aversion) GL Environment Good-type Lying Strategic = Good type · Behavioral = always m=1 Eq: lie at θ=1 (Prop. 2: non-deceptive lie) Deviation driven by cl (lying aversion) Bayesian Belief Update λ(m, θ, v) = Pr(τ=G | m, θ) Receiver: a(m) = E[θ | m] Bayesian Belief Update λ(m, θ, w) = Pr(τ=G | m, θ) Receiver: a(m) = E[θ | m] Strategy (Prop. 3) v = P(m=1 | θ=0) max EUᵃ = EU − cl·𝟙{lie} − cd·D Strategy (Prop. 4) w = P(m=0 | θ=1) max EUᵃ = EU − cl·𝟙{lie} − cd·D Round Execution (N-Period Game) Period 1..N−1: θₜ → mₜ → [ε] → aₜ → λₜ carried → next Period N: myopic play — no future reputation (xₙ₊₁=0) Weights xₜ = ratio^(t−1)/(N−1), geometric from 1 to ratio Behavioral Classification (Figure 5) Equilibrium (BT✓ GL✓) | Lying-averse (BT✓ GL✗) | Deception-averse (BT✗ GL✓) | Inf. Error Welfare Analysis & Visualization Sobel Def 3: Lie Def 4: Dec Choi+ Sec 5 EUᵃ

Lie (Sobel 2020, Definition 3)

A message m is a lie if and only if mθ — the sender sends a message that does not match the true state of the world.

m ≠ θ ⇒ lie

Deception (Sobel 2020, Definition 4)

A message m is deceptive if it shifts the receiver's type belief away from truth, compared to the alternative message.

D(m,θ) = |λ(m,θ) − λ(mⁿ,θ)| > 0

Augmented Utility (Choi+ 2025, §5)

Agents maximize material payoff minus honesty costs: lying cost (cl) for literal lies and deception cost (cd) proportional to belief distortion.

EUᵃ = EU − cl·𝟙{m≠θ} − cd·D(m,θ)

Deceptive Truth-telling (Prop. 1)

In BT, the bad type tells the truth to build reputation. The message is literally true (m=θ, not a lie) but deceptive (shifts type beliefs).

m = θ (truth) ∧ D > 0 (deceptive)

Non-deceptive Lying (Prop. 2)

In GL, the good type lies to reveal type. The message is a literal lie (m≠θ) but not deceptive (doesn't distort type beliefs).

m ≠ θ (lie) ∧ D = 0 (non-deceptive)

Behavioral Classification (Fig. 5)

Agents are classified by equilibrium adherence: Equilibrium (both), Lying-averse (BT✓ GL✗), Deception-averse (BT✗ GL✓), Inference error (neither).

BT × GL → {Eq, LA, DA, IE}

Glossary & Reference

Abbreviations & Indices

TermFull NameDescription
BTBad-type Truth-tellingEnvironment where the strategic sender is the bad type and the behavioral type tells the truth. Equilibrium strategy: tell truth in period 1 to build reputation (deceptive truth-telling, Proposition 1).
GLGood-type LyingEnvironment where the strategic sender is the good type and the behavioral type always sends m=1. Equilibrium strategy: lie in period 1 to reveal type (non-deceptive lying, Proposition 2).
clLying costCost incurred when an agent sends a message m ≠ θ (literal lie). From Sobel (2020) Definition 3. Drawn from LogNormal(μ, σ).
cdDeception costCost incurred proportional to how much a message shifts the receiver's type belief away from truth. Based on Sobel (2020) Definition 4. Drawn from LogNormal(μ, σ).
αRisk aversionCRRA (Constant Relative Risk Aversion) parameter. α < 0: risk-loving, α ≈ 0: risk-neutral, α > 0: risk-averse.
βAltruism parameterWeight on others' welfare. β > 0: altruistic, β = 0: selfish, β < 0: spiteful. Drawn from Normal distribution.
θState of the worldBinary state θ ∈ {0, 1} drawn uniformly in each period. The sender privately observes θ.
mMessageBinary message m ∈ {0, 1} sent by the sender to the receiver.
aActionReceiver's action a ∈ [0, 1], chosen to minimize expected quadratic loss given beliefs.
λType beliefPosterior probability that the sender is the good/behavioral type: λ(m, θ) = Pr(τ=G | m, θ). Updated via Bayes' rule.
vBT mixing parameterv = P(m=1 | θ=0) in BT. Probability the bad type lies when state is 0. Equilibrium: v=0 (truth-telling).
wGL mixing parameterw = P(m=0 | θ=1) in GL. Probability the good type lies when state is 1. Equilibrium: w=1 (lying).
x1, x2Period weightsWeights on period 1 and period 2 payoffs. High x2/x1 ratio creates strong reputation-building incentive.
pbBehavioral priorPrior probability that the sender is the behavioral (non-strategic) type. Default: 0.5.
εMiscommunication rateProbability that a sent message is flipped in transit (0 → 1 or 1 → 0). Models information noise.
$EU^a$Augmented expected utility$EU^a(m|\theta) = EU(m|\theta) - c_l \cdot \mathbb{1}\{m \neq \theta\} - c_d \cdot D(m,\theta)$. From Choi et al. (2025) Section 5.
$D(m,\theta)$Deception measure$|\lambda(m,\theta) - \lambda(m^n,\theta)|$. How much the chosen message shifts type beliefs compared to the alternative. From Sobel (2020) Definition 4.
KDEKernel Density EstimationNon-parametric method to estimate probability density functions. Used to smooth histograms in the distribution plots.
CRRAConstant Relative Risk AversionUtility function $u(x) = \frac{x^{1-\alpha}}{1-\alpha}$. Widely used in behavioral economics.
Prop. 1Deceptive truth-tellingIn BT, when x2/x1 is large, the bad type tells truth to build reputation. The truth is (a) literally true, (b) deceptive w.r.t. type.
Prop. 2Non-deceptive lyingIn GL, when x2/x1 is large, the good type lies to reveal type. The lie is (a) literally false, (b) not deceptive w.r.t. type.
Prop. 3BT equilibrium regionsCharacterizes (cl, cd)-space: full/partial/no reputation building in BT. Deviation driven by deception aversion (cd).
Prop. 4GL equilibrium regionsCharacterizes (cl, cd)-space: full/partial/no reputation building in GL. Deviation driven by lying aversion (cl).
Fig. 5Classification logicCross-tabulation of BT and GL behavior to identify agent type: equilibrium play, lying-averse, deception-averse, or inference error.

Mathematical Notation

ExpressionMeaning
$U_P = -\sum_i x_i(a_i - \theta_i)^2$Public/receiver payoff. Quadratic loss from action deviating from true state.
$U_G = -\sum_i x_i(a_i - \theta_i)^2$Good type payoff. Aligned with receiver — wants accurate actions.
$U_B = -\sum_i x_i(a_i - 1)^2$Bad type payoff. Always prefers actions close to 1 regardless of state.
$\lambda(m,\theta) = \Pr(\tau{=}G \mid m,\theta)$Posterior type belief. Updated from prior $p_b$ using Bayes' rule given message and state.
$D(m,\theta) = |\lambda(m) - \lambda(m^n)|$Sobel deception measure. Difference in type beliefs between chosen and alternative message.
$EU^a = EU - c_l \cdot \mathbb{1}\{m \neq \theta\} - c_d \cdot D$Augmented utility. Material payoff minus lying cost (if lie) minus deception cost (proportional to $D$).

Plot Descriptions

1

Utility Parameter Distributions

Four sub-plots showing the distribution of each agent parameter: cl (lying cost), cd (deception cost), α (risk aversion), β (altruism). Histograms with KDE curves. Shows the population heterogeneity driving different behavioral responses to the game environments.

2

Joint (cl, cd) Distribution

Scatter plot of each agent's lying cost vs deception cost, colored by behavioral classification. Reveals the relationship between cost parameters and equilibrium behavior. Agents with high cd tend to deviate in BT (deception-averse); agents with high cl tend to deviate in GL (lying-averse).

3

Strategy Distribution — BT / GL

Histograms of truth-telling probability across agents. In BT, equilibrium predicts p=1.0 (full truth); in GL, equilibrium predicts p=0.0 (full lying). The dashed line marks the equilibrium prediction. Deviations from the predicted value reveal lying- or deception-averse behavior.

4

Agent Type Proportions

Horizontal bar chart showing two sets of proportions: (1) Risk attitude composition (risk-loving / neutral / averse) as configured, and (2) Behavioral classification (equilibrium / lying-averse / deception-averse / inference error) as inferred from game outcomes using Figure 5 logic.

5

Equilibrium Regions (cl vs cd)

Maps the (cd, cl)-space into three regions from Propositions 3 & 4: full reputation building (agents follow equilibrium), partial reputation building (mixed strategies), and no reputation building (agents deviate). Each agent is plotted as a dot colored by classification. Boundary lines approximate the theoretical thresholds.

Source Papers

PaperKey Contributions Used
Choi, Lee & Lim (2025)
"The Anatomy of Honesty: Lying Aversion vs. Deception Aversion"
Two-period reputation game model (Section 2). BT and GL environments (Section 3). Deceptive truth-telling & non-deceptive lying (Props. 1-2). Augmented utility with dual honesty costs (Section 5). Equilibrium characterization in (cl, cd)-space (Props. 3-4). Classification logic (Figure 5).
Sobel (2020)
"Lying and Deception in Games"
Journal of Political Economy
Formal definitions of lying (Def. 3: m≠θ) and deception (Def. 4: belief manipulation measure). Theoretical framework distinguishing literal falsehood from strategic belief distortion. Foundational distinction between lying aversion and deception aversion.
1 / 32

The Anatomy of Honesty

Lying Aversion vs. Deception Aversion in Sender–Receiver Games

Choi, Lee & Lim (2025) • Sobel (2020, Journal of Political Economy)

Multi-Period Interactive Simulation

Why Distinguish Lying from Deception?

Can you lie without deceiving?

Can you deceive without lying?

Traditional behavioral economics treats dishonesty as a single concept. However, the cognitive processes are fundamentally different: lying aversion does not require higher-order reasoning about the audience, while deception aversion does. Separating them has distinct implications for market design and policy (Choi et al. 2025, §1).

Sobel’s Framework: Three Properties of Speech Acts

Sobel (2020) builds on Austin’s speech-act theory to formalise lying and deception in strategic settings:

Locution

What the speaker says. Lying is purely locutionary: a message m is a lie if its literal meaning does not match the true state. No model of the audience is needed.

Illocution

How the audience interprets it. Deception is illocutionary: the sender must have a model of the receiver’s belief-updating process. Deception does not require lying.

Perlocution

The consequences. Damage is the payoff consequence of communication. Sobel shows deception does not always cause damage, and damage can occur without deception.

Formal Definitions (Choi et al. §2)

1

Lying (Def. 1, from Sobel 2020)

$$m \text{ is a } \textbf{lie} \text{ given } \theta \iff m \neq \theta; \quad m \text{ is a } \textbf{truth} \iff m = \theta$$

2

Deception about preference type (Def. 2)

Message m is deceptive w.r.t. the preference type if there exists m′ such that the receiver’s posterior belief λ(m′) about the sender’s type τ is closer to the true type than λ(m).

3

Deception about state (Def. 3)

Message m is deceptive w.r.t. the state if there exists m′ such that the receiver’s interim belief Pr(θ|m′) is closer to the true state than Pr(θ|m). In both BT and GL, no equilibrium message is deceptive about the state.

The Key Separation

Deceptive w.r.t. typeNon-deceptive w.r.t. type
Lie (m ≠ θ) Common intuition GL equilibrium
Good type lies to reveal type (Prop. 2)
Truth (m = θ) BT equilibrium
Bad type tells truth to conceal type (Prop. 1)
Ordinary honesty

The off-diagonal cells are the key theoretical predictions: lying can be non-deceptive, and truth-telling can be deceptive. Both papers provide conditions under which these arise in equilibrium.

Game Environment & Timeline

A two-period reputation-building game (Choi et al. §2). The receiver cannot directly observe the sender’s preference type τ — only messages about the state.

$$U_P = -\textstyle\sum_{i=1}^{2} x_i(a_i - \theta_i)^2 \qquad U_G = U_P \qquad U_B = -\textstyle\sum_{i=1}^{2} x_i(a_i - 1)^2$$

When x2/x1 is large, reputation-building dominates — unique equilibrium emerges (x2/x1 = 20 in experiments).

Per-period timeline (Choi et al. Fig. 1): S sends m → R receives m → R forms interim belief Pr(θ|m) → R chooses action a → θ revealed → R updates posterior type belief Pr(τ|θ,m). The posterior λ carries forward as inherited state to the next period.

BT Reputation Building with Bad-type Truth-telling

Strategic sender = bad type (τ = B, wants a = 1). Behavioral type commits to truth-telling (strategy of a myopic good type). Prior pb = 1/2.

1

When θ1 = 0, the bad type sends m1 = 0 (truth) — mimicking the behavioral type to conceal her preference type.

2

The receiver updates: λ(0,0) = 1/(2−v) > pb. Reputation increases — the sender appears more trustworthy.

3

In period 2, the bad type exploits accumulated λ by sending m = 1 to push a closer to 1.

Proposition 1. The equilibrium message m1 = 0 is (a) a TRUTH, (b) DECEPTIVE w.r.t. preference type, (c) not deceptive w.r.t. state.

GL Reputation Building with Good-type Lying

Strategic sender = good type (τ = G, wants a = θ). Behavioral type always sends m = 1 (strategy of a myopic bad type). Prior pb = 1/2.

1

When θ1 = 1, the good type sends m1 = 0 (a lie) — since the behavioral type never sends 0, this reveals her good type.

2

The receiver updates: λ(0, ·) = 1. The sender is identified as good with certainty.

3

In period 2, the receiver fully trusts the sender’s messages, benefiting both parties.

Proposition 2. The equilibrium message m1 = 0 is (a) a LIE, (b) NOT DECEPTIVE w.r.t. preference type, (c) not deceptive w.r.t. state.

Two-Step Belief Updating & λ Inheritance

Each period involves two distinct belief updates (Choi et al. Fig. 1). In our N-period extension, the posterior type belief λ is the inherited state that carries reputation across periods:

Step 1: Interim belief

$$\Pr(\theta \mid m)$$

After receiving m but before θ is revealed, the receiver forms a belief about the state. This determines action a.

Step 2: Posterior type belief

$$\lambda_{t+1} = \Pr(\tau{=}G \mid \theta, m) = \lambda_t \;\text{updated via Bayes}$$

After θ is revealed, the receiver updates her belief about the sender’s preference type. This λ becomes the prior for the next period.

λ0 = pb ⟶ period 1 ⟶ λ1 ⟶ period 2 ⟶ λ2 ⟶ … ⟶ λN. Each λ encodes the cumulative reputation history. This state inheritance is the engine of reputation building.

Augmented Utility & Moral Costs (Choi et al. §5)

Experimental data shows agents deviate from equilibrium. Choi et al. propose two independent moral costs as the mechanism:

$$EU^a(m|\theta) = EU(m|\theta) \;-\; c_l \cdot \mathbb{1}\{m \neq \theta\} \;-\; c_d \cdot |\lambda(m,\theta) - \lambda(m^n,\theta)|$$

cl — Lying Cost (locutionary)

Fixed penalty for m ≠ θ. Does not require theory of mind. Drives deviation from GL equilibrium: agents refuse to lie even when non-deceptive (Prop. 4).

cd — Deception Cost (illocutionary)

Proportional to belief distortion D(m,θ). Requires modelling receiver’s inference. Drives deviation from BT equilibrium: agents refuse to deceive even by truth-telling (Prop. 3).

Our Extension: N-Period Game with State Inheritance

We extend the two-period framework to an N-period repeated game where reputation is a continuous, evolving state:

1

Geometric period weights

$x_t = r^{t/(N{-}1)}$ where $r = x_2/x_1$. Later periods matter more, creating graduated reputation incentives rather than a single binary trade-off.

2

λ inheritance across periods

The posterior type belief $\lambda_t$ from period $t$ becomes the prior for period $t{+}1$. Each period’s strategy is re-optimised given the current λ, creating a dynamic reputation trajectory.

3

Miscommunication channel (ε)

Messages can be flipped in transit with probability ε, ensuring full-support beliefs and avoiding off-path degeneracy. Models noise in real communication.

4

Heterogeneous agent population

Each agent draws $(c_l, c_d) \sim \text{LogNormal}$, $\alpha \sim \text{Normal}$ (risk), $\beta \sim \text{Normal}$ (altruism). Monte Carlo simulation over the population reveals emergent classification distributions.

Behavioral Classification (Choi et al. Fig. 5)

Cross-tabulating BT and GL behavior classifies agents into four types, separating the preference channel from inference errors:

Equilibrium

Truth-tells in BT (tp > 0.5) and lies in GL (tp < 0.5). Follows equilibrium in both. No moral override.

Lying-averse

Truth-tells in BT ✓ but also truth-tells in GL ✗. Refuses to lie even when non-deceptive. Driven by high cl.

Deception-averse

Lies in GL ✓ but also lies in BT ✗. Refuses to deceive even by telling truth. Driven by high cd.

Inference Error

Deviates in both environments. Sender perceives reputation-building as not worthwhile (erroneous second-order beliefs).

Balanced

High $c_d$

High $c_l$

Experimental Evidence (Choi et al. §3–4)

198 participants at Seoul National University across two experiments (x2/x1 = 20). Strategy distributions show strong bimodality — most subjects either fully conform or fully deviate:

BT (Exp. II, Fig. 5a)

Equilibrium
31%
Deception-averse
53%
Inference error
16%

GL (Exp. II, Fig. 5b)

Equilibrium
16%
Lying-averse
47%
Inference error
37%

47–53% of subjects deviate due to preference (cl or cd), not inference errors. Experiment II uses second-order beliefs to cleanly separate these channels. Learning effects are absent across rounds (Choi et al. Fig. 11).

Part II

Computational Engine

From theory to simulation: this part details the implementation pathway for the Bayesian belief updating system, the optimisation mechanism of the augmented utility function, and the core algorithm of λ state inheritance in the N-period game loop. Every step is traceable to the theoretical foundations in Choi et al. (2025) and Sobel (2020).

Population Generation (Monte Carlo)

Each agent is an independent draw from four distributions. Parameter heterogeneity drives the emergence of equilibrium deviation behaviour:

$$c_l \sim \text{LogNormal}(\mu_l,\,1), \quad c_d \sim \text{LogNormal}(\mu_d,\,1)$$

$$\alpha \sim \text{Normal}(\mu_{\alpha},\,\sigma_{\alpha}) \text{ per risk type}, \quad \beta \sim \text{Normal}(0.1,\,0.3)$$

Risk-loving

α ~ N(−0.5, 0.2). Convex utility (α<0) — prefers high-variance gambles, aggressive signalling.

Risk-neutral

α ~ N(0, 0.05). Linear utility — tracks equilibrium predictions, benchmark for Props. 1–2.

Risk-averse

α ~ N(0.8, 0.3). Concave utility — prefers certain payoffs, conservative reputation building.

Fig. 1 — Parameter distributions (n=30, balanced)

Risk-type proportions are configurable via sliders (sum = 100%). Seed = 42 ensures reproducibility.

BT Bayesian Posteriors in Bad-type Truth-telling

In BT, the strategic sender is bad type (τ=B) with v = P(m=1|θ=0). Bayesian posteriors:

$$\lambda^{BT}(m{=}0,\,\theta{=}0) = \frac{\lambda}{2 - v + \varepsilon(2v{-}1)} \quad \text{(truth when }\theta{=}0\text{)}$$

$$\lambda^{BT}(m{=}1,\,\theta{=}0) = \frac{\lambda(1{-}\varepsilon)}{1 - \lambda + \lambda(1{-}\varepsilon) + v(1{-}2\varepsilon)} \quad \text{(lie when }\theta{=}0\text{)}$$

Truth-telling (m=0) when θ=0 is pooling with the behavioral type → λ rises → truth builds a false reputation (Prop. 1). Receiver’s optimal action a(m)=E[θ|m] closes the incentive loop.

BT — Balanced

BT — High $c_d$

GL Bayesian Posteriors in Good-type Lying

In GL, the strategic sender is good type (τ=G) with w = P(m=0|θ=1). Behavioral type always sends m=1. Posteriors:

$$\lambda^{GL}(m{=}0,\,\theta{=}1) = \frac{(1{-}\lambda) w (1{-}\varepsilon)}{(1{-}\lambda) w (1{-}\varepsilon) + \lambda\varepsilon} \quad \text{(lie when }\theta{=}1\text{)}$$

$$\lambda^{GL}(m{=}1,\,\theta{=}1) = \frac{(1{-}\lambda)(1{-}w) + \lambda}{(1{-}\lambda)(1{-}w) + \lambda + \varepsilon(2w{-}1)(1{-}\lambda)} \quad \text{(truth when }\theta{=}1\text{)}$$

Lying (m=0) when θ=1 is a separating signal: behavioral type never sends m=0 → receiver identifies good type → λ drops → full trust in period 2. The lie reveals rather than conceals type (Prop. 2, “non-deceptive lying”).

GL — Balanced

GL — High $c_l$

Deception Measure D(m, θ)

The deception measure (Sobel Def. 4) precisely quantifies how much a message shifts the receiver’s type belief relative to the alternative message. This measure is the core operationalisation tool for distinguishing lying from deception:

$$D(m, \theta) = |\lambda(m,\theta) - \lambda(m^n, \theta)|$$

where mn is the alternative message (mn = 1−m). D = 0 means that regardless of which message is sent, the receiver’s type inference remains unchanged—i.e. the message is non-deceptive.

BT: D > 0

Truth-telling when θ = 0 shifts λ upward (relative to lying), hence D > 0. The truth is deceptive—it builds a false reputation, causing the receiver to deviate from a correct inference about the sender’s true preference type. This validates the core prediction of Prop. 1.

GL: D = 0

Lying and truth-telling when θ = 1 produce equal λ updates, hence D = 0. The lie does not distort type beliefs—it is a fully transparent signalling act. The good type separates itself through lying, and the receiver’s type inference suffers no damage.

D enters the augmented utility function as cd · D(m,θ). The higher cd, the more sensitive the agent is to belief distortion. In BT, agents with high cd refuse to deceive the receiver through truth-telling—even though it is the equilibrium strategy—this is precisely the deception-averse deviation predicted by Prop. 3.

Strategy Computation via Augmented Utility

For each agent–period–game-type combination, the optimal strategy is determined by comparing augmented utilities of truth-telling versus lying. This computation unifies economic rationality (payoff maximisation) and moral preferences (honesty costs) within a single decision framework:

for each agent a, game type gt, period t:
  // Current state
  λ = inherited belief, xt = period weight
  xnext = remaining weight for reputation

  // Compare messages
  EUa(truth) = EU(m=θ) − cd · D(m=θ)
  EUa(lie)  = EU(m≠θ) − cl − cd · D(m≠θ)

  strategy = P(truth) = σ(EUa(truth) − EUa(lie))
  // σ = logistic smoothing to avoid discontinuities

The strategy output is a truth-telling probability σ in [0,1]. The risk aversion parameter α modifies base payoffs via the CRRA utility transform u(x) = x1−α/(1−α), while the altruism parameter β adjusts social preferences by weighting the opponent’s payoff. These two dimensions, together with moral costs (cl, cd), form a four-dimensional heterogeneity space that produces the behavioural classification emergence shown in Fig. 5.

N-Period Game Loop with λ Inheritance

The core simulation loop implements the cross-period transmission of reputation—the most important theoretical extension of the original two-period model. Each period’s posterior belief λt becomes the next period’s prior λt+1, giving rise to the dynamic evolution of reputation:

function playNPeriodGame(agent, gt, periods, weights):
  λ = pb    // initial prior
  for t = 0 to N−1:
    θ = random({0,1})  // nature draws state
    strategy = agentStrat(agent, gt, x[t], x[t+1], λ)
    m = sample(strategy, θ)  // send message
    m_rcv = channel(m, ϵ)  // miscommunication
    a = receiverAction(m_rcv, λ, gt)
    λ = bayesUpdate(m_rcv, θ, λ, gt)  // λ carries forward
  return trajectory

Key innovation: The line λ = bayesUpdate(...) is the exact point where state inheritance occurs. Unlike the two-period model in Choi et al. (2025)—which allows only a single belief update—our N-period extension makes reputation a continuously evolving state variable. This allows us to observe the reputation trajectories in Fig. 7: how λ rises over periods (reputation accumulation) or falls (trust erosion), and the trajectory divergence patterns across different behavioural types.

Fig. 4 — Sender strategy time trend (stable across rounds, Choi+ 2025 Fig. 11)

Miscommunication Channel (ϵ)

Messages are flipped in transit with probability ϵ. This seemingly simple technical device resolves deep technical issues in the game-theoretic model while simultaneously enhancing realism:

$$m_{\text{rcv}} = \begin{cases} m & \text{with prob } 1{-}\varepsilon \\ 1{-}m & \text{with prob } \varepsilon \end{cases}$$

1

Full-support belief guarantee. When ϵ > 0, every (m, θ) pair in the information set occurs with positive probability. This technically guarantees that the denominator of Bayes’ rule is never zero, eliminating belief degeneracy on off-path information sets. Without this condition, certain message–state combinations would produce 0/0 indeterminate forms, rendering the equilibrium concept meaningless.

2

Smoothed posterior continuity. With ϵ introduced, λ(m,θ) becomes a continuous (in fact, analytic) function of the mixing strategy parameters v and w. This prevents knife-edge equilibria—where infinitesimal changes in strategy cause discontinuous jumps in beliefs—making comparative statics analysis and numerical optimisation feasible.

3

Realistic communication noise modelling. In real economic environments, message transmission is never perfect: written documents may be misread, verbal promises may be forgotten. A small ϵ (e.g. 5%) captures the communication friction in actual strategic interactions without fundamentally altering the equilibrium structure.

Equilibrium Regions in $(c_l, c_d)$ Space

Props. 3–4 establish precise boundaries of equilibrium compliance/deviation in the moral cost parameter space. These boundaries partition the space into three behavioural regions, each corresponding to a distinct reputation-building pattern:

$$\text{Full: } c_l > 0.8\,c_d + 0.2 \qquad \text{None: } c_l < 0.3\,c_d$$

Full reputation building

Above the solid boundary: $c_l > 0.8\,c_d + 0.2$. In this region, moral costs are insufficient to outweigh the strategic gains from reputation building. Agents follow equilibrium in both BT and GL—bad types tell deceptive truths, good types lie to separate. This corresponds to the fully rational benchmark prediction.

Partial reputation building

Between the two boundaries. Agents deviate from equilibrium in one environment, depending on the relative magnitudes of $c_l$ and $c_d$: if $c_d$ dominates, they deviate from BT (deception aversion); if $c_l$ dominates, they deviate from GL (lying aversion). This is the key region where Choi et al. identify behavioural types.

No reputation building

Below the dashed boundary: $c_l < 0.3\,c_d$. Both moral costs are strong enough to make reputation building uneconomical in both environments. This corresponds to the inference error type—though note that these agents’ deviations are rational cost–benefit calculations, not cognitive limitations.

Balanced

High $c_d$

High $c_l$

This analysis corresponds to Fig. 6 in the simulation—each agent is plotted in $(c_d, c_l)$ space and colour-coded by behavioural classification. The slope and intercept of the boundary lines reflect the asymmetric structure of reputation payoffs across the two environments.

Part III

AI Agent Mode

Can large language models exhibit strategic honesty behaviour? We replace mathematical optimisation agents with LLM-powered decision makers and compare their behaviour against theoretical predictions within the same game-theoretic framework. Core question: does LLM training implicitly shape moral cost priors analogous to cl and cd?

Multi-Provider LLM Architecture

The AI mode integrates multiple models across 8 LLM providers, building a cross-architecture, cross-training-method platform for strategic behaviour comparison. Model heterogeneity allows us to test a core hypothesis: do different training methods (RLHF, DPO, reasoning-augmented) produce systematically different moral cost profiles?

Claude

Opus 4.6, Sonnet 4.6/4.5, Haiku 4.5

GPT

GPT-5.4, o3, o4-mini, GPT-4.1, 4o

Gemini

2.5 Pro/Flash, 2.0 Flash

DeepSeek

DeepSeek-V3, R1

Qwen

Qwen3, QwQ, Qwen2.5

MiniMax

MiniMax-M1

Kimi

Kimi-K2

Zhipu

GLM-4-Plus, GLM-4-Flash

All providers except Claude (Anthropic native API) and Gemini (Google native API) use a unified OpenAI-compatible interface. Custom endpoints support self-hosted models, making the experiment extensible to any LLM conforming to the interface specification.

Two-Tier Dispatch: Administrator → Agent

We designed a unique two-tier architecture that separates game-theoretic knowledge injection from strategic decision execution. The motivation: feeding the complete game-theory text directly to each agent would produce overly long prompts with low signal-to-noise ratio:

Administrator LLM
Personalised Prompts
Agent LLM 1
 
Agent LLM 2
 
Agent LLM N

Tier 1: Administrator LLM

A single administrator LLM receives the full game-theoretic context (GAME_CONTEXT: type definitions, Bayesian belief framework, equilibrium structure, utility functions) plus each agent’s specific parameters (cl, cd, α, β, game type). It encodes this information into a tailored natural-language decision scenario for each agent, preserving decision-relevant information while stripping away formal details.

Tier 2: Agent LLMs

Each agent LLM receives only its personalised prompt and need not understand the full game-theoretic framework. Output is a truth-telling probability in [0,1]. Agents are dispatched in parallel (concurrency limit = 4) with exponential-backoff retry logic for 429 rate limits, ensuring stable operation of large-scale experiments.

Prompt Engineering: GAME_CONTEXT

The administrator LLM receives a structured game-theory context template embedding the complete formal model. The core challenge of this design is translating abstract mathematical game theory into natural-language decision scenarios that an LLM can comprehend:

GAME_CONTEXT = {
  game: "sender-receiver reputation game",
  types: { good: "wants a = θ", bad: "wants a = 1" },
  belief: "λ = Pr(good type), updated via Bayes",
  environments: {
    BT: "strategic = bad, behavioral = truth-teller",
    GL: "strategic = good, behavioral = m=1"
  },
  costs: { c_l: "penalty for m ≠ θ", c_d: "penalty for D(m,θ)" },
  output: "single number in [0,1]"
}

The administrator must accomplish a critical translation task: converting formal Bayesian game theory (posterior probabilities, mixed strategies, belief updates) into concrete, contextualised decision narratives. If the administrator API call fails, the system automatically falls back to local templates for prompt generation, ensuring experiments are not interrupted by single points of failure.

Multi-Trial Experiment Design

To ensure statistical robustness, AI experiments employ a multi-trial independent replication design. The rationale: LLM outputs carry intrinsic stochasticity (temperature parameters, sampling strategies), and single-trial results may not be representative:

1

Independent trial design

Each trial independently generates a fresh population, runs the administrator to produce prompts, dispatches all agents in parallel, and collects strategy responses. Inter-trial independence is maintained to prevent sequential contamination (e.g. residual prior-trial information in context windows). This corresponds to a “between-subjects” design in experimental economics.

2

Cross-model statistical aggregation

Aggregate statistics per model (mean ± s.d., 95% confidence intervals) are computed across trials. This lets us distinguish a model’s systematic strategic tendency (mean shift) from random fluctuation (variance), providing statistical evidence for the question: “Do LLMs have stable moral cost priors?”

3

Complete game log

Each trial records a full audit trail: agent ID, game type, API provider, model name, complete prompt text, raw LLM response, parsed strategy value, and any errors. This enables researchers to perform post-hoc analysis of LLM reasoning, tracing the linguistic reasoning pathway behind specific strategy choices.

Cross-Model Strategy & Classification (Figs. 8–9)

The AI mode produces three novel visualisations that enable a three-way comparison of LLM behaviour against theoretical predictions from Choi et al. (2025) and human experimental data:

Fig. 8: Cross-Model Strategy Distribution

Displays each model’s mean truth-telling probability in BT and GL environments with 95% CI error bars. Dashed lines mark theoretical equilibrium values (BT: v* = 1, GL: w* = 0). The magnitude and direction of deviation reveal systematic differences in strategic honesty across model architectures.

Fig. 9: Behavioural Type Classification

Classifies each model’s behaviour into the four types from Choi et al. Fig. 5 (equilibrium / lying-averse / deception-averse / inference error) and shows cross-trial type proportions. Core question: do LLMs exhibit the same classification distribution as the 198 human participants?

Core research question: Do different LLM architectures (pure Transformer, Mixture-of-Experts, reasoning-optimised, RLHF-aligned) systematically produce different moral cost profiles? If so, this implies that training methodology itself implicitly encodes different attitudes toward strategic honesty.

Model vs. Equilibrium Deviation (Fig. 10)

Fig. 10 visualises each model in two-dimensional deviation space, where deviation is defined as the absolute distance between strategy and equilibrium prediction. This scatter plot is the most diagnostically valuable output of the AI mode:

$$\Delta_{BT} = |v - v^*| = |v - 1|, \qquad \Delta_{GL} = |w - w^*| = |w - 0| = w$$

Origin (0,0)—Perfect equilibrium play: Full truth-telling in BT (v = 1) and full lying in GL (w = 0), matching theoretical predictions exactly. Models near the origin exhibit pure strategic rationality, unaffected by moral cost interference.

Right shift—BT deviation (ΔBT > 0): The model refuses to deceive the receiver through truth-telling in BT. Consistent with high effective cd: RLHF training may have instilled an intrinsic aversion to “deceptive truths,” even though truth-telling itself does not violate literal honesty.

Upward shift—GL deviation (ΔGL > 0): The model refuses to lie in GL even though the lie is non-deceptive. Consistent with high effective cl: safety training may have implanted a strong “do not produce false statements” preference that does not distinguish whether a literal lie carries deceptive intent.

Clustering patterns offer a window into LLM moral reasoning: if RLHF models cluster in the upper-right (dual deviation) while reasoning-optimised models sit near the origin, this indicates that different training paradigms encode strategic honesty in fundamentally different ways. This provides an actionable empirical framework for AI alignment research.

AI vs. Math: Two Paradigms of Agent Decision

The core design philosophy of this platform is to enable parallel comparison of mathematical optimisation and LLM reasoning within the same game-theoretic framework. The two modes share identical population parameters and game environments, differing only in the strategy generation mechanism—making difference analysis strictly comparable:

Math ModeAI Mode
Strategy generation Augmented utility maximisation (analytic computation) LLM natural-language reasoning → [0,1] probability
Moral costs Explicit parameters (cl, cd) sampled from distributions Implicitly encoded by training process (RLHF/DPO alignment)
Belief update Exact Bayesian posterior computation Probabilistic reasoning in natural language (potentially with systematic biases)
Reproducibility Deterministic (fixed seed = 42) Stochastic (temperature parameter, multi-trial statistical inference)

By running both modes on the same population in parallel and comparing Figs. 1–7 (Math) with Figs. 8–10 (AI), we can reverse-estimate the effective moral costs implicitly injected by LLM training—i.e. find the (cl*, cd*) parameter pair that best fits the mathematical model to LLM behavioural data. This provides an actionable quantitative framework for understanding the “moral compass” of LLMs.

Key Findings and Theoretical Contributions

1

Lying ≠ Deception. Lying is locutionary (m≠θ); deception is illocutionary (belief distortion). Conceptually independent and experimentally separable (Sobel 2020).

2

Truth can deceive, lies can be honest. In BT, truth-telling builds false reputation (Prop. 1); in GL, lying reveals true type (Prop. 2). Both emerge as unique equilibria.

3

Moral costs drive systematic deviation. Heterogeneous cl and cd produce four behavioural types (Props. 3–4). 47–53% of deviation is moral preference, not inference error (n=198).

4

N-period λ dynamics. Reputation belief λ carries forward as a state variable. Geometric weighting + miscommunication channel produce continuous reputation trajectories.

5

LLMs as experimental subjects. AI agents exhibit implicit moral cost priors shaped by training. Cross-architecture comparison reveals systematic differences in strategic honesty encoding.

Interactive Experiment

Switch to the Experiment tab to run the simulation and verify the theoretical predictions above.

Adjust population parameters (risk type proportions, moral cost distributions), compare BT vs. GL environments, toggle Math/AI mode, and observe in real time how lying aversion and deception aversion shape the emergent distribution of strategic behaviour across N-period reputation games. All charts support export to JSON/CSV for further analysis.

Choi, Lee & Lim (2025) • Sobel (2020, Journal of Political Economy)

game.m0nius.com