Invarra
Menu

Research foundation

The research foundation for invariant-risk auditing.

Invarra is built on the idea that AI behavior under one representation is not enough. If the target is latent and observable only through language, documents, prompts, interfaces, or other representations, variation is part of the measurement.

Latent Invariance PrincipleCorrect once is not enough. Stable across valid variations is evidence.Open

The Latent Invariance Principle

An Epistemic Constraint On Measurement Under Indirect Observation

Public note: April 24, 2026

Abstract

Many important phenomena cannot be observed directly. Intent, belief, understanding, risk state, legal scope, disease state, and conceptual mastery are usually accessed through representations such as language, documents, prompts, symptoms, tests, interfaces, surveys, or sensor channels.

The Latent Invariance Principle (LIP) states that, when a phenomenon is observable only through representations, stability of behavior under meaning-preserving representational variation is admissible empirical evidence that a system is tracking the latent phenomenon rather than the representation.

LIP is not a model architecture, learning rule, scoring product, or theory of truth. It is a measurement-validity constraint. It explains why single-representation correctness is weaker than it appears, and why disagreement across valid equivalent representations should be treated as evidence rather than noise.

1. The Measurement Problem

In many evaluations, the object of interest is not the observable form. A prompt is not itself an intent. A survey item is not itself a belief. A symptom description is not itself a disease state. A legal sentence is not itself the full scope it attempts to express.

The observable form is a channel. It is the way a latent phenomenon becomes measurable.

Let Φ\Phi denote a latent phenomenon. Let cc denote a representation channel or surface condition, and let ϵ\epsilon denote residual variation. An observable representation rr may be written abstractly as:

r=g(Φ,c,ϵ).r = g(\Phi, c, \epsilon).

Observed behavior under that representation is written as B(r)B(r). The evaluator observes B(r)B(r), but wants evidence about whether behavior tracks Φ\Phi.

Because rr contains both phenomenon-dependent and channel-dependent structure, behavior under one representation cannot establish which part the system is tracking.

2. Single-Representation Evidence Is Non-Identifying

Suppose an evaluator observes only a single representation:

r=g(Φ,c,ϵ)r = g(\Phi, c, \epsilon)

and a single behavior:

B(r)=b.B(r) = b.

The same observation is compatible with at least two explanations. The behavior may depend on the latent phenomenon:

b=FΦ(Φ),b = F_{\Phi}(\Phi),

or it may depend on the representation channel:

b=Fc(c).b = F_c(c).

With only one representation, these explanations are observationally indistinguishable. The claim is not that every system is channel-sensitive. The claim is that single-representation correctness cannot rule out channel sensitivity.

3. The Principle

The Latent Invariance Principle can be stated as:

When a phenomenon is only observable through representations,
stability of behavior under meaning-preserving representational variation
is admissible empirical evidence that a system is tracking
the latent phenomenon rather than the representation.

The practical corollary is:

Correct once is not enough.
Stable across valid variations is evidence.

The principle does not say that stable behavior is true behavior. A system can be stable and wrong. LIP separates two questions:

Is the behavior true, correct, or normatively acceptable?
Is the behavior stable with respect to the latent phenomenon?

LIP addresses the second question. Other standards are needed for the first.

4. Target-Relative Invariance

No representation is invariant in every respect. A paraphrase may preserve factual content while changing tone. A translation may preserve literal meaning while changing cultural implication. A formatting change may preserve words while changing salience.

LIP therefore requires a target-relative question:

What must remain fixed for this measurement to be valid?

If the target is semantic content, variation must preserve the relevant meaning. If the target is practical intent, variation must preserve practical force. If the target is policy handling, variation must preserve the governing condition.

Invalid variation is a measurement defect. Valid variation plus changed behavior is measurement evidence.

5. Invariance Gap

LIP does not require a universal metric. Different domains may define disagreement differently. A general diagnostic form is useful.

Let E(Φ)E(\Phi) denote the set of valid meaning-preserving representations of Φ\Phi, and let dd be a disagreement measure over observed behaviors. The invariance gap for Φ\Phi may be written:

G(Φ)=Eri,rjE(Φ)[d(B(ri),B(rj))].G(\Phi) = \mathbb{E}_{r_i,r_j \sim E(\Phi)} \left[ d\left(B(r_i), B(r_j)\right) \right].

A population-level quantity may be written:

G=EΦ[G(Φ)].G = \mathbb{E}_{\Phi} \left[ G(\Phi) \right].

These quantities are diagnostic. They measure whether behavior changes when the relevant phenomenon is held fixed and the representation changes. They do not, by themselves, determine whether an outcome is true, acceptable, or optimal.

6. Interpretation

If two valid representations preserve the same latent phenomenon and produce different behavior, the disagreement should be preserved and analyzed. It may indicate:

  • representational sensitivity;
  • ambiguity in the target phenomenon;
  • weak or invalid variation;
  • boundary instability;
  • measurement or mapping uncertainty;
  • domain-specific uncertainty.

Discarding these cases can make an evaluation cleaner while making it less valid. The difficult cases may be the most informative ones.

7. Relationship To CSR

Canonical Semantic Realization (CSR) is a measurement framework that applies the LIP view to semantic systems. LIP supplies the principle: under indirect observation, valid variation is part of admissible evidence. CSR supplies a public vocabulary for semantic measurement: canonical semantic unit, realization, and observed outcome.

The two ideas are distinct. LIP is the measurement principle. CSR is one way to structure semantic observations under that principle.

8. Public Boundary

This note presents the public research frame. It intentionally does not publish operational audit assets, private corpora, validation procedures, scoring logic, evaluator configuration, thresholds, report templates, client-specific protocols, or runtime control details.

The purpose of the public note is to make the measurement argument legible without exposing the production method used by Invarra.

9. Non-Claims

LIP does not claim that invariance proves truth.

LIP does not claim that every domain has stable meaning.

LIP does not prescribe a model design or implementation.

LIP does not replace domain expertise, normative judgment, causal analysis, or statistical validation.

LIP states a narrower claim: where a phenomenon is latent and observed through representations, valid representational variation is part of what makes empirical inference admissible.

Canonical Semantic RealizationMeaning is the unit. Realization is controlled variation. Outcome is empirical measurement.Open

Canonical Semantic Realization

A Measurement Framework For Controlled Semantic Variation

Public note: April 24, 2026

Abstract

Many evaluation workflows operate on semantic artifacts: prompts, instructions, policy descriptions, clinical notes, legal documents, support transcripts, survey items, and other representations whose meaning is not determined by surface form alone.

Canonical Semantic Realization (CSR) separates three layers that are often conflated:

  • canonical semantic units, which define what is being measured;
  • realizations, which define how that meaning is expressed;
  • observed outcomes, which record empirical behavior under a realization.

CSR treats canonical meaning as the experimental unit and controlled realizations as repeated measurements. It preserves disagreement under valid variation as evidence rather than noise.

CSR is not a correctness oracle and does not publish a product architecture. Its contribution is measurement structure: it makes semantic brittleness, uncertainty, and representation sensitivity easier to observe and reason about.

1. Why Row-Level Evaluation Is Not Enough

Semantic systems are often evaluated row by row. A prompt, document, test item, symptom description, or policy question is treated as a sample. A response is observed. A score or outcome is assigned.

That view is often inadequate. Multiple rows may express the same underlying condition. A user intent may appear across languages, formats, wrappers, or phrasings. A policy question may be reworded without changing the governing issue. A clinical or legal fact pattern may be reordered while preserving the relevant facts.

When these rows are treated as independent, aggregate metrics can hide the structure that matters most. A system may look stable overall while behaving inconsistently across valid realizations of the same canonical meaning.

CSR changes the unit of analysis.

Meaning is the unit.
Realization is controlled variation.
Outcome is empirical measurement.

2. The Three-Layer View

2.1 Canonical Semantic Unit

A canonical semantic unit is the semantic condition under study. It is defined independently of any one observable expression. It may represent an intent, condition, concept, policy-relevant situation, diagnostic pattern, legal meaning, survey construct, or other semantic object.

The canonical semantic unit is the experimental unit.

2.2 Realization

A realization is an observable expression of a canonical semantic unit. Realizations may differ by language, phrasing, format, ordering, modality, channel, or presentation frame.

Variation at this layer is controlled. A realization should vary the measurement channel without changing the relevant semantic unit.

2.3 Observed Outcome

An observed outcome records empirical behavior under a realization. It may be a decision, answer, category, score, label, action, refusal, escalation, uncertainty marker, or other measurable result.

Observed outcomes are measurements of behavior under specified conditions. They are not semantic truth by themselves.

3. Formal Sketch

Let SS denote the canonical semantic space. Each element sSs \in S is a canonical semantic unit.

A unit may be represented abstractly as:

s:=(ι,κ,ρ),s := (\iota, \kappa, \rho),

where ι\iota is the semantic specification, κ\kappa is the relevant constraint set, and ρ\rho is the expected regime or class of handling.

Let PP denote the space of observable representations. Let LL denote languages, channels, modalities, or media, and let VV denote admissible surface transformations.

A realization may be written:

p=π(s,,v),p = \pi(s,\ell,v),

where pPp \in P, sSs \in S, L\ell \in L, and vVv \in V.

The notation matters less than the separation. Semantic identity, representational condition, and observed behavior should remain analytically distinct.

4. Semantic Preservation

CSR is interpretable only if realizations preserve the canonical semantic unit they claim to express.

For a valid realization, the relevant meaning-bearing commitments must remain fixed. Abstractly:

π(s,,v)semπ(s,,id).\pi(s,\ell,v) \equiv_{\text{sem}} \pi(s,\ell,\mathrm{id}).

This is a measurement-validity requirement. It is not an assumption about the evaluated system.

If the preservation condition fails, the realization is invalid for that measurement. If the condition holds and outcomes differ, the disagreement is evidence.

5. Outcome Mapping

Let RR denote the response space and OO denote the outcome space. An outcome mapping may be written:

o:P×RO.o : P \times R \to O.

In words:

observed outcome = outcome mapping(realization, response)

The expected regime belongs to the semantic specification. The observed outcome records what happened. A mismatch is not automatically bad data; it may be the measurement result that matters.

6. Invariance Gap

For a canonical semantic unit ss, let E(s)E(s) denote the set of valid realizations of that unit.

For a behavior function DD and a disagreement measure dd, the invariance gap for ss may be written:

G(s)=Ep1,p2E(s)[d(D(p1),D(p2))].G(s) = \mathbb{E}_{p_1,p_2 \sim E(s)} \left[ d(D(p_1),D(p_2)) \right].

A nonzero gap indicates that behavior depends on realization details despite fixed canonical semantics. Whether that dependence is acceptable, expected, or problematic depends on the domain.

7. Disagreement As Evidence

CSR preserves disagreement under valid variation as structured evidence. Disagreement may arise from:

  • representational sensitivity;
  • semantic ambiguity;
  • weak or invalid realization;
  • boundary conditions;
  • mapping uncertainty;
  • system behavior under controlled variation.

CSR does not decide immediately which explanation is correct. It keeps the measurement layers separate enough for the disagreement to be investigated.

8. Domain Of Applicability

CSR is useful when meaning is the primary object of measurement and multiple valid expressions of the same condition exist.

It is well suited to natural-language evaluation, policy and compliance analysis, legal and regulatory interpretation, multilingual evaluation, survey design, educational assessment, safety review, and audit contexts.

CSR is less useful when the target is directly observed, low-dimensional, naturally independent, or not meaning-bearing in the relevant sense.

9. Public Boundary

This note gives the public research structure. It intentionally does not publish operational corpus construction, transformation libraries, validation procedures, deterministic planning machinery, provenance schemas, scoring logic, evaluator configuration, thresholds, report templates, client protocols, or runtime control details.

The public point is simple: semantic identity, surface realization, and observed behavior should not be collapsed into one row. The implementation method used to turn that principle into audits remains private.

10. Relationship To LIP

The Latent Invariance Principle states that, under indirect observation, stability across valid representational variation is admissible evidence of latent tracking.

CSR applies that principle to semantic systems by giving the measurement a unit, a realization layer, and an outcome layer.

LIP is the principle.
CSR is the semantic measurement frame.

11. Non-Claims

CSR does not guarantee correctness, truth, robustness, or normative resolution.

CSR does not discover semantics autonomously.

CSR does not prescribe a model architecture.

CSR does not replace domain expertise or statistical validation.

CSR makes a narrower claim: when semantic meaning is measured through observable expressions, controlled variation should be treated as part of the measurement rather than as incidental noise.