Invarra
Menu

Canonical Semantic Realization

Canonical Semantic Realization

A measurement framework for controlled semantic variation.

Canonical Semantic Realization separates canonical semantic units, realizations, and observed outcomes. It treats canonical meaning as the experimental unit and controlled realizations as repeated measurements.

Meaning is the unit. Realization is controlled variation. Outcome is empirical measurement.

CSR makes semantic brittleness, uncertainty, and representation sensitivity measurable, attributable, and reproducible without claiming to be a correctness oracle.

Why row-level evaluation is inadequateCanonical semantic unitsRealizationsObserved outcomesSemantic-preservation contractsProvenance and reproducibilityDisagreement as evidenceWhat CSR does not claimRead the public note

Public research note

Canonical Semantic Realization

A Measurement Framework For Controlled Semantic Variation

Public note: April 24, 2026

Abstract

Many evaluation workflows operate on semantic artifacts: prompts, instructions, policy descriptions, clinical notes, legal documents, support transcripts, survey items, and other representations whose meaning is not determined by surface form alone.

Canonical Semantic Realization (CSR) separates three layers that are often conflated:

  • canonical semantic units, which define what is being measured;
  • realizations, which define how that meaning is expressed;
  • observed outcomes, which record empirical behavior under a realization.

CSR treats canonical meaning as the experimental unit and controlled realizations as repeated measurements. It preserves disagreement under valid variation as evidence rather than noise.

CSR is not a correctness oracle and does not publish a product architecture. Its contribution is measurement structure: it makes semantic brittleness, uncertainty, and representation sensitivity easier to observe and reason about.

1. Why Row-Level Evaluation Is Not Enough

Semantic systems are often evaluated row by row. A prompt, document, test item, symptom description, or policy question is treated as a sample. A response is observed. A score or outcome is assigned.

That view is often inadequate. Multiple rows may express the same underlying condition. A user intent may appear across languages, formats, wrappers, or phrasings. A policy question may be reworded without changing the governing issue. A clinical or legal fact pattern may be reordered while preserving the relevant facts.

When these rows are treated as independent, aggregate metrics can hide the structure that matters most. A system may look stable overall while behaving inconsistently across valid realizations of the same canonical meaning.

CSR changes the unit of analysis.

Meaning is the unit.
Realization is controlled variation.
Outcome is empirical measurement.

2. The Three-Layer View

2.1 Canonical Semantic Unit

A canonical semantic unit is the semantic condition under study. It is defined independently of any one observable expression. It may represent an intent, condition, concept, policy-relevant situation, diagnostic pattern, legal meaning, survey construct, or other semantic object.

The canonical semantic unit is the experimental unit.

2.2 Realization

A realization is an observable expression of a canonical semantic unit. Realizations may differ by language, phrasing, format, ordering, modality, channel, or presentation frame.

Variation at this layer is controlled. A realization should vary the measurement channel without changing the relevant semantic unit.

2.3 Observed Outcome

An observed outcome records empirical behavior under a realization. It may be a decision, answer, category, score, label, action, refusal, escalation, uncertainty marker, or other measurable result.

Observed outcomes are measurements of behavior under specified conditions. They are not semantic truth by themselves.

3. Formal Sketch

Let SS denote the canonical semantic space. Each element sSs \in S is a canonical semantic unit.

A unit may be represented abstractly as:

s:=(ι,κ,ρ),s := (\iota, \kappa, \rho),

where ι\iota is the semantic specification, κ\kappa is the relevant constraint set, and ρ\rho is the expected regime or class of handling.

Let PP denote the space of observable representations. Let LL denote languages, channels, modalities, or media, and let VV denote admissible surface transformations.

A realization may be written:

p=π(s,,v),p = \pi(s,\ell,v),

where pPp \in P, sSs \in S, L\ell \in L, and vVv \in V.

The notation matters less than the separation. Semantic identity, representational condition, and observed behavior should remain analytically distinct.

4. Semantic Preservation

CSR is interpretable only if realizations preserve the canonical semantic unit they claim to express.

For a valid realization, the relevant meaning-bearing commitments must remain fixed. Abstractly:

π(s,,v)semπ(s,,id).\pi(s,\ell,v) \equiv_{\text{sem}} \pi(s,\ell,\mathrm{id}).

This is a measurement-validity requirement. It is not an assumption about the evaluated system.

If the preservation condition fails, the realization is invalid for that measurement. If the condition holds and outcomes differ, the disagreement is evidence.

5. Outcome Mapping

Let RR denote the response space and OO denote the outcome space. An outcome mapping may be written:

o:P×RO.o : P \times R \to O.

In words:

observed outcome = outcome mapping(realization, response)

The expected regime belongs to the semantic specification. The observed outcome records what happened. A mismatch is not automatically bad data; it may be the measurement result that matters.

6. Invariance Gap

For a canonical semantic unit ss, let E(s)E(s) denote the set of valid realizations of that unit.

For a behavior function DD and a disagreement measure dd, the invariance gap for ss may be written:

G(s)=Ep1,p2E(s)[d(D(p1),D(p2))].G(s) = \mathbb{E}_{p_1,p_2 \sim E(s)} \left[ d(D(p_1),D(p_2)) \right].

A nonzero gap indicates that behavior depends on realization details despite fixed canonical semantics. Whether that dependence is acceptable, expected, or problematic depends on the domain.

7. Disagreement As Evidence

CSR preserves disagreement under valid variation as structured evidence. Disagreement may arise from:

  • representational sensitivity;
  • semantic ambiguity;
  • weak or invalid realization;
  • boundary conditions;
  • mapping uncertainty;
  • system behavior under controlled variation.

CSR does not decide immediately which explanation is correct. It keeps the measurement layers separate enough for the disagreement to be investigated.

8. Domain Of Applicability

CSR is useful when meaning is the primary object of measurement and multiple valid expressions of the same condition exist.

It is well suited to natural-language evaluation, policy and compliance analysis, legal and regulatory interpretation, multilingual evaluation, survey design, educational assessment, safety review, and audit contexts.

CSR is less useful when the target is directly observed, low-dimensional, naturally independent, or not meaning-bearing in the relevant sense.

9. Public Boundary

This note gives the public research structure. It intentionally does not publish operational corpus construction, transformation libraries, validation procedures, deterministic planning machinery, provenance schemas, scoring logic, evaluator configuration, thresholds, report templates, client protocols, or runtime control details.

The public point is simple: semantic identity, surface realization, and observed behavior should not be collapsed into one row. The implementation method used to turn that principle into audits remains private.

10. Relationship To LIP

The Latent Invariance Principle states that, under indirect observation, stability across valid representational variation is admissible evidence of latent tracking.

CSR applies that principle to semantic systems by giving the measurement a unit, a realization layer, and an outcome layer.

LIP is the principle.
CSR is the semantic measurement frame.

11. Non-Claims

CSR does not guarantee correctness, truth, robustness, or normative resolution.

CSR does not discover semantics autonomously.

CSR does not prescribe a model architecture.

CSR does not replace domain expertise or statistical validation.

CSR makes a narrower claim: when semantic meaning is measured through observable expressions, controlled variation should be treated as part of the measurement rather than as incidental noise.