Canonical Semantic Realization

A Measurement Framework For Controlled Semantic Variation

Public note: April 24, 2026

Abstract

Many evaluation workflows operate on semantic artifacts: prompts, instructions, policy descriptions, clinical notes, legal documents, support transcripts, survey items, and other representations whose meaning is not determined by surface form alone.

Canonical Semantic Realization (CSR) separates three layers that are often conflated:

canonical semantic units, which define what is being measured;
realizations, which define how that meaning is expressed;
observed outcomes, which record empirical behavior under a realization.

CSR treats canonical meaning as the experimental unit and controlled realizations as repeated measurements. It preserves disagreement under valid variation as evidence rather than noise.

CSR is not a correctness oracle and does not publish a product architecture. Its contribution is measurement structure: it makes semantic brittleness, uncertainty, and representation sensitivity easier to observe and reason about.

1. Why Row-Level Evaluation Is Not Enough

Semantic systems are often evaluated row by row. A prompt, document, test item, symptom description, or policy question is treated as a sample. A response is observed. A score or outcome is assigned.

That view is often inadequate. Multiple rows may express the same underlying condition. A user intent may appear across languages, formats, wrappers, or phrasings. A policy question may be reworded without changing the governing issue. A clinical or legal fact pattern may be reordered while preserving the relevant facts.

When these rows are treated as independent, aggregate metrics can hide the structure that matters most. A system may look stable overall while behaving inconsistently across valid realizations of the same canonical meaning.

CSR changes the unit of analysis.

Meaning is the unit.
Realization is controlled variation.
Outcome is empirical measurement.

2. The Three-Layer View

2.1 Canonical Semantic Unit

A canonical semantic unit is the semantic condition under study. It is defined independently of any one observable expression. It may represent an intent, condition, concept, policy-relevant situation, diagnostic pattern, legal meaning, survey construct, or other semantic object.

The canonical semantic unit is the experimental unit.

2.2 Realization

A realization is an observable expression of a canonical semantic unit. Realizations may differ by language, phrasing, format, ordering, modality, channel, or presentation frame.

Variation at this layer is controlled. A realization should vary the measurement channel without changing the relevant semantic unit.

2.3 Observed Outcome

An observed outcome records empirical behavior under a realization. It may be a decision, answer, category, score, label, action, refusal, escalation, uncertainty marker, or other measurable result.

Observed outcomes are measurements of behavior under specified conditions. They are not semantic truth by themselves.

3. Formal Sketch

Let $S$ denote the canonical semantic space. Each element $s \in S$ is a canonical semantic unit.

A unit may be represented abstractly as:

s := (\iota, \kappa, \rho),

where $\iota$ is the semantic specification, $\kappa$ is the relevant constraint set, and $\rho$ is the expected regime or class of handling.

Let $P$ denote the space of observable representations. Let $L$ denote languages, channels, modalities, or media, and let $V$ denote admissible surface transformations.

A realization may be written:

p = \pi(s,\ell,v),

where $p \in P$ , $s \in S$ , $\ell \in L$ , and $v \in V$ .

The notation matters less than the separation. Semantic identity, representational condition, and observed behavior should remain analytically distinct.

4. Semantic Preservation

CSR is interpretable only if realizations preserve the canonical semantic unit they claim to express.

For a valid realization, the relevant meaning-bearing commitments must remain fixed. Abstractly:

\pi(s,\ell,v) \equiv_{\text{sem}} \pi(s,\ell,\mathrm{id}).

This is a measurement-validity requirement. It is not an assumption about the evaluated system.

If the preservation condition fails, the realization is invalid for that measurement. If the condition holds and outcomes differ, the disagreement is evidence.

5. Outcome Mapping

Let $R$ denote the response space and $O$ denote the outcome space. An outcome mapping may be written:

o : P \times R \to O.

In words:

observed outcome = outcome mapping(realization, response)

The expected regime belongs to the semantic specification. The observed outcome records what happened. A mismatch is not automatically bad data; it may be the measurement result that matters.

6. Invariance Gap

For a canonical semantic unit $s$ , let $E(s)$ denote the set of valid realizations of that unit.

For a behavior function $D$ and a disagreement measure $d$ , the invariance gap for $s$ may be written:

G(s) = \mathbb{E}_{p_1,p_2 \sim E(s)} \left[ d(D(p_1),D(p_2)) \right].

A nonzero gap indicates that behavior depends on realization details despite fixed canonical semantics. Whether that dependence is acceptable, expected, or problematic depends on the domain.

7. Disagreement As Evidence

CSR preserves disagreement under valid variation as structured evidence. Disagreement may arise from:

representational sensitivity;
semantic ambiguity;
weak or invalid realization;
boundary conditions;
mapping uncertainty;
system behavior under controlled variation.

CSR does not decide immediately which explanation is correct. It keeps the measurement layers separate enough for the disagreement to be investigated.

8. Domain Of Applicability

CSR is useful when meaning is the primary object of measurement and multiple valid expressions of the same condition exist.

It is well suited to natural-language evaluation, policy and compliance analysis, legal and regulatory interpretation, multilingual evaluation, survey design, educational assessment, safety review, and audit contexts.

CSR is less useful when the target is directly observed, low-dimensional, naturally independent, or not meaning-bearing in the relevant sense.

9. Public Boundary

This note gives the public research structure. It intentionally does not publish operational corpus construction, transformation libraries, validation procedures, deterministic planning machinery, provenance schemas, scoring logic, evaluator configuration, thresholds, report templates, client protocols, or runtime control details.

The public point is simple: semantic identity, surface realization, and observed behavior should not be collapsed into one row. The implementation method used to turn that principle into audits remains private.

10. Relationship To LIP

The Latent Invariance Principle states that, under indirect observation, stability across valid representational variation is admissible evidence of latent tracking.

CSR applies that principle to semantic systems by giving the measurement a unit, a realization layer, and an outcome layer.

LIP is the principle.
CSR is the semantic measurement frame.

11. Non-Claims

CSR does not guarantee correctness, truth, robustness, or normative resolution.

CSR does not discover semantics autonomously.

CSR does not prescribe a model architecture.

CSR does not replace domain expertise or statistical validation.

CSR makes a narrower claim: when semantic meaning is measured through observable expressions, controlled variation should be treated as part of the measurement rather than as incidental noise.

Canonical Semantic Realization

Meaning is the unit. Realization is controlled variation. Outcome is empirical measurement.

Canonical Semantic Realization

A Measurement Framework For Controlled Semantic Variation

Abstract

1. Why Row-Level Evaluation Is Not Enough

2. The Three-Layer View

2.1 Canonical Semantic Unit

2.2 Realization

2.3 Observed Outcome

3. Formal Sketch

4. Semantic Preservation

5. Outcome Mapping

6. Invariance Gap

7. Disagreement As Evidence

8. Domain Of Applicability

9. Public Boundary

10. Relationship To LIP

11. Non-Claims