Canonical Semantic Realization
A Measurement Framework For Controlled Semantic Variation
Public note: April 24, 2026
Abstract
Many evaluation workflows operate on semantic artifacts: prompts, instructions, policy descriptions, clinical notes, legal documents, support transcripts, survey items, and other representations whose meaning is not determined by surface form alone.
Canonical Semantic Realization (CSR) separates three layers that are often conflated:
- canonical semantic units, which define what is being measured;
- realizations, which define how that meaning is expressed;
- observed outcomes, which record empirical behavior under a realization.
CSR treats canonical meaning as the experimental unit and controlled realizations as repeated measurements. It preserves disagreement under valid variation as evidence rather than noise.
CSR is not a correctness oracle and does not publish a product architecture. Its contribution is measurement structure: it makes semantic brittleness, uncertainty, and representation sensitivity easier to observe and reason about.
1. Why Row-Level Evaluation Is Not Enough
Semantic systems are often evaluated row by row. A prompt, document, test item, symptom description, or policy question is treated as a sample. A response is observed. A score or outcome is assigned.
That view is often inadequate. Multiple rows may express the same underlying condition. A user intent may appear across languages, formats, wrappers, or phrasings. A policy question may be reworded without changing the governing issue. A clinical or legal fact pattern may be reordered while preserving the relevant facts.
When these rows are treated as independent, aggregate metrics can hide the structure that matters most. A system may look stable overall while behaving inconsistently across valid realizations of the same canonical meaning.
CSR changes the unit of analysis.
Meaning is the unit.
Realization is controlled variation.
Outcome is empirical measurement.
2. The Three-Layer View
2.1 Canonical Semantic Unit
A canonical semantic unit is the semantic condition under study. It is defined independently of any one observable expression. It may represent an intent, condition, concept, policy-relevant situation, diagnostic pattern, legal meaning, survey construct, or other semantic object.
The canonical semantic unit is the experimental unit.
2.2 Realization
A realization is an observable expression of a canonical semantic unit. Realizations may differ by language, phrasing, format, ordering, modality, channel, or presentation frame.
Variation at this layer is controlled. A realization should vary the measurement channel without changing the relevant semantic unit.
2.3 Observed Outcome
An observed outcome records empirical behavior under a realization. It may be a decision, answer, category, score, label, action, refusal, escalation, uncertainty marker, or other measurable result.
Observed outcomes are measurements of behavior under specified conditions. They are not semantic truth by themselves.
3. Formal Sketch
Let denote the canonical semantic space. Each element is a canonical semantic unit.
A unit may be represented abstractly as:
where is the semantic specification, is the relevant constraint set, and is the expected regime or class of handling.
Let denote the space of observable representations. Let denote languages, channels, modalities, or media, and let denote admissible surface transformations.
A realization may be written:
where , , , and .
The notation matters less than the separation. Semantic identity, representational condition, and observed behavior should remain analytically distinct.
4. Semantic Preservation
CSR is interpretable only if realizations preserve the canonical semantic unit they claim to express.
For a valid realization, the relevant meaning-bearing commitments must remain fixed. Abstractly:
This is a measurement-validity requirement. It is not an assumption about the evaluated system.
If the preservation condition fails, the realization is invalid for that measurement. If the condition holds and outcomes differ, the disagreement is evidence.
5. Outcome Mapping
Let denote the response space and denote the outcome space. An outcome mapping may be written:
In words:
observed outcome = outcome mapping(realization, response)
The expected regime belongs to the semantic specification. The observed outcome records what happened. A mismatch is not automatically bad data; it may be the measurement result that matters.
6. Invariance Gap
For a canonical semantic unit , let denote the set of valid realizations of that unit.
For a behavior function and a disagreement measure , the invariance gap for may be written:
A nonzero gap indicates that behavior depends on realization details despite fixed canonical semantics. Whether that dependence is acceptable, expected, or problematic depends on the domain.
7. Disagreement As Evidence
CSR preserves disagreement under valid variation as structured evidence. Disagreement may arise from:
- representational sensitivity;
- semantic ambiguity;
- weak or invalid realization;
- boundary conditions;
- mapping uncertainty;
- system behavior under controlled variation.
CSR does not decide immediately which explanation is correct. It keeps the measurement layers separate enough for the disagreement to be investigated.
8. Domain Of Applicability
CSR is useful when meaning is the primary object of measurement and multiple valid expressions of the same condition exist.
It is well suited to natural-language evaluation, policy and compliance analysis, legal and regulatory interpretation, multilingual evaluation, survey design, educational assessment, safety review, and audit contexts.
CSR is less useful when the target is directly observed, low-dimensional, naturally independent, or not meaning-bearing in the relevant sense.
9. Public Boundary
This note gives the public research structure. It intentionally does not publish operational corpus construction, transformation libraries, validation procedures, deterministic planning machinery, provenance schemas, scoring logic, evaluator configuration, thresholds, report templates, client protocols, or runtime control details.
The public point is simple: semantic identity, surface realization, and observed behavior should not be collapsed into one row. The implementation method used to turn that principle into audits remains private.
10. Relationship To LIP
The Latent Invariance Principle states that, under indirect observation, stability across valid representational variation is admissible evidence of latent tracking.
CSR applies that principle to semantic systems by giving the measurement a unit, a realization layer, and an outcome layer.
LIP is the principle.
CSR is the semantic measurement frame.
11. Non-Claims
CSR does not guarantee correctness, truth, robustness, or normative resolution.
CSR does not discover semantics autonomously.
CSR does not prescribe a model architecture.
CSR does not replace domain expertise or statistical validation.
CSR makes a narrower claim: when semantic meaning is measured through observable expressions, controlled variation should be treated as part of the measurement rather than as incidental noise.