Specification v1.0

Portable AI Memory (PAM) — Specification v1.0

Status: Published
Date: 2026-02-17
Authors: Daniel Gines [email protected] (https://github.com/danielgines)
License: CC BY 4.0 (specification), Apache 2.0 (schema and reference implementation)

Abstract

This document defines Portable AI Memory (PAM), an interchange format for user memories generated by AI assistants. PAM enables the portability of user context, preferences, knowledge, and conversation history across any LLM provider without vendor lock-in or semantic data loss.

PAM is to AI memory what vCard is to contacts and iCalendar is to events: a universal interchange format that decouples user data from specific implementations.

1. Introduction

1.1 Problem Statement

AI assistants (ChatGPT, Claude, Gemini, Grok, etc.) accumulate knowledge about users over time — preferences, expertise, projects, goals, and behavioral patterns. This knowledge is stored in proprietary, undocumented formats with no interoperability between providers. Users cannot:

Migrate their AI context when switching providers
Maintain a unified identity across multiple AI assistants
Audit, correct, or manage memories systematically
Own and control their AI-generated personal knowledge

1.2 Solution

PAM defines a standardized JSON interchange format with:

A closed taxonomy of memory types
Full provenance tracking (which platform, conversation, and method produced each memory)
Temporal lifecycle management (creation, validity, supersession, archival)
Confidence scoring with decay models
Content hashing for deterministic deduplication
A semantic relations graph between memories
Access control for multi-agent and federation scenarios
Optional embeddings as a separate companion file
Integrity verification for corruption and tampering detection

1.3 Scope

PAM is an interchange format, not a storage format. Implementations SHOULD use databases (SQLite, PostgreSQL, vector databases, graph databases) for internal storage and MUST support export and import using this format.

1.4 Terminology

The key words “MUST”, “MUST NOT”, “REQUIRED”, “SHALL”, “SHOULD”, “SHOULD NOT”, “MAY”, and “OPTIONAL” in this document are to be interpreted as described in RFC 2119.

2. Format Overview

A PAM export consists of one required file and optional companion files:

File	Required	Description
`memory-store.json`	Yes	Main interchange file with memories, relations, conversations index, and integrity block
`conversations/*.json`	No	Individual conversation files referenced by `conversations_index[].storage.ref`
`embeddings.json`	No	Separate file containing vector embeddings for memory objects

Each file type is validated against its own JSON Schema Draft 2020-12 schema:

schemas/portable-ai-memory.schema.json — validates memory-store.json
schemas/portable-ai-memory-conversation.schema.json — validates conversation files (see §25)
schemas/portable-ai-memory-embeddings.schema.json — validates embeddings.json

3. Root Structure

{
  "schema": "portable-ai-memory",
  "schema_version": "1.0",
  "spec_uri": "https://portable-ai-memory.org/spec/v1.0",
  "export_id": "e47ac10b-58cc-4372-a567-0e02b2c3d479",
  "exported_by": "system-name/1.0.0",
  "export_date": "2026-02-15T22:00:00Z",
  "owner": {
    ...
  },
  "memories": [
    ...
  ],
  "relations": [
    ...
  ],
  "conversations_index": [
    ...
  ],
  "integrity": {
    ...
  },
  "export_type": "full",
  "type_registry": "https://portable-ai-memory.org/types/",
  "signature": {
    ...
  }
}

3.1 Required Root Fields

Field	Type	Description
`schema`	string	MUST be `"portable-ai-memory"`
`schema_version`	string	Semantic version. Current: `"1.0"`
`owner`	object	Owner identification
`memories`	array	Array of memory objects

3.2 Optional Root Fields

Field	Type	Description
`spec_uri`	string\|null	URI or URN of the specification version. Implementations MUST NOT require `spec_uri` to resolve over network
`export_id`	string\|null	Unique identifier for this export (UUID v4). Enables tracking and duplicate detection
`exported_by`	string\|null	System that generated the export. Format: `"name/semver"`
`export_date`	string	ISO 8601 timestamp of export
`relations`	array	Semantic relationships between memories
`conversations_index`	array	Lightweight conversation references
`integrity`	object	Integrity verification block
`export_type`	string	`"full"` or `"incremental"`. Default: `"full"` (Section 16)
`base_export_id`	string\|null	For incremental exports: `export_id` of the base export (Section 16)
`since`	string\|null	For incremental exports: ISO 8601 cutoff timestamp (Section 16)
`type_registry`	string\|null	URI of official type registry (Section 19)
`signature`	object\|null	Cryptographic signature (Section 18)

4. Memory Object

The memory object is the fundamental unit of the format. Each memory represents a discrete piece of knowledge about the user.

4.1 Required Fields

Field	Type	Description
`id`	string	Unique identifier. SHOULD be UUID v4
`type`	string	Memory type from closed taxonomy (Section 5)
`content`	string	Natural language content. Primary semantic payload
`content_hash`	string	SHA-256 of normalized content (Section 6)
`temporal`	object	Temporal metadata. `created_at` is required
`provenance`	object	Origin metadata. `platform` is required

4.2 Conditional Required Fields

Field	Condition	Description
`custom_type`	REQUIRED when `type == "custom"`	Custom type identifier. MUST be null when type is not `"custom"`

4.3 Optional Fields

Field	Type	Default	Description
`status`	string	`"active"`	Lifecycle status (Section 7)
`summary`	string\|null	null	Short summary for display
`tags`	array	[]	Lowercase tags. Pattern: `^[a-z0-9][a-z0-9_-]*$`
`confidence`	object	—	Confidence scoring (Section 8)
`access`	object	—	Access control (Section 9)
`embedding_ref`	string\|null	null	Reference to embeddings file (Section 12)
`metadata`	object	—	Additional metadata (Section 10)

5. Memory Type Taxonomy

PAM defines a closed taxonomy of memory types. The taxonomy is extensible via the "custom" type.

Type	Description
`fact`	Objective, verifiable information about the user
`preference`	User preference, taste, or stated desire
`skill`	Competency, expertise, or demonstrated ability
`context`	Situational or temporal context
`relationship`	Relation to another person, entity, or organization
`goal`	Active objective or aspiration
`instruction`	How the user wants to be treated or addressed
`identity`	Personal identity information
`environment`	Technical or physical environment details
`project`	Active project or initiative
`custom`	Extensible type. REQUIRES `custom_type` field

5.1 Custom Type Rule

IF type == "custom" THEN custom_type MUST be a non-empty string
IF type != "custom" THEN custom_type MUST be null

Example:

{
  "type": "custom",
  "custom_type": "security_clearance"
}

6. Content Hash Normalization

The content_hash field enables deterministic deduplication across exports from different platforms.

6.1 Normalization Pipeline

normalize(content):
  1. Trim leading and trailing whitespace
  2. Convert to lowercase
  3. Apply Unicode NFC normalization
  4. Collapse multiple consecutive spaces to a single space

6.2 Hash Generation

content_hash = "sha256:" + hex(SHA-256(UTF-8(normalize(content))))

6.3 Format

Pattern: ^sha256:[a-f0-9]{64}$

Example: "sha256:a1b2c3d4e5f6a7b8c9d0e1f2a3b4c5d6e7f8a9b0c1d2e3f4a5b6c7d8e9f0a1b2"

7. Memory Lifecycle

Each memory has a status field tracking its lifecycle state.

Status	Description
`active`	Current and valid. Default state
`superseded`	Replaced by a newer memory. `temporal.superseded_by` SHOULD reference the replacement
`deprecated`	Still valid but no longer prioritized
`retracted`	Explicitly invalidated by the user
`archived`	Retained for historical purposes only

7.1 Lifecycle Transitions

active → superseded    (new information replaces old)
active → deprecated    (relevance diminished)
active → retracted     (user explicitly invalidates)
active → archived      (user archives for history)
superseded → archived  (historical retention)
deprecated → retracted (user explicitly invalidates)
deprecated → archived  (historical retention)

8. Confidence Scoring

The confidence block contains system-computed scores. This is NOT user-defined priority.

Field	Type	Description
`initial`	number [0.0, 1.0]	Confidence at time of extraction
`current`	number [0.0, 1.0]	Current confidence after decay and reinforcement
`decay_model`	string\|null	Decay model: `"time_linear"`, `"time_exponential"`, `"none"`, or null
`last_reinforced`	string\|null	ISO 8601 timestamp of last reinforcement

8.1 Decay Models

time_linear: Confidence decreases linearly with time since last reinforcement
time_exponential: Confidence decreases exponentially with time
none: No automatic decay (e.g., identity facts)

The specific decay rate is implementation-defined. PAM records the model, not the parameters.

9. Access Control

The access block enables multi-agent and federation scenarios.

Field	Type	Default	Description
`visibility`	string	`"private"`	`"private"`, `"shared"`, or `"public"`
`exportable`	boolean	true	Whether this memory may be included in exports
`shared_with`	array	[]	List of access grants

9.1 Access Grant

Each grant specifies an entity and its permissions:

{
  "entity": "agent-work-assistant",
  "permissions": [
    "read"
  ]
}

Permissions: "read", "write", "delete".

10. Metadata

The metadata block contains non-semantic additional properties. This block allows additionalProperties for extensibility.

Field	Type	Description
`language`	string\|null	BCP 47 language tag. Pattern: `^[a-z]{2,3}(-[A-Z][a-z]{3})?(-[A-Z]{2})?$`
`domain`	string\|null	Knowledge domain (e.g., `"technical"`, `"personal"`, `"professional"`)

Implementations MAY add custom fields to this block.

11. Provenance

The provenance block enables auditability and cross-platform conflict resolution.

11.1 Required Fields

Field	Type	Description
`platform`	string	Source platform identifier

11.2 Optional Fields

Field	Type	Description
`platform_user_id`	string\|null	User ID on source platform
`conversation_ref`	string\|null	Reference to conversations_index entry
`message_ref`	string\|null	Reference to specific message
`extraction_method`	string\|null	How the memory was extracted
`extracted_at`	string\|null	ISO 8601 timestamp of extraction
`extractor`	string\|null	System that performed extraction

11.3 Extraction Methods

Method	Description
`llm_inference`	LLM inferred the memory from conversation
`explicit_user_input`	User explicitly stated the information
`api_export`	Extracted from platform API/export
`browser_extraction`	Extracted via browser automation or extension
`manual`	Manually entered by user or operator

11.4 Platform Identifiers

Platform identifiers MUST be lowercase ASCII matching the pattern:

^[a-z0-9_-]{2,32}$

Identifiers SHOULD be registered in a public registry to prevent collisions.

The same identifier namespace MUST be used across all PAM schemas: provenance.platform in the memory store, conversations_index[].platform, and provider.name in the conversation schema. Use product names, not company names.

Recommended identifiers (not an exhaustive list):

chatgpt, claude, gemini, grok, perplexity, copilot, local, manual

12. Embeddings

Embeddings are OPTIONAL. They are stored in a separate embeddings.json file.

12.1 Normative Rules

Embeddings MAY be omitted entirely from an export
When omitted, embedding_ref in memory objects MUST be null
Consumers MUST NOT fail if embedding_ref is null or if embeddings.json is missing
Consumers MAY regenerate embeddings from the content field at any time using any model
The content field in the memory object is ALWAYS the authoritative source of semantic content, never the embedding
Each memory object MUST have at most one corresponding embedding in the embeddings file — the memory_id field MUST be unique across all embedding objects. Implementations that maintain multiple embeddings internally (e.g., for different models) SHOULD export only the most recent or preferred embedding

12.2 Embeddings File Structure

{
  "schema": "portable-ai-memory-embeddings",
  "schema_version": "1.0",
  "embeddings": [
    {
      "id": "emb-uuid",
      "memory_id": "mem-uuid",
      "model": "text-embedding-3-small",
      "dimensions": 1536,
      "created_at": "2026-02-15T22:00:00Z",
      "vector": [
        0.1,
        0.2,
        ...
      ],
      "storage": null
    }
  ]
}

12.3 Embedding Object Fields

Field	Required	Type	Description
`id`	Yes	string	Unique identifier. Referenced by `memory.embedding_ref`
`memory_id`	Yes	string	ID of the associated memory object
`model`	Yes	string	Embedding model identifier
`dimensions`	Yes	integer	Vector dimensionality
`created_at`	Yes	string	ISO 8601 timestamp
`vector`	No	array\|null	Inline vector. Null if stored externally
`storage`	No	object\|null	External storage reference

13. Relations

The relations array defines semantic relationships between memory objects, forming a knowledge graph.

13.1 Relation Object

Field	Required	Type	Description
`id`	Yes	string	Unique identifier
`from`	Yes	string	Source memory ID
`to`	Yes	string	Target memory ID
`type`	Yes	string	Relationship type
`confidence`	No	number\|null	Confidence in this relationship [0.0, 1.0]
`created_at`	Yes	string	ISO 8601 timestamp

13.2 Relation Types

Type	Semantics
`supports`	Source provides evidence for target
`contradicts`	Source conflicts with target
`extends`	Source adds detail to target
`supersedes`	Source replaces target
`related_to`	General semantic relation
`derived_from`	Source was inferred from target

14. Conversations Index

The conversations index provides lightweight references to conversations without embedding full message history.

14.1 Consistency Rule

Exporters MUST ensure consistency between memory.provenance.conversation_ref and the corresponding conversations_index[].derived_memories entry.

Importers SHOULD treat derived_memories as advisory and MAY reconstruct from provenance using:

for memory in memories:
    conv_id = memory.provenance.conversation_ref
    if conv_id:
        conversations_index[conv_id].derived_memories.append(memory.id)

14.2 Storage Reference

Full conversation data is stored externally and referenced via:

{
  "storage": {
    "type": "file",
    "ref": "conversations/conv-001.json",
    "format": "json"
  }
}

Storage types: "file", "database", "object_storage", "vector_db", "uri".

15. Integrity Verification

The integrity block enables corruption and tampering detection.

15.1 Canonicalization

PAM uses RFC 8785 (JSON Canonicalization Scheme — JCS) for deterministic serialization. The canonicalization field declares the method used:

Value	Standard	Description
`RFC8785`	RFC 8785	JSON Canonicalization Scheme. Default and currently only supported method

This eliminates implementation ambiguity across languages and platforms. RFC 8785 defines deterministic rules for key ordering, number serialization, string escaping, and whitespace elimination.

15.2 Checksum Computation

The checksum is computed using the following deterministic pipeline:

1. Take the memories array
2. Sort memory objects by id ascending
3. Canonicalize per RFC 8785 (JCS):
   - Sort all object keys lexicographically (recursive)
   - Serialize numbers per ECMAScript/IEEE 754 rules (e.g., 1.0 → 1)
   - Apply RFC 8785 string escaping
   - No whitespace
   - UTF-8 encoding
4. Compute SHA-256 over the canonical UTF-8 bytes
5. Format as "sha256:<hex>"

IMPORTANT: Standard json.dumps() in most languages is NOT RFC 8785 compliant. Implementations MUST use a dedicated JCS library. See Appendix C for library recommendations per language.

15.3 Integrity Block Fields

Field	Required	Type	Description
`canonicalization`	No	string	Canonicalization method. Default: `"RFC8785"`
`checksum`	Yes	string	SHA-256 of canonicalized memories. Format: `sha256:<hex>`
`total_memories`	Yes	integer	MUST equal `len(memories)`

15.4 Validation

integrity.total_memories MUST equal len(memories)
integrity.checksum MUST match the computed checksum of the canonicalized memories array
If integrity.canonicalization is absent, implementations MUST assume RFC8785

16. Incremental Exports

PAM supports both full and incremental (delta) exports for efficient synchronization.

16.1 Export Types

Type	Description
`full`	Complete memory store. Default. Self-contained
`incremental`	Delta since a previous export. Contains only new or updated memories

16.2 Incremental Export Fields

When export_type is "incremental":

Field	Required	Description
`base_export_id`	SHOULD	The `export_id` of the base export this delta applies to
`since`	SHOULD	ISO 8601 timestamp. Only memories created or updated after this time are included

16.3 Merge Rules

Importers processing incremental exports MUST:

Match base_export_id to a previously imported full export
For each memory in the delta: if id exists in the base, update it; otherwise, insert it
Recompute integrity.checksum after merge
Memories with status: "retracted" in the delta MUST be marked as retracted in the base
Importers MUST NOT physically delete memories marked as "retracted". They MUST preserve the memory object and update its status. This ensures auditability and enables undo operations

Importers MAY reject incremental exports if base_export_id does not match any known export.

17. Decentralized Identity (DID)

PAM supports W3C Decentralized Identifiers for universal cross-platform identity resolution.

17.1 Owner DID

The owner.did field accepts any valid DID method:

did:key:z6MkhaXgBZDvotDkL5257faiztiGiC2QtKLGpbnnEGta2doK
did:web:example.com:user:alice
did:ion:EiAnKD8...
did:pkh:eip155:1:0xab16a96D359eC26a11e2C2b3d8f8B8942d5Bfcdb

17.2 Normative Rules

owner.did is OPTIONAL but RECOMMENDED for exports shared between systems
When present, the DID MUST be resolvable to a DID Document per W3C DID Core (https://www.w3.org/TR/did-1.0/)
If signature is present, signature.public_key SHOULD correspond to a verification method in the DID Document
owner.id remains REQUIRED even when did is present, for backward compatibility

17.3 Recommended DID Methods

Method	Use Case	Key Properties
`did:key`	Self-contained, no resolution needed	Simplest. Key is the identifier
`did:web`	Organization-hosted identity	DNS-based, easy to set up
`did:ion`	Decentralized, Bitcoin-anchored	Maximum decentralization
`did:pkh`	Blockchain wallet-based	Reuses existing crypto keys

18. Cryptographic Signatures

PAM exports MAY be cryptographically signed to verify authenticity and detect tampering.

18.1 Signature Block

{
  "signature": {
    "algorithm": "Ed25519",
    "public_key": "z6MkhaXgBZDvotDkL5257faiztiGiC2QtKLGpbnnEGta2doK",
    "value": "eyJhbGciOiJFZERTQSJ9..base64url-signature",
    "signed_at": "2026-02-15T22:00:01Z",
    "key_id": "did:key:z6Mk...#z6Mk..."
  }
}

18.2 Supported Algorithms

Algorithm	Type	Recommended
`Ed25519`	EdDSA	Yes — fast, small keys, side-channel resistant
`ES256`	ECDSA P-256	Yes — widely supported
`ES384`	ECDSA P-384	Optional
`RS256`	RSA 2048+	Legacy compatibility
`RS384`	RSA 3072+	Legacy compatibility
`RS512`	RSA 4096+	Legacy compatibility

18.3 Signature Computation

The signature MUST cover not only the memories integrity but also export identity and ownership, to prevent replay attacks and export spoofing.

When signature is present (not null), the fields export_id and export_date MUST also be present and non-null. This is enforced by the schema via a conditional dependency.

The signature MUST be computed as follows:

1. Compute integrity.checksum (Section 15)
2. Construct the signature payload object:
   {
     "checksum": integrity.checksum,
     "export_id": export_id,
     "export_date": export_date,
     "owner_id": owner.id
   }
3. Canonicalize the payload using RFC 8785 (JCS)
4. Sign the canonical UTF-8 bytes with the private key using the specified algorithm
5. Base64url-encode the signature (RFC 4648 §5)
6. Store in signature.value

This ensures that altering memories (which changes the checksum), export_id, export_date, or owner.id will invalidate the signature. Note that changes to relations, conversations_index, or other owner fields are NOT covered by the signature payload.

18.4 Verification

1. Recompute integrity.checksum from the memories array
2. Verify computed checksum matches integrity.checksum
3. Reconstruct the signature payload object from the export
4. Canonicalize with RFC 8785
5. Decode signature.value from Base64url
6. Verify the signature against the canonical payload using signature.public_key
7. If owner.did is present, optionally resolve the DID Document and verify the key matches

18.5 Normative Rules

Signature is OPTIONAL but RECOMMENDED for exports shared between systems or users
The signature payload MUST include checksum, export_id, export_date, and owner_id
signature.signed_at MUST be equal to or after export_date
If signature.key_id is present and owner.did is present, key_id SHOULD be a DID URL referencing a verification method in the owner’s DID Document
Importers SHOULD verify signatures when present but MUST NOT reject unsigned exports

19. Type Registry

PAM provides a centralized registry for custom memory types to enable interoperability between implementations.

19.1 Registry URI

The type_registry root field specifies the registry URI:

{
  "type_registry": "https://portable-ai-memory.org/types/"
}

19.2 Registry Structure

The registry is a publicly accessible JSON document listing registered custom types:

{
  "registry_version": "1.0",
  "types": {
    "security_clearance": {
      "description": "Security clearance level held by the user",
      "proposed_by": "my-exporter/1.0.0",
      "status": "registered",
      "registered_at": "2026-03-01T00:00:00Z"
    },
    "medical_condition": {
      "description": "Known medical condition or diagnosis",
      "proposed_by": "healthai/1.0.0",
      "status": "registered",
      "registered_at": "2026-04-15T00:00:00Z"
    }
  }
}

19.3 Type Lifecycle

unregistered → registered → candidate → standard

Status	Description
`unregistered`	Custom type used locally, not in registry
`registered`	Listed in registry, available for cross-platform use
`candidate`	Nominated for promotion to standard taxonomy
`standard`	Promoted to core taxonomy in a future spec version

19.4 Normative Rules

Custom types SHOULD be registered at the official registry for interoperability
Implementations MUST accept any custom_type value regardless of registry status
The registry is advisory, not prescriptive — implementations MUST NOT reject unregistered types
Community-adopted custom types MAY be promoted to the standard taxonomy in future spec versions

20. Interoperability and Migration Compatibility Matrix

IMPORTANT: The interoperability paths described in this section reflect observed export formats and extraction strategies as of the time of publication. AI providers do not natively support PAM at the time of this specification. Implementations SHOULD treat these mappings as best-effort compatibility guidance, not guaranteed or officially supported migration paths. Provider export formats may change without notice. Importers MUST be versioned and resilient to format variations.

20.1 Observed Export Sources

Source	Export Method	PAM Coverage
ChatGPT	`conversations.json` + memory prompt	Full: conversations, memories, preferences
Claude	JSON export + memory prompt + memory edits	Full: conversations, memories, instructions
Gemini	Google Takeout + prompt extraction	Partial: conversations; memories via prompt
Copilot	Privacy Dashboard CSV	Partial: conversations only
Grok	Data export (grok.com settings)	Full: conversations, projects, media posts, assets
Perplexity	Form request + prompt	Partial: limited conversation access
Local LLMs	Direct database access	Full: complete control

20.2 Known Import Strategies

Target	Method
ChatGPT	Custom instructions, conversation priming
Claude	Memory edits, Projects, system prompts
Gemini	Gems, conversation priming
Any LLM	System prompt injection from PAM memories

21. Security Considerations

21.1 Data Sensitivity

PAM files contain personal information. Implementations MUST:

Encrypt PAM files at rest when stored locally
Use TLS for any network transmission of PAM files
Respect the access.exportable flag when generating exports
Not include memories marked exportable: false in exports

21.2 Content Hash Security

The content_hash uses SHA-256 for deduplication, not for cryptographic authentication. For tamper-proof verification, use the signature block (Section 18).

21.3 Privacy

The provenance.platform_user_id field is OPTIONAL specifically to allow privacy-preserving exports. Implementations SHOULD allow users to strip platform identifiers before sharing.

21.4 Signature Security

When the signature block is present, implementations SHOULD verify it before processing the export. A failed verification SHOULD result in a warning to the user, not a silent failure.

22. Extensibility

22.1 Memory Types

New types are added via the "custom" type mechanism and the type registry (Section 19). If a custom type achieves broad adoption, it MAY be promoted to the standard taxonomy in a future version.

22.2 Metadata

The metadata block allows additionalProperties, enabling implementations to add custom fields without breaking schema validation.

22.3 Schema Versioning

Schema versions follow semantic versioning:

Patch (1.0.x): Documentation clarifications, no schema changes
Minor (1.x.0): Backward-compatible additions (new optional fields, new enum values)
Major (x.0.0): Breaking changes requiring migration

23. Reference Implementation

A reference implementation is available for PAM. It provides:

Platform extractors — Parse exports from ChatGPT, Claude, Gemini, Copilot, Grok into PAM format
Validator — CLI tool for schema validation (pam validate)
Converter — Export PAM memories to platform-specific import formats
Integrity checker — Verify checksums and consistency rules
Signature tools — Sign and verify exports (pam sign, pam verify)

24. Acknowledgments

This specification was informed by:

University of Stavanger PKG research — Krisztian Balog, Martin G. Skjæveland, and the Personal Knowledge Graph research group
Solid Project — Tim Berners-Lee’s vision of user-owned data stores
Mem0, Zep, Letta — Commercial memory layer implementations that demonstrated practical memory management patterns
Samsung Personal Data Engine — Production-scale personal knowledge graph deployment
EU Digital Markets Act — Regulatory framework driving data portability requirements
W3C DID Core — Decentralized identity standard enabling cross-platform identity resolution

25. Normalized Conversation Format

PAM defines a companion schema for storing full conversation data referenced by conversations_index[].storage.ref. While the main memory-store schema contains extracted knowledge, the conversation schema preserves the raw dialogue from which memories were derived.

25.1 Purpose

The conversation schema serves as the normalized intermediate format between provider-specific exports and the PAM memory store. The import pipeline is:

Raw Provider Export → Parse → Normalize → Conversation Schema → Extract Memories → PAM Memory Store

25.2 Schema File

portable-ai-memory-conversation.schema.json — JSON Schema Draft 2020-12

25.3 Key Design Decisions

DAG support: OpenAI conversations use branching (mapping with parent/children). The schema supports parent_id and children_ids per message to preserve this structure. Linear conversations (Claude, Gemini) set parent_id: null and children_ids: [].

Role normalization: Each provider uses different role names (human, Request, AI, ASSISTANT, model names). The schema normalizes to: user, assistant, system, tool. See importer-mappings.md section 6 for the full verified mapping table.

Multipart content: Messages may contain text, images, code, files. The content field supports both simple text ( type: "text") and multipart (type: "multipart" with parts[]).

Import metadata: Normalized conversations SHOULD record the importer version, source file, and source checksum. This enables debugging, re-import, and format version tracking. The import_metadata block is OPTIONAL in the schema to allow lightweight exports, but implementations SHOULD populate it for auditability.

raw_metadata: Provider-specific fields that don’t map to PAM are preserved verbatim in raw_metadata for lossless round-tripping.

25.4 Provider Import Mappings

See importer-mappings.md for field-by-field mappings from each provider format to the normalized conversation schema:

OpenAI (ChatGPT): conversations.json with DAG mapping
Anthropic (Claude): conversations.json with chat_messages array
Google (Gemini): Google Takeout MyActivity.json (single activity log array)
Microsoft (Copilot): Privacy Dashboard CSV
xAI (Grok): Data export via grok.com settings (conversations, projects, assets)

25.5 Importer Versioning Rule

Provider export formats change without notice. Importers MUST be versioned:

importer_version: "openai-importer/2025.01"

When a format changes, create a new importer version while keeping the old one for processing older exports.

26. License

26.1 Specification

This specification document (spec.md) is licensed under the Creative Commons Attribution 4.0 International License (CC BY 4.0). You may share and adapt this document for any purpose, provided appropriate credit is given.

Full text: https://creativecommons.org/licenses/by/4.0/

26.2 Schema and Reference Implementation

The JSON Schema files (portable-ai-memory.schema.json, portable-ai-memory-conversation.schema.json, portable-ai-memory-embeddings.schema.json) and all reference implementation code are licensed under the Apache License, Version 2.0.

Full text: https://www.apache.org/licenses/LICENSE-2.0

Appendix A: Complete Example

Memory store: examples/example-memory-store.json
Conversation: examples/example-conversation.json
Embeddings: examples/example-embeddings.json

Appendix B: JSON Schema Files

schemas/portable-ai-memory.schema.json — Main memory store schema (JSON Schema Draft 2020-12)
schemas/portable-ai-memory-embeddings.schema.json — Embeddings schema (JSON Schema Draft 2020-12)
schemas/portable-ai-memory-conversation.schema.json — Normalized conversation schema (JSON Schema Draft 2020-12)

Appendix C: Content Hash Reference Implementation

import hashlib
import unicodedata


def normalize_content(content: str) -> str:
    """Normalize content for deterministic hashing."""
    text = content.strip()
    text = text.lower()
    text = unicodedata.normalize("NFC", text)
    text = " ".join(text.split())
    return text


def compute_content_hash(content: str) -> str:
    """Compute SHA-256 hash of normalized content."""
    normalized = normalize_content(content)
    hash_hex = hashlib.sha256(normalized.encode("utf-8")).hexdigest()
    return f"sha256:{hash_hex}"


def compute_integrity_checksum(memories: list) -> str:
    """Compute deterministic checksum of memories array using RFC 8785.

    IMPORTANT: This function MUST use an RFC 8785 (JCS) compliant
    serializer. Standard json.dumps() is NOT sufficient because:
    - json.dumps serializes 1.0 as "1.0", RFC 8785 requires "1"
    - json.dumps does not guarantee RFC 8785 Unicode escaping rules
    - json.dumps number formatting differs from IEEE 754/ECMAScript

    Python: pip install rfc8785
    Node.js: npm install canonicalize
    Go: github.com/nicktrav/canonicaljson
    Java: org.erdtman:java-json-canonicalization
    """
    import rfc8785

    sorted_memories = sorted(memories, key=lambda m: m["id"])
    canonical_bytes = rfc8785.dumps(sorted_memories)
    hash_hex = hashlib.sha256(canonical_bytes).hexdigest()
    return f"sha256:{hash_hex}"

WARNING: Do NOT use json.dumps(..., sort_keys=True, separators=(",", ":")) for checksum computation. It produces different output than RFC 8785 for floating-point numbers, which will result in checksum mismatches across implementations.

Appendix D: Signature Reference Implementation

from cryptography.hazmat.primitives.asymmetric.ed25519 import Ed25519PrivateKey
from cryptography.hazmat.primitives import serialization
import base64
import rfc8785


def build_signature_payload(export: dict) -> bytes:
    """Construct and canonicalize the signature payload per Section 18.3.

    The payload MUST include checksum, export_id, export_date, and owner_id
    to prevent replay attacks and export spoofing.
    """
    payload = {
        "checksum": export["integrity"]["checksum"],
        "export_id": export["export_id"],
        "export_date": export["export_date"],
        "owner_id": export["owner"]["id"]
    }
    return rfc8785.dumps(payload)


def sign_export(export: dict, private_key: Ed25519PrivateKey) -> str:
    """Sign an export with Ed25519 over the canonical payload."""
    payload_bytes = build_signature_payload(export)
    signature_bytes = private_key.sign(payload_bytes)
    return base64.urlsafe_b64encode(signature_bytes).decode("ascii")


def verify_export(export: dict, signature_b64: str, public_key_bytes: bytes) -> bool:
    """Verify an Ed25519 signature over the canonical payload."""
    from cryptography.hazmat.primitives.asymmetric.ed25519 import Ed25519PublicKey
    public_key = Ed25519PublicKey.from_public_bytes(public_key_bytes)
    payload_bytes = build_signature_payload(export)
    signature_bytes = base64.urlsafe_b64decode(signature_b64)
    try:
        public_key.verify(signature_bytes, payload_bytes)
        return True
    except Exception:
        return False

Appendix E: Incremental Export Merge

import rfc8785


def merge_incremental(base: dict, delta: dict) -> dict:
    """Merge an incremental export into a base export."""
    base_memories = {m["id"]: m for m in base["memories"]}

    for mem in delta["memories"]:
        base_memories[mem["id"]] = mem  # insert or update

    merged = list(base_memories.values())
    base["memories"] = merged
    base["integrity"]["total_memories"] = len(merged)
    base["integrity"]["checksum"] = compute_integrity_checksum(merged)
    return base

Appendix F: Provider Import Mappings

See importer-mappings.md for complete field-by-field mappings from each provider to the normalized PAM format.