8.0.15007673299 • Published 5 months ago

@kodexa/kodexa-document v8.0.15007673299

Weekly downloads
-
License
ISC
Repository
-
Last release
5 months ago

Kodexa Document TypeScript SDK

A TypeScript implementation of the Kodexa Document model for working with structured documents.

Installation

npm install @kodexa/kodexa-document

Overview

The Kodexa Document TypeScript SDK provides a comprehensive framework for working with structured documents. It enables developers to create, load, manipulate, and query documents with a hierarchical node structure. The SDK offers a powerful selector language (similar to XPath) for extracting specific content from documents based on complex criteria.

Key Features

  • Create and manipulate hierarchical document structures
  • Add, update, and remove content nodes and features
  • Query documents using a powerful selector language
  • Tag content for classification and extraction
  • Track document processing steps
  • Store and retrieve external data

Usage Examples

Creating a Document

import { Document, DocumentMetadata } from '@kodexa/kodexa-document';

// Create a new document
const document = new Document(new DocumentMetadata());

// Create a root node
const rootNode = document.createNode('root', 'Root content');
document.contentNode = rootNode;

// Add child nodes
rootNode.addChild(document.createNode('paragraph', 'This is a paragraph'));
rootNode.addChild(document.createNode('paragraph', 'This is another paragraph'));

Creating a Document from Text

import { Document } from '@kodexa/kodexa-document';

// Create a document from text
const document = Document.fromText('Hello World');

Querying Documents

import { Document } from '@kodexa/kodexa-document';

// Create a document with some content
const document = Document.fromText('Hello World');

// Select nodes using selectors
const nodes = document.select('//text');

// Select the first matching node
const firstNode = document.selectFirst('//text');

Adding Features to Nodes

import { Document } from '@kodexa/kodexa-document';

// Create a document with some content
const document = Document.fromText('Hello World');

// Add a feature to the root node
document.contentNode?.addFeature('metadata', 'language', 'en');

// Get features
const features = document.contentNode?.getFeatures();

Tagging Content

import { Document } from '@kodexa/kodexa-document';

// Create a document with some content
const document = Document.fromText('Hello World');

// Tag the content
document.contentNode?.tag('important', { confidence: 0.95 });

// Get tags
const tags = document.contentNode?.getTags();

API Reference

Document

The main class for working with documents.

  • constructor(metadata?: DocumentMetadata, source?: SourceMetadata, ref?: string): Create a new document
  • static fromText(text: string): Create a document from text
  • createNode(nodeType: string, content?: string, virtual?: boolean): Create a new content node
  • select(selector: string, params?: Record<string, any>): Select nodes using a selector
  • selectFirst(selector: string, params?: Record<string, any>): Select the first matching node
  • getRoot(): Get the root node of the document
  • getSteps(): Get the processing steps
  • setSteps(steps: Array<ProcessingStep>): Set the processing steps
  • getExternalData(): Get external data
  • setExternalData(externalData: Record<string, any>): Set external data

ContentNode

Represents a node in the document hierarchy.

  • constructor(document: Document, nodeType: string, id?: number, content?: string): Create a new content node
  • getParent(): Get the parent node
  • getChildren(): Get child nodes
  • addChild(child: ContentNode, index?: number): Add a child node
  • removeChild(contentNode: ContentNode): Remove a child node
  • addFeature(featureType: string, name: string, value: any): Add a feature to the node
  • getFeatures(): Get all features
  • getFeature(featureType: string, name: string): Get a specific feature
  • tag(name: string, options?: any): Add a tag to the node
  • getTags(): Get all tags
  • getTag(name: string): Get tags by name
  • removeTag(name: string): Remove a tag
  • select(selector: string, params?: Record<string, any>): Select nodes using a selector

ContentFeatureClass

Represents a feature associated with a content node.

  • constructor(featureType: string, name: string, value: any): Create a new feature
  • getValue(): Get the feature value
  • toString(): Get a string representation of the feature
  • toDict(): Convert the feature to a dictionary

Tag

Represents a tag applied to a content node.

  • constructor(start?: number, end?: number, value?: string, uuid?: string, data?: any): Create a new tag
  • toDict(): Convert the tag to a dictionary

Running Tests

To run the tests:

# From the lib/typescript directory
npm install
npm test

Building the Package

To build the package:

# From the lib/typescript directory
npm run build

License

ISC

8.0.15007673299

5 months ago

8.0.2

5 months ago