3.0.0 • Published 5 years ago

@datafire/azure_cognitiveservices_formrecognizer v3.0.0

Weekly downloads
1
License
MIT
Repository
github
Last release
5 years ago

@datafire/azure_cognitiveservices_formrecognizer

Client library for Form Recognizer Client

Installation and Usage

npm install --save @datafire/azure_cognitiveservices_formrecognizer
let azure_cognitiveservices_formrecognizer = require('@datafire/azure_cognitiveservices_formrecognizer').create({
  apim_key: ""
});

.then(data => {
  console.log(data);
});

Description

Extracts information from forms and images into structured data.

Actions

GetCustomModels

Get information about all custom models

azure_cognitiveservices_formrecognizer.GetCustomModels({}, context)

Input

  • input object
    • op string (values: full, summary): Specify whether to return summary or full list of models.

Output

TrainCustomModelAsync

Create and train a custom model. The request must include a source parameter that is either an externally accessible Azure storage blob container Uri (preferably a Shared Access Signature Uri) or valid path to a data folder in a locally mounted drive. When local paths are specified, they must follow the Linux/Unix path format and be an absolute path rooted to the input mount configuration setting value e.g., if '{Mounts:Input}' configuration setting value is '/input' then a valid source path would be '/input/contosodataset'. All data to be trained is expected to be under the source folder or sub folders under it. Models are trained using documents that are of the following content type - 'application/pdf', 'image/jpeg', 'image/png', 'image/tiff'. Other type of content is ignored.

azure_cognitiveservices_formrecognizer.TrainCustomModelAsync({
  "trainRequest": {
    "source": ""
  }
}, context)

Input

Output

Output schema unknown

DeleteCustomModel

Mark model for deletion. Model artifacts will be permanently removed within a predetermined period.

azure_cognitiveservices_formrecognizer.DeleteCustomModel({
  "modelId": ""
}, context)

Input

  • input object
    • modelId required string: Model identifier.

Output

Output schema unknown

GetCustomModel

Get detailed information about a custom model.

azure_cognitiveservices_formrecognizer.GetCustomModel({
  "modelId": ""
}, context)

Input

  • input object
    • modelId required string: Model identifier.
    • includeKeys boolean: Include list of extracted keys in model information.

Output

AnalyzeWithCustomModel

Extract key-value pairs, tables, and semantic values from a given document. The input document must be of one of the supported content types - 'application/pdf', 'image/jpeg', 'image/png' or 'image/tiff'. Alternatively, use 'application/json' type to specify the location (Uri or local path) of the document to be analyzed.

azure_cognitiveservices_formrecognizer.AnalyzeWithCustomModel({
  "modelId": ""
}, context)

Input

  • input object
    • modelId required string: Model identifier.
    • includeTextDetails boolean: Include text lines and element references in the result.
    • fileStream SourceDataStream

Output

Output schema unknown

GetAnalyzeFormResult

Obtain current status and the result of the analyze form operation.

azure_cognitiveservices_formrecognizer.GetAnalyzeFormResult({
  "modelId": "",
  "resultId": ""
}, context)

Input

  • input object
    • modelId required string: Model identifier.
    • resultId required string: Analyze operation result identifier.

Output

AnalyzeLayoutAsync

Extract text and layout information from a given document. The input document must be of one of the supported content types - 'application/pdf', 'image/jpeg', 'image/png' or 'image/tiff'. Alternatively, use 'application/json' type to specify the location (Uri or local path) of the document to be analyzed.

azure_cognitiveservices_formrecognizer.AnalyzeLayoutAsync({}, context)

Input

Output

Output schema unknown

GetAnalyzeLayoutResult

Track the progress and obtain the result of the analyze layout operation

azure_cognitiveservices_formrecognizer.GetAnalyzeLayoutResult({
  "resultId": ""
}, context)

Input

  • input object
    • resultId required string: Analyze operation result identifier.

Output

AnalyzeReceiptAsync

Extract field text and semantic values from a given receipt document. The input document must be of one of the supported content types - 'application/pdf', 'image/jpeg', 'image/png' or 'image/tiff'. Alternatively, use 'application/json' type to specify the location (Uri or local path) of the document to be analyzed.

azure_cognitiveservices_formrecognizer.AnalyzeReceiptAsync({}, context)

Input

  • input object
    • includeTextDetails boolean: Include text lines and element references in the result.
    • fileStream SourceDataStream

Output

Output schema unknown

GetAnalyzeReceiptResult

Track the progress and obtain the result of the analyze receipt operation.

azure_cognitiveservices_formrecognizer.GetAnalyzeReceiptResult({
  "resultId": ""
}, context)

Input

  • input object
    • resultId required string: Analyze operation result identifier.

Output

Definitions

AnalyzeOperationResult

  • AnalyzeOperationResult object: Status and result of the queued analyze operation.
    • analyzeResult AnalyzeResult
    • createdDateTime required string: Date and time (UTC) when the analyze operation was submitted.
    • lastUpdatedDateTime required string: Date and time (UTC) when the status was last updated.
    • status required OperationStatus

AnalyzeResult

  • AnalyzeResult object: Analyze operation result.
    • documentResults array: Document-level information extracted from the input.
    • errors array: List of errors reported during the analyze operation.
    • pageResults array: Page-level information extracted from the input.
    • readResults required array: Text extracted from the input.
    • version required string: Version of schema used for this result.

BoundingBox

  • BoundingBox array: Quadrangle bounding box, with coordinates specified relative to the top-left of the original image. The eight numbers represent the four points, clockwise from the top-left corner relative to the text orientation. For image, the (x, y) coordinates are measured in pixels. For PDF, the (x, y) coordinates are measured in inches.
    • items number

Confidence

  • Confidence number: Confidence value.

DataTable

  • DataTable object: Information about the extracted table contained in a page.
    • cells required array: List of cells contained in the table.
    • columns required integer: Number of columns.
    • rows required integer: Number of rows.

DataTableCell

  • DataTableCell object: Information about the extracted cell in a table.
    • boundingBox required BoundingBox
    • columnIndex required integer: Column index of the cell.
    • columnSpan integer: Number of columns spanned by this cell.
    • confidence required Confidence
    • elements array: When includeTextDetails is set to true, a list of references to the text elements constituting this table cell.
    • isFooter boolean: Is the current cell a footer cell?
    • isHeader boolean: Is the current cell a header cell?
    • rowIndex required integer: Row index of the cell.
    • rowSpan integer: Number of rows spanned by this cell.
    • text required string: Text content of the cell.

DocumentResult

  • DocumentResult object: A set of extracted fields corresponding to the input document.
    • docType required string: Document type.
    • fields required object: Dictionary of named field values.
    • pageRange required array: First and last page number where the document is found.
      • items integer

ElementReference

  • ElementReference string: Reference to a line or word.

ErrorInformation

  • ErrorInformation object
    • code required string
    • message required string

ErrorResponse

FieldValue

  • FieldValue object: Recognized field value.
    • boundingBox BoundingBox
    • confidence Confidence
    • elements array: When includeTextDetails is set to true, a list of references to the text elements constituting this field.
    • page integer: The 1-based page number in the input document.
    • text string: Text content of the extracted field.
    • type required FieldValueType
    • valueArray array: Array of field values.
    • valueDate string: Date value.
    • valueInteger integer: Integer value.
    • valueNumber number: Floating point value.
    • valueObject object: Dictionary of named field values.
    • valuePhoneNumber string: Phone number value.
    • valueString string: String value.
    • valueTime string: Time value.

FieldValueType

  • FieldValueType string (values: string, date, time, phoneNumber, number, integer, array, object): Semantic data type of the field value.

FormFieldsReport

  • FormFieldsReport object: Report for a custom model training field.
    • accuracy required number: Estimated extraction accuracy for this field.
    • fieldName required string: Training field name.

KeyValueElement

  • KeyValueElement object: Information about the extracted key or value in a key-value pair.
    • boundingBox BoundingBox
    • elements array: When includeTextDetails is set to true, a list of references to the text elements constituting this key or value.
    • text required string: The text content of the key or value.

KeyValuePair

  • KeyValuePair object: Information about the extracted key-value pair.

KeysResult

  • KeysResult object: Keys extracted by the custom model.
    • clusters required object: Object mapping clusterIds to a list of keys.

Language

  • Language string (values: en, es): Language code

Model

ModelInfo

  • ModelInfo object: Basic custom model information.
    • createdDateTime required string: Date and time (UTC) when the model was created.
    • lastUpdatedDateTime required string: Date and time (UTC) when the status was last updated.
    • modelId required string: Model identifier.
    • status required string (values: creating, ready, invalid): Status of the model.

Models

  • Models object: Response to the list custom models operation.
    • modelList array: Collection of trained custom models.
    • nextLink string: Link to the next page of custom models.
    • summary object: Summary of all trained custom models.
      • count required integer: Current count of trained custom models.
      • lastUpdatedDateTime required string: Date and time (UTC) when the summary was last updated.
      • limit required integer: Max number of models that can be trained for this subscription.

OperationStatus

  • OperationStatus string (values: notStarted, running, succeeded, failed): Status of the queued operation.

PageResult

  • PageResult object: Extracted information from a single page.
    • clusterId integer: Cluster identifier.
    • keyValuePairs array: List of key-value pairs extracted from the page.
    • page required integer: Page number.
    • tables array: List of data tables extracted from the page.

ReadResult

  • ReadResult object: Text extracted from a page in the input document.
    • angle required number: The general orientation of the text in clockwise direction, measured in degrees between (-180, 180].
    • height required number: The height of the image/PDF in pixels/inches, respectively.
    • language Language
    • lines array: When includeTextDetails is set to true, a list of recognized text lines. The maximum number of lines returned is 300 per page. The lines are sorted top to bottom, left to right, although in certain cases proximity is treated with higher priority. As the sorting order depends on the detected text, it may change across images and OCR version updates. Thus, business logic should be built upon the actual line location instead of order.
    • page required integer: The 1-based page number in the input document.
    • unit required string (values: pixel, inch): The unit used by the width, height and boundingBox properties. For images, the unit is "pixel". For PDF, the unit is "inch".
    • width required number: The width of the image/PDF in pixels/inches, respectively.

SourceDataStream

  • SourceDataStream object: A PDF document, image (jpg/png/tiff), or JSON file to analyze.

SourcePath

  • SourcePath object: Uri or local path to source data.
    • source string: File source path.

TextLine

  • TextLine object: An object representing an extracted text line.
    • boundingBox required BoundingBox
    • language Language
    • text required string: The text content of the line.
    • words required array: List of words in the text line.

TextWord

  • TextWord object: An object representing a word.

TrainRequest

  • TrainRequest object: Request parameter to train a new custom model.
    • source required string: Source path containing the training documents.
    • sourceFilter TrainSourceFilter
    • useLabelFile boolean: Use label file for training a model.

TrainResult

  • TrainResult object: Custom model training result.
    • averageModelAccuracy number: Average accuracy.
    • errors array: Errors returned during the training operation.
    • fields array: List of fields used to train the model and the train operation error reported by each.
    • trainingDocuments required array: List of the documents used to train the model and any errors reported in each document.

TrainSourceFilter

  • TrainSourceFilter object: Filter to apply to the documents in the source path for training.
    • includeSubFolders boolean: A flag to indicate if sub folders within the set of prefix folders will also need to be included when searching for content to be preprocessed.
    • prefix string: A case-sensitive prefix string to filter documents in the source path for training. For example, when using a Azure storage blob Uri, use the prefix to restrict sub folders for training.

TrainingDocumentInfo

  • TrainingDocumentInfo object: Report for a custom model training document.
    • documentName required string: Training document name.
    • errors required array: List of errors.
    • pages required integer: Total number of pages trained.
    • status required string (values: succeeded, partiallySucceeded, failed): Status of the training operation.