@datafire/azure_cognitiveservices_formrecognizer v3.0.0
@datafire/azure_cognitiveservices_formrecognizer
Client library for Form Recognizer Client
Installation and Usage
npm install --save @datafire/azure_cognitiveservices_formrecognizer
let azure_cognitiveservices_formrecognizer = require('@datafire/azure_cognitiveservices_formrecognizer').create({
apim_key: ""
});
.then(data => {
console.log(data);
});
Description
Extracts information from forms and images into structured data.
Actions
GetCustomModels
Get information about all custom models
azure_cognitiveservices_formrecognizer.GetCustomModels({}, context)
Input
- input
object
- op
string
(values: full, summary): Specify whether to return summary or full list of models.
- op
Output
- output Models
TrainCustomModelAsync
Create and train a custom model. The request must include a source parameter that is either an externally accessible Azure storage blob container Uri (preferably a Shared Access Signature Uri) or valid path to a data folder in a locally mounted drive. When local paths are specified, they must follow the Linux/Unix path format and be an absolute path rooted to the input mount configuration setting value e.g., if '{Mounts:Input}' configuration setting value is '/input' then a valid source path would be '/input/contosodataset'. All data to be trained is expected to be under the source folder or sub folders under it. Models are trained using documents that are of the following content type - 'application/pdf', 'image/jpeg', 'image/png', 'image/tiff'. Other type of content is ignored.
azure_cognitiveservices_formrecognizer.TrainCustomModelAsync({
"trainRequest": {
"source": ""
}
}, context)
Input
- input
object
- trainRequest required TrainRequest
Output
Output schema unknown
DeleteCustomModel
Mark model for deletion. Model artifacts will be permanently removed within a predetermined period.
azure_cognitiveservices_formrecognizer.DeleteCustomModel({
"modelId": ""
}, context)
Input
- input
object
- modelId required
string
: Model identifier.
- modelId required
Output
Output schema unknown
GetCustomModel
Get detailed information about a custom model.
azure_cognitiveservices_formrecognizer.GetCustomModel({
"modelId": ""
}, context)
Input
- input
object
- modelId required
string
: Model identifier. - includeKeys
boolean
: Include list of extracted keys in model information.
- modelId required
Output
- output Model
AnalyzeWithCustomModel
Extract key-value pairs, tables, and semantic values from a given document. The input document must be of one of the supported content types - 'application/pdf', 'image/jpeg', 'image/png' or 'image/tiff'. Alternatively, use 'application/json' type to specify the location (Uri or local path) of the document to be analyzed.
azure_cognitiveservices_formrecognizer.AnalyzeWithCustomModel({
"modelId": ""
}, context)
Input
- input
object
- modelId required
string
: Model identifier. - includeTextDetails
boolean
: Include text lines and element references in the result. - fileStream SourceDataStream
- modelId required
Output
Output schema unknown
GetAnalyzeFormResult
Obtain current status and the result of the analyze form operation.
azure_cognitiveservices_formrecognizer.GetAnalyzeFormResult({
"modelId": "",
"resultId": ""
}, context)
Input
- input
object
- modelId required
string
: Model identifier. - resultId required
string
: Analyze operation result identifier.
- modelId required
Output
- output AnalyzeOperationResult
AnalyzeLayoutAsync
Extract text and layout information from a given document. The input document must be of one of the supported content types - 'application/pdf', 'image/jpeg', 'image/png' or 'image/tiff'. Alternatively, use 'application/json' type to specify the location (Uri or local path) of the document to be analyzed.
azure_cognitiveservices_formrecognizer.AnalyzeLayoutAsync({}, context)
Input
- input
object
- fileStream SourceDataStream
Output
Output schema unknown
GetAnalyzeLayoutResult
Track the progress and obtain the result of the analyze layout operation
azure_cognitiveservices_formrecognizer.GetAnalyzeLayoutResult({
"resultId": ""
}, context)
Input
- input
object
- resultId required
string
: Analyze operation result identifier.
- resultId required
Output
- output AnalyzeOperationResult
AnalyzeReceiptAsync
Extract field text and semantic values from a given receipt document. The input document must be of one of the supported content types - 'application/pdf', 'image/jpeg', 'image/png' or 'image/tiff'. Alternatively, use 'application/json' type to specify the location (Uri or local path) of the document to be analyzed.
azure_cognitiveservices_formrecognizer.AnalyzeReceiptAsync({}, context)
Input
- input
object
- includeTextDetails
boolean
: Include text lines and element references in the result. - fileStream SourceDataStream
- includeTextDetails
Output
Output schema unknown
GetAnalyzeReceiptResult
Track the progress and obtain the result of the analyze receipt operation.
azure_cognitiveservices_formrecognizer.GetAnalyzeReceiptResult({
"resultId": ""
}, context)
Input
- input
object
- resultId required
string
: Analyze operation result identifier.
- resultId required
Output
- output AnalyzeOperationResult
Definitions
AnalyzeOperationResult
- AnalyzeOperationResult
object
: Status and result of the queued analyze operation.- analyzeResult AnalyzeResult
- createdDateTime required
string
: Date and time (UTC) when the analyze operation was submitted. - lastUpdatedDateTime required
string
: Date and time (UTC) when the status was last updated. - status required OperationStatus
AnalyzeResult
- AnalyzeResult
object
: Analyze operation result.- documentResults
array
: Document-level information extracted from the input.- items DocumentResult
- errors
array
: List of errors reported during the analyze operation.- items ErrorInformation
- pageResults
array
: Page-level information extracted from the input.- items PageResult
- readResults required
array
: Text extracted from the input.- items ReadResult
- version required
string
: Version of schema used for this result.
- documentResults
BoundingBox
- BoundingBox
array
: Quadrangle bounding box, with coordinates specified relative to the top-left of the original image. The eight numbers represent the four points, clockwise from the top-left corner relative to the text orientation. For image, the (x, y) coordinates are measured in pixels. For PDF, the (x, y) coordinates are measured in inches.- items
number
- items
Confidence
- Confidence
number
: Confidence value.
DataTable
- DataTable
object
: Information about the extracted table contained in a page.- cells required
array
: List of cells contained in the table.- items DataTableCell
- columns required
integer
: Number of columns. - rows required
integer
: Number of rows.
- cells required
DataTableCell
- DataTableCell
object
: Information about the extracted cell in a table.- boundingBox required BoundingBox
- columnIndex required
integer
: Column index of the cell. - columnSpan
integer
: Number of columns spanned by this cell. - confidence required Confidence
- elements
array
: When includeTextDetails is set to true, a list of references to the text elements constituting this table cell.- items ElementReference
- isFooter
boolean
: Is the current cell a footer cell? - isHeader
boolean
: Is the current cell a header cell? - rowIndex required
integer
: Row index of the cell. - rowSpan
integer
: Number of rows spanned by this cell. - text required
string
: Text content of the cell.
DocumentResult
- DocumentResult
object
: A set of extracted fields corresponding to the input document.- docType required
string
: Document type. - fields required
object
: Dictionary of named field values. - pageRange required
array
: First and last page number where the document is found.- items
integer
- items
- docType required
ElementReference
- ElementReference
string
: Reference to a line or word.
ErrorInformation
- ErrorInformation
object
- code required
string
- message required
string
- code required
ErrorResponse
- ErrorResponse
object
- error required ErrorInformation
FieldValue
- FieldValue
object
: Recognized field value.- boundingBox BoundingBox
- confidence Confidence
- elements
array
: When includeTextDetails is set to true, a list of references to the text elements constituting this field.- items ElementReference
- page
integer
: The 1-based page number in the input document. - text
string
: Text content of the extracted field. - type required FieldValueType
- valueArray
array
: Array of field values.- items FieldValue
- valueDate
string
: Date value. - valueInteger
integer
: Integer value. - valueNumber
number
: Floating point value. - valueObject
object
: Dictionary of named field values. - valuePhoneNumber
string
: Phone number value. - valueString
string
: String value. - valueTime
string
: Time value.
FieldValueType
- FieldValueType
string
(values: string, date, time, phoneNumber, number, integer, array, object): Semantic data type of the field value.
FormFieldsReport
- FormFieldsReport
object
: Report for a custom model training field.- accuracy required
number
: Estimated extraction accuracy for this field. - fieldName required
string
: Training field name.
- accuracy required
KeyValueElement
- KeyValueElement
object
: Information about the extracted key or value in a key-value pair.- boundingBox BoundingBox
- elements
array
: When includeTextDetails is set to true, a list of references to the text elements constituting this key or value.- items ElementReference
- text required
string
: The text content of the key or value.
KeyValuePair
- KeyValuePair
object
: Information about the extracted key-value pair.- confidence required Confidence
- key required KeyValueElement
- label
string
: A user defined label for the key/value pair entry. - value required KeyValueElement
KeysResult
- KeysResult
object
: Keys extracted by the custom model.- clusters required
object
: Object mapping clusterIds to a list of keys.
- clusters required
Language
- Language
string
(values: en, es): Language code
Model
- Model
object
: Response to the get custom model operation.- keys KeysResult
- modelInfo required ModelInfo
- trainResult TrainResult
ModelInfo
- ModelInfo
object
: Basic custom model information.- createdDateTime required
string
: Date and time (UTC) when the model was created. - lastUpdatedDateTime required
string
: Date and time (UTC) when the status was last updated. - modelId required
string
: Model identifier. - status required
string
(values: creating, ready, invalid): Status of the model.
- createdDateTime required
Models
- Models
object
: Response to the list custom models operation.- modelList
array
: Collection of trained custom models.- items ModelInfo
- nextLink
string
: Link to the next page of custom models. - summary
object
: Summary of all trained custom models.- count required
integer
: Current count of trained custom models. - lastUpdatedDateTime required
string
: Date and time (UTC) when the summary was last updated. - limit required
integer
: Max number of models that can be trained for this subscription.
- count required
- modelList
OperationStatus
- OperationStatus
string
(values: notStarted, running, succeeded, failed): Status of the queued operation.
PageResult
- PageResult
object
: Extracted information from a single page.- clusterId
integer
: Cluster identifier. - keyValuePairs
array
: List of key-value pairs extracted from the page.- items KeyValuePair
- page required
integer
: Page number. - tables
array
: List of data tables extracted from the page.- items DataTable
- clusterId
ReadResult
- ReadResult
object
: Text extracted from a page in the input document.- angle required
number
: The general orientation of the text in clockwise direction, measured in degrees between (-180, 180]. - height required
number
: The height of the image/PDF in pixels/inches, respectively. - language Language
- lines
array
: When includeTextDetails is set to true, a list of recognized text lines. The maximum number of lines returned is 300 per page. The lines are sorted top to bottom, left to right, although in certain cases proximity is treated with higher priority. As the sorting order depends on the detected text, it may change across images and OCR version updates. Thus, business logic should be built upon the actual line location instead of order.- items TextLine
- page required
integer
: The 1-based page number in the input document. - unit required
string
(values: pixel, inch): The unit used by the width, height and boundingBox properties. For images, the unit is "pixel". For PDF, the unit is "inch". - width required
number
: The width of the image/PDF in pixels/inches, respectively.
- angle required
SourceDataStream
- SourceDataStream
object
: A PDF document, image (jpg/png/tiff), or JSON file to analyze.
SourcePath
- SourcePath
object
: Uri or local path to source data.- source
string
: File source path.
- source
TextLine
- TextLine
object
: An object representing an extracted text line.- boundingBox required BoundingBox
- language Language
- text required
string
: The text content of the line. - words required
array
: List of words in the text line.- items TextWord
TextWord
- TextWord
object
: An object representing a word.- boundingBox required BoundingBox
- confidence Confidence
- text required
string
: The text content of the word.
TrainRequest
- TrainRequest
object
: Request parameter to train a new custom model.- source required
string
: Source path containing the training documents. - sourceFilter TrainSourceFilter
- useLabelFile
boolean
: Use label file for training a model.
- source required
TrainResult
- TrainResult
object
: Custom model training result.- averageModelAccuracy
number
: Average accuracy. - errors
array
: Errors returned during the training operation.- items ErrorInformation
- fields
array
: List of fields used to train the model and the train operation error reported by each.- items FormFieldsReport
- trainingDocuments required
array
: List of the documents used to train the model and any errors reported in each document.- items TrainingDocumentInfo
- averageModelAccuracy
TrainSourceFilter
- TrainSourceFilter
object
: Filter to apply to the documents in the source path for training.- includeSubFolders
boolean
: A flag to indicate if sub folders within the set of prefix folders will also need to be included when searching for content to be preprocessed. - prefix
string
: A case-sensitive prefix string to filter documents in the source path for training. For example, when using a Azure storage blob Uri, use the prefix to restrict sub folders for training.
- includeSubFolders
TrainingDocumentInfo
- TrainingDocumentInfo
object
: Report for a custom model training document.- documentName required
string
: Training document name. - errors required
array
: List of errors.- items ErrorInformation
- pages required
integer
: Total number of pages trained. - status required
string
(values: succeeded, partiallySucceeded, failed): Status of the training operation.
- documentName required
5 years ago