@gmod/gtf v0.0.9
@gmod/gtf
GTF or the General Transfer Format is identical to GFF version2. This module was created to read and write GTF data. This module aims to be a complete implementation of the GTF specification.
- streaming parsing and streaming formatting
- creates transcript features with children_features
- only compatible with GTF
Note: For JBrowse, we generally encourage GFF3 over GTF
For GFF3, checkout @gmod/gff-js package found here
Install
$ npm install --save @gmod/gtfUsage
import gtf from '@gmod/gtf'
// parse a file from a file name
gtf.parseFile('path/to/my/file.gtf', { parseAll: true })
.on('data', data => {
if (data.directive) {
console.log('got a directive',data)
}
else if (data.comment) {
console.log('got a comment',data)
}
else if (data.sequence) {
console.log('got a sequence from a FASTA section')
}
else {
console.log('got a feature',data)
}
})
// parse a stream of GTF text
const fs = require('fs')
fs.createReadStream('path/to/my/file.gtf')
.pipe(gtf.parseStream())
.on('data', data => {
console.log('got item',data)
return data
})
.on('end', () => {
console.log('done parsing!')
})
// parse a string of gtf synchronously
let stringOfGTF = fs
.readFileSync('my_annotations.gtf')
.toString()
let arrayOfThings = gtf.parseStringSync(stringOfGTF)
// format an array of items to a string
let stringOfGTF = gtf.formatSync(arrayOfThings)
// format a stream of things to a stream of text.
// inserts sync marks automatically.
// note: this could create new gtf lines for transcript features
myStreamOfGTFObjects
.pipe(gtf.formatStream())
.pipe(fs.createWriteStream('my_new.gtf'))
// format a stream of things and write it to
// a gtf file. inserts sync marks
// note: this could create new gtf lines for transcript features
myStreamOfGTFObjects
.pipe(gtf.formatFile('path/to/destination.gtf')Object format
features
Because GTF can not handle a 3 level hierarchy (gene -> transcript -> exon), we parse GTF by creating transcript features with children features.
We do not create features from the gene_id. Values that are . in the GTF are
null in the output.
ctgA bare_predicted CDS 10000 11500 . + 0 transcript_id "Apple1";Note: that is creates an additional transcript feature from the transcript id when featureType is not 'transcript'. It will then create a child CDS feature from the line of GTF shown above.
[
[
{
"seq_name": "ctgA",
"source": "bare_predicted",
"featureType": "transcript",
"start": 10000,
"end": 11500,
"score": null,
"strand": "+",
"frame": "0",
"attributes": { "transcript_id": [ "\"Apple1\"" ] },
"child_features": [[
{
"seq_name": "ctgA",
"source": "bare_predicted",
"featureType": "CDS",
"start": 10000,
"end": 11500,
"score": null,
"strand": "+",
"frame": "0",
"attributes": { "transcript_id": [ "\"Apple1\"" ] },
"child_features": [],
"derived_features": []
}
]],
"derived_features": []
}
]
]directives, comments, sequences
parseDirective("##gtf\n")
// returns
{
"directive": "gtf",
}
parseComment('# hi this is a comment\n')
// returns
{
"comment": "hi this is a comment"
}
//These come from any embedded `##FASTA` section in the GTF file.
{
"id": "ctgA",
"description": "test contig",
"sequence": "ACTGACTAGCTAGCATCAGCGTCGTAGCTATTATATTACGGTAGCCA"
}API
Table of Contents
parseStream
Parse a stream of text data into a stream of feature, directive, and comment objects.
Parameters
optionsObject optional options object (optional, default{})options.encodingstring text encoding of the input GTF. default 'utf8'options.parseAllboolean default false. if true, will parse all items. overrides other flagsoptions.parseFeaturesboolean default trueoptions.parseDirectivesboolean default falseoptions.parseCommentsboolean default falseoptions.parseSequencesboolean default trueoptions.bufferSizeNumber maximum number of GTF lines to buffer. defaults to 1000
Returns ReadableStream stream (in objectMode) of parsed items
parseFile
Read and parse a GTF file from the filesystem.
Parameters
filenamestring the filename of the file to parseoptionsObject optional options objectoptions.encodingstring the file's string encoding, defaults to 'utf8'options.parseAllboolean default false. if true, will parse all items. overrides other flagsoptions.parseFeaturesboolean default trueoptions.parseDirectivesboolean default falseoptions.parseCommentsboolean default falseoptions.parseSequencesboolean default trueoptions.bufferSizeNumber maximum number of GTF lines to buffer. defaults to 1000
Returns ReadableStream stream (in objectMode) of parsed items
parseStringSync
Synchronously parse a string containing GTF and return an arrayref of the parsed items.
Parameters
Returns Array array of parsed features, directives, and/or comments
formatSync
Format an array of GTF items (features,directives,comments) into string of GTF. Does not insert synchronization (###) marks. Does not insert directive if it's not already there.
Parameters
items
Returns String the formatted GTF
formatStream
Format a stream of items (of the type produced by this script) into a stream of GTF text.
Inserts synchronization (###) marks automatically.
Parameters
optionsObject
formatFile
Format a stream of items (of the type produced by this script) into a GTF file and write it to the filesystem.
Inserts synchronization (###) marks and a ##gtf directive automatically (if one is not already present).
Parameters
streamReadableStream the stream to write to the filefilenameString the file path to write tooptionsObject (optional, default{})
Returns Promise promise for the written filename
util
Table of Contents
- util
- unescape
- _escape
- escapeColumn
- parseAttributes
- parseFeature
- parseDirective
- formatAttributes
- formatFeature
- formatDirective
- formatComment
- formatSequence
- formatItem
util
unescape
Unescape a string/text value used in a GTF attribute. Textual attributes should be surrounded by double quotes source info: https://mblab.wustl.edu/GTF22.html https://en.wikipedia.org/wiki/Gene_transfer_format
Parameters
sString
Returns String
_escape
Escape a value for use in a GTF attribute value.
Parameters
regexsString
Returns String
escapeColumn
Escape a value for use in a GTF column value.
Parameters
sString
Returns String
parseAttributes
Parse the 9th column (attributes) of a GTF feature line.
Parameters
attrStringString
Returns Object
parseFeature
Parse a GTF feature line.
Parameters
lineString returns the parsed line in an object
parseDirective
Parse a GTF directive/comment line.
Parameters
lineString
Returns Object the information in the directive
formatAttributes
Format an attributes object into a string suitable for the 9th column of GTF.
Parameters
attrsObject
formatFeature
Format a feature object or array of feature objects into one or more lines of GTF.
Parameters
featureOrFeatures
formatDirective
Format a directive into a line of GTF.
Parameters
directiveObject
Returns String
formatComment
Format a comment into a GTF comment. Yes I know this is just adding a # and a newline.
Parameters
commentObject
Returns String
formatSequence
Format a sequence object as FASTA
Parameters
seqObject
Returns String formatted single FASTA sequence
formatItem
Format a directive, comment, or feature, or array of such items, into one or more lines of GTF.
Parameters
Notes and resources
- This is an adaptation of the JBrowse GTF parser
- GTF docs
License
MIT © Robert Buels