seq-tools v0.1.1
Table of Contents
- formats
- alignment
- aligner
- basics
- graph
- mapping
- random
- range
- sequence
- splice
- streams
formats
The formats collection of modules contain a variety of more specifc types (see members for types). These should draw heavily on general objects and extend them whenever possible.
compression
The compression subset of formats contains compression formats
bgzf
classes for accessing bgzf compression
BGZFBlock
Give on-demand block data and stats and take either unzipped or zipped data as an input
Parameters
options
Object
archive
getter
Returns Buffer archive - bgzf compressed data
data
getter
Returns Buffer data - uncompressed data
ToBGZFBlocks
Extends Transform
Stream data into BGZF Blocks for writing data, objects emitted have the property 'archive' that contains a Buffer of compressed data
Parameters
options
Object
BGZFBlockCache
buffer data, and allow emitting entire gzip blocks at a time
Parameters
options
Object
add
method to add put more data in the buffer
Parameters
indata
Buffer
remove
remove data from the buffer
Returns Buffer data_chunk - returns false if not enough data is ready
BGZFDecompress
Extends Transform
stream compressed data in and uncompressed data out
Parameters
options
Object
gunzip_block
PRE: Take a datablock begins with a gzip and can contain extra data POST: Return the uncompressed data and the extra data separately
Parameters
indata
Buffer compressed data
Returns Object outdata - uncompressed data in an Object {data:Buffer,remainder:buffer}
BGZFCompress
Extends Transform
stream uncompressed data in and compressed data out
Parameters
options
Object
gzip_block
PRE: Take the uncompressed data not checked for max length since it must be fit going into this POST: Output bgzf zipped block
Parameters
Returns Buffer outdata - compressed data
BGZFDecompressionCache
Cache class for decompressing BGZF files
end
trigger the flag that the data has ended
has_data
getter
Returns bool has_data - true if there is data in there
ready
getter
Returns bool ready - data is ready to be read
write
write data to the buffer
Parameters
indata
Buffer data is added to the cache
read
read / decompress data from the cache ... just one block. you need to call read repeatedly to read all if there is a lot
Returns Buffer outdata - data is decompressed from one block
alignment
The alignment subset of formats contain specific types of alignments
bam
Extends Transform
classes for accessing BAM files
DecompressedToBAMObj
Extends Transform
Take decompessed data and transform it to a bam object stream
Parameters
options
Object Transform options are passed to Transform parent
BAMObjToDecompressed
Extends Transform
To facilite writing, convert bam objects into an uncompressed (pre bgzf compression) stream
Parameters
options
Object passed to Transform class
BAMInputStream
Extends Transform
Given input stream, output header and BAM objects
Parameters
options
Object passed on to parent Transform
BAMHeader
Extends formats.alignment.sam.SAMHeader
A BAM Header
Parameters
add_ref
Add a reference seqence name and corresponding length
Parameters
n_ref
Getter for the number of reference sequences
Returns Number number of reference sequences
n_ref
Setter for number of reference sequences present
Parameters
n_ref
Number as Int32LEindata
magic
Setter for the magic number. More of a sanity check because we know what it should be and this will throw an error if its incorrect.
Parameters
magic
Number number as UInt32LE number (Not Buffer)indata
text
Setter for text data of a header
Parameters
header
String textindata
set_from_sam_header2
This may be deprecciated
Parameters
n_ref
Number as Int32LEintext
bam_data
Getter for the bam_data. return a buffer of BAM bytes. The start of the BAM file but not yet bgzf compressed.
Returns Buffer bam header data
BAM
Extends sam.SAM
BAM is a child of SAM pretty much every getter of a sam should get overridden
Parameters
options
Objectoptions.bam_data
Buffer required uncompressed bam dataoptions.header
BAMHeader required BAMHeader object
bam_data
Getter for the bam_data. return a buffer of BAM bytes. The start of the BAM file but not yet bgzf compressed. Easy since it was required to create the BAM object in the first place.
Returns Buffer bam entry data
refName
this one is always cached in case we are working as a SAM
Returns String rname
next_refName
getter for the next_refName
Returns String next_rname
block_size
A BAM specific property
Returns Number block_size in bytes
refID
A BAM specific property the refID is the index in the header
Returns Number refID index
pos
The position in the reference sequence for first matching
Returns Number pos
name_l
Getter for property of the length of the query name
Returns Number name_l
mapq
Getter for MAPQ
Returns Number MAPQ
bin
Getter for bin
Returns Number bin
flag
Getter for flag
Returns Number flag
n_cigar_op
Getter for number of cigar Ops
Returns Number cigar op count
l_seq
Getter for the sequence length
Returns Number query sequence length
next_refID
Getter for the next_refID index into header reference sequences
Returns Number next_refID
next_pos
Getter for the next_pos
Returns Number next_pos
tlen
Getter for the target sequence length
Returns Number target sequence length
read_name
Getter for the query name. gets buffered upon reading
Returns String query name
cigar
Getter for the cigar, gets cached on read
Returns CIGAR cigar object
seq
Getter for seq, gets cached on read
Returns BAMSeq BAMSeq object
qual
Getter for quality
Returns String Quality as a string
auxillary
Getter for auxillary data
Returns BAMAuxillary Auxillary information object
BAMAuxillary
BAM Auxillary is provides a way to access the data inside a bam
load_data
load data into the BAMAuxillary object
Parameters
indata
Buffer
tags
getter for tag
Returns Array tag information in list [tag,valuetype,val,....]
toString
Get the string representation of the auxillary data
Returns String auxillary_data
BAMSeq
BAMSeq is an object to access the sequence data in the bam
load_data
load data into the BAMSeq object
Parameters
toString
Return the string representation of the sequence
Returns String sequence
CIGAR
CIGAR is an object to access the cigar string in the bam
toString
Return the string representation of the CIGAR
Returns String cigar string
load_data
load data into the CIGAR object
Parameters
indata
Buffer
BamDataReader
A class to read data from a BAM. is a helper class to other streamers
Parameters
header
BAMHeader (can be left unset of undefined)
add
add an arbitrary amount of data to the data buffer
Parameters
indata
Buffer
remove
add an arbitrary amount of data to the data buffer or remove available data that could be a header or an entry
Returns Object output
Returns Buffer output.data - a bam entry as a Buffer
Returns BAMHeader output.header - a bam header
sam
Extends GenericAlignment
classes for accessing SAM files
SAM
Extends alignment.GenericAlignment
SAM is a version of a GenericAlignment
Parameters
qseq
override query sequence getter for generic access
Returns String qseq - query sequence
rseq
Get or Set the reference sequence
Returns Sequence rseq - reference sequence
qname
Return the query name
Returns String qname
direction
provide the direction
Returns Char direction - the strand + or -
to_query_map
get the mapping on the query
Parameters
options
Object can pass options to AlignmentDerivedMapping
Returns AlignmentDerviedMapping mapping
to_reference_map
get the mapping on the reference
Parameters
options
Object can pass options to AlignmentDerivedMapping
Returns AlignmentDerviedMapping mapping
sam_line
set/get the sam line
Parameters
intext
String provide a sam line
Returns String sam_line
refName
getter
Returns String refName - the name of the referenc sequence
next_refName
getter
Returns String next_refName - the name of the next referenc sequence
pos
getter
Returns Number pos - the 1-based index where first aligned base is
mapq
getter
Returns Number mapq - the number representation of the mapping quality
flag
getter
Returns Number flag - the number representation of the map
next_pos
getter
Returns Number next_pos - the 1-based index of the next position
tlen
getter
Returns Number tlen - the length of the target (reference) sequence
read_name
getter
Returns String read_name - the query name
cigar
getter - cached if called
Returns CIGAR cigar - return a cigar object
seq
getter
Returns String seq - return a seq string representation
qual
getter
Returns String qual - return a qual string representation
auxillary
getter
Returns SAMAuxillary auxillary - return an object for accessing auxillary data
bam_data
getter - very difficult function constructs the bam dat afrom a buffer (prior to bgzf compression)
Returns Buffer bam_data - return byte-wise buffer data
SAMAuxillary
SAMAuxillary provides access to the auxillary tags and data in the SAM file
Parameters
aux
Array An array of auxillary tags/information
tags
getter - get an array of exta data. the data (3rd) field of data is Number if its numerical.
Returns Array tags - return an array of data [tag1,type1,data1,...]
bam_data
getter - get uncompressed bytes representing the auxillary info
Returns Buffer bam_data
CIGAR
Class to describe a CIGAR sequence
Parameters
cigar_string
String input the cigar string as a string type
ops
getter - get cigar opps
Returns Array ops - Ops like [op1,length2,op2,length2...]
bam_data
getter - get uncompressed bytes representing the CIGAR
Returns Buffer bam_data
DataToSAMObj
Extends Transform
Class to read in SAM objects from a data stream - will emit data objects in the form of an Object with either the header property which contains the header object, or the sam_line property with the SAM line string
Parameters
options
Object options for the transform object
SAMHeader
Class to describe a SAMheader
Parameters
n_ref
getter - the number of reference sequences
Returns Number n_ref
bam_data
getter - the uncompressed bam data representation of the header
Returns Buffer bam_data
alignment
This module has the most general classes for defining an alignment. These should be extended whenever possible by more specific alignment formats under formats.alignment.*
GenericAlignment
Parameters
options
Objectoptions.min_intron
Number? The smallest size gap to consider an intron. (optional, default68
)
min_intron
Getter and setter for minimum gap in reference to be considered an intron
Parameters
min_intron
Number
Returns Number min_intron
ref
Set the reference dictionary
Parameters
refDict
Object Dictionary keys are chromosome names, values are sequences
cigar
A getter to retrieve the CIGAR string
Returns String CIGAR - is a getter for the cigar string
aligned_length
A getter for the aligned length of the sequence
Returns Number length
psl_line
A getter for the aligned length of the sequence. Requires qseq and rseq or cigar and tlen.
Returns Number length
sam_line
A getter for the SAM format line. Requires qname and rname.
Returns String sam_line
pretty_print
Make a pretty print of the alignment. Requires qseq and rseq.
Parameters
linelength
Number? how long to make the pretty print line (optional, default50
)
Returns String a pretty printing of the alignment
to_query_map
Convert the alignment into a mapping along the query
Parameters
options
Object? not necessary to be set to call (optional, default{}
)
Returns Object Returns and AlignmentDerivedMapping object
to_reference_map
Convert the alignment into a mapping along the reference
Parameters
options
Object? not necessary to be set to call (optional, default{}
)
Returns Object Returns and AlignmentDerivedMapping object
aligner
This module contains classes for DOING alignments. See alignment for the general definition of an alignment.
private
Members are not exported
SmithWatermanResults
A class for generating results from a Smith-Waterman aligner
Parameters
H_matrix_positve_strand
Objectsequence_1
Objectsequence_2
ObjectH_matrix_negative_strand
Objectsequence_1_reverse_complement
Object
get_entry
Get the alignment stored at the input index
Parameters
index
Number
Returns Object SmithWatermanAlignment
SmithWatermanAlignment
Extends GenericAlignment
A single alignment from among Smith-Waterman aligner results. This is not created by the the user. It is created by SmithWatermanResults
Parameters
options
Object These must be set when generating an the alignment
qual
getter, but Quality is not set and not available
Returns undefined undefined
qname
getter for name of Query
Returns String qname
rname
getter for name of Reference
Returns String rname
tlen
getter for length of the reference (target) sequence
Returns Number tlen
qseq
getter for query sequence
Returns Object Sequence
rseq
getter for reference sequence
Returns Object Sequence
direction
getter for direction
Returns String direction - Strand +/-
SmithWatermanAligner
A class for performing a local alignment
Parameters
options
Object
align
Execute the alignment
Parameters
Returns Object SmithWatermanResults - returns an object of executing and reteiving smithwaterman alignments
basics
basic objects for very general use
Matrix
Parameters
dim
Get the dimensions of the matrix
Returns Object dimensions object with m and n properties
zero
Set all elements of the matrix to zero
toString
Get a string with what the matrix looks like
Returns String output value
graph
module with classes for describing graphs
private
classes for describing graphs
GenericGraph
Generic graph should be overridden
Parameters
node_count
getter for node_count
Returns Number number of nodes
edge_count
getter for edge_count
Returns Number number of edges
add_node
add a node to the graph. cannot be one thats already been added
Parameters
input_node
Object
id
getter for the unique id for this graph
Returns Object id - the uuid4 that was set for this object
id
getter for the unique id for this node
Returns Object id - the uuid4 that was set for this object
id
getter for id
Returns Object uuid
nodes
getter for a list of the nodes in this graph
Returns Array<Object> nodes - list of nodes
nodes
getter for dictionary of nodes
Returns Object node_id_dictionary
split_unconnected
get a list of graphs that are not connected that are subsets of the original
Returns Array<Object> graphs - list of unconnected graphs
get_edges_by_node
get a list of edges that are associated with a node
Parameters
innode
Object input a node
Returns Array<Object> edges - list of edges
get_connected_nodes
get a list of nodes that are connected to a node. this can be called recursively
Parameters
Returns Array<Object> edges - list of nodes
name
getter for the name
Returns String name
payload
getter for the payload
Returns Object payload
payload
getter for the payload
Returns Object payload
node1
getter for node1
Returns Object node1
node2
getter for node1
Returns Object node2
DirectedGraph
Extends GenericGraph
Class for a directed graph
Parameters
options
Object is also passed to GenericGraph constructor
add_edge
Extends graph.DirectedGraph
Add an edge to the graph
Parameters
input_edge
Object
UndirectedGraph
Extends GenericGraph
Class for an undirected graph
Parameters
options
Object is also passed to GenericGraph constructor
add_edge
Add an edge to the graph
Parameters
input_edge
Object
Node
Class for a node on a graph
Parameters
options
Object
Edge
Class for an edge on a graph
Parameters
mapping
Extends Bed
This module contains the most general classes for describing how an object is mapped to a sequence. This would include the very general basis for gpd and bed12 formats etc...
Exon
Extends Bed
Only mapping a single exon, requires a direction
GenericMapping
A generic mapping is like a transcript but does not require direction
Transcript
Extends GenericMapping
Transcript is a direction specific mapping
AlignmentDerivedMapping
Extends GenericMapping
GenericTranscriptome
Gene
Extends GenericTranscriptome
A canonical gene is a collection of transcripts at a single locus in a single direction. This is a specific type of transcriptome.
random
classes for generating random sequences or other randomness
RandomSeeded
uuid4
range
classes for dealing with range data, including genomic range and building from range, to genomic range, to exons etc...
Range
Bed
Extends Range
BedArray
An object to work with bed arrays
sequence
module with classes to describe sequence data
GenericNucleotideSequence
rc_nt
Parameters
nt
NucleotideSequence2Bit
Extends GenericNucleotideSequence
splice
classes for analyzing splices and isoform composition
SpliceAnalysis
streams
module with classes to help work with streams
PipeFitterGeneric
A Generic buffer for data in and out to fit output to a certain size
constructor
length of cached data
Type: Number
length
length of cached data
Type: Number
drain
remove any remaining bits
Returns Buffer
add
Add data to the buffer
Parameters
indata
putback
Add data to the read-end of the buffer
Parameters
indata
PipeFitterLowpass
Extends PipeFitterGeneric
Ensure that chunks of are the largest data that can be read from the pipe
Parameters
maxsize
Number maximum size of output chunks
PipeFitterHighpass
Extends PipeFitterGeneric
Ensure chunks are at least
8 years ago