0.1.1 • Published 8 years ago

seq-tools v0.1.1

Weekly downloads
3
License
(Apache-2.0 OR MI...
Repository
github
Last release
8 years ago

Table of Contents

formats

The formats collection of modules contain a variety of more specifc types (see members for types). These should draw heavily on general objects and extend them whenever possible.

compression

The compression subset of formats contains compression formats

bgzf

classes for accessing bgzf compression

BGZFBlock

Give on-demand block data and stats and take either unzipped or zipped data as an input

Parameters

  • options Object
    • options.data Buffer data to use to make the BGZF block. must be smaller than max_block_size
    • options.archive Buffer data is already compressed
    • options.level Number compression level (default 9)
archive

getter

Returns Buffer archive - bgzf compressed data

data

getter

Returns Buffer data - uncompressed data

ToBGZFBlocks

Extends Transform

Stream data into BGZF Blocks for writing data, objects emitted have the property 'archive' that contains a Buffer of compressed data

Parameters

BGZFBlockCache

buffer data, and allow emitting entire gzip blocks at a time

Parameters

add

method to add put more data in the buffer

Parameters

remove

remove data from the buffer

Returns Buffer data_chunk - returns false if not enough data is ready

BGZFDecompress

Extends Transform

stream compressed data in and uncompressed data out

Parameters

gunzip_block

PRE: Take a datablock begins with a gzip and can contain extra data POST: Return the uncompressed data and the extra data separately

Parameters

Returns Object outdata - uncompressed data in an Object {data:Buffer,remainder:buffer}

BGZFCompress

Extends Transform

stream uncompressed data in and compressed data out

Parameters

gzip_block

PRE: Take the uncompressed data not checked for max length since it must be fit going into this POST: Output bgzf zipped block

Parameters

  • indata Buffer uncompressed data
  • inlevel Number compression level recommend 9

Returns Buffer outdata - compressed data

BGZFDecompressionCache

Cache class for decompressing BGZF files

end

trigger the flag that the data has ended

has_data

getter

Returns bool has_data - true if there is data in there

ready

getter

Returns bool ready - data is ready to be read

write

write data to the buffer

Parameters

  • indata Buffer data is added to the cache
read

read / decompress data from the cache ... just one block. you need to call read repeatedly to read all if there is a lot

Returns Buffer outdata - data is decompressed from one block

alignment

The alignment subset of formats contain specific types of alignments

bam

Extends Transform

classes for accessing BAM files

DecompressedToBAMObj

Extends Transform

Take decompessed data and transform it to a bam object stream

Parameters

  • options Object Transform options are passed to Transform parent
BAMObjToDecompressed

Extends Transform

To facilite writing, convert bam objects into an uncompressed (pre bgzf compression) stream

Parameters

  • options Object passed to Transform class
BAMInputStream

Extends Transform

Given input stream, output header and BAM objects

Parameters

  • options Object passed on to parent Transform
BAMHeader

Extends formats.alignment.sam.SAMHeader

A BAM Header

Parameters

  • options Object
    • options.bam_data Buffer can give the decompressed bytes
add_ref

Add a reference seqence name and corresponding length

Parameters

n_ref

Getter for the number of reference sequences

Returns Number number of reference sequences

n_ref

Setter for number of reference sequences present

Parameters

  • n_ref Number as Int32LE
  • indata
magic

Setter for the magic number. More of a sanity check because we know what it should be and this will throw an error if its incorrect.

Parameters

  • magic Number number as UInt32LE number (Not Buffer)
  • indata
text

Setter for text data of a header

Parameters

set_from_sam_header2

This may be deprecciated

Parameters

  • n_ref Number as Int32LE
  • intext
bam_data

Getter for the bam_data. return a buffer of BAM bytes. The start of the BAM file but not yet bgzf compressed.

Returns Buffer bam header data

BAM

Extends sam.SAM

BAM is a child of SAM pretty much every getter of a sam should get overridden

Parameters

  • options Object
    • options.bam_data Buffer required uncompressed bam data
    • options.header BAMHeader required BAMHeader object
bam_data

Getter for the bam_data. return a buffer of BAM bytes. The start of the BAM file but not yet bgzf compressed. Easy since it was required to create the BAM object in the first place.

Returns Buffer bam entry data

refName

this one is always cached in case we are working as a SAM

Returns String rname

next_refName

getter for the next_refName

Returns String next_rname

block_size

A BAM specific property

Returns Number block_size in bytes

refID

A BAM specific property the refID is the index in the header

Returns Number refID index

pos

The position in the reference sequence for first matching

Returns Number pos

name_l

Getter for property of the length of the query name

Returns Number name_l

mapq

Getter for MAPQ

Returns Number MAPQ

bin

Getter for bin

Returns Number bin

flag

Getter for flag

Returns Number flag

n_cigar_op

Getter for number of cigar Ops

Returns Number cigar op count

l_seq

Getter for the sequence length

Returns Number query sequence length

next_refID

Getter for the next_refID index into header reference sequences

Returns Number next_refID

next_pos

Getter for the next_pos

Returns Number next_pos

tlen

Getter for the target sequence length

Returns Number target sequence length

read_name

Getter for the query name. gets buffered upon reading

Returns String query name

cigar

Getter for the cigar, gets cached on read

Returns CIGAR cigar object

seq

Getter for seq, gets cached on read

Returns BAMSeq BAMSeq object

qual

Getter for quality

Returns String Quality as a string

auxillary

Getter for auxillary data

Returns BAMAuxillary Auxillary information object

BAMAuxillary

BAM Auxillary is provides a way to access the data inside a bam

load_data

load data into the BAMAuxillary object

Parameters

tags

getter for tag

Returns Array tag information in list [tag,valuetype,val,....]

toString

Get the string representation of the auxillary data

Returns String auxillary_data

BAMSeq

BAMSeq is an object to access the sequence data in the bam

load_data

load data into the BAMSeq object

Parameters

toString

Return the string representation of the sequence

Returns String sequence

CIGAR

CIGAR is an object to access the cigar string in the bam

toString

Return the string representation of the CIGAR

Returns String cigar string

load_data

load data into the CIGAR object

Parameters

BamDataReader

A class to read data from a BAM. is a helper class to other streamers

Parameters

  • header BAMHeader (can be left unset of undefined)
add

add an arbitrary amount of data to the data buffer

Parameters

remove

add an arbitrary amount of data to the data buffer or remove available data that could be a header or an entry

Returns Object output

Returns Buffer output.data - a bam entry as a Buffer

Returns BAMHeader output.header - a bam header

sam

Extends GenericAlignment

classes for accessing SAM files

SAM

Extends alignment.GenericAlignment

SAM is a version of a GenericAlignment

Parameters

qseq

override query sequence getter for generic access

Returns String qseq - query sequence

rseq

Get or Set the reference sequence

Returns Sequence rseq - reference sequence

qname

Return the query name

Returns String qname

direction

provide the direction

Returns Char direction - the strand + or -

to_query_map

get the mapping on the query

Parameters

  • options Object can pass options to AlignmentDerivedMapping

Returns AlignmentDerviedMapping mapping

to_reference_map

get the mapping on the reference

Parameters

  • options Object can pass options to AlignmentDerivedMapping

Returns AlignmentDerviedMapping mapping

sam_line

set/get the sam line

Parameters

  • intext String provide a sam line

Returns String sam_line

refName

getter

Returns String refName - the name of the referenc sequence

next_refName

getter

Returns String next_refName - the name of the next referenc sequence

pos

getter

Returns Number pos - the 1-based index where first aligned base is

mapq

getter

Returns Number mapq - the number representation of the mapping quality

flag

getter

Returns Number flag - the number representation of the map

next_pos

getter

Returns Number next_pos - the 1-based index of the next position

tlen

getter

Returns Number tlen - the length of the target (reference) sequence

read_name

getter

Returns String read_name - the query name

cigar

getter - cached if called

Returns CIGAR cigar - return a cigar object

seq

getter

Returns String seq - return a seq string representation

qual

getter

Returns String qual - return a qual string representation

auxillary

getter

Returns SAMAuxillary auxillary - return an object for accessing auxillary data

bam_data

getter - very difficult function constructs the bam dat afrom a buffer (prior to bgzf compression)

Returns Buffer bam_data - return byte-wise buffer data

SAMAuxillary

SAMAuxillary provides access to the auxillary tags and data in the SAM file

Parameters

  • aux Array An array of auxillary tags/information
tags

getter - get an array of exta data. the data (3rd) field of data is Number if its numerical.

Returns Array tags - return an array of data [tag1,type1,data1,...]

bam_data

getter - get uncompressed bytes representing the auxillary info

Returns Buffer bam_data

CIGAR

Class to describe a CIGAR sequence

Parameters

  • cigar_string String input the cigar string as a string type
ops

getter - get cigar opps

Returns Array ops - Ops like [op1,length2,op2,length2...]

bam_data

getter - get uncompressed bytes representing the CIGAR

Returns Buffer bam_data

DataToSAMObj

Extends Transform

Class to read in SAM objects from a data stream - will emit data objects in the form of an Object with either the header property which contains the header object, or the sam_line property with the SAM line string

Parameters

  • options Object options for the transform object
SAMHeader

Class to describe a SAMheader

Parameters

  • options Object options for the transform object
    • options.text Object the header data
n_ref

getter - the number of reference sequences

Returns Number n_ref

bam_data

getter - the uncompressed bam data representation of the header

Returns Buffer bam_data

alignment

This module has the most general classes for defining an alignment. These should be extended whenever possible by more specific alignment formats under formats.alignment.*

GenericAlignment

Parameters

  • options Object
    • options.min_intron Number? The smallest size gap to consider an intron. (optional, default 68)

min_intron

Getter and setter for minimum gap in reference to be considered an intron

Parameters

Returns Number min_intron

ref

Set the reference dictionary

Parameters

  • refDict Object Dictionary keys are chromosome names, values are sequences

cigar

A getter to retrieve the CIGAR string

Returns String CIGAR - is a getter for the cigar string

aligned_length

A getter for the aligned length of the sequence

Returns Number length

psl_line

A getter for the aligned length of the sequence. Requires qseq and rseq or cigar and tlen.

Returns Number length

sam_line

A getter for the SAM format line. Requires qname and rname.

Returns String sam_line

pretty_print

Make a pretty print of the alignment. Requires qseq and rseq.

Parameters

  • linelength Number? how long to make the pretty print line (optional, default 50)

Returns String a pretty printing of the alignment

to_query_map

Convert the alignment into a mapping along the query

Parameters

  • options Object? not necessary to be set to call (optional, default {})

Returns Object Returns and AlignmentDerivedMapping object

to_reference_map

Convert the alignment into a mapping along the reference

Parameters

  • options Object? not necessary to be set to call (optional, default {})

Returns Object Returns and AlignmentDerivedMapping object

aligner

This module contains classes for DOING alignments. See alignment for the general definition of an alignment.

private

Members are not exported

SmithWatermanResults

A class for generating results from a Smith-Waterman aligner

Parameters

get_entry

Get the alignment stored at the input index

Parameters

Returns Object SmithWatermanAlignment

SmithWatermanAlignment

Extends GenericAlignment

A single alignment from among Smith-Waterman aligner results. This is not created by the the user. It is created by SmithWatermanResults

Parameters

qual

getter, but Quality is not set and not available

Returns undefined undefined

qname

getter for name of Query

Returns String qname

rname

getter for name of Reference

Returns String rname

tlen

getter for length of the reference (target) sequence

Returns Number tlen

qseq

getter for query sequence

Returns Object Sequence

rseq

getter for reference sequence

Returns Object Sequence

direction

getter for direction

Returns String direction - Strand +/-

SmithWatermanAligner

A class for performing a local alignment

Parameters

  • options Object
    • options.match Number? (optional, default 2)
    • options.mismatch Number? (optional, default -2)
    • options.gap_open Number? (optional, default -5)
    • options.gap_extend Number? (optional, default -2)
    • options.max_gap Number? (optional, default -10)

align

Execute the alignment

Parameters

  • inputs Object
    • inputs.query Object Query Sequence
    • inputs.reference Object Reference Sequence

Returns Object SmithWatermanResults - returns an object of executing and reteiving smithwaterman alignments

basics

basic objects for very general use

Matrix

Parameters

dim

Get the dimensions of the matrix

Returns Object dimensions object with m and n properties

zero

Set all elements of the matrix to zero

toString

Get a string with what the matrix looks like

Returns String output value

graph

module with classes for describing graphs

private

classes for describing graphs

GenericGraph

Generic graph should be overridden

Parameters

  • options Object
    • options.payload Object add on a payload to a graph
node_count

getter for node_count

Returns Number number of nodes

edge_count

getter for edge_count

Returns Number number of edges

add_node

add a node to the graph. cannot be one thats already been added

Parameters

id

getter for the unique id for this graph

Returns Object id - the uuid4 that was set for this object

id

getter for the unique id for this node

Returns Object id - the uuid4 that was set for this object

id

getter for id

Returns Object uuid

nodes

getter for a list of the nodes in this graph

Returns Array<Object> nodes - list of nodes

nodes

getter for dictionary of nodes

Returns Object node_id_dictionary

split_unconnected

get a list of graphs that are not connected that are subsets of the original

Returns Array<Object> graphs - list of unconnected graphs

get_edges_by_node

get a list of edges that are associated with a node

Parameters

Returns Array<Object> edges - list of edges

get_connected_nodes

get a list of nodes that are connected to a node. this can be called recursively

Parameters

  • innode Object input a node
  • traversed Object? set of nodes that have been traversd

Returns Array<Object> edges - list of nodes

name

getter for the name

Returns String name

payload

getter for the payload

Returns Object payload

payload

getter for the payload

Returns Object payload

node1

getter for node1

Returns Object node1

node2

getter for node1

Returns Object node2

DirectedGraph

Extends GenericGraph

Class for a directed graph

Parameters

  • options Object is also passed to GenericGraph constructor

add_edge

Extends graph.DirectedGraph

Add an edge to the graph

Parameters

UndirectedGraph

Extends GenericGraph

Class for an undirected graph

Parameters

  • options Object is also passed to GenericGraph constructor

add_edge

Add an edge to the graph

Parameters

Node

Class for a node on a graph

Parameters

  • options Object
    • options.payload Object set a payload if you like
    • options.name Object set a name if you want

Edge

Class for an edge on a graph

Parameters

mapping

Extends Bed

This module contains the most general classes for describing how an object is mapped to a sequence. This would include the very general basis for gpd and bed12 formats etc...

Exon

Extends Bed

Only mapping a single exon, requires a direction

GenericMapping

A generic mapping is like a transcript but does not require direction

Transcript

Extends GenericMapping

Transcript is a direction specific mapping

AlignmentDerivedMapping

Extends GenericMapping

GenericTranscriptome

Gene

Extends GenericTranscriptome

A canonical gene is a collection of transcripts at a single locus in a single direction. This is a specific type of transcriptome.

random

classes for generating random sequences or other randomness

RandomSeeded

uuid4

range

classes for dealing with range data, including genomic range and building from range, to genomic range, to exons etc...

Range

Bed

Extends Range

BedArray

An object to work with bed arrays

sequence

module with classes to describe sequence data

GenericNucleotideSequence

rc_nt

Parameters

  • nt

NucleotideSequence2Bit

Extends GenericNucleotideSequence

splice

classes for analyzing splices and isoform composition

SpliceAnalysis

streams

module with classes to help work with streams

PipeFitterGeneric

A Generic buffer for data in and out to fit output to a certain size

constructor

length of cached data

Type: Number

length

length of cached data

Type: Number

drain

remove any remaining bits

Returns Buffer

add

Add data to the buffer

Parameters

  • indata

putback

Add data to the read-end of the buffer

Parameters

  • indata

PipeFitterLowpass

Extends PipeFitterGeneric

Ensure that chunks of are the largest data that can be read from the pipe

Parameters

  • maxsize Number maximum size of output chunks

PipeFitterHighpass

Extends PipeFitterGeneric

Ensure chunks are at least