gtdb-local v0.0.13-0
gtdb-local
Javascript implementation of a pouchDB to host the data of GTDB locally. Written in Typescript.
GTDB Version: 95
Please cite the original authors:
Parks, D. H., et al. (2018). "A standardized bacterial taxonomy based on genome phylogeny substantially revises the tree of life." Nature Biotechnology.
Please, read this first
Upon installing this package, it will build GTDB using PouchDB in a hidden folder .gtdb-local
on your home directory. It can take almost 400 MB of space in the hard drive.
If you want to install the package but not set up the database, you need to pass a variable to skip postinstall
setup. Take a looks at the install session
Also, this is a very early version. Please use at your own risk.
Install
We can install the module like this:
npm install gtdb-local
This will install the module and set up the database files in the local home directory.
If you want to install just the package:
skip_setup='yes' npm install gtdb-local
After installing like this, there will be no data. We must download the data from the GTDB website and then build the DB and index for faster search.
Usage (assuming install and setup)
Select Protobacterial genomes
import { Gtdb } from 'gtdb-local'
const gtdb = new Gtdb()
gtdb.connectDB()
.then((db) => {
return db.find({
selector: {
p: 'Proteobacteria'
}
})
})
.then((data) => {
// do something with Proteobacteria genomes
})
Search taxonomy info for genomes in bulk
We added the GTDB data on PouchDB using genome NCBI (new as of 2019) accession code as the main index.
import { Gtdb } from 'gtdb-local'
const genomeIds = [
'UBA10210',
'RS_GCF_002214165.1',
'UBA10214',
'GB_GCA_001871475.1',
'GB_GCA_001871595.1',
'UBA8261',
'GB_GCA_001871495.1',
'GB_GCA_001871535.1',
'GB_GCA_001889985.1',
'GB_GCA_002763345.1'
]
const gtdb = new Gtdb()
gtdb.connectDB().then((db) => {
const searchOptions = {
include_docs: true,
keys: genomeIds
}
db.allDocs(searchOptions).then((results: any) => {
// do something with the results.
})
Balanced selection of 100 random genomes from Protobacteria
Certain clades of organisms have been sequenced more than others. For this reason, we implemented a balanced sample of genomes under a GTDB node.
Note that we don't need to connect to the DB itself because the algorithm uses the newick tree and not the database.
This is just a wrapper around the Phylogician-TS selectBalancedLeafs()
import { Gtdb } from 'gtdb-local'
const gtdb = new Gtdb()
const data = gtdb.selectBalancedSample('p', 'Proteobacteria', [], 100)
// do something with the 100 genomes randomly selected from Proteobacteria clade
Uninstall
We can uninstall gtdb-local by using npm
npm uninstall gtdb-local
To get rid of the main database files in our home directory, we can just remove ~/.gtdb-local
.
Be careful when removing directories.
Documentation
Todo
- Version control of gtdb data
Written with ❤ in Typescript.
4 years ago
5 years ago
5 years ago
5 years ago
5 years ago
5 years ago
5 years ago
5 years ago
5 years ago
5 years ago
5 years ago
5 years ago
6 years ago
6 years ago
6 years ago
6 years ago
6 years ago
6 years ago
6 years ago
6 years ago
6 years ago
6 years ago
6 years ago
6 years ago
6 years ago
6 years ago
6 years ago