0.0.7 • Published 5 years ago

@peterwmwong/gto v0.0.7

Weekly downloads
7
License
-
Repository
github
Last release
5 years ago

GTO: Gremlin TypeScript ORM

WARNING: This project is an experiment and has not been put into production yet.

Developer Getting Started

npm ci
npm run test-db-build-docker-image
npm run test-db-start
npm run test

Enforced consistency and correctness

Currently, our with repository methods are ad-hoc groupings of raw read/write DB queries. Hard to enforce consistency or correctness.

Example: IngestionRow's rowNumber

Excel import path added rowNumber property, IRI/CSV import path did not. If repositories were Object Oriented that have a consistent read/write view of properties, this would not have happened.

Example: IngestionRow's rowNumber PART 2.

Mike and I attempted to add setting rowNumber in the IRI/CSV import path, but only found out later it was incorrectly set as a string instead of a number... and effectively causing ingestion row chunking to take forever/blow up.

Benefits from GTO

  • Nodes and Edges are created and filtered with the correct properties with the correct types
  • Traversals between Nodes and Edges are always correct
    • ex. Prevent accidentally going from Ingestion to IngestionRow through the wrong edge (HAS_???)
    • ex. Prevent accidentally using the wrong direction (in_? out?)

FUTURE IDEA: DB/Query Metrics/Statistics

  • Individual
  • Aggregate
    • What are the longest taking queries?
    • What are the most frequent queries?
    • What are the biggest queries?

FUTURE IDEA: Automated DB validation

It is still possible for the database's structure to be tampered with outside of the application (JupyterHub, direct '/gremlin' access).

As GTO provides a single source of truth/schema for the DB, we could easily build a script that runs through each GTO Node, Edge, properties and make sure we're still in sync/valid.

  • ex. Using Node.name and new Node(g).properties, query nodes that don't have all the required properties, mis-typed properties, extra-properties, etc.

A more accessible Graph DB

Currently, the learning curve to enable Product/QA/Developer to access data in the DB is steep for a number of reasons:

  • Gremlin Querying
    • Not widely known as other DB querying languages (ex. SQL)
    • Less Stack Overflows
    • Less Documentation
    • Little-to-no tooling support (is this gremlin query syntactically correct?)
  • No Schema
    • Unlike SQL DBs, where out-of-the-box tooling can surface tables, columns (name, type), relationships between tables... Neptune does not.
    • This makes it hard to even know where to begin when trying to access data:
      • What nodes/edges are available?
      • What properties for nodes/edges and their types (number? string?)
      • Which direction is the edge? (inE? outE? in_? out?)
    • Currently, the structure of the Graph DB is enforced by our code.
    • Even worse, the code currently does not have a single-source-of-truth about which nodes/edges nor the properties (name/type) on nodes/edges.
  • Constants
    • Labels for Nodes/Edges and property names are mostly in flat "lists" of constants
    • Incredibly easy to use the wrong constant, in the wrong place. Nothing stopping you from trying to use P_VERTEX_TYPE when querying against an Edge.

Benefits from GTO

  • Single source of truth for a Node/Edge and relationship between Nodes and Edges
  • Type/Editor driven querying
    • Type information provides users accurate hints on what's possible and valid
      • ex. Ingestion. options - all, byId
      • ex. Ingestion.all(g, { options - source (property)
      • ex. Ingestion.all(g, {source: 'Annotator'}). options - having, count, fetchOne, fetchAll, IngestionRows.

Discoveries

Gremlin: GraphTraversalSource, GraphTraversal, Statics (Anonymous Traversal) have different steps.

StepGraphTraversalSourceGraphTraversalStatics
E
V
addE
addV
toList
iterate
next
0.0.7

5 years ago

0.0.6

5 years ago

0.0.5

5 years ago

0.0.4

5 years ago

0.0.3

5 years ago

0.0.2

5 years ago

0.0.1

5 years ago