0.0.1 • Published 8 years ago

bloom-harvesting-neo4j-import v0.0.1

Weekly downloads
3
License
ISC
Repository
gitlab
Last release
8 years ago

Neo4J Import

This project contains classes required to transform TripleModel instances to table-like structures required to perform batch import of data in the Neo4J database.

Data Model

This section describes main model entities and their relations. Entities:

  • doc - documents; it could be a tweet, fb-post, blog post, comment etc
  • actor - a person or a service creating documents; it could be also a fb-group or a media-site generating posts

Each entity contains the following fields:

  • doc

    • uri - the main URI uniquely identifying the document
    • source - source of the document (twitter, facebook, web, etc)
    • date - date of the creation of this document
    • href - an URL giving access to the document
    • content - content of the document
    • tags - character string representing a comma-separated list of tags

    • type - type of the document; from synthesio

    • country - country of the document; could be empty; from synthesio
    • language - document language; from synthesio
    • sentiment - sentiment associated with the document; positive/negative/neutral/undefined
    • influence_document - numerical influence value; extracted from synthesio
    • influence_author - numerical influence value for the document author; from synthesio
    • followers - number of author followers; extracted from synthesio
    • favorits - likes etc
    • retweets - number of retweets; from synthesio
  • actor

    • uri - an URI uniquely identifying the actor
    • source - twitter/facebook/web/...
    • type - type of the actor
    • name - human readable actor's name
    • description - actor's descrition
    • key - key from the original social network
  • topic

    • uri - unique identifier of the topic

Relations:

  • author - relation between the document and the actor authoring the document
  • refersTo - relation between two documents; ex: one document contains a http reference to another one
  • mention - a document referencing an actor; ex: mentions in twitter like @john
  • partOf - relation between document and topic

These relations can be represented as following:

The Main Package Class

The main class of this package is the Neo4JImportBatchGenerator. The responsibility of this class is transformation of TripleModel instances to individual entities and relations.

0.1.0

8 years ago

0.0.1

8 years ago