0.0.1 • Published 10 years ago

bloom-harvesting-neo4j-import v0.0.1

Weekly downloads
3
License
ISC
Repository
gitlab
Last release
10 years ago

Neo4J Import

This project contains classes required to transform TripleModel instances to table-like structures required to perform batch import of data in the Neo4J database.

Data Model

This section describes main model entities and their relations. Entities:

  • doc - documents; it could be a tweet, fb-post, blog post, comment etc
  • actor - a person or a service creating documents; it could be also a fb-group or a media-site generating posts

Each entity contains the following fields:

  • doc

    • uri - the main URI uniquely identifying the document
    • source - source of the document (twitter, facebook, web, etc)
    • date - date of the creation of this document
    • href - an URL giving access to the document
    • content - content of the document
    • tags - character string representing a comma-separated list of tags

    • type - type of the document; from synthesio

    • country - country of the document; could be empty; from synthesio
    • language - document language; from synthesio
    • sentiment - sentiment associated with the document; positive/negative/neutral/undefined
    • influence_document - numerical influence value; extracted from synthesio
    • influence_author - numerical influence value for the document author; from synthesio
    • followers - number of author followers; extracted from synthesio
    • favorits - likes etc
    • retweets - number of retweets; from synthesio
  • actor

    • uri - an URI uniquely identifying the actor
    • source - twitter/facebook/web/...
    • type - type of the actor
    • name - human readable actor's name
    • description - actor's descrition
    • key - key from the original social network
  • topic

    • uri - unique identifier of the topic

Relations:

  • author - relation between the document and the actor authoring the document
  • refersTo - relation between two documents; ex: one document contains a http reference to another one
  • mention - a document referencing an actor; ex: mentions in twitter like @john
  • partOf - relation between document and topic

These relations can be represented as following:

The Main Package Class

The main class of this package is the Neo4JImportBatchGenerator. The responsibility of this class is transformation of TripleModel instances to individual entities and relations.

0.1.0

10 years ago

0.0.1

10 years ago