2.5.2 • Published 8 months ago

ldbc-snb-enhancer v2.5.2

Weekly downloads
-
License
MIT
Repository
github
Last release
8 months ago

LDBC SNB Enhancer

Build status Coverage Status npm version

Generates auxiliary data based on an LDBC SNB social network dataset.

For example, it can generate fake names for existing people such as:

<http://www.ldbc.eu/ldbc_socialnet/1.0/data/pers00000000000000000471> a <http://www.ldbc.eu/ldbc_socialnet/1.0/vocabulary/Person>
    <http://www.ldbc.eu/ldbc_socialnet/1.0/vocabulary/firstName> "Zulma";
    <http://www.ldbc.eu/ldbc_socialnet/1.0/vocabulary/lastName> "Tulma";
    <http://www.ldbc.eu/ldbc_socialnet/1.0/vocabulary/hasMaliciousCreator> <http://www.ldbc.eu/ldbc_socialnet/1.0/data/pers00000032985348840411>.

All auxiliary data that is generated is annotated with the predicate http://www.ldbc.eu/ldbc_socialnet/1.0/vocabulary/hasMaliciousCreator, which can refer to an existing person, that acts as a malicious actor.

Installation

$ npm install -g ldbc-snb-enhancer

or

$ yarn global add ldbc-snb-enhancer

Usage

Invoke from the command line

This tool can be used on the command line as ldbc-snb-enhancer, which takes as single parameter the path to a config file:

$ ldbc-snb-enhancer path/to/config.json

Config file

The config file that should be passed to the command line tool has the following JSON structure:

{
  "@context": "https://linkedsoftwaredependencies.org/bundles/npm/ldbc-snb-enhancer/^2.0.0/components/context.jsonld",
  "@id": "urn:ldbc-snb-enhancer:default",
  "@type": "Enhancer",
  "personsPath": "path/to/social_network_person_0_0.ttl",
  "activitiesPath": "path/to/social_network_activity_0_0.ttl",
  "staticPath": "path/to/social_network_static_0_0.ttl",
  "destinationPathData": "path/to/social_network_auxiliary.ttl",
  "logger": {
    "@type": "LoggerStdout"
  },
  "dataSelector": {
    "@type": "DataSelectorRandom",
    "seed": 12345
  },
  "handlers": [
    {
      "@type": "EnhancementHandlerPersonNames",
      "chance": 0.3
    }
  ]
}

The important parts in this config file are:

  • "personsPath": Path to the persons output file of LDBC SNB.
  • "destinationPath": Path of the destination file to create.
  • "logger": An optional logger for tracking the generation process. (LoggerStdout prints to standard output)
  • "dataSelector": A strategy for selecting values from a collection. (DataSelectorRandom selects random values based on a given seed)
  • "handlers": An array of enhancement handlers, which are strategies for generating data.
  • "parameterEmitterPosts"": An optional parameter emitter for the extracted posts.
  • "parameterEmitterComments"": An optional parameter emitter for the extracted comments.

Configure

Handlers

The following handlers can be configured.

Person Names Handler

Generate additional names for existing people. People are selected randomly from the friends that are known by the given person.

{
  "handlers": [
    {
      "@type": "EnhancementHandlerPersonNames",
      "chance": 0.3
    }
  ]
}

Parameters:

  • "chance": The chance for a name to be generated. The number of new names will be the number of people times this chance, where names are randomly assigned to names.
  • "parameterEmitter"": An optional parameter emitter.

Generated shape:

<http://www.ldbc.eu/ldbc_socialnet/1.0/data/pers00000000000000000471> a <http://www.ldbc.eu/ldbc_socialnet/1.0/vocabulary/Person> 
    <http://www.ldbc.eu/ldbc_socialnet/1.0/vocabulary/firstName> "Zulma";
    <http://www.ldbc.eu/ldbc_socialnet/1.0/vocabulary/lastName> "Tulma";
    <http://www.ldbc.eu/ldbc_socialnet/1.0/vocabulary/hasMaliciousCreator> <http://www.ldbc.eu/ldbc_socialnet/1.0/data/pers00000032985348840411>.

Person Names in Cities Handler

Generate additional names for existing people where the malicious creator refers to a city. Cities will be selected based on the city the random person is located in.

This is a variant of the Person Names Handler.

{
  "handlers": [
    {
      "@type": "EnhancementHandlerPersonNamesCities",
      "chance": 0.3
    }
  ]
}

Parameters:

  • "chance": The chance for a name to be generated. The number of new names will be the number of people times this chance, where names are randomly assigned to names.
  • "parameterEmitter"": An optional parameter emitter.

Generated shape:

<http://www.ldbc.eu/ldbc_socialnet/1.0/data/pers00000021990232555617> a <http://www.ldbc.eu/ldbc_socialnet/1.0/vocabulary/Person>
    <http://www.ldbc.eu/ldbc_socialnet/1.0/vocabulary/firstName> "Zulma";
    <http://www.ldbc.eu/ldbc_socialnet/1.0/vocabulary/lastName> "Tulma";
    <http://www.ldbc.eu/ldbc_socialnet/1.0/vocabulary/hasMaliciousCreator> <http://dbpedia.org/resource/Dingzhou>.

Person Noise Handler

Generate additional triples attached to existing people. People are selected randomly.

{
  "handlers": [
    {
      "@type": "EnhancementHandlerPersonNoise",
      "chance": 0.3
    }
  ]
}

Parameters:

  • "chance": The chance for an additional triple to be generated. The number of new triples will be the number of people times this chance. This value can be larger than 1 to generate multiple triples per person.

Generated shape:

<http://www.ldbc.eu/ldbc_socialnet/1.0/data/pers00000000000000000471-noise-1> 
    <http://www.ldbc.eu/ldbc_socialnet/1.0/vocabulary/noise> "NOISE-1";
    <http://www.ldbc.eu/ldbc_socialnet/1.0/vocabulary/hasCreator> <http://www.ldbc.eu/ldbc_socialnet/1.0/data/pers00000000000000000471>.

Posts Handler

Generate posts and assign them to existing people.

{
  "handlers": [
    {
      "@type": "EnhancementHandlerPosts",
      "chance": 0.3
    }
  ]
}

Parameters:

  • "chance": The chance for posts to be generated. The number of posts will be the number of people times this chance, where people are randomly assigned to posts.

Generated shape:

<http://www.ldbc.eu/ldbc_socialnet/1.0/data/post-fake2967> a <http://www.ldbc.eu/ldbc_socialnet/1.0/vocabulary/Post>;
    <http://www.ldbc.eu/ldbc_socialnet/1.0/vocabulary/id> "2967";
    <http://www.ldbc.eu/ldbc_socialnet/1.0/vocabulary/hasCreator> <http://www.ldbc.eu/ldbc_socialnet/1.0/data/pers00000004398046512167>;
    <http://www.ldbc.eu/ldbc_socialnet/1.0/vocabulary/hasMaliciousCreator> <http://www.ldbc.eu/ldbc_socialnet/1.0/data/pers00000010995116283441>;
    <http://www.ldbc.eu/ldbc_socialnet/1.0/vocabulary/creationDate> "2021-02-22T10:39:31.595Z";
    <http://www.ldbc.eu/ldbc_socialnet/1.0/vocabulary/locationIP> "200.200.200.200";
    <http://www.ldbc.eu/ldbc_socialnet/1.0/vocabulary/browserUsed> "Firefox";
    <http://www.ldbc.eu/ldbc_socialnet/1.0/vocabulary/content> "Tomatoes are blue";
    <http://www.ldbc.eu/ldbc_socialnet/1.0/vocabulary/length> "17";
    <http://www.ldbc.eu/ldbc_socialnet/1.0/vocabulary/language> "en";
    <http://www.ldbc.eu/ldbc_socialnet/1.0/vocabulary/locatedIn> <http://dbpedia.org/resource/Belgium>;
    <http://www.ldbc.eu/ldbc_socialnet/1.0/vocabulary/hasTag> <http://www.ldbc.eu/ldbc_socialnet/1.0/tag/Georges_Bizet>.

Comments Handler

Generate comments and assign them to existing people as reply to existing posts

{
  "handlers": [
    {
      "@type": "EnhancementHandlerComments",
      "chance": 0.3
    }
  ]
}

Parameters:

  • "chance": The chance for comments to be generated. The number of comments will be the number of people times this chance, where people are randomly assigned to comments.

Generated shape:

<http://www.ldbc.eu/ldbc_socialnet/1.0/data/comment-fake9> a <http://www.ldbc.eu/ldbc_socialnet/1.0/vocabulary/Comment>;
    <http://www.ldbc.eu/ldbc_socialnet/1.0/vocabulary/id> "9";
    <http://www.ldbc.eu/ldbc_socialnet/1.0/vocabulary/hasCreator> <http://www.ldbc.eu/ldbc_socialnet/1.0/data/pers00000008796093024878>;
    <http://www.ldbc.eu/ldbc_socialnet/1.0/vocabulary/hasMaliciousCreator> <http://www.ldbc.eu/ldbc_socialnet/1.0/data/pers00000032985348839704>;
    <http://www.ldbc.eu/ldbc_socialnet/1.0/vocabulary/creationDate> "2021-02-22T10:39:31.595Z";
    <http://www.ldbc.eu/ldbc_socialnet/1.0/vocabulary/locationIP> "200.200.200.200";
    <http://www.ldbc.eu/ldbc_socialnet/1.0/vocabulary/browserUsed> "Firefox";
    <http://www.ldbc.eu/ldbc_socialnet/1.0/vocabulary/content> "Tomatoes are blue";
    <http://www.ldbc.eu/ldbc_socialnet/1.0/vocabulary/length> "17";
    <http://www.ldbc.eu/ldbc_socialnet/1.0/vocabulary/replyOf> <http://www.ldbc.eu/ldbc_socialnet/1.0/data/post00000000274877908873>;
    <http://www.ldbc.eu/ldbc_socialnet/1.0/vocabulary/language> "en";
    <http://www.ldbc.eu/ldbc_socialnet/1.0/vocabulary/locatedIn> <http://dbpedia.org/resource/Belgium>;
    <http://www.ldbc.eu/ldbc_socialnet/1.0/vocabulary/hasTag> <http://www.ldbc.eu/ldbc_socialnet/1.0/tag/Georges_Bizet>.

Post Contents Handler

Generate additional contents for existing posts.

{
  "handlers": [
    {
      "@type": "EnhancementHandlerPostContents",
      "chance": 0.3
    }
  ]
}

Parameters:

  • "chance": The chance for post content to be generated. The number of new post contents will be the number of posts times this chance, where contents are randomly assigned to posts. @range {double}

Generated shape:

<http://www.ldbc.eu/ldbc_socialnet/1.0/data/post00000000206158430485> <http://www.ldbc.eu/ldbc_socialnet/1.0/vocabulary/id> "962072675046";
    <http://www.ldbc.eu/ldbc_socialnet/1.0/vocabulary/content> "Tomatoes are blue";
    <http://www.ldbc.eu/ldbc_socialnet/1.0/vocabulary/hasMaliciousCreator> <http://www.ldbc.eu/ldbc_socialnet/1.0/data/pers00000017592186048516>.

Post Authors Handler

Generate additional authors for existing posts.

{
  "handlers": [
    {
      "@type": "EnhancementHandlerPostAuthors",
      "chance": 0.3
    }
  ]
}

Parameters:

  • "chance": The chance for a post author to be generated. The number of new post authors will be the number of posts times this chance, where authors are randomly assigned to posts.

Generated shape:

<http://www.ldbc.eu/ldbc_socialnet/1.0/data/post00000000962072675046> <http://www.ldbc.eu/ldbc_socialnet/1.0/vocabulary/id> "962072675046";
    <http://www.ldbc.eu/ldbc_socialnet/1.0/vocabulary/hasCreator> <http://www.ldbc.eu/ldbc_socialnet/1.0/data/pers00000006597069770017>;
    <http://www.ldbc.eu/ldbc_socialnet/1.0/vocabulary/hasMaliciousCreator> <http://www.ldbc.eu/ldbc_socialnet/1.0/data/pers00000019791209301543>.

Vocabuary Handler

Generates vocabulary information.

{
  "handlers": [
    {
      "@type": "EnhancementHandlerVocabulary"
    }
  ]
}

Generated shape:

<http://www.ldbc.eu/ldbc_socialnet/1.0/vocabulary/id> a rdf:Property.
<http://www.ldbc.eu/ldbc_socialnet/1.0/vocabulary/hasCreator> a rdf:Property.
<http://www.ldbc.eu/ldbc_socialnet/1.0/vocabulary/Person> a rdfs:Class.

Vocabulary Predicate Domain Handler

Generates vocabulary information about the domain of a specific predicate.

{
  "handlers": [
     {
      "@type": "EnhancementHandlerVocabularyPredicateDomain",
      "classIRI": "http://www.ldbc.eu/ldbc_socialnet/1.0/vocabulary/Comment",
      "predicateIRI": "http://www.ldbc.eu/ldbc_socialnet/1.0/vocabulary/locationIP"
    }
  ]
}

Generated shape:

<http://www.ldbc.eu/ldbc_socialnet/1.0/vocabulary/locationIP> rdfs:domain
<http://www.ldbc.eu/ldbc_socialnet/1.0/vocabulary/Comment>.

Posts Multiply Handler

Multiply the number of posts by a given amount.

{
  "handlers": [
     {
       "@type": "EnhancementHandlerPostsMultiply",
       "factor": 10
    }
  ]
}

Generated shape:

<http://www.ldbc.eu/ldbc_socialnet/1.0/data/post00000000618475290624000001>
    a <http://www.ldbc.eu/ldbc_socialnet/1.0/vocabulary/Post>;
    <http://www.ldbc.eu/ldbc_socialnet/1.0/vocabulary/id> "618475290624000001";
    <http://www.ldbc.eu/ldbc_socialnet/1.0/vocabulary/browserUsed> "Firefox";
    <http://www.ldbc.eu/ldbc_socialnet/1.0/vocabulary/content> "About Rupert Murdoch ... COPY 1";

Parameter Emitters

Certain handlers allow their internal parameters to be emitted.

Such parameters may then for instance be valuable as query substitution parameters.

CSV Parameter Emitter

Emits parameters as CSV files.

{
  "handlers": [
    {
      "@type": "EnhancementHandlerPersonNames",
      "chance": 0.3,
      "parameterEmitter": {
        "@type": "ParameterEmitterCsv",
        "destinationPath": "parameters-person-names.csv"
      }
    }
  ]
}

License

This software is written by Ruben Taelman.

This code is released under the MIT license.