1.0.7 • Published 7 years ago

podigg v1.0.7

Weekly downloads
2
License
MIT
Repository
github
Last release
7 years ago

PoDiGG

POpulation DIstribution-based Gtfs Generator

npm version Docker Automated Build

A realistic public transport dataset generator, which is serialized as GTFS.

It is based on five sub-generators:

  • Region: A geographical area of cells where each cells contains a population value.
  • Stops: Tagging of cells with stop or no stop.
  • Edges: Adding transport edges between stops.
  • Routes: Routes over one or more edges.
  • Connections: Instantiation of routes at times.

Install

This generator is a Node.js application that can be installed by running:

[sudo] npm install -g podigg

Usage

Command line

The easiest way to run the generator is using the command line tool:

podigg [output folder [path to a JSON config file]]

The default output folder is output_data.

This config file contains parameters for the generator, as explained below. Example of a config file:

{
    "seed": 1,
    "stops:stops": 100,
    "connections:connections": 3000
}

Alternatively, the generator can also be configured using environment variables, as explained below. In that case, the generator must be called as follows:

podigg-env [output folder]

From code

The generator can be included into your application as follows:

const PodiggGenerator = require('podigg');
new PodiggGenerator({
    "seed": 1,
    "stops:stops": 100,
    "connections:connections": 3000
}).generate('output_data');

Docker

Downloading and running the container from the Docker hub:

docker pull podigg/podigg
docker run --rm -it -v $(pwd)/docker-out:/output_data -e GTFS_GEN_SEED=100 podigg/podigg

Building and running the container from this repo:

git clone git@github.com:PoDiGG/podigg.git
cd podigg
docker build -t podigg .
docker run --rm -it -v $(pwd)/docker-out:/output_data -e GTFS_GEN_SEED=100 podigg

Parameters must be passed using environment variables.

Parameters

All parameters are scoped by their generator name in lower-case, except for the general parameters. For example, choosing a region's latitude offset is done with the parameter region:lat_offset.

When configuring parameters via environment variables, parameters should be defined with the prefix GTFS_GEN_, followed by the generator name + __ (or empty if general) and the parameter name. The generator and parameter names can either be upper or lower case. For example, choosing a region's latitude offset is done with the parameter GTFS_GEN_REGION__LAT_OFFSET, and choosing the seed is done with GTFS_GEN_SEED.

General

NameDefault ValueDescription
seed1The random seed

Region

Several region generators exist which are explained hereafter, one of them needs to be selected.

Config prefix: region:

NameDefault ValueDescription
region_generatorisolatedName of a region generator. (isolated, noisy or region)
lat_offset0The value to add with all generated latitudes
lon_offset0The value to add with all generated longitudes
cells_per_latlon100The precision of the cells, how many cells go in 1 latitude or 1 longitude.

File

NameDefault ValueDescription
region_file_pathnullPath to the cells in csv, this can also be a filename of an internal region file from the data directory, for example region_BE.csv. Expected columns (x:integer, y:integer, lat:float, long:float, density:float)

Noisy

A noise-based generator, where population values are influenced by nearby cells.

NameDefault ValueDescription
size_x300The width of the region in number of cells
size_y300The height of the region in number of cells
pop_average0The average population value for a cell
pop_deviation10The standard deviation of the population value for a cell

Isolated

A generator that creates a given number of circular clusters of population. The population density is high at the center of the cluster and decreases to zero when going to the border of the cluster.

NameDefault ValueDescription
size_x300The width of the region in number of cells
size_y300The height of the region in number of cells
pop_average0The average population value for a cell
pop_deviation10The standard deviation of the population value for a cell
pop_clusters50The number of clusters to generate.
max_radius50The maximum cluster radius.

Stops

The generation of stops

Config prefix: stops:

NameDefault ValueDescription
stops600How many stops should be generated
min_station_size0.01The minimum population value in a cell for a station to form
max_station_size30The maximum population value in a cell for a station to form
start_stop_choice_power4The power for selecting cells with a large population value as stops
min_interstop_distance1The minimum distance between stops in number of cells
factor_stops_post_edges0.66The factor of stops that should be generated after edge generation
edge_choice_power2The power for selecting longer edges to generate stops on
stop_around_edge_choice_power4The power for selecting cells with a large population value around edges as stops
stop_around_edge_radius2The radius in number of cells around an edge to select points from

Edges

The generation of edges

Config prefix: edges:

NameDefault ValueDescription
max_intracluster_distance100The maximum distance stops in one cluster can have from each other
max_intracluster_distance_growthfactor0.1The lower this value, the larger the chance that closer stops will be clustered first before further away stations
post_cluster_max_intracluster_distancefactor1.5The larger the value, the larger the chance that a stop will be connected to more stops
loosestations_neighbourcount3The number of neighbours around a loose station that should define its area
loosestations_max_range_factor0.3The maximum range to check around a loose station relative to the total region size
loosestations_max_iterations10The max number of iterations to try to connect one loose station
loosestations_search_radius_factor0.5The number to multiply with the loose station neighbourhood size to get the search radius for each step

Routes

The generation of trips and routes

Config prefix: routes:

NameDefault ValueDescription
routes1000The number of routes to generate
largest_stations_fraction0.05The fraction of (largest) stops between which routes need to be formed
penalize_station_size_area10The area in which stop sizes should be penalized
max_route_length10The maximum number of edges a route can have in the macro-step, the larger, the slower this generator
min_route_length4The minimum number of edges a route must have in the macro-step

Connections

The generation of connections

Config prefix: connections:

NameDefault ValueDescription
time_initial0The initial timestamp (ms) of trip starting times
time_final24 * 3600000The final timestamp (ms) of trip starting times
connections30000The number of connections to generate
stop_wait_min60000The minimum waiting time per stop in milliseconds
stop_wait_size_factor60000The factor in milliseconds of stop waiting time to add depending on the station size
route_choice_power2The power for selecting longer routes for instantiating connections
vehicle_max_speed160The maximum speed of a vehicle in km/h, used to calculate the duration of a connection
vehicle_speedup1000The vehicle speedup in km/(h^2), used to calculate the duration of a connection
hourly_weekday_distribution[0.05,0.01,0.01,0.48,2.46,5.64,7.13,6.23,5.44,5.43,5.41,5.49,5.42,5.41,5.57,6.70,6.96,6.21,5.40,4.95,4.33,3.31,1.56,0.42]The chance (percentage) for each hour to have a connection on a weekday
hourly_weekend_distribution[0.09,0.01,0.01,0.08,0.98,3.56,5.23,5.79,5.82,5.89,5.84,5.91,5.88,5.95,5.87,5.95,5.89,5.96,5.92,5.94,5.62,4.61,2.45,0.76]The chance (percentage) for each hour to have a connection on a weekend day
delay_chance0The 0-1 chance that a connection will have a delay, 0 will not produce any delays (default)
delay_max3600000The maximum delay in milliseconds
delay_choice_power1Higher values means higher chance on larger delays
delay_reasons{ 'td:DamagedVehicle': 0.4, 'td:Strike': 0.2, 'td:Accident': 0.2, 'td:BadWeather': 0.1, 'td:Obstruction': 0.1}Default reasons for having delays with their respective chance. Keys must be prefixed with td: http://purl.org/td/transportdisruption#
delay_reduction_duration_fraction0.1The maximum fraction of connection duration that can be subtracted when there is a delay

Query Set

Optionally, PoDiGG can also generate realistic route planning query sets based on the generated dataset. For this, the queryset:generate option must be set to true.

Config prefix: queryset:

NameDefault ValueDescription
start_stop_choice_power4Higher values means higher chance on larger stations when selecting starting stations
query_count100The number of queries that should be generated
time_initial0The initial timestamp (ms)
time_final24 * 3600000The final timestamp (ms)
max_time_before_departure3600000The maximum time in ms that a query for a certain departure time must be queried
hourly_weekday_distribution[0.05,0.01,0.01,0.48,2.46,5.64,7.13,6.23,5.44,5.43,5.41,5.49,5.42,5.41,5.57,6.70,6.96,6.21,5.40,4.95,4.33,3.31,1.56,0.42]The chance (percentage) for each hour to have a connection on a weekday
hourly_weekend_distribution[0.09,0.01,0.01,0.08,0.98,3.56,5.23,5.79,5.82,5.89,5.84,5.91,5.88,5.95,5.87,5.95,5.89,5.96,5.92,5.94,5.62,4.61,2.45,0.76]The chance (percentage) for each hour to have a connection on a weekend day

License

The PoDiGG generator is written by Ruben Taelman.

This code is copyrighted by Ghent University – imec and released under the MIT license.

1.0.7

7 years ago

1.0.6

7 years ago

1.0.5

7 years ago

1.0.4

7 years ago

1.0.3

7 years ago

1.0.2

7 years ago

1.0.1

7 years ago

1.0.0

7 years ago