worker-map-reduce v0.0.3
worker-map-reduce
MapReduce with Web Workers
This is a MapReduce for a browser. Uses Web Workers to run tasks in parallel.
Features (see Release Notes in the bottom):
- Spawn limited number of workers.
- Mapper and reducer tasks are sent to workers. (REDUCER NOT YET)
- Send a new task to a worker as soon as it gets free. (ALMOST)
- Fault tolerance: if a worker fails with a runtime exception, its task will be resent. (NOT YET)
WorkerMapReduce uses Task from data.task module and immutable list from immutable.
On how to use SharedArrayBuffer to pass data to workers more effectively see demo demo/array-max/ as well as
worker-task-runner examples.
Multiple workers can work with the same SharedArrayBuffer as long as they operate on different parts of the buffer.
Intro
Two models of parallelism
- Data parallelism <<< current use case with parallelized mappers
- the same task is executed several times in parallel on different elements of the same dataset.
- Task parallelism
- different tasks are executed in parallel.
Use cases
- Small data, expensive calculation <<< main demo
- can pass data via
postMessage
- can pass data via
- Large data, fast calculation
- no benefit to send data with
postMessage- consider retrieving data by the spawned process itself (e.g. from api)
- use
SharedArrayBuffer
- no benefit to send data with
- Large data, slow calculation <<< demo/array-max
- use
SharedArrayBuffer
- use
Usage
Main demo
The task is:
- given an array of strings
- run an expensive calculation (is simulated) per string (with a provided mapper function)
- and calculate a sum in the provided reducer.
To see the demo load demo/index.html in browser and open the console. You will see something like:
[mapReduceWorker] starting: list.size=5, workerPoolSize=3
...
Exec: took 2.243 seconds.
Result: 73Example
To try it your self you need to create two modules that export a task: a mapper and a reducer. Then create your app:
// Main MapReduce module:
const mapReduce = require('worker-map-reduce/src/map-reduce')
// Your reducer module that given a data returns a Task:
const reducer = require('./reducer')
// Your mapper (given a data returns a Task) module name that will be loaded
// via a module loader (StealJS) into a worker:
const mapperUrl = 'demo/mapper'
// Data array:
const array = [1, 2, 3]
// Main app:
const app = mapReduce(mapperUrl, reducer, array)
app.fork(err => console.log("Error: ", err),
res => console.log("Result: ", res))Parallelizing a task for MapReduce
The simplest example of applying MapReduce to an array of data so that a mapper is executed for every element. But this is not often the case.
What you can do is:
- Divide your data into chunks.
- Run your mapper on every chunk of data in parallel.
- Run your reducer on the results collected from mappers.
See the following demo: demo/array-max/.
Types and Signatures
Note: Using Haskell notation
// Types:
// Reducer a b = List a -> Task b
// Mapper a b = a -> Task b
// mapReduce :: MapperUrl -> Reducer a b -> List a -> Task b
// mapReduce :: String -> (List a -> Task b) -> ListData a -> Task b
const mapReduce = require("worker-map-reduce")
// mapper :: a -> Task b
const mapper = require('your-mapper-module')
// reducer :: List a -> Task b
const reducer = require('your-reducer-module')Release Notes
0.0.2Worker pool:- Schedule tasks for all available workers.
- Once the whole pool of workers is done, schedule the next bunch of tasks.
0.0.1Initial version:- Send mapper tasks to workers, number of workers is the same as number of elements in data array.
- Reducer task is executed inside the main app.
- Uses
data.taskfor asynchronous actions.