0.2.0 • Published 3 years ago

node-pyspark v0.2.0

Weekly downloads
18
License
SEE LICENSE IN FI...
Repository
github
Last release
3 years ago

node-pyspark

This module brings Apache Spark API to nodejs.

WARNING This package is still in its early stages of development, and not all pyspark APIs have been ported yet.

Usage

The API is very similar to the pyspark API with some notable differences:

  • All Functions and methods take an object argument. The keys of which represent the function parameters in pyspark
  • Most functions are synchronous (in perception), but some return a promise and can be awaited upon
// importing is similar to that in pyspark
const { SparkSession, DataFrame, types, } = require('./index').sql;

// create spark session
spark = SparkSession.builder.appName("SimpleApp").getOrCreate()

// create a dataframe
df = spark.createDataFrame({ data: [1, 2, 3], schema: types.IntegerType() })

// show returns a promise
await df.show()

// stop the sessions
spark.stop()

APIs implemented

pyspark APIs that have been ported to node-pyspark

Class/ObjectAPIComments
BuilderappName
Builderconfig
BuilderappName
BuilderenableHiveSupport
Buildermaster
BuildergetOrCreate
SparkSessionrange
SparkSessioncreateDataFrame
SparkSessionsql
SparkSessionstop
SparkSessiontable
SparkSessionputnon standard API
SparkSessionwaitnon standard API
SparkSession_exec
SparkSession_eval
DataFrameReadercsv
DataFrameReaderformat
DataFrameReaderjdbc
DataFrameReaderjson
DataFrameReaderload
DataFrameReaderoption
DataFrameReaderorc
DataFrameReaderparquet
DataFrameReaderschema
DataFrameReadertable
DataFrameReadertext
DataFrameWriterbucketBy
DataFrameWritercsv
DataFrameWriterformat
DataFrameWriterinsertInto
DataFrameWriterjdbc
DataFrameWriterjson
DataFrameWritermode
DataFrameWriteroption
DataFrameWriterorc
DataFrameWriterparquet
DataFrameWriterpartitionBy
DataFrameWritersave
DataFrameWritersaveAsTable
DataFrameWritersortBy
DataFrameWritertext
DataFrameagg
DataFramealias
DataFrameapproxQuantile
DataFramecache
DataFramecheckpoint
DataFramecoalesce
DataFrame_exec
DataFrame_eval