0.2.1 • Published 2 months ago

@stdlib/datasets-spam-assassin v0.2.1

Weekly downloads
-
License
Apache-2.0
Repository
github
Last release
2 months ago

Spam Assassin

NPM version Build Status Coverage Status

The Spam Assassin public mail corpus.

Installation

npm install @stdlib/datasets-spam-assassin

Usage

var corpus = require( '@stdlib/datasets-spam-assassin' );

corpus()

Returns the Spam Assassin public mail corpus.

var data = corpus();
// returns [{...},{...},...]

Each array element has the following fields:

  • id: message id (relative to message group)
  • group: message group
  • checksum: object containing checksum info
  • text: message text (including headers)

The message group may be one of the following:

  • easy-ham-1: easier to detect non-spam e-mails (2500 messages)
  • easy-ham-2: easier to detect non-spam e-mails collected at a later date (1400 messages)
  • hard-ham-1: harder to detect non-spam e-mails (250 messages)
  • spam-1: spam e-mails (500 messages)
  • spam-2: spam e-mails collected at a later date (1396 messages)

The checksum object contains the following fields:

  • type: checksum type (e.g., MD5)
  • value: checksum value

Examples

var corpus = require( '@stdlib/datasets-spam-assassin' );

var data;
var i;

data = corpus();
for ( i = 0; i < data.length; i++ ) {
    console.log( 'Character Count: %d', data[ i ].text.length );
}

License

The data files (databases) are licensed under an Open Data Commons Public Domain Dedication & License 1.0 and their contents are licensed under Creative Commons Zero v1.0 Universal. The software is licensed under Apache License, Version 2.0.

See Also


Notice

This package is part of stdlib, a standard library for JavaScript and Node.js, with an emphasis on numerical and scientific computing. The library provides a collection of robust, high performance libraries for mathematics, statistics, streams, utilities, and more.

For more information on the project, filing bug reports and feature requests, and guidance on how to develop stdlib, see the main project repository.

Community

Chat


Copyright

Copyright © 2016-2024. The Stdlib Authors.