predictive-sentence-generator v1.1.1
Overview
predictive-sentence-generator is a package that generates random sentences based on source text. The grammar isn't great, and varies widely in accuracy based on the length/writing style of the source text. Despite this, many of the sentences sound like they could have been part of the source (though sometimes they actually are; see Common Issues). This sentence generator should be compatible with any (Latin alphabet) language, though I can't confirm as I only know English.
Simple Use
If you're impatient or don't really care about settings that much, here's the minimum you'll need to get started:
const psg = require('predictive-sentence-generator');
psg.load({
type: 'preset',
content: 'random'
});
console.log(psg.generate());
Overview
First, require
the sentence generator.
Next, run the load()
method, which in this case loads a random preset (see the Presets and Load Settings sections below). This is usually undesirable, since the presets can range from Shakespearean language (Romeo and Juliet) to the scripts of Shrek 2 and the Bee Movie.
Finally, run the generate()
method. Without any parameters, this generates a single, formatted sentence of any length. Depending on the source this can often be a single word — to prevent this, see the minLength
property in Generate Settings.
Load Settings
Before generating any sentences, you must call the load()
method. It tells the generator what to use as the source, and will throw an error if you try to do anything without calling it first. As seen above, this method has two required parameters: type
and content
. The method must be called with both, and will throw an error if one or both are missing.
type
Specifies the format of the content
property.
'preset'
declares the use of a preset; see the Presets section for more information.
'save file'
loads a save from a file using fs
, see Saving and Loading for more information.
'save url'
loads a save from an external URL using sync-fetch
.
'save internal'
loads a save directly from the content
property of the method.
'text file'
loads raw text from a file and internally converts it to an object (see Saving and Loading) so the generator can use it.
'text url'
loads text from a URL.
'text internal'
loads text directly from the content
property.
Most of these values can have different formatting and still work. For example, Save File
means the same thing as file-save
and saveFromFile
.
content
When type
is set to 'preset'
, content
is the name of the preset (i.e, 'Bee Movie').
When type
is set to '[] file'
, content
specifies the path to the file.
When type
is set to '[] url'
, content
specifies the full URL (i.e, https://example.com/).
When type
is set to '[] internal'
, content
is the entirety of the data (text, JSON, Object, etc.).
Content in 'save []'
types can be either a JavaScript Object or a JSON string. 'text []'
types should only be strings.
Example
An example use of the load()
method with settings.
psg.load({
type: 'preset',
content: 'Bee Movie'
});
Setting Ignored Characters
Sometimes, your source text might have characters you want to ignore (double quotes, for example). The load()
method has an optional property to do this: ignoredCharacters
.
ignoredCharacters
should be a RegExp, in the format of what characters to replace. For example, use /[^A-Z'.,;:?!]/gi
(^
meaning 'match everything but the following') to remove anything that isn't a letter, apostrophe or punctuation, and /[\"]/g
to remove all double quotes.
This property is only used if the source is text, not a save or a preset.
Note: all ignored characters are interpreted as whitespace, so replacing the 1
in 'Ch1cken'
will turn it into two words, Ch
and cken
.
Generate Settings
The generate()
method also has optional settings. Any combination of these can be included; none, all, or anything in between.
sentenceCount
specifies the number of sentences to generate. A similar effect can be achieved by running the generate()
method multiple times. Defaults to 1.
sentenceSeparator
specifies a character or string to separate sentences (default: space). Ignored if formatted
is set to false.
wordSeparator
is essentially the same as sentenceSeparator
, but for each word.
formatted
specifies whether to format the sentence as a string. if set to false, generate()
will return a 2D array — the first level containing each sentence and the next containing each word. Defaults to true.
Formatted output:
Pretty pony wants to live her honeymoon. Boss!
Unformatted output:
[
[ 'Pretty', 'pony', 'wants', 'to', 'live', 'her', 'honeymoon.' ],
[ 'Boss!' ]
]
minLength
sets a minimum word count for each sentence. Slow at high values, since the generator has to repeatedly run until it finds a sentence of the correct length. Default: 1.
maxLength
sets the maximum word count. Slow at low values, though usually not quite as much as minLength
, for the same reason as mentioned above. Default: Infinity.
seed
defines a starting word or words for the generator. If set to '(random)'
(the default value), it will choose any word starting with a capital letter. Note: if you omit the parentheses, it will look for the word 'random'
as a seed and probably error. This value can also be an array (instead of a string), in which case a seed will be selected at random for each sentence.
seedCaseSensitive
specifies whether the seed word should be searched for case-sensitively. Default: true.
Example
An example use of the generate()
method with settings.
psg.generate({
minLength: 5,
sentenceCount: 3,
formatted: false
});
Note: the generate()
method returns the sentence(s), so if you want to print it to the console you'll have to wrap the method in console.log()
.
Presets
The package comes with multiple source texts for you to try out. I update these sometimes, but this page might not always be up to date, so go to the source to know for sure.
The current list includes:
- The Bee Movie (Script)
- Jane Eyre
- Romeo and Juliet
- Shrek 2 (Script)
- The Bible
Saving and Loading
Realistically, you probably won't need this feature. Usually you can just load your source as text, but if your source text is very long, you might want to create a save to cut down on loading time.
Saving works by loading an Object with word relationships, rather than a string of text (which is converted to word relationships internally). To save, first run the getSave()
method, which returns a JSON string or JavaScript Object with word relationships. Store this data somewhere, then load it using the load()
method with the type
set to save [location type]
(see Load Settings).
getSave()
has an optional parameter, asString
, which specifies whether the data should be returned as a JavaScript Object or JSON string.
Example Usage
Saving:
const psg = require('predictive-sentence-generator');
psg.load({
type: 'text internal',
content: "This is an example. I don't actually know what to write here, so I'll just write some random example text."
});
const savedData = psg.getSave(false);
console.log(savedData);
savedData
is now set to:
{
This: [ 'is' ],
is: [ 'an' ],
an: [ 'example.' ],
'example.': [ 'I' ],
I: [ "don't" ],
"don't": [ 'actually' ],
actually: [ 'know' ],
know: [ 'what' ],
what: [ 'to' ],
to: [ 'write' ],
write: [ 'here,', 'some' ],
'here,': [ 'so' ],
so: [ "I'll" ],
"I'll": [ 'just' ],
just: [ 'write' ],
some: [ 'random' ],
random: [ 'example' ],
example: [ 'text.' ]
}
Now, loading this object in as a save:
const psg = require('predictive-sentence-generator');
psg.load({
type: 'save internal',
content:
{
This: [ 'is' ],
is: [ 'an' ],
an: [ 'example.' ],
'example.': [ 'I' ],
I: [ "don't" ],
"don't": [ 'actually' ],
actually: [ 'know' ],
know: [ 'what' ],
what: [ 'to' ],
to: [ 'write' ],
write: [ 'here,', 'some' ],
'here,': [ 'so' ],
so: [ "I'll" ],
"I'll": [ 'just' ],
just: [ 'write' ],
some: [ 'random' ],
random: [ 'example' ],
example: [ 'text.' ]
}
});
console.log(psg.generate());
...produces this result:
I don't actually know what to write some random example text.
It's not very random, but that's related to how short the source is, not saving (see Common Issues below).
Misc. Methods
getSourceText([formatted = true])
: returns the text used as the source, if it was loaded as text; errors otherwise. formatted = false
splits the words into an array, otherwise the method returns a space-separated string.
Common Issues
Issue/Error | Possible Cause | Solution |
---|---|---|
Unoriginal sentences (generated sentences are in the source) | The source text is too short | Get a longer source text, or write a script to check for unoriginal sentences |
Sentences are too short | If the source has a lot of 1-2 word sentences, these can get picked frequently | See the minLength property in Generate Options |
Error: Could not find a starting point | The source text has no capitalized words, which are interpreted as starting points | Use proper grammar (capitalize words), or get a different source |
Error: Could not find an ending point | The source text has no end-of-sentence punctuation (. ? or ! ) | Again, use proper punctuation or get a different source |
Error: Source not loaded | A method has been called before load() | see Load Settings |
Error: Could not find source preset | The content property of load() has an invalid preset name | see Presets |
Error: Incomplete sentence at (word) | The source text ended before the sentence was closed with punctuation | Check the end of the source text, add punctuation if needed |