0.1.0 • Published 8 years ago

hyper-replace v0.1.0

Weekly downloads
2
License
MIT
Repository
github
Last release
8 years ago

hyper-replace

The point of hyper-replace is to be able to replace patterns in strings by things that aren't strings. Here are a few things hyper-replace can help you with:

  • Replace all occurrences of a pattern (a hashtag, a youtube URL, etc.) in a comment by a React component. hyper-replace will return a suitable array of children. This will, of course, work just as well for other frameworks!
  • Parse simple markup to a structure, for example Markdown to virtual-dom, instead of outputting a string in a specific format like HTML.

hyper-replace returns an array of string parts interspersed with arbitrary objects. It is also powerful enough to apply regular expressions to that same data structure, which means that you can compose calls to it.

Basic usage

var hyperReplace = require('hyper-replace');
hyperReplace('I have the __POWER__!',
             {pattern: /__([^_]+)__/g,
              replacement: (_, text) => ({tag: 'strong', children: text})});

// => ['I have the ', {tag: 'strong', children: 'POWER'}, '!']

hyper-replace can also operate on mixed arrays of strings and objects. By that, I don't mean it applies the regexs to each string in the array, I mean it stitches everything together and applies them to the whole thing, so a single regex can span several elements in the array, including non-strings. Let me demonstrate:

hyperReplace(['This is __', {tag: 'em', children: 'very important'}, '__'],
             {pattern: /__([^_]+)__/g,
              replacement: (_, text) => ({tag: 'strong', children: text})});

// => ['This is ',
//     {tag: 'strong', children: [{tag: 'em', children: 'very important'}]}]

Note that because of this feature, the arguments given to the replacement function (the first of which is the full match, and the others are the groups in the regular expression) may not be strings. They may either be a string or an array of strings and objects spanned by the match.

Multiple patterns

For convenience, hyperReplace can take a list of patterns:

hyperReplace('__Emphasis__ on `code`',
             [{pattern: /__([^_]+)__/g,
               replacement: (_, text) => ({tag: 'strong', children: text})},
              {pattern: /`([^`]+)`/g,
               replacement: (_, text) => ({tag: 'code', children: text})}]);

// => [{tag: 'strong', children: 'Emphasis'},
//      ' on ',
//     {tag: 'code', children: 'code'}]

Overlapping patterns

The other in which the patterns are specified matters: hyperReplace will apply them in that order. Furthermore, all replacements are "closed off": all patterns see the previous replacements as surrogate (invalid) characters, so while they can contain or encompass them, they can't intersect. (This is arguably a feature: you don't have to worry that the return value of a replacement function will interfere with the next pattern).

If you wish to match patterns that can be nested, your two options are:

  • Call hyperReplace recursively in the replacement function.
  • Apply the patterns from the inside out (smaller first). The applyUntilEquilibrium option, described below, can help.

Apply until equilibrium

hyperReplace can be told to apply one or more patterns over and over until there is nothing left to replace. Use this feature wisely.

hyperReplace('<span>Do <b>not</b> <s>parse HTML</s> with hyper-replace</span>',
             /<([a-z]+)>([^<>]*)<\/\1>/g,
             (_, tag, text) => ({tag: tag, children: text}),
             {applyUntilEquilibrium: true});

// => [{tag: 'span',
//      children: ['Do ',
//                 {tag: 'b', children: 'not'},
//                 ' ',
//                 {tag: 's', children: 'parse HTML'},
//                 ' with hyper-replace']}]

Basically, the regular expression will start by matching the innermost tags (that don't contain sub-tags) and will replace them all by surrogates. Once that's done, the second innermost tags become innermost, so you can apply the pattern again, and so on, until you get to the top level.

How it works

When it is given a list like ['I ate ', chowder, ' and ', cake], hyperReplace stashes all non-strings in an array and creates the string 'I ate \uD801 and \uD802'. Characters in the \uD800-\uDBFF range are supposed to be used in surrogate pairs in the high position. They are invalid characters on their own, so they make for good placeholders.

So \uD801 is a placeholder for the first non-string element in the array, \uD802 stands for the second, and so on. Then, when a replacement is made, the replacement is stashed in the list and a new surrogate is put in its place in the string.

At the end, the string is split along the lone surrogates we introduced and filled in with the appropriate objects. This is also done with the match strings before they are given to a replacement function, so the placeholders are never exposed.

Limitations

  • It is only guaranteed to work on valid Unicode strings.
  • It'll choke if it has to do more than 1024 replacements, because then it runs out of surrogates. An exception will be raised if the limit is reached. (This should be fixable, I'm just waiting to run into the problem, or for somebody else to.)