textobserver v3.0.0
TextObserver
TextObserver replaces text on a webpage with whatever you want, including text injected dynamically after initial page load.
const badWordFilter = new TextObserver(text => text.replaceAll(/heck/gi, 'h*ck'));Inspired by findAndReplaceDOMText, but with a different use case. Use findAndReplaceDOMText if you need to robustly substitute or wrap text that may span across multiple nodes a set number of times. TextObserver uses the MutationObserver interface to keep track of changes in the DOM. Any dynamically updated content will automatically be processed in time linear to the amount of changes, not to the total amount of nodes. This enables usage on modern infinite-scrolling websites that make heavy use of AJAX like Reddit and Discord without performance degrading as more content is added.
Installation
All the code is self-contained in TextObserver.js. If you just want to try it out, paste the contents of TextObserver.js into the browser console. It is also available as an NPM package.
Usage
Constructor
TextObserver(callback, target = document, processExisting = true, performanceOptions = {...})
- callback: a function that takes a string as its only argument and returns a string to replace it with
- target(optional): the subtree of the DOM to watch (set to- document.bodyif you want to exclude the- <title>)
- processExisting(optional): whether to do an initial scan over the existing content
- performanceOptions(optional): a dictionary with keys specifying what to watch; see Performance section below
Methods
disconnect(flush = true)
- flush(optional): whether to do a "clean-up" run of changes that have been detected but not yet processed by the observer's callback
reconnect(reprocess = true)
- reprocess(optional): whether to do a complete "re-sweep" of the page; if- false, changes made while the observer was disconnected are ignored
Examples
A more advanced regex that uses capturing groups/backreferences to convert miles to kilometers:
const unitConverter = new TextObserver(text => text.replaceAll(
    /(\d+\.?\d*) ?mi(\W|les?|$)/gi,
    (match, number) => (parseFloat(number) * 1.609).toFixed(2) + ' km'
));The callback is not limited to a regular expression. Here's a more complex example that transforms everything into "mOcKiNg SpOnGeBoB" case. Useful for heated Internet discussions!
const spongebobCase = new TextObserver(text => {
    let characters = Array.from(text);
    for (let i = 0; i < characters.length; i++) {
        if (Math.random() < 0.5) {
            if (characters[i].toUpperCase() === characters[i]) {
                characters[i] = characters[i].toLowerCase();
            } else {
                characters[i] = characters[i].toUpperCase();
            }
        }
    }
    return characters.join('');
});You don't even have to necessarily modify the text at all if you're doing something like sentiment analysis on your Internet readings. Just remember to return at the end of your callback, or else every string on the page will get blanked out.
Performance
Note: All measurements were taken on the front page of New Reddit on a Core i5-8210Y running Chromium 91, roughly equivalent to an iPhone 7 on GeekBench. Your numbers should be better if you're targeting laptops and desktops, while older netbooks and phones may be slower.
There are two main pieces to TextObserver. One is the "observation" part, that is, the code that watches for added nodes or changes to existing text content and processes them. This step happens in time linear to the number of mutations and should use negligible resources.
The other piece is the scanning of the entire target subtree that occurs whenever the constructor or reconnect() are called with their default arguments. This runs in time linear to the total amount of nodes in the target subtree and can take >100ms for the heaviest pages, fast enough to not perceptibly affect load time but enough that whole page scans should only be run a single time during initial load. The time it takes to walk through the DOM dominates the time spent in the callback. That is why if you want to perform multiple replacements, it is recommended to extend the callback instead of creating a separate observer for each replacement.
Baseline: ~100 ms range
const grammarPolice = new TextObserver(text => text.replaceAll(/would of/gi, 'would have'));Good: performs practically identically to baseline
const grammarPolice = new TextObserver(text => text.replaceAll(
    /would of/gi, 'would have').replaceAll(
    /should of/gi, 'should have'
));Bad: ~200 ms range
const grammarPolice = new TextObserver(text => text.replaceAll(/would of/gi, 'would have'));
const grammarPolice2 = new TextObserver(text => text.replaceAll(/should of/gi, 'should have'));Narrowing down the target as much as possible and calling the constructor and reconnect() sparingly will also greatly help. To further customize performance, you can set the following keys in the constructor's performanceOptions dictionary (any missing keys will be assumed to be off):
- cssContent(default- false): Check for text generated with the CSS- contentkey. This may cause a style recalculation bottleneck and push execution time to over a second, so it is turned off by default.
- iconFonts(default- false): Skip text whose- font-familycontains the word "icon." This is computationally expensive for the same reason as the above setting and having both enabled can take seconds to process the whole document.
- contentEditable(default- true): Only process text whose parent is not- contentEditable. Disable only if you're sure the page has no- contentEditableelements because modifying their text messes up the cursor position, resulting in a frustrating user experience. It also doesn't make much sense to disable this or any of the below options when either- cssContentor- iconFontshas been enabled because the cost of computing styles is massive compared to everything else.
- attributes(default- true): Process HTML attributes that get rendered by the browser as user-visible text such as image alt text, input placeholders, etc. Disabling this and- contentEditablecan cut runtime to about half that of the default config.
- shadows(default- true): Search for and process open Shadow DOMs as well as override- Element.prototype.attachShadow()to force Shadow DOMs created in the future to open mode. This option doesn't have a big impact on performance; it mostly exists if you want to respect the encapsulation of Shadow DOMs.
Hacking
Issues and pull requests are welcome. I have written an article that goes over the basics of how the library works which may be useful.
Known Issues
TextObserver should work out of the box 98% of the time. Unfortunately, the modern web is an extraordinarily complex beast, and the following edge cases (hopefully rare for most users) exist.
Doesn't see <iframe>s
An observer created on a page cannot see inside an <iframe> from another domain due to the browser-enforced same origin policy. However, if you are using this as part of a web extension's content script, you can set "all_frames": true in your manifest to tell the browser to inject the script into every frame's execution context.
Misses closed Shadow DOMs
A closed Shadow DOM is inaccessible to outside JavaScript. If you're making an extension, you could try injecting the script into the page's execution context before any other scripts run to force Shadow DOMs created in the future to open mode (if you manage to get this to work, let me know and I'll put up example code).
Repeated "recursive" replacements
If your "before" text is a substring of your "after" text (e.g. replacing "expands" with "physically expands") you may find that the replacement has happened more than once ("physically physically ... expands"). TextObserver contains mitigations for this but every site is coded in a different way and it is impossible to account for every case. For example, Wikipedia reads off the References section to create the tooltip citations that pop up when you hover over a boxed number, so the tooltip's text will inevitably be replaced twice.
Messes with document editors
Complex online code editors and word processors can exhibit quirks if TextObserver changes their contents, especially if the replacement is of a different length than the old value ("heck" to "heckerino!" is more likely to cause issues than "heck" to "h*ck"). I would suggest allowing the user to disable replacements on a site-by-site basis if applicable.