1.1.9 • Published 3 years ago

@vereign/lib-mime v1.1.9

Weekly downloads
-
License
AGPL-3.0-or-later
Repository
-
Last release
3 years ago

MIME parser

Table of contents

[TOC]

Prerequisites

  • Node.JS v12
  • yarn package manager

Installation

  • Set up authentication to the company NPM account
  • $ yarn add @vereign/lib-mime or $ npm install @vereign/lib-mime

Usage

Basic sample

import MIMEParser from "@veriegn/mime-parser";

const mimeParser = new MIMEParser(mimeString);

const html = await mimeParser.getHTML();
const plain = await mimeParser.getPlain();
const attachments = mimeParser.getAttachments();
const from = mimeParser.getGlobalHeaderValue("from");
const to = mimeParser.getGlobalHeaderValue("to");
const cc = mimeParser.getGlobalHeaderValue("cc");
const bcc = mimeParser.getGlobalHeaderValue("bcc");
const subject = mimeParser.getGlobalHeaderValue("subject");

Custom DOM parser. Provided example uses @vereign/dom

import MIMEParser from "@veriegn/mime-parser";
import { DOM } from "@vereign/dom";

const mimeParser = new MIMEParser(mimeString);
mimeParser.parseHTML = (htmlString: string) => {
  return new DOM(htmlString).window.document;
};

Separation of the replied/forwarded fragments from the HTML part of the MIME body

Gmail

Gmail wraps forwarded/replied parts into <div class="gmail_quote/>. The goal is to extract top-level ones.

HTML sample:

<div dir="ltr">
  Hello!
  <br />
  <!-- Consider everything within as the quoted fragment -->
  <div class="gmail_quote">
    <div dir="ltr" class="gmail_attr">
      ---------- Forwarded message ---------<br />
      From:
      <strong class="gmail_sendername" dir="auto">Sender Name</strong>
      <span dir="auto">
        <a href="mailto:markin.io210@gmail.com">sender@gmail.com</a>
      </span>
      <br />
      Date: Tue, Feb 9, 2021 at 6:29 PM<br />
      Subject: Fwd: Gmail-Gmail-Forward<br />
      To: Recipient Name
      <a href="mailto:sender@gmail.com">sender@gmail.com</a>
    </div>
  </div>
</div>

Outlook

Outlook.com

outlook.com engine uses an empty <div/> element with id appendonsend as a separator. Sometimes it might be prefixed with `x` as a part of their Anti-XSS protection_

Everything below this separator can be considered as forwarded/replied part

HTML sample

<body>
  <div>
    <span>Hello!</span>
    <div style="margin: 0px; background-color: rgb(255, 255, 255)">and</div>
  </div>
  <div>
    <!-- Consider everything below as the quoted fragment -->
    <div id="appendonsend"></div>
    <hr tabindex="-1" style="display: inline-block; width: 98%" />
    <div id="divRplyFwdMsg" dir="ltr">
      <font face="Calibri, sans-serif" color="#000000" style="font-size: 11pt"
        ><b>From:</b> Igor Markin &lt;markin.io@hotmail.com&gt;<br /><b
          >Sent:</b
        >
        Tuesday, February 9, 2021 6:32 PM<br /><b>To:</b> Stephan Morphis
        &lt;sepahimmelen@live.com&gt;<br /><b>Subject:</b> Fw:
        Outlook-Outlook-Forwarded</font
      >
      <div>&nbsp;</div>
    </div>
  </div>
</body>

Outlook for Office 365, version 16.48 (Build 21041102) MacOS

This version does not provide an explicit identifier of the forwarded/replied content. However, it has a specific structure and style. Algorithm looks for the first occurrence of the structure provided below, and considers all consequent siblings as belonging to the fwd/repl fragment.

<div
  style="border: none; border-top: solid #b5c4df 1pt; padding: 3pt 0cm 0cm 0cm;"
/>

Outlook Desktop allows end user completely alter contents and style of the fwd/rpl parts, which can completely invalidate it for MIME parser. In such case, library will not be able to recognize it.

HTML sample

<body lang="en-RU" link="#0563C1" vlink="#954F72" style="word-wrap: break-word">
  <div class="WordSection1">
    <p class="MsoNormal">
      <span lang="EN-US">Hello!<o:p></o:p></span>
    </p>
    <p class="MsoNormal"><o:p>&nbsp;</o:p></p>
    <!-- Consider everything below as the quoted fragment -->
    <div
      style="border: none; border-top: solid #b5c4df 1pt; padding: 3pt 0cm 0cm 0cm;"
    >
      <p class="MsoNormal">
        <b><span style="font-size: 12pt; color: black">From: </span></b
        ><span style="font-size: 12pt; color: black"
          >Stephan Morphis &lt;sepahimmelen@live.com&gt;<br /><b>Date: </b
          >Monday, 26 April 2021, 18:31<br /><b>To: </b>Stephan Morphis
          &lt;sepahimmelen@live.com&gt;<br /><b>Subject: </b>Re:
          macos-outlook-outlook-direct</span
        ><span
          style="font-size: 12pt; color: black; mso-fareast-language: EN-GB"
          ><o:p></o:p
        ></span>
      </p>
    </div>
    <div>
      <p class="MsoNormal"><o:p>&nbsp;</o:p></p>
    </div>
    <p class="MsoNormal"><span lang="EN-US">Back to you!</span><o:p></o:p></p>
    <p class="MsoNormal">&nbsp;<o:p></o:p></p>
  </div>
</body>

Outlook for Office 365, version 2103 (Build 13901.20462) Windows

Same to MacOS, this version does not provide an explicit identifier of the forwarded/replied content. It has a similar specific structure and style with a slight differences.
Algorithm looks for the first occurrence of the structure provided below, and considers all consequent siblings as belonging to the fwd/repl fragment.

<div>
  <div
    style="border: none; border-top: solid #e1e1e1 1pt; padding: 3pt 0in 0in 0in;"
  >
    ...
  </div>
</div>

Outlook Desktop allows end user completely alter contents and style of the fwd/rpl parts, which can completely invalidate it for MIME parser. In such case, library will not be able to recognize it.

HTML sample

<body lang="EN-US" link="#0563C1" vlink="#954F72" style="word-wrap: break-word">
  <div class="WordSection1">
    <p class="MsoNormal">Hello!<o:p></o:p></p>
    <p class="MsoNormal"><o:p>&nbsp;</o:p></p>
    <!-- Consider everything below as the quoted fragment -->
    <div>
      <div
        style="border: none; border-top: solid #e1e1e1 1pt; padding: 3pt 0in 0in 0in;"
      >
        <p class="MsoNormal">
          <b>From:</b> Rosen Georgiev &lt;rosen.georgiev@vereign.com&gt; <br />
          <b>Sent:</b> Tuesday, April 27, 2021 10:46 AM<br />
          <b>To:</b> Rosen Georgiev &lt;rosen.georgiev@vereign.com&gt;<br />
          <b>Subject:</b> FW: 2 forwards simple text<o:p></o:p>
        </p>
      </div>
    </div>
    <p class="MsoNormal"><o:p>&nbsp;</o:p></p>
    <p class="MsoNormal"><o:p>&nbsp;</o:p></p>
    <p class="MsoNormal">Forward 1<o:p></o:p></p>
    <p class="MsoNormal"><o:p>&nbsp;</o:p></p>
  </div>
</body>

Changelog

CHANGELOG.md

License

AGPLv3

1.1.9

3 years ago

1.1.8

3 years ago

1.1.7

3 years ago

1.1.6

3 years ago

1.1.5

3 years ago

1.1.4

3 years ago

1.1.3

3 years ago

1.1.2

3 years ago

1.1.1

3 years ago

1.1.0

3 years ago