Skip to main content

Middleware

Middleware functions run on the HTML string before it is parsed into DocumentElement nodes. They allow you to transform, sanitize, or minify the HTML content.

Signature

import { Middleware } from 'html-to-document';

type Middleware = (html: string) => Promise<string>;

See the Types Reference for the full definition.

Default Middleware

By default, init() applies a built-in whitespace-minifying middleware (unless you set clearMiddleware: true). This default middleware:

  • Strips HTML comments (<!-- ... -->)
  • Collapses consecutive whitespace into a single space (outside <pre>)
  • Removes unnecessary whitespace between tags
  • Trims leading and trailing whitespace

Custom Middleware

There are two ways to register your own middleware:

1. Via init options

import { init } from 'html-to-document';

// Example: remove all <script> tags
const stripScripts: Middleware = async (html) =>
html.replace(/<script[\s\S]*?>[\s\S]*?<\/script>/g, '');

const converter = init({
clearMiddleware: true, // skip default minifier
middleware: [stripScripts],
});

2. Programmatically

const converter = init();
// Add another middleware after initialization
converter.useMiddleware(stripScripts);

Note: Middleware functions are executed in the order they are passed in or registered. Make sure to arrange them accordingly if one depends on the output of another.

Example: Sanitizing HTML

import { init } from 'html-to-document';

// Remove all inline styles
const removeStyles: Middleware = async (html) =>
html.replace(/ style="[^"]*"/g, '');

const converter = init({
middleware: [removeStyles],
});

converter.convert('<p style="color:red">Hello</p>', 'docx')
.then(buffer => /* ... */)
.catch(console.error);