Middleware
Middleware functions run on the HTML string before it is parsed into DocumentElement
nodes. They allow you to transform, sanitize, or minify the HTML content.
Signature
import { Middleware } from 'html-to-document';
type Middleware = (html: string) => Promise<string>;
See the Types Reference for the full definition.
Default Middleware
By default, init()
applies a built-in whitespace-minifying middleware (unless you set clearMiddleware: true
). This default middleware:
- Strips HTML comments (
<!-- ... -->
) - Collapses consecutive whitespace into a single space (outside
<pre>
) - Removes unnecessary whitespace between tags
- Trims leading and trailing whitespace
Custom Middleware
There are two ways to register your own middleware:
1. Via init
options
import { init } from 'html-to-document';
// Example: remove all <script> tags
const stripScripts: Middleware = async (html) =>
html.replace(/<script[\s\S]*?>[\s\S]*?<\/script>/g, '');
const converter = init({
clearMiddleware: true, // skip default minifier
middleware: [stripScripts],
});
2. Programmatically
const converter = init();
// Add another middleware after initialization
converter.useMiddleware(stripScripts);
Note: Middleware functions are executed in the order they are passed in or registered. Make sure to arrange them accordingly if one depends on the output of another.
Example: Sanitizing HTML
import { init } from 'html-to-document';
// Remove all inline styles
const removeStyles: Middleware = async (html) =>
html.replace(/ style="[^"]*"/g, '');
const converter = init({
middleware: [removeStyles],
});
converter.convert('<p style="color:red">Hello</p>', 'docx')
.then(buffer => /* ... */)
.catch(console.error);