Plugins
Plugins are the primary extension point for the converter pipeline. They can transform the raw HTML before parsing, inspect or mutate the parsed Document, and replace the parsed DocumentElement[] after parsing.
Signature
import { Plugin } from 'html-to-document';
interface Plugin {
name?: string;
beforeParse?(context: BeforeParseContext): void | Promise<void>;
onDocument?(context: OnDocumentContext): void | Promise<void>;
afterParse?(context: AfterParseContext): void | Promise<void>;
}
name is optional and intended for diagnostics.
Hook Order
Plugins run in registration order.
- Every
beforeParsehook runs against the HTML string. - The converter parses the final HTML into a
Documentand runs everyonDocumenthook. - The parser converts that
DocumentintoDocumentElement[]. - Every
afterParsehook runs against the parsed elements.
Each parse session starts with a fresh stylesheet clone. Every hook receives the same per-parse stylesheet and data object.
If any plugin throws or rejects, parsing fails immediately and the original error is surfaced.
Default Plugin
init() and new Converter() enable a built-in minify plugin by default. It:
- strips HTML comments
- collapses consecutive whitespace outside
<pre> - removes unnecessary whitespace between tags
- trims leading and trailing whitespace
Use enableDefaultPlugins: false to disable it.
Registering Plugins
Via init()
import { init } from 'html-to-document';
const converter = init({
plugins: [
{
name: 'strip-scripts',
beforeParse: async (context) => {
context.setHtml(
context.html.replace(/<script[\s\S]*?>[\s\S]*?<\/script>/g, '')
);
},
},
{
name: 'append-review-flag',
afterParse: async (context) =>
context.replaceElements(
context.elements.map((element) =>
element.type === 'paragraph' && element.text
? {
...element,
metadata: { ...element.metadata, reviewed: true },
}
: element
)
),
},
],
});
Deprecated Middleware Compatibility
Legacy middleware and clearMiddleware are still supported for backward compatibility, but they are deprecated in favor of plugins.
middlewareentries are internally adapted into plugins with abeforeParsehookclearMiddleware: trueimpliesenableDefaultPlugins: falseby default- explicit
enableDefaultPluginsoverrides that implication useMiddleware()still works and registers abeforeParseplugin after construction
This means the following is still valid:
const converter = init({
middleware: [async (html) => html.replace('foo', 'bar')],
});
But new integrations should prefer:
const converter = init({
plugins: [
{
beforeParse: async (context) => {
context.setHtml(context.html.replace('foo', 'bar'));
},
},
],
});