Initialization
The init function is your main entry point to configure and initialize the converter engine. It returns a Converter instance that can parse HTML and convert it into document formats like DOCX, PDF, or Markdown. Through init, you can register custom adapters, tag handlers, plugins, and default styles to control how HTML is interpreted and styled.
Quick Start
Here's a minimal example to get started:
import { init, DocxAdapter } from 'html-to-document';
const converter = init({
adapters: {
register: [{ format: 'docx', adapter: DocxAdapter }],
},
});
const html = '<h1>Hello</h1><p>This is a paragraph</p>';
converter.convert(html, 'docx').then((blobOrBuffer) => {
// Save or download the result
console.log('Document generated');
});
For full customization, refer to the options below.
Signature
import { init } from 'html-to-document';
declare function init(options?: InitOptions): Converter;
The options object conforms to the InitOptions type and supports the following properties:
plugins?: Plugin[]
Register one or more plugins for the converter pipeline.
-
Type:
Plugin[] -
Default: the built-in
minifyplugin is enabled unless disabled byenableDefaultPlugins: false, or implicitly by legacyclearMiddleware: true -
Hooks:
beforeParse?(context)can inspect and replace the raw HTML viacontext.setHtml(...)onDocument?(context)can inspect or mutate the parsedDocumentand the fresh per-parse stylesheetafterParse?(context)can inspect the parsedDocumentElement[]and replace them viacontext.replaceElements(...)
-
Order: plugins run in array order across the three phases:
beforeParse,onDocument, thenafterParse -
Errors: plugin failures fail fast and surface their original errors
-
Example:
import { init } from 'html-to-document';
const converter = init({
plugins: [
{
name: 'strip-scripts',
beforeParse: async (context) => {
context.setHtml(
context.html.replace(/<script[\s\S]*?>[\s\S]*?<\/script>/g, '')
);
},
},
{
name: 'mark-paragraphs',
afterParse: async (context) => {
context.replaceElements(
context.elements.map((element) =>
element.type === 'paragraph'
? {
...element,
metadata: { ...element.metadata, sanitized: true },
}
: element
)
);
},
},
],
});
enableDefaultPlugins?: boolean
Controls whether built-in plugins are registered.
- Type: boolean
- Default:
true, unlessclearMiddleware: trueis set andenableDefaultPluginsis not explicitly provided - Current built-in plugin:
minify
See Plugins for details.
middleware?: Middleware[]
Deprecated compatibility layer for HTML preprocessing.
-
Type:
Middleware[] -
Status: deprecated; prefer
pluginswithbeforeParse -
Behavior: each middleware entry is internally adapted into a plugin
-
Example:
import { init } from 'html-to-document';
import { customMiddleware1, customMiddleware2 } from './middleware';
const converter = init({
middleware: [customMiddleware1, customMiddleware2],
});
clearMiddleware?: boolean
Deprecated compatibility switch for the old middleware model.
- Type: boolean
- Default:
false - Status: deprecated; prefer
enableDefaultPlugins: false - Behavior: implies
enableDefaultPlugins: falseby default, but explicitenableDefaultPluginsoverrides that legacy implication
styleInheritance?
Customize how CSS properties flow from parent to child elements. This allows you to override the default inheritance behavior defined in the core engine.
- Type:
Partial<Record<keyof CSS.Properties, Partial<StyleMeta>>> - Example:
init({
styleInheritance: {
// Force 'border' to inherit (standard CSS does not inherit borders, but you might want to)
border: {
inherits: true,
scopes: ['block', 'tableCell'],
},
// Prevent 'color' from inheriting (just as an example)
color: {
inherits: false,
scopes: ['block'],
},
},
});
Understanding Scopes and Cascading
-
scopes: Defines which element types can have this property.- It answers: "Is this property allowed on this type of element?"
- It does not control inheritance (that's
inherits). IF a child attempts to inherit a property, it must also be valid for that child's scope (unlesscascadeTooverrides this). - Example:
textAlignhasscopes: ['block', 'tableCell']. A<span>(inline) cannot havetextAlignbecause it is not a valid scope.
-
inherits: Defines if the property flows down to children naturally.- Example:
textAlignhasinherits: true. A<p>(block) can inherittextAlignfrom its parent<div>(block).
- Example:
-
cascadeTo: Usage is rare but powerful. It forces a property to pass from a parent to a specific type of child even if local logic might otherwise filter it.- Example:
textAlignon atableCellshould affect theblock(paragraph) inside it. - Why? A table cell isn't just a box; it sets the alignment context for its contents.
cascadeTo: ['block']tells the engine: "If I'm on a cell, pass me down to the paragraph inside."
- Example:
tags?
{
tagHandlers?: TagHandlerObject[];
defaultStyles?: { key: string; styles: Styles }[];
defaultAttributes?: { key: string; attributes: Record<string, any> }[];
}
Customize how HTML tags are parsed and styled before conversion.
- tagHandlers: Provide custom
TagHandlerObjectoverrides:const customHandler: TagHandlerObject = {
/* ... */
};
init({ tags: { tagHandlers: [customHandler] } }); - defaultStyles: Fallback style definitions per HTML tag:
init({
tags: {
defaultStyles: [
{ key: 'p', styles: { marginBottom: 10, lineHeight: 1.5 } },
],
},
}); - defaultAttributes: Fallback attributes per HTML tag:
init({
tags: {
defaultAttributes: [{ key: 'img', attributes: { width: 600 } }],
},
});
adapters?
{
register?: {
format: string;
adapter: AdapterProvider;
config?: object;
createAdapter?: (args: {
format: string;
Adapter: AdapterProvider;
config?: object;
dependencies: IConverterDependencies;
}) => IDocumentConverter;
}[];
defaultStyles?: { format: string; styles: Record<ElementType, Styles> }[];
}
Adapters determine how the parsed content is rendered into a final document format. You can register your own adapter (e.g., for Markdown) or extend existing ones like the built-in DOCX adapter. Controls which adapters are registered and which default styles they receive.
-
register: List of custom adapters implementing
IDocumentConverter:init({
adapters: {
register: [{ format: 'md', adapter: MyAdapter }],
},
}); -
register.createAdapter: Per-registration factory hook for adapter construction.
init()computes a fresh dependency object for each registration and passes it to this factory together with the adapter class and config. Use this when you want to customizedefaultStyles,styleMeta, or wrap the adapter instance before registration:init({
adapters: {
register: [
{
format: 'docx',
adapter: DocxAdapter,
createAdapter: ({ Adapter, config, format, dependencies }) => {
if (format === 'docx') {
return new Adapter(
{
...dependencies,
defaultStyles: {
...dependencies.defaultStyles,
heading: { color: 'darkred' },
},
styleMeta: {
...dependencies.styleMeta,
color: {
...dependencies.styleMeta?.color,
inherits: false,
},
},
},
config
);
}
return new Adapter(dependencies, config);
},
},
],
},
});Each
dependenciesobject is isolated per adapter registration. Mutating one factory call does not affect the next adapter. -
defaultStyles: Fallback styles per element type for each format:
init({
adapters: {
defaultStyles: [
{
format: 'docx',
styles: { paragraph: { color: 'darkblue', fontSize: 24 } },
},
],
},
}); -
config: Optional adapter-specific configuration object for each registered adapter. For example, the built-in
DocxAdaptersupports custom block, inline, and fallthrough converters, plus DOCX-specific style mapping:init({
adapters: {
register: [
{
format: 'docx',
adapter: DocxAdapter,
config: {
blockConverters: [new MyBlockConverter()],
inlineConverters: [new MyInlineConverter()],
fallthroughConverters: [new MyFallthroughConverter()],
styleMappings: {
fontWeight: (v) => ({ bold: v === 'bold' }),
},
},
},
],
},
});
domParser?: IDOMParser
Use a custom DOM parser implementation.
- Type:
IDOMParser - Example:
class CustomParser implements IDOMParser {
parse(html: string) {
/* ... */
}
}
init({ domParser: new CustomParser() });
Example Usage
import { init } from 'html-to-document';
import { MyAdapter } from './my-adapter';
import { CustomParser } from './parser';
const converter = init({
plugins: [
{
beforeParse: async (context) =>
context.setHtml(
context.html.replace(/<script[\s\S]*?>[\s\S]*?<\/script>/g, '')
),
},
],
tags: {
defaultStyles: [{ key: 'p', styles: { marginBottom: 8 } }],
},
adapters: {
register: [{ format: 'md', adapter: MyAdapter }],
defaultStyles: [{ format: 'md', styles: { paragraph: { indent: 20 } } }],
},
domParser: new CustomParser(), // custom DOM parser implementation
});
converter
.convert('<h1>Title</h1><p>Text</p>', 'docx')
.then((buffer) => console.log('Generated DOCX:', buffer))
.catch(console.error);