Initialization
The init function is your main entry point to configure and initialize the converter engine. It returns a Converter instance that can parse HTML and convert it into document formats like DOCX, PDF, or Markdown. Through init, you can register custom adapters, tag handlers, middleware, and default styles to control how HTML is interpreted and styled.
Quick Start
Here's a minimal example to get started:
import { init, DocxAdapter } from 'html-to-document';
const converter = init({
adapters: {
register: [{ format: 'docx', adapter: DocxAdapter }],
},
});
const html = '<h1>Hello</h1><p>This is a paragraph</p>';
converter.convert(html, 'docx').then((blobOrBuffer) => {
// Save or download the result
console.log('Document generated');
});
For full customization, refer to the options below.
Signature
import { init } from 'html-to-document';
declare function init(options?: InitOptions): Converter;
The options object conforms to the InitOptions type and supports the following properties:
middleware?: Middleware[]
Register one or more middleware functions to transform the HTML before parsing. Middleware lets you transform or sanitize HTML before parsing—e.g., stripping scripts, normalizing whitespace, or injecting metadata.
-
Type:
Middleware[] -
Default: [minifyMiddleware] applied automatically unless
clearMiddlewareistrue -
Example:
import { init } from 'html-to-document';
import { customMiddleware1, customMiddleware2 } from './middleware';
const converter = init({
middleware: [customMiddleware1, customMiddleware2],
});
clearMiddleware?: boolean
Skips registering the default minifyMiddleware. When true, only your provided middleware functions will be used.
- Type: boolean
- Default:
false - Default:
false
styleInheritance?
Customize how CSS properties flow from parent to child elements. This allows you to override the default inheritance behavior defined in the core engine.
- Type:
Partial<Record<keyof CSS.Properties, Partial<StyleMeta>>> - Example:
init({
styleInheritance: {
// Force 'border' to inherit (standard CSS does not inherit borders, but you might want to)
border: {
inherits: true,
scopes: ['block', 'tableCell'],
},
// Prevent 'color' from inheriting (just as an example)
color: {
inherits: false,
scopes: ['block'],
},
},
});
Understanding Scopes and Cascading
-
scopes: Defines which element types can have this property.- It answers: "Is this property allowed on this type of element?"
- It does not control inheritance (that's
inherits). IF a child attempts to inherit a property, it must also be valid for that child's scope (unlesscascadeTooverrides this). - Example:
textAlignhasscopes: ['block', 'tableCell']. A<span>(inline) cannot havetextAlignbecause it is not a valid scope.
-
inherits: Defines if the property flows down to children naturally.- Example:
textAlignhasinherits: true. A<p>(block) can inherittextAlignfrom its parent<div>(block).
- Example:
-
cascadeTo: Usage is rare but powerful. It forces a property to pass from a parent to a specific type of child even if local logic might otherwise filter it.- Example:
textAlignon atableCellshould affect theblock(paragraph) inside it. - Why? A table cell isn't just a box; it sets the alignment context for its contents.
cascadeTo: ['block']tells the engine: "If I'm on a cell, pass me down to the paragraph inside."
- Example:
tags?
{
tagHandlers?: TagHandlerObject[];
defaultStyles?: { key: string; styles: Styles }[];
defaultAttributes?: { key: string; attributes: Record<string, any> }[];
}
Customize how HTML tags are parsed and styled before conversion.
- tagHandlers: Provide custom
TagHandlerObjectoverrides:const customHandler: TagHandlerObject = {
/* ... */
};
init({ tags: { tagHandlers: [customHandler] } }); - defaultStyles: Fallback style definitions per HTML tag:
init({
tags: {
defaultStyles: [
{ key: 'p', styles: { marginBottom: 10, lineHeight: 1.5 } },
],
},
}); - defaultAttributes: Fallback attributes per HTML tag:
init({
tags: {
defaultAttributes: [{ key: 'img', attributes: { width: 600 } }],
},
});
adapters?
{
register?: { format: string; adapter: AdapterProvider; config?: object }[];
defaultStyles?: { format: string; styles: Record<ElementType, Styles> }[];
styleMappings?: { format: string; handlers: StyleMapping }[];
}
Adapters determine how the parsed content is rendered into a final document format. You can register your own adapter (e.g., for Markdown) or extend existing ones like the built-in DOCX adapter. Controls which adapters are registered and how CSS styles map to document properties.
-
register: List of custom adapters implementing
IDocumentConverter:init({
adapters: {
register: [{ format: 'md', adapter: MyAdapter }],
},
}); -
defaultStyles: Fallback styles per element type for each format:
init({
adapters: {
defaultStyles: [
{
format: 'docx',
styles: { paragraph: { color: 'darkblue', fontSize: 24 } },
},
],
},
}); -
styleMappings: Custom CSS → document property mappings via
StyleMapping:init({
adapters: {
styleMappings: [
{
format: 'docx',
handlers: { fontWeight: (v) => ({ bold: v === 'bold' }) },
},
],
},
}); -
config: Optional adapter-specific configuration object for each registered adapter. For example, the built-in
DocxAdaptersupports custom block, inline, and fallthrough converters:init({
adapters: {
register: [
{
format: 'docx',
adapter: DocxAdapter,
config: {
blockConverters: [new MyBlockConverter()],
inlineConverters: [new MyInlineConverter()],
fallthroughConverters: [new MyFallthroughConverter()],
},
},
],
},
});
domParser?: IDOMParser
Use a custom DOM parser implementation.
- Type:
IDOMParser - Example:
class CustomParser implements IDOMParser {
parse(html: string) {
/* ... */
}
}
init({ domParser: new CustomParser() });
Example Usage
import { init } from 'html-to-document';
import { MyAdapter } from './my-adapter';
import { customMiddleware } from './middleware';
import { CustomParser } from './parser';
const converter = init({
clearMiddleware: false,
middleware: [customMiddleware],
tags: {
defaultStyles: [{ key: 'p', styles: { marginBottom: 8 } }],
},
adapters: {
register: [{ format: 'md', adapter: MyAdapter }],
defaultStyles: [{ format: 'md', styles: { paragraph: { indent: 20 } } }],
styleMappings: [
{
format: 'md',
handlers: { fontStyle: (v) => ({ italic: v === 'italic' }) },
},
],
},
domParser: new CustomParser(), // custom DOM parser implementation
});
converter
.convert('<h1>Title</h1><p>Text</p>', 'docx')
.then((buffer) => console.log('Generated DOCX:', buffer))
.catch(console.error);