Skip to main content

Custom Tag Handlers & Default Element Behavior

The parser in html-to-document allows you to intercept or override how specific HTML tags are transformed into intermediate DocumentElement nodes.

This is extremely powerful for adapting to different HTML inputs — whether you're customizing list levels, handling custom tags, or preprocessing structural elements.


Tag Handlers

You can provide your own tag handlers using the tags.tagHandlers array when initializing the converter via init.

Each handler implements the TagHandlerObject interface, specifying a key (HTML tag name) and a handler function.

import { TagHandlerObject, init } from 'html-to-document';

const customHandler: TagHandlerObject = {
key: 'custom-element',
handler: (element, options) => ({
type: 'paragraph',
text: element.textContent || '',
styles: {
fontStyle: 'italic',
...(options?.styles || {}),
},
}),
};

const converter = init({
tags: {
tagHandlers: [customHandler],
},
});

The handler() receives:

  • element: The raw HTMLElement
  • options: Precomputed values like:
    • text (plain content)
    • content (parsed children)
    • styles (flattened computed + inline + defaults)
    • attributes (HTML attributes + defaults)
    • metadata (e.g., list level)
    • level, colspan, rowspan, etc.

The handler must return either:

  • A single DocumentElement, or
  • An array of DocumentElement nodes

Built-in Tag Support

The parser includes internal support for many tags:

  • block elements: p, div, h1h6, ul, ol, li, blockquote, pre, code
  • inline elements: strong, em, u, sup, sub, span, a, br
  • media: img, figure, figcaption
  • table: table, thead, tbody, tfoot, tr, td, th, col, colgroup, caption
  • semantic: dl, dt, dd

The default handler falls back to type: 'custom' if a tag is unknown.

You can override any of these by providing your own handler for that tag.


Default Styles and Attributes

In addition to tag handlers, you can also specify default styles and attributes that apply before parsing:

const converter = init({
tags: {
defaultStyles: [
{ key: 'p', styles: { marginBottom: 10, lineHeight: 1.5 } },
],
defaultAttributes: [
{ key: 'img', attributes: { width: 600 } },
],
},
});
  • defaultStyles: Set fallback styles per HTML tag before parsing
  • defaultAttributes: Set fallback attributes per tag (e.g., size, alignment)

These apply in addition to inline styles or attributes in the HTML.


Runtime Handler Registration

You can also register handlers after initialization:

converter.parser.registerTagHandler('custom-element', handlerFn);

Useful for dynamic extension or plugin behavior.


Summary

FeatureDescription
tagHandlersOverride how specific HTML tags are parsed
defaultStylesSet base styles for HTML tags
defaultAttributesSet base attributes (like width for img)
TagHandlerObjectDefines a key (tag) and a handler function
handler() returnMust return a DocumentElement or array thereof
registerTagHandler()Add handlers dynamically after init

Need more control? You can also subclass the Parser directly or extend the converter behavior for advanced parsing workflows.