Skip to main content

Types Reference

TypeDescription
InitOptionsOptions for initializing the converter via init.
ConverterOptionsInternal options for the Converter constructor.
ConverterMain class for conversion and parsing (methods: convert, parse).
MiddlewareAsync function (html: string) => Promise<string>.
TagHandlerHandler (element, options?) => DocumentElement | DocumentElement[].
TagHandlerObject{ key: string; handler: TagHandler }.
DocumentElementUnion of intermediate document element types (paragraph, heading, etc.).
ElementTypeString literal type of element kinds ('paragraph', 'heading', etc.).
StylesMap of style keys to string/number, with CSS property support.
IDOMParserInterface for custom DOM parser with parse(html: string): Document.
IDocumentConverterAdapter interface (convert(elements): Promise<Buffer | Blob>).
AdapterProviderConstructor type (new(deps) => IDocumentConverter).
StyleMappingMap of CSS keys to transform functions (value, el) => unknown.
StyleMapperClass for mapping CSS styles to document styles.

For full definitions, refer to the TypeScript source in src/core/types.ts.

Document Elements

All built-in document element interfaces extend the BaseElement type. Below are the default element types you'll encounter when parsing HTML:

Paragraph (paragraph)

Interface: ParagraphElement (type: 'paragraph')
Represents a block of text or grouping of inline content. Commonly produced from HTML <p>, <pre>, <blockquote>, <figure>, and <figcaption>.
Key properties:

  • text?: string — Plain text content.
  • content?: DocumentElement[] — Nested child elements (inline text, images).
  • styles?, attributes?, metadata?.

Heading (heading)

Interface: HeadingElement (type: 'heading')
Represents section titles. Produced from HTML <h1><h6>.
Key properties:

  • text: string — Heading text.
  • level: number — Heading level (1–6).
  • content?: DocumentElement[].
  • styles?, attributes?, metadata?.

Image (image)

Interface: ImageElement (type: 'image')
Represents an image resource. Produced from <img> tags.
Key properties:

  • src: string — Image URL or data URI.
  • styles?, attributes?, metadata?.

Text (text)

Interface: TextElement (type: 'text')
The basic inline text node. Produced for text content and many inline tags (<span>, <strong>, <em>, etc.).
Key properties:

  • text: string — Text content.
  • content?: DocumentElement[] — Nested formatting elements.
  • styles?, attributes?, metadata?.

Line (line)

Interface: LineElement (type: 'line')
A horizontal divider or rule. Produced from <hr> elements.
Key properties:

  • styles?, attributes?, metadata?.

List (list)

Interface: ListElement (type: 'list')
An ordered or unordered list container. Produced from <ol> and <ul>.
Key properties:

  • listType: 'ordered' | 'unordered' — List style.
  • level: number — Nesting level.
  • content: ListItemElement[] — List items.
  • markerStyle?: string — Custom bullet or marker style.
  • text?, styles?, attributes?, metadata?.

List Item (list-item)

Interface: ListItemElement (type: 'list-item')
A single item within a list. Produced from <li>.
Key properties:

  • text?: string — Inline text content.
  • level: number — Nesting level.
  • content: DocumentElement[] — Item content (paragraphs, nested lists).
  • styles?, attributes?, metadata?.

Table (table)

Interface: TableElement (type: 'table')
Represents a table container. Produced from <table>.
Key properties:

  • rows: TableRowElement[] — Table rows.
  • content?: DocumentElement[] — Non-row content (e.g., captions).
  • styles?, attributes?, metadata?.

Table Row (table-row)

Interface: TableRowElement (type: 'table-row')
A row within a table. Produced from <tr>.
Key properties:

  • cells: TableCellElement[] — Cells in the row.
  • styles?, attributes?, metadata?.

Table Cell (table-cell)

Interface: TableCellElement (type: 'table-cell')
A cell within a table row. Produced from <td> and <th>.
Key properties:

  • colspan?: number — Number of columns to span.
  • rowspan?: number — Number of rows to span.
  • content?: DocumentElement[] — Cell content.
  • styles?, attributes?, metadata?.

Fragment (fragment)

Interface: FragmentElement (type: 'fragment')
A generic grouping container without specific semantics. Produced from <div> or <dl>.
Key properties:

  • text?, content?: DocumentElement[], styles?, attributes?, metadata?.

Attribute (attribute)

Interface: AttributeElement (type: 'attribute')
An attribute-like element for table metadata (e.g., column definitions via <colgroup>, <col>, or captions). Captures structural attributes.
Key properties:

  • name?: string — Name of the attribute (e.g., 'colgroup', 'col', 'caption').
  • attributes?: Record<string, string \| number> — Attribute key/value pairs.
  • text?, content?: DocumentElement[], styles?, metadata?.

Custom Element Types

You are not limited to the built-in element types. The ElementType union includes (string & {}), so you can define custom types simply by returning a DocumentElement with a type set to your custom string.

For example, you can handle a <widget> tag and produce a custom widget element:

import { init, TagHandlerObject, DocumentElement } from 'html-to-document';

const widgetHandler: TagHandlerObject = {
key: 'widget',
handler: (element, options) => {
const id = element.getAttribute('data-id');
return {
type: 'widget',
metadata: { id },
content: options.content,
styles: options.styles,
attributes: options.attributes,
} as DocumentElement;
},
};

const converter = init({
tags: {
tagHandlers: [widgetHandler],
},
});

This will produce DocumentElement nodes with type: 'widget', which you can process in your custom adapter or middleware.