Types Reference
Type | Description |
---|---|
InitOptions | Options for initializing the converter via init . |
ConverterOptions | Internal options for the Converter constructor. |
Converter | Main class for conversion and parsing (methods: convert , parse ). |
Middleware | Async function (html: string) => Promise<string> . |
TagHandler | Handler (element, options?) => DocumentElement | DocumentElement[] . |
TagHandlerObject | { key: string; handler: TagHandler } . |
DocumentElement | Union of intermediate document element types (paragraph, heading, etc.). |
ElementType | String literal type of element kinds ('paragraph' , 'heading' , etc.). |
Styles | Map of style keys to string/number, with CSS property support. |
IDOMParser | Interface for custom DOM parser with parse(html: string): Document . |
IDocumentConverter | Adapter interface (convert(elements): Promise<Buffer | Blob> ). |
AdapterProvider | Constructor type (new(deps) => IDocumentConverter) . |
StyleMapping | Map of CSS keys to transform functions (value, el) => unknown . |
StyleMapper | Class for mapping CSS styles to document styles. |
For full definitions, refer to the TypeScript source in src/core/types.ts
.
Document Elements
All built-in document element interfaces extend the BaseElement
type. Below are the default element types you'll encounter when parsing HTML:
Paragraph (paragraph
)
Interface: ParagraphElement
(type: 'paragraph'
)
Represents a block of text or grouping of inline content. Commonly produced from HTML <p>
, <pre>
, <blockquote>
, <figure>
, and <figcaption>
.
Key properties:
text?: string
— Plain text content.content?: DocumentElement[]
— Nested child elements (inline text, images).styles?
,attributes?
,metadata?
.
Heading (heading
)
Interface: HeadingElement
(type: 'heading'
)
Represents section titles. Produced from HTML <h1>
–<h6>
.
Key properties:
text: string
— Heading text.level: number
— Heading level (1–6).content?: DocumentElement[]
.styles?
,attributes?
,metadata?
.
Image (image
)
Interface: ImageElement
(type: 'image'
)
Represents an image resource. Produced from <img>
tags.
Key properties:
src: string
— Image URL or data URI.styles?
,attributes?
,metadata?
.
Text (text
)
Interface: TextElement
(type: 'text'
)
The basic inline text node. Produced for text content and many inline tags (<span>
, <strong>
, <em>
, etc.).
Key properties:
text: string
— Text content.content?: DocumentElement[]
— Nested formatting elements.styles?
,attributes?
,metadata?
.
Line (line
)
Interface: LineElement
(type: 'line'
)
A horizontal divider or rule. Produced from <hr>
elements.
Key properties:
styles?
,attributes?
,metadata?
.
List (list
)
Interface: ListElement
(type: 'list'
)
An ordered or unordered list container. Produced from <ol>
and <ul>
.
Key properties:
listType: 'ordered' | 'unordered'
— List style.level: number
— Nesting level.content: ListItemElement[]
— List items.markerStyle?: string
— Custom bullet or marker style.text?
,styles?
,attributes?
,metadata?
.
List Item (list-item
)
Interface: ListItemElement
(type: 'list-item'
)
A single item within a list. Produced from <li>
.
Key properties:
text?: string
— Inline text content.level: number
— Nesting level.content: DocumentElement[]
— Item content (paragraphs, nested lists).styles?
,attributes?
,metadata?
.
Table (table
)
Interface: TableElement
(type: 'table'
)
Represents a table container. Produced from <table>
.
Key properties:
rows: TableRowElement[]
— Table rows.content?: DocumentElement[]
— Non-row content (e.g., captions).styles?
,attributes?
,metadata?
.
Table Row (table-row
)
Interface: TableRowElement
(type: 'table-row'
)
A row within a table. Produced from <tr>
.
Key properties:
cells: TableCellElement[]
— Cells in the row.styles?
,attributes?
,metadata?
.
Table Cell (table-cell
)
Interface: TableCellElement
(type: 'table-cell'
)
A cell within a table row. Produced from <td>
and <th>
.
Key properties:
colspan?: number
— Number of columns to span.rowspan?: number
— Number of rows to span.content?: DocumentElement[]
— Cell content.styles?
,attributes?
,metadata?
.
Fragment (fragment
)
Interface: FragmentElement
(type: 'fragment'
)
A generic grouping container without specific semantics. Produced from <div>
or <dl>
.
Key properties:
text?
,content?: DocumentElement[]
,styles?
,attributes?
,metadata?
.
Attribute (attribute
)
Interface: AttributeElement
(type: 'attribute'
)
An attribute-like element for table metadata (e.g., column definitions via <colgroup>
, <col>
, or captions). Captures structural attributes.
Key properties:
name?: string
— Name of the attribute (e.g.,'colgroup'
,'col'
,'caption'
).attributes?: Record<string, string \| number>
— Attribute key/value pairs.text?
,content?: DocumentElement[]
,styles?
,metadata?
.
Custom Element Types
You are not limited to the built-in element types. The ElementType
union includes (string & {})
, so you can define custom types simply by returning a DocumentElement
with a type
set to your custom string.
For example, you can handle a <widget>
tag and produce a custom widget
element:
import { init, TagHandlerObject, DocumentElement } from 'html-to-document';
const widgetHandler: TagHandlerObject = {
key: 'widget',
handler: (element, options) => {
const id = element.getAttribute('data-id');
return {
type: 'widget',
metadata: { id },
content: options.content,
styles: options.styles,
attributes: options.attributes,
} as DocumentElement;
},
};
const converter = init({
tags: {
tagHandlers: [widgetHandler],
},
});
This will produce DocumentElement
nodes with type: 'widget'
, which you can process in your custom adapter or middleware.