Skip to main content

Getting Started

html-to-document is a flexible library for converting HTML into structured document formats like DOCX, PDF, and more. This guide helps you get started quickly.

Installation

Install the library using npm:

npm install html-to-document

Quick Start

Here's a minimal example to convert HTML into a DOCX file:

import { init, DocxAdapter } from 'html-to-document';
import fs from 'fs';

const converter = init({
adapters: {
register: [
{ format: 'docx', adapter: DocxAdapter },
],
},
});

const html = '<h1>Hello World</h1>';
const buffer = await converter.convert(html, 'docx'); // ↩️ Buffer in Node / Blob in browser
fs.writeFileSync('output.docx', buffer);

This will return a Buffer (in Node.js) or a Blob (in browsers) representing the DOCX file.

How It Works

Below is a high-level overview of the conversion pipeline. The library processes the HTML input through optional middleware steps, parses it into a structured intermediate representation, and then delegates to an adapter to generate the desired output format.

Conversion Pipeline Diagram

The stages are:

  • Input: Raw HTML input as a string.
  • Middleware: One or more middleware functions can inspect or transform the HTML string before parsing (e.g., sanitization, custom tags).
  • Parser: Converts the (possibly modified) HTML string into an array of DocumentElement objects, representing a structured AST.
  • Adapter: Takes the parsed DocumentElement[] and renders it into the target format (e.g., DOCX, PDF, Markdown) via a registered adapter.

Learn More

Explore how to customize the conversion process:

Or dive into the API Reference for full method documentation.