HTML → Markdown
Paste HTML on the left, get clean Markdown on the right. Works with pasted Word docs, Google Docs, web pages, anywhere HTML lives.
What this is for
The most common use cases:
- Migrate from a WYSIWYG editor. WordPress, Notion, Confluence, Google Docs all export HTML; converting to Markdown lets you store content in a more portable format.
- Build a static site from CMS content. Pull from your CMS as HTML, convert to MD, drop into Astro/Hugo/Jekyll.
- Clean up pasted email or document content. Pasting from Word into a Markdown editor breaks formatting; converting through this tool first preserves structure.
- Reverse-engineer a doc. View source on a rendered page, copy the article, get Markdown back.
What gets converted
Turndown handles every common HTML tag with sensible Markdown equivalents:
| HTML | Markdown |
|---|---|
<h1>…<h6> | # … through ###### … |
<strong> / <b> | **bold** |
<em> / <i> | *italic* |
<del> / <s> | ~~struck~~ |
<a href> | [text](url) |
<img src alt> |  |
<ul> / <ol> | - item / 1. item |
<blockquote> | > text |
<pre><code> | ```\n...\n``` |
<code> (inline) | `code` |
<table> | GFM table |
<input type="checkbox"> | - [x] task |
<hr> | --- |
What doesn't convert cleanly
- Inline styles (
style="color: red") — Markdown has no equivalent; styling is dropped. - Class attributes — same; lost.
- Custom HTML elements — passed through as raw HTML in the Markdown output.
- Tables with rowspan / colspan — Markdown tables are rectangular grids only; spans get flattened.
- Form elements (
<input>,<select>) — except checkboxes inside lists, which become task list items. - Embedded video / iframe — passed through as raw HTML.
Microsoft Word and Google Docs specifically
When you copy from Word or Google Docs and paste into a text area, the clipboard contains HTML alongside plain text. The browser puts the plain text in the textarea by default, so you'll see Word's text without formatting.
To paste the HTML version: in most browsers,
Edit → Paste Special → HTML, or paste into a
contenteditable field on this site (we don't have one
yet — coming in a future update). Or copy from Word and paste into
a free online "extract HTML from clipboard" tool first.
Word's exported HTML is famously cluttered — Microsoft adds mso-*
styles, <o:p> tags, and Office namespaces. Turndown
handles the structural tags fine; the cosmetic markup gets dropped.
Programmatic alternative
// JavaScript / Node
import TurndownService from "turndown";
import { gfm } from "turndown-plugin-gfm";
const td = new TurndownService();
td.use(gfm);
const md = td.turndown(htmlString);
// Python
# pip install markdownify
from markdownify import markdownify
md = markdownify(html_string, heading_style="ATX")
// CLI (pandoc, supports many input formats)
pandoc input.html -o output.md FAQ
Can I paste from Microsoft Word or Google Docs?
Yes — both copy as HTML when you paste into the text area. Word's HTML is messy (lots of inline styles and mso- classes), but Turndown extracts the structural tags and produces clean Markdown.
Does it handle tables?
Yes. The 'Tables' option (on by default) uses turndown-plugin-gfm to convert <table> to GFM table syntax (pipes and dashes). Complex tables with rowspans / colspans get flattened — Markdown tables don't support spans.
Will it convert the entire web page if I paste a URL?
No — paste actual HTML, not a URL. To convert a live page, View Source (Ctrl-U / Cmd-Opt-U), copy the body HTML, and paste here. For a richer 'fetch and convert' workflow, use pandoc or turndown in a CLI.
What's the difference between heading style options?
ATX uses # prefixes — universally supported, what most modern Markdown looks like. Setext underlines headings with === or --- — older style, only supports H1 and H2. ATX is recommended.
Why does pasted Word content have extra blank lines?
Word inserts <p> </p> between paragraphs. Turndown converts those to blank lines. After pasting, scan the output and remove unintended blanks. Or paste the Word content into a plain text editor first to strip formatting.