Support inline HTML in markdown blocks #98

CNSeniorious000 · 2024-10-12T17:02:25Z

As described in the docs:

While the most common flavours of markdown let you use HTML in markdown paragraphs, due to how Svelte handles plain HTML it is currently not possible to do this with this package. A paragraph must be either all HTML or all markdown.

A way to solve this is to do a single-pass walk along tokens and then join tokens between the outmost html tokens.

For example:

Input Markdown

a<sub>1</sub>

Ideal Result

a₁

Tokens

[
    {
        "type": "text",
        "raw": "a",
        "text": "a"
    },
    {
        "type": "html",
        "raw": "<sub>",
        "inLink": false,
        "inRawBlock": false,
        "block": false,
        "text": "<sub>"
    },
    {
        "type": "text",
        "raw": "1",
        "text": "1"
    },
    {
        "type": "html",
        "raw": "</sub>",
        "inLink": false,
        "inRawBlock": false,
        "block": false,
        "text": "</sub>"
    }
]

After Patching

[
    {
        "type": "text",
        "raw": "a",
        "text": "a"
    },
    {
        "type": "html",
        "raw": "<sub>",
        "inLink": false,
        "inRawBlock": false,
        "block": false,
        "text": "<sub>1</sub>"  // the 3 tokens were joined into this one
    }
]

CNSeniorious000 · 2024-10-12T17:07:36Z

My implementation is like this:

import type { marked } from "marked"

export function patchTokens(tokens: marked.Token[]) {
  for (let i = 0; i < tokens.length; i++) {
    const token = tokens[i]
    if (token.type === "html") {
      const idealCloseTag = token.raw.replace("<", "</")

      while (true) {
        if (i + 1 === tokens.length) {
          // incomplete html
          token.text += idealCloseTag
          break
        }

        const nextToken = tokens[i + 1]
        if (nextToken.type === "html" && nextToken.raw === idealCloseTag) {
          token.text += idealCloseTag
          tokens.splice(i + 1, 1)
          break
        }
        else {
          // treat as html children
          token.text += nextToken.raw
          tokens.splice(i + 1, 1)
        }
      }
    }
    else if ("tokens" in token) {
      patchTokens(token.tokens!)
    }
  }
  return tokens
}

If you think this is a common use case, I can open a PR.

Rudedog9d · 2024-11-06T22:27:26Z

+1 to this, although I think instead of something like patchTokens, an extension should be written to parse any HTML as a block rather than inline.

I'm not sure if this would have any significant side effects.

Here's an example of parsing out a custom block (looking for :section names: between :)

const sectionExtension = {
  name: 'section',
  // Is this a block-level or inline-level tokenizer?
  level: 'block',
  // Hint to Marked.js to stop and check for a match
  start(src: string) {
    return src.match(/:[^:\n]+:/);
  },
  tokenizer(src: string) {
    // Match any :text: pattern, capturing the label
    // simple match
    const sectionHeaderRule = /^:([^:\n]+):/;

    const headerMatch = sectionHeaderRule.exec(src);

    if (!headerMatch) return;
    const label = headerMatch[1];

    // Find the start of the next section or end of text
    const remainingText = src.slice(headerMatch[0].length);
    const nextSectionIndex = remainingText.search(/(?:\s|^):[^:\n]+:/);
    let sectionContent = '';

    if (nextSectionIndex === -1) {
      // No next section found, consume all remaining text
      sectionContent = remainingText;
    } else {
      // Found next section, consume text up to that point
      sectionContent = remainingText.slice(0, nextSectionIndex);
    }

    // Create the token
    const token = {
      type: 'section',
      raw: headerMatch[0] + sectionContent,
      label, // Store the text between colons as label
      color,
      bgColor,
      text: sectionContent, // Store the content after the label
      tokens: [],
    };

    // Parse the section content for inline and block tokens
    this.lexer.blockTokens(token.text, token.tokens);
    return token;
  },
};

jaysin586 mentioned this issue Nov 4, 2024

Enhancement: HTML + TEXT + HTML - Should render the HTML requested by the user humanspeak/svelte-markdown#2

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Support inline HTML in markdown blocks #98

Support inline HTML in markdown blocks #98

CNSeniorious000 commented Oct 12, 2024 •

edited

Loading

CNSeniorious000 commented Oct 12, 2024 •

edited

Loading

Rudedog9d commented Nov 6, 2024

Support inline HTML in markdown blocks #98

Support inline HTML in markdown blocks #98

Comments

CNSeniorious000 commented Oct 12, 2024 • edited Loading

Input Markdown

Ideal Result

Tokens

After Patching

CNSeniorious000 commented Oct 12, 2024 • edited Loading

Rudedog9d commented Nov 6, 2024

CNSeniorious000 commented Oct 12, 2024 •

edited

Loading

CNSeniorious000 commented Oct 12, 2024 •

edited

Loading