Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support inline HTML in markdown blocks #98

Open
CNSeniorious000 opened this issue Oct 12, 2024 · 2 comments
Open

Support inline HTML in markdown blocks #98

CNSeniorious000 opened this issue Oct 12, 2024 · 2 comments

Comments

@CNSeniorious000
Copy link

CNSeniorious000 commented Oct 12, 2024

As described in the docs:

While the most common flavours of markdown let you use HTML in markdown paragraphs, due to how Svelte handles plain HTML it is currently not possible to do this with this package. A paragraph must be either all HTML or all markdown.

A way to solve this is to do a single-pass walk along tokens and then join tokens between the outmost html tokens.

For example:

Input Markdown

a<sub>1</sub>

Ideal Result

a1

Tokens

[
    {
        "type": "text",
        "raw": "a",
        "text": "a"
    },
    {
        "type": "html",
        "raw": "<sub>",
        "inLink": false,
        "inRawBlock": false,
        "block": false,
        "text": "<sub>"
    },
    {
        "type": "text",
        "raw": "1",
        "text": "1"
    },
    {
        "type": "html",
        "raw": "</sub>",
        "inLink": false,
        "inRawBlock": false,
        "block": false,
        "text": "</sub>"
    }
]

After Patching

[
    {
        "type": "text",
        "raw": "a",
        "text": "a"
    },
    {
        "type": "html",
        "raw": "<sub>",
        "inLink": false,
        "inRawBlock": false,
        "block": false,
        "text": "<sub>1</sub>"  // the 3 tokens were joined into this one
    }
]
@CNSeniorious000
Copy link
Author

CNSeniorious000 commented Oct 12, 2024

My implementation is like this:

import type { marked } from "marked"

export function patchTokens(tokens: marked.Token[]) {
  for (let i = 0; i < tokens.length; i++) {
    const token = tokens[i]
    if (token.type === "html") {
      const idealCloseTag = token.raw.replace("<", "</")

      while (true) {
        if (i + 1 === tokens.length) {
          // incomplete html
          token.text += idealCloseTag
          break
        }

        const nextToken = tokens[i + 1]
        if (nextToken.type === "html" && nextToken.raw === idealCloseTag) {
          token.text += idealCloseTag
          tokens.splice(i + 1, 1)
          break
        }
        else {
          // treat as html children
          token.text += nextToken.raw
          tokens.splice(i + 1, 1)
        }
      }
    }
    else if ("tokens" in token) {
      patchTokens(token.tokens!)
    }
  }
  return tokens
}

If you think this is a common use case, I can open a PR.

@Rudedog9d
Copy link

+1 to this, although I think instead of something like patchTokens, an extension should be written to parse any HTML as a block rather than inline.

I'm not sure if this would have any significant side effects.

Here's an example of parsing out a custom block (looking for :section names: between :)

const sectionExtension = {
  name: 'section',
  // Is this a block-level or inline-level tokenizer?
  level: 'block',
  // Hint to Marked.js to stop and check for a match
  start(src: string) {
    return src.match(/:[^:\n]+:/);
  },
  tokenizer(src: string) {
    // Match any :text: pattern, capturing the label
    // simple match
    const sectionHeaderRule = /^:([^:\n]+):/;

    const headerMatch = sectionHeaderRule.exec(src);

    if (!headerMatch) return;
    const label = headerMatch[1];

    // Find the start of the next section or end of text
    const remainingText = src.slice(headerMatch[0].length);
    const nextSectionIndex = remainingText.search(/(?:\s|^):[^:\n]+:/);
    let sectionContent = '';

    if (nextSectionIndex === -1) {
      // No next section found, consume all remaining text
      sectionContent = remainingText;
    } else {
      // Found next section, consume text up to that point
      sectionContent = remainingText.slice(0, nextSectionIndex);
    }

    // Create the token
    const token = {
      type: 'section',
      raw: headerMatch[0] + sectionContent,
      label, // Store the text between colons as label
      color,
      bgColor,
      text: sectionContent, // Store the content after the label
      tokens: [],
    };

    // Parse the section content for inline and block tokens
    this.lexer.blockTokens(token.text, token.tokens);
    return token;
  },
};

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants