Store as HTML, Edit as LML

7 hours ago 1

Incredibly, I was able to get the basics working in just one evening. The key to success: Extreme simplicity!

The two-pane editor remains identical in appearance, but I scrapped the two-way WYSIWYG and HTML code sync and replaced it with a more traditional one-way sync from the LML code view to the "rendered" HTML view.

As I often do with DSLs, I have restricted the HTML WARDen LML to a line-based format, like the roff (wikipedia.org) formats of yore. This means that there is no tricky inline formatting (like '*' for bold). EXTREME SIMPLICITY!

Putting the Lightweight in Lightweight Markup Language

The HTML WARDen LML supports a tiny number of elements. Just what I need for this project and not a single thing extra:

A page title (the first line of the document).
Paragraphs of text separated by blank lines.
Internal and external links.
Two heading levels.

Here’s a syntax cheatsheet:

= Heading == Sub-Heading internal_page_link[] page_link[Page Link With Display Text] http://example.com[External Link]

The page editing button interface also needed an update, but mostly I removed formatting features and replaced them with a Help button that displays the above cheatsheet.

Arbitrary HTML still okay

Wait, but isn’t that LML too limited? What if I need to create or include something a little more complex in a page - can I still edit it?

Absolutely. The final syntax feature of the LML is the ability to have arbitrary HTML that doesn’t have a direct equivalent still exist, but be left alone. Any HTML not understood by HTML WARDen’s editor is automatically enclosed by a protective bubble of start and end <html></html> tags and left verbatim. You can edit the raw HTML if you want, but the editor itself won’t mess with it.

You can see an example of that in this screenshot of the new editor interface:

screenshot of the htmlwarden split-screen editing interface

As you can see on the left side, there’s a green box. And on the right side, you can see that it’s just a chunk of raw HTML with inline style like so:

This "escape hatch" may feel like cheating, but it’s not.

Line-based parsers are so easy

The LML parser is 71 lines of, honestly, pretty trashy code.

Here’s the bulk of it:

The hardest part, as is the case with these things, is just keeping track of whether or not we’re currently in a paragraph.

So that’s the LML to HTML conversion, but…

Wait, but one does not simply parse HTML!

There’s still the initial one-time conversion of HTML to LML to populate the code editor pane. Trying to parse even a tiny subset of HTML is going to be fraught with danger, right?

So…the other fun thing about working with HTML in a browser is…you have the most powerful piece of HTML parsing software ever created right at your fingertips.

How do I parse the HTML? I don’t! I’m accessing the DOM of the page from JavaScript. It’s just 43 lines of, again, trashy code. Here’s most of it:

// Start by writing title as the first line var title_tag = html_view.querySelector('h1'); var txt = title_tag ? title_tag.textContent : 'Untitled'; if(txt.length < 1){ txt = 'Untitled'; } txt += "\n\n"; // old skool 'for' loop required for element collection for(var i=0; i<html_view.children.length; i++){ e = html_view.children[i]; switch(e.nodeName){ case 'H1': continue; // We've already taken care of the title case 'H2': p = '# ' + e.textContent + "\n\n"; break; case 'H3': p = '## ' + e.textContent + "\n\n"; break; ... case 'P': p=''; for(j=0; j<e.childNodes.length; j++){ if(e2.nodeType === Node.TEXT_NODE){ ... if(e2.nodeType === Node.ELEMENT_NODE){ if(e2.nodeName === 'A'){ ... } } } p += "\n"; break; default: p = '<html>\n' + e.outerHTML + '\n</html>\n\n'; } txt += p; } code_area.value = txt; }

The DOM API ain’t pretty, but compared to parsing HTML myself, it’s very, very nice.

Read Entire Article