Rich Text Editors

A look into the Rich Text Editor eco system

Why would anyone care about Rich Text Editors?

I don’t think many people who have been raised in today’s age do. I didn’t for sure. Rich Text Editors isn’t something people think about when they are using them, neither is it something they usually think about when building things. Everyone’s introduction to buildiong software and web-apps comes from some simpler introductions like Drag and Drop builders, and depending on how old you are, those are: framer, wix, wordpress and so on. I honestly don’t remember what comes before that. A key feature of these was that they all had Rich Text Editors to add stylized text to our websites. We grew up with it, without seeing them as much of a complexity, more or less as solved.

All websites like forums, social networks and so on also had some sort of a rich text editor. Websites like Enjin or other forums where a bunch of us grew up featured perfectly working onesm with extensions and what not. Websites like Reddit and some other websites have opted towards markdown more today, but I always thought that was due to the pivot towards more typable formatted text. I thought it was people moving away from text editors where the formattign and styling was implicitly done by the editor, and the buttons and keybinds you pressed rather than it living in the text itself.

A combination of us growing up with them built in to our software and applications aswell as us leaning away from them in this modern day has made us honestly not think about them much, is what I’d like to say. It was the same for me until I was building an application with Payload CMS, where I first had to think of this more clearly. I was provided an option between Slate and Lexical. I just went with whatever wasn’t depricated. I think it was Lexical? I’ve had to think of this more working at my day job, where we’ve built applications that work with TipTap editors, or now with PlateUI aswell. There’s so much depth to these, and its a pretty rich ecosystem.

From the ground-up

Rich Text Editor’s are basically an container in a web (primarily discussing web here) application where you can edit text, apply styles, apply formatting to text and see the updates in real time, without having to write any “markup” per se. We dont have to add code like <b> tags around the text we want to make bold. We don’t have to add ** asterisks around the text we wanna italicize. None of that, we press a keyboard shortcut or just press a dedicated button for the same. Think of your email editor.

The engineering that happens behind the scene to make this work is far from simple though. Website are in essence HTML rendered on your browser. Everytime you want to change what, or how text is being displayed on the browser of a user, someone has to change the HTML that is displaying the text on that user’s browser. Someone has to ensure the cursor is being tracked at the right place. Someone has to track historical states to ensure undo, and redo work. Someone has to ensure that you can safely paste an image into the text editor, and it should display a preview inline. Someone has to make sense of when you press the bold button with an image selected. And worst of all, if your text editor is something like Notion, or Google Docs who is gonna reconcile the real time changes you make AGAINST the changes of all the other people using the same editor. Who does all this?

This is where our Text Editing engines enter the conversation. The ones I’ve heard of are: Slate, Lexical, ProseMirror. These are the foundational text editing engines that we speak of. There’s other frameworks that are built on these that help improve DX by abstracting away some of the complexities and parts of these that can be handled by the framework itself. Frameworks like TipTap, BlockNote, Plate, Notion and so on expose a friendlier, easier to work with API for other developers who arent Rich Text Editor experts to build their own Rich Text Editor features and to extend them in their own application. I’ve worked with ProseMirror for my TipTap editor assignment I had at my day job, and now I’ve been assigned another assignement that has to do with Slate. This has lead me to look into these more deeply.

ProseMirror

ProseMirror is a transactional text editing engine that takes approaches the problem in the most data safety oriented manner. It considers the text content almost as a datapoint in a database which can be modified only through a database transaction. State cannot be modified directly, each copy of the state exists individually making edits and operations clean, and undos and redos extremely simple. The document content is based on a strictly defined schema that the ProseMirror validates on every operation, ensure only safe edits persist as a transaction.

Schema: This a strictly defined “type” for your document. Any operation or transformation should not violate this schema.
Transaction: A sequence of primitive operations you made to the editor content that can be persisted on to the Editor to create a new updated copy of the editor state.
Node: This is any part of the document content structure that you have in your editor state. Even the document content is a node itself, each can have attributes, children, and be associated with a type from the schema.
Mark: These are a type of node we have in our schema, they serve as modifications to text nodes and handle the splitting and management of the text nodes to create new ones with the specifiec formatting.
Position: Literally an interger count of where we are. Each node maintains its own integer based boundary counts, so this makes it a really simple location tracker for the cursor.
ResolvedPosition: Position but resolved with extra context about where it is, in what node, provides methods for navigation and what not.
Selection: Self explnatory, but of multiple types; TextSelection, NodeSelection, AllSelection.