JSON is an omnipresent format that can be read by every mainstream language in the wild. It is well-specified and simple-yet-complex enough to make writing a parser a challenging and fun test of our programming skill.
In the first post of this two part series, we lay down the basic components for building a general string parser. In the second post, we will focus more on the actual JSON format specification and parsing every part of it. In the end, we will have a fully functional JSON parser (think of the native JSON.parse function).
We will use TypeScript and the fp-ts library, and basic knowledge of functional concepts (function composition, immutability) and data types (Option, Either) is expected. That said, the code should be understandable even if you are not familiar with seemingly scary stuff like functors, monoids or monads.
The beauty of functional programming is that it allows us to build complex things from simple ones, and that is exactly how we are going to build our JSON parser - by combining simple things.
First we’ll give our parser some shape by defining some types. Every parser needs an input and ours is no exception.
With <inline-code>input<inline-code> present, we can define a type representing the parser - <inline-code>Parser<A><inline-code>. It's a function that, when given some input, returns a parsed value of type A and the rest of the input. Parsing can fail, so we’ll wrap return value in Either type and define <inline-code>ParserError<inline-code> to represent all the stuff that can go wrong during parsing.
The <inline-code>advance<inline-code> function will read one character from the <inline-code>input<inline-code> and return it with the rest of the <inline-code>input<inline-code>. The return value is wrapped inside Option in order to represent the state when there are no more chars to read (<inline-code>none<inline-code> is returned in that case).
Note that <inline-code>input<inline-code> is immutable — we are creating a new instance every time we advance the position.
Now we have everything in place to build our first parser — one capable of no less than parsing a specific character!
We will read a single character from <inline-code>input<inline-code> using an advanced function and check to see if it matches the expected character.
Time to see if our parser behaves correctly.
First, we'll create a helper for displaying parser results as a human-readable string. We handle both possible outcomes, failure and success (represented by Either type's left & right value).
Works like a charm! Notice that our parser is returning a string value. If we want it to return a number, we can transform parsers the same way we transform values inside an array using the map function.
We have defined a map function that can be easily used inside a pipe function. Implementation was simple — we just applied function f to a successfully parsed value.
Using map, we can transform our <inline-code>Parser<string><inline-code> into <inline-code>Parser<number><inline-code>.
Parsing a simple char was fun, but how about combining multiple parsers into one that could parse the whole word?
Let's start simple by combining two parsers into one. The product function does nothing more than apply the first parser, applying the second one and combining their results into one tuple.
We can generalize this concept further to create a function sequence that can combine any number of <inline-code>Parser<A><inline-code> instances. Notice that we are turning the type inside out — we provide the function with <inline-code>ReadonlyArray<Parser<A>>><inline-code>, and get <inline-code>Parser<ReadonlyArray<A>>><inline-code> as a result.
In order to implement the sequence, we also need a parser that always succeeds as it will be used as a starting value for reducing the array of parsers.
Do you wonder why we decided that an empty sequence yields a successful result? A sequence of parsers succeeds when all parsers succeed, and vice versa. Our parser fails when a parser fails. Obviously, this will never happen for an empty sequence, so by combining empty sequences we get the parser that always succeeds.
Now we have a parser that executes all character parsers one by one in the sequence and returns an array of parsed characters. But, it would be nice to have a parser that concatenates all characters and returns the whole word. We will solve it by using a map and concatenating all chars in the resulting array.
We often come across a situation where we want our parser to choose from multiple options. For those cases, we need a function that combines multiple parsers into one in such a way that it succeeds when one of the already-passed parsers succeeds.
Let's implement it and call it oneOf!
In contrast to sequence, the oneOf function yields a failing parser when it passes an empty list of possible parsers. The reason is simple: it must be true that there is a parser that succeeds, but this condition is not met for an empty list.
Now, one last function combining parsers - many. It will run a given parser repeatedly until it fails, and will succeed when parsing succeeds at least once.
Now we have the basic building blocks to create more complex parsers. As promised, next time we will dig deeper into the JSON specifications, try to represent it in TypesScript, and parse it using our own parser.
Stay tuned...and functional!