How does nom v.5 work?

What is nom?

nom is a parser combinators library written in Rust. Its goal is to provide tools to build safe parsers without compromising the speed or memory consumption. To that end, it uses extensively Rust’s strong typing and memory safety to produce fast and correct parsers, and provides functions, macros and traits to abstract most of the error prone plumbing.

Why this passage?

nom version 5 is going through a very large change to the lexical syntax to be used. The old way will not work any more. So this passage will tell and explain the new way.

Hello Color

This is a hello world example for parser that we are going to parse Hex color like #2F14DF.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
extern crate nom;
use nom::{
IResult,
bytes::complete::{tag, take_while_m_n},
combinator::map_res,
sequence::tuple
};

#[derive(Debug,PartialEq)]
pub struct Color {
pub red: u8,
pub green: u8,
pub blue: u8,
}

fn from_hex(input: &str) -> Result<u8, std::num::ParseIntError> {
u8::from_str_radix(input, 16)
}

fn is_hex_digit(c: char) -> bool {
c.is_digit(16)
}

fn hex_primary(input: &str) -> IResult<&str, u8> {
map_res(
take_while_m_n(2, 2, is_hex_digit),
from_hex
)(input)
}

fn hex_color(input: &str) -> IResult<&str, Color> {
let (input, _) = tag("#")(input)?;
let (input, (red, green, blue)) = tuple((hex_primary, hex_primary, hex_primary))(input)?;

Ok((input, Color { red, green, blue }))
}

fn main() {}

#[test]
fn parse_color() {
assert_eq!(hex_color("#2F14DF"), Ok(("", Color {
red: 47,
green: 20,
blue: 223,
})));
}

The combinator is writting in a very functional way.

1
2
3
4
let (input, _) = tag("#")(input)?;
let (input, (red, green, blue)) = tuple((hex_primary, hex_primary, hex_primary))(input)?;

Ok((input, Color { red, green, blue }))

That, tag("#"), tuple((hex_primary, hex_primary, hex_primary)) are functions.

Parse Json

Then we come to the real world. They provided a very efficient Json parser as an example of nom.

Please find the example here.

Revision, this is the defination or state-machine of Json.

1
2
3
4
5
6
7
pub enum JsonValue {
Str(String),
Boolean(bool),
Num(f64),
Array(Vec<JsonValue>),
Object(HashMap<String, JsonValue>),
}

The following are lexical elements:

1
2
3
4
5
fn sp<'a, E: ParseError<&'a str>>(i: &'a str) ->IResult<&'a str, &'a str, E> {
let chars = " \t\r\n";

take_while(move |c| chars.contains(c))(i)
}

Space should be obvious.

1
2
3
fn float<'a, E: ParseError<&'a str>>(i: &'a str) ->IResult<&'a str, f64, E> {
flat_map!(i, recognize_float, parse_to!(f64))
}

nom provided features recognize_float and parse_to to do it automatically.

1
2
3
fn parse_str<'a, E: ParseError<&'a str>>(i: &'a str) ->IResult<&'a str, &'a str, E> {
escaped!(i, call!(alphanumeric), '\\', one_of!("\"n\\"))
}

call!() is a macro to call a named function as callback. It can also take in argument like below.

1
2
3
4
fn take_wrapper(input: &[u8], i: u8) -> IResult<&[u8], &[u8]> { take!(input, i * 10) }

// will make a parser taking 20 bytes
named!(parser, call!(take_wrapper, 2));

For one_of!():

1
2
3
4
5
named!(simple<char>, one_of!(&b"abc"[..]));
assert_eq!(simple(b"a123"), Ok((&b"123"[..], 'a')));

named!(a_or_b<&str, char>, one_of!("ab汉"));
assert_eq!(a_or_b("汉jiosfe"), Ok(("jiosfe", '汉')));

This following code will extract string lexical elements by {"..."}. We can see that delimiters actually helps programmer to write parsers.

1
2
3
4
fn string<'a, E: ParseError<&'a str>>(i: &'a str) ->IResult<&'a str, &'a str, E> {
let (i, _) = char('\"')(i)?;
context("string", terminated(parse_str, char('\"')))(i)
}

Boolean is a simple tag.

1
2
3
4
5
6
7
fn boolean<'a, E: ParseError<&'a str>>(input: &'a str) ->IResult<&'a str, bool, E> {
alt( (
|i| tag("false")(i).map(|(i,_)| (i, false)),
|i| tag("true")(i).map(|(i,_)| (i, true))
))(input)

}

1
2
3
4
5
6
7
8
9
10
11
12
13
fn array<'a, E: ParseError<&'a str>>(i: &'a str) ->IResult<&'a str, Vec<JsonValue>, E> {
let (i, _) = char('[')(i)?;

context(
"array",
terminated(
|i| separated_listc(i, preceded(sp, char(',')), value),
preceded(sp, char(']')))
)(i)
}
fn key_value<'a, E: ParseError<&'a str>>(i: &'a str) ->IResult<&'a str, (&'a str, JsonValue), E> {
separated_pair!(i, preceded!(sp, string), preceded!(sp, char!(':')), value)
}

preceded is a tool to check if defined something is following.

1
2
3
4
5
named!(parser<&str, &str>, preceded!(char!('-'), alpha1));

assert_eq!(parser("-abc"), Ok(("", "abc")));
assert_eq!(parser("abc"), Err(Err::Error(("abc", ErrorKind::Char))));
assert_eq!(parser("-123"), Err(Err::Error(("123", ErrorKind::Alpha))));

separated_list is a string split in native Rust (or other languages).

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
fn hash<'a, E: ParseError<&'a str>>(i: &'a str) ->IResult<&'a str, HashMap<String, JsonValue>, E> {
let (i, _) = char('{')(i)?;
context(
"map",
terminated(
|i| map!(i,
separated_list!(preceded!(sp, char!(',')), key_value),
|tuple_vec| tuple_vec
.into_iter()
.map(|(k, v)| (String::from(k), v))
.collect()
),
preceded(sp, char('}')))
)(i)
}

terminated()! returns the result of its first parser if both succeed

1
2
3
4
5
named!(parser<&str, &str>, terminated!(alpha1, char!(';')));

assert_eq!(parser("abc;"), Ok(("", "abc")));
assert_eq!(parser("abc,"), Err(Err::Error((",", ErrorKind::Char))));
assert_eq!(parser("123;"), Err(Err::Error(("123;", ErrorKind::Alpha))));

and it also takes the third parameter which is a callback like the second parameter in nom 4.3.

1
2
3
4
5
6
7
map!(i,
separated_list!(preceded!(sp, char!(',')), key_value),
|tuple_vec| tuple_vec
.into_iter()
.map(|(k, v)| (String::from(k), v))
.collect()
)

At the end:

1
2
3
4
5
6
7
8
9
10
11
fn value<'a, E: ParseError<&'a str>>(i: &'a str) ->IResult<&'a str, JsonValue, E> {
preceded!(i,
sp,
alt!(
hash => { |h| JsonValue::Object(h) } |
array => { |v| JsonValue::Array(v) } |
string => { |s| JsonValue::Str(String::from(s)) } |
float => { |f| JsonValue::Num(f) } |
boolean => { |b| JsonValue::Boolean(b) }
))
}

alt try a list of parsers and return the result of the first successful one.