Start

Let's take the example of hello world of nom:

  assert_eq!(hex_color("#2F14DF"), Ok(("", Color {
    red: 47,
    green: 20,
    blue: 223,
  })));

RGB Hex code is very similar to low level TCP/UDP messages. That, for instance of #AABBCC, # means "This is a color, the following message length is 6: 0-1 is for R, 2-3:G, 4-5:B`.

That one can make a theoretically O(n) memory model for it by reading bytes by bytes.

We can also expect a worse performance from a parser doing let (input, (red, green, blue)) = tuple((hex_primary, hex_primary, hex_primary))(input)?;

Common Functions

To be fair, here are the common functions shared by two parsers.

// tools.rs
pub fn from_hex(input: &str) -> Result<u8, std::num::ParseIntError> {
    u8::from_str_radix(input, 16)
}
pub fn to_i64(input: &str) -> Result<i64, std::num::ParseIntError> {
    input.parse::<i64>()
}
pub fn is_hex_digit(_c: char) -> bool {
    true
}
pub fn is_num(c: char) -> bool {
    c.is_digit(10)
}
pub fn to_string(c: &str) -> Result<String, ()> {
    Ok(c.to_string().trim().to_string())
}

is_hex_digit is for validation of the input byte by take_while_m_n, which is a very common operation by parsers, but we keep it always returning true so the have the baseline first, before we want the power of the lexical engine.

Nom Parser

We got main.rs, which is using nom as below.

#![feature(test)]

#[macro_use]
extern crate nom;

mod tools;
use tools::*;

#[derive(Debug, PartialEq)]
pub struct Color {
    pub red: u8,
    pub green: u8,
    pub blue: u8,
    pub number1: i64,
    pub number2: i64,
    pub number3: i64,
    pub number4: i64,
    pub number5: String,
}

named!(hex_primary<&str, u8>,
  map_res!(take_while_m_n!(2, 2, is_hex_digit), from_hex)
);
named!(four_length_int<&str, i64>,
  map_res!(take!(8), to_i64)
);
named!(length8_string<&str, String>,
  map_res!(take!(8), to_string)
);
named!(length100_string<&str, String>,
  map_res!(take!(100), to_string)
);

fn from_u8(c: &str) -> Result<String, ()> {
    print!("{:?}", c);
    Ok(c.to_string())
}
named!(structure1<&str, Color>,
  do_parse!(
           tag!("#")   >>
    red:   hex_primary >>
    green: hex_primary >>
    blue:  hex_primary >>
    number1: four_length_int >>
    number2: four_length_int >>
    number3: four_length_int >>
    number4: four_length_int >>
    number5: length8_string >>
    (
      Color {
        red,
        green,
        blue,
        number1,
        number2,
        number3,
        number4,
        number5
      }
    )
  )
);

fn main() {

}

#[cfg(test)]
mod tests_main {

    extern crate test;
    use super::*;
    use test::Bencher;

    #[bench]
    fn parse_color1(b: &mut Bencher) {
        b.iter(|| structure1("#2F14DF888888887777777766666666555555551234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890"));
    }

}

&str Slice Reader

And direct.rs, which is a direct slice access to &str.

#![feature(test)]

mod tools;
use tools::*;

#[derive(Debug, PartialEq)]
pub struct Color {
    pub red: u8,
    pub green: u8,
    pub blue: u8,
    pub number1: i64,
    pub number2: i64,
    pub number3: i64,
    pub number4: i64,
    pub number5: String,
}

fn main() {}
fn structure1(c: &str) -> Result<Color, String> {
    Ok(Color {
        red: from_hex(&c[1..3]).unwrap(),
        green: from_hex(&c[3..5]).unwrap(),
        blue: from_hex(&c[5..7]).unwrap(),
        number1: to_i64(&c[7..7 + 8]).unwrap(),
        number2: to_i64(&c[7 + 8..7 + 8 + 8]).unwrap(),
        number3: to_i64(&c[7 + 8 + 8..7 + 8 + 8 + 8]).unwrap(),
        number4: to_i64(&c[7 + 8 + 8 + 8..7 + 8 + 8 + 8 + 8]).unwrap(),
        number5: to_string(&c[7 + 8 * 4..7 + 8 * 4 + 100]).unwrap(),
    })
}

#[cfg(test)]
mod tests_direct {

    extern crate test;
    use super::*;
    use test::Bencher;

    #[bench]
    fn parse_color1(b: &mut Bencher) {
        b.iter(|| structure1("#2F14DF888888887777777766666666555555551234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890"));
    }

}

Both of the are zero-copy during parsing.

Result

After running a cargo +nightly bench, we got:

running 1 test
test tests_direct::parse_color1 ... bench:         263 ns/iter (+/- 22)

test result: ok. 0 passed; 0 failed; 0 ignored; 1 measured; 0 filtered out

     Running target/release/deps/nomtest-cdc895365399f2bf

running 1 test
test tests_main::parse_color1 ... bench:         670 ns/iter (+/- 76)

test result: ok. 0 passed; 0 failed; 0 ignored; 1 measured; 0 filtered out

This is quite close to the expection that how nom is punishing on its lexical techniques. (Testing on a Apple MacBook Pro 2018 i7 16GB RAM)