Alexander Afanasyev | 7c6aeb0 | 2014-04-10 19:59:19 -0700 | [diff] [blame] | 1 | NDN Regular Expression |
| 2 | ====================== |
| 3 | |
| 4 | NDN regular expression matching is done at two levels: one at the name |
| 5 | level and one at the name component level. |
| 6 | |
| 7 | We use ``<`` and ``>`` to enclose a name component matcher which |
| 8 | specifies the pattern of a name component. The component pattern is |
| 9 | expressed using the `Perl Regular Expression |
| 10 | Syntax <http://www.boost.org/doc/libs/1_55_0/libs/regex/doc/html/boost_regex/syntax/perl_syntax.html>`__. |
| 11 | For example, ``<ab*c>`` can match the 1st, 3rd, and 4th components of |
| 12 | ``/ac/dc/abc/abbc``, but it cannot match the 2nd component. A special |
| 13 | case is that ``<>`` is a wildcard matcher that can match **ANY** |
| 14 | component. |
| 15 | |
| 16 | Note that a component match can match only one name component. In order |
| 17 | to match a name, you need to specify the pattern of a name based on the |
| 18 | name component matchers. For example, ``<ndn><edu><ucla>`` can match the |
| 19 | name ``/ndn/edu/ucla``. In order to describe a more complicated name |
| 20 | pattern, we borrow some syntaxes from the standard regular expressions. |
| 21 | |
| 22 | NDN Regex Syntax |
| 23 | ---------------- |
| 24 | |
| 25 | Anchors |
| 26 | ~~~~~~~ |
| 27 | |
| 28 | A ``'^'`` character shall match the start of a name. For example, |
| 29 | ``^<ndn>`` shall match any names starting with a component ``ndn``, and |
| 30 | it will exclude a name like ``/local/broadcast``. |
| 31 | |
| 32 | A ``'$'`` character shall match the end of a name. For example, |
| 33 | ``^<ndn><edu>$`` shall match only one name: ``/ndn/edu``. |
| 34 | |
| 35 | Repeats |
| 36 | ~~~~~~~ |
| 37 | |
| 38 | A component matcher can be followed by a repeat syntax to indicate how |
| 39 | many times the preceding component can be matched. |
| 40 | |
| 41 | Syntax ``*`` for zero or more times. For example, |
| 42 | ``^<ndn><KEY><>*<ID-CERT>`` shall match ``/ndn/KEY/ID-CERT/``, or |
| 43 | ``/ndn/KEY/edu/ID-CERT``, or ``/ndn/KEY/edu/ksk-12345/ID-CERT`` and so |
| 44 | on. |
| 45 | |
| 46 | Syntax ``+`` for one or more times. For example, |
| 47 | ``^<ndn><KEY><>+<ID-CERT>`` shall match ``/ndn/KEY/edu/ID-CERT``, or |
| 48 | ``/ndn/KEY/edu/ksk-12345/ID-CERT`` and so on, but it cannot match |
| 49 | ``/ndn/KEY/ID-CERT/``. |
| 50 | |
| 51 | Syntax ``?`` for zero or one times. For example, |
| 52 | ``^<ndn><KEY><>?<ID-CERT>`` shall match ``/ndn/KEY/ID-CERT/``, or |
| 53 | ``/ndn/KEY/edu/ID-CERT``, but it cannot match |
| 54 | ``/ndn/KEY/edu/ksk-12345/ID-CERT``. |
| 55 | |
| 56 | Repetition can also be bounded: |
| 57 | |
| 58 | ``{n}`` for exactly ``n`` times. ``{n,}`` for at least ``n`` times. |
| 59 | ``{,n}`` for at most ``n`` times. And ``{n, m}`` for ``n`` to ``m`` |
| 60 | times. |
| 61 | |
| 62 | Note that the repeat matching is **greedy**, that is it will consume as |
| 63 | many matched components as possible. We do not support non-greedy repeat |
| 64 | matching and possessive repeat matching for now. |
| 65 | |
| 66 | Sets |
| 67 | ~~~~ |
| 68 | |
| 69 | Name component set is a bracket-expression starting with ``'['`` and |
| 70 | ending with ``']'``, it defines a set of name components, and matches |
| 71 | any single name component that is a member of that set. |
| 72 | |
| 73 | Unlike the standard regular expression, NDN regular expression only |
| 74 | supports **Single Components Set**, that is, you have to list all the |
| 75 | set members one by one between the bracket. For example, |
| 76 | ``^[<ndn><localhost>]`` shall match any names starting with either a |
| 77 | component ``ndn"`` or ``localhost``. |
| 78 | |
| 79 | When a name component set starts with a ``'^'``, the set becomes a |
| 80 | **Negation Set**, that is, it matches the complement of the name |
| 81 | components it contains. For example, ``^[^<ndn>]`` shall match any names |
| 82 | that does not start with a component ``ndn``. |
| 83 | |
| 84 | Some other types of sets, such as Range Set, will be supported later. |
| 85 | |
| 86 | Note that component set can be repeated as well. |
| 87 | |
| 88 | Sub-pattern and Back Reference |
| 89 | ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ |
| 90 | |
| 91 | A section beginning ``(`` and ending ``)`` acts as a marked sub-pattern. |
| 92 | Whatever matched the sub-pattern is split out in a separate field by the |
| 93 | matching algorithms. For example ``^([^<DNS>])<DNS>(<>*)<NS>`` shall |
| 94 | match a data name of NDN DNS NS record, and the first sub-pattern |
| 95 | captures the zone name while the second sub-pattern captures the |
| 96 | relative record name. |
| 97 | |
| 98 | Marked sub-patterns can be referred to by a back-reference ``\n``. The |
| 99 | same example above shall match a name |
| 100 | ``/ndn/edu/ucla/DNS/irl/NS/123456``, and a back reference ``\1\2`` shall |
| 101 | extract ``/ndn/edu/ucla/irl`` out of the name. |
| 102 | |
| 103 | Note that marked sub-patterns can be also repeated. |