blob: 88934b610c90dcc9ead15965471c2990445792b9 [file] [log] [blame]
Alexander Afanasyev7c6aeb02014-04-10 19:59:19 -07001NDN Regular Expression
2======================
3
4NDN regular expression matching is done at two levels: one at the name
5level and one at the name component level.
6
7We use ``<`` and ``>`` to enclose a name component matcher which
8specifies the pattern of a name component. The component pattern is
9expressed using the `Perl Regular Expression
10Syntax <http://www.boost.org/doc/libs/1_55_0/libs/regex/doc/html/boost_regex/syntax/perl_syntax.html>`__.
11For example, ``<ab*c>`` can match the 1st, 3rd, and 4th components of
12``/ac/dc/abc/abbc``, but it cannot match the 2nd component. A special
13case is that ``<>`` is a wildcard matcher that can match **ANY**
14component.
15
16Note that a component match can match only one name component. In order
17to match a name, you need to specify the pattern of a name based on the
18name component matchers. For example, ``<ndn><edu><ucla>`` can match the
19name ``/ndn/edu/ucla``. In order to describe a more complicated name
20pattern, we borrow some syntaxes from the standard regular expressions.
21
22NDN Regex Syntax
23----------------
24
25Anchors
26~~~~~~~
27
28A ``'^'`` character shall match the start of a name. For example,
29``^<ndn>`` shall match any names starting with a component ``ndn``, and
30it will exclude a name like ``/local/broadcast``.
31
32A ``'$'`` character shall match the end of a name. For example,
33``^<ndn><edu>$`` shall match only one name: ``/ndn/edu``.
34
35Repeats
36~~~~~~~
37
38A component matcher can be followed by a repeat syntax to indicate how
39many times the preceding component can be matched.
40
41Syntax ``*`` for zero or more times. For example,
42``^<ndn><KEY><>*<ID-CERT>`` shall match ``/ndn/KEY/ID-CERT/``, or
43``/ndn/KEY/edu/ID-CERT``, or ``/ndn/KEY/edu/ksk-12345/ID-CERT`` and so
44on.
45
46Syntax ``+`` for one or more times. For example,
47``^<ndn><KEY><>+<ID-CERT>`` shall match ``/ndn/KEY/edu/ID-CERT``, or
48``/ndn/KEY/edu/ksk-12345/ID-CERT`` and so on, but it cannot match
49``/ndn/KEY/ID-CERT/``.
50
51Syntax ``?`` for zero or one times. For example,
52``^<ndn><KEY><>?<ID-CERT>`` shall match ``/ndn/KEY/ID-CERT/``, or
53``/ndn/KEY/edu/ID-CERT``, but it cannot match
54``/ndn/KEY/edu/ksk-12345/ID-CERT``.
55
56Repetition can also be bounded:
57
58``{n}`` for exactly ``n`` times. ``{n,}`` for at least ``n`` times.
59``{,n}`` for at most ``n`` times. And ``{n, m}`` for ``n`` to ``m``
60times.
61
62Note that the repeat matching is **greedy**, that is it will consume as
63many matched components as possible. We do not support non-greedy repeat
64matching and possessive repeat matching for now.
65
66Sets
67~~~~
68
69Name component set is a bracket-expression starting with ``'['`` and
70ending with ``']'``, it defines a set of name components, and matches
71any single name component that is a member of that set.
72
73Unlike the standard regular expression, NDN regular expression only
74supports **Single Components Set**, that is, you have to list all the
75set members one by one between the bracket. For example,
76``^[<ndn><localhost>]`` shall match any names starting with either a
77component ``ndn"`` or ``localhost``.
78
79When a name component set starts with a ``'^'``, the set becomes a
80**Negation Set**, that is, it matches the complement of the name
81components it contains. For example, ``^[^<ndn>]`` shall match any names
82that does not start with a component ``ndn``.
83
84Some other types of sets, such as Range Set, will be supported later.
85
86Note that component set can be repeated as well.
87
88Sub-pattern and Back Reference
89~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
90
91A section beginning ``(`` and ending ``)`` acts as a marked sub-pattern.
92Whatever matched the sub-pattern is split out in a separate field by the
93matching algorithms. For example ``^([^<DNS>])<DNS>(<>*)<NS>`` shall
94match a data name of NDN DNS NS record, and the first sub-pattern
95captures the zone name while the second sub-pattern captures the
96relative record name.
97
98Marked sub-patterns can be referred to by a back-reference ``\n``. The
99same example above shall match a name
100``/ndn/edu/ucla/DNS/irl/NS/123456``, and a back reference ``\1\2`` shall
101extract ``/ndn/edu/ucla/irl`` out of the name.
102
103Note that marked sub-patterns can be also repeated.