blob: 11ccf8af5e0ea910be1848771394651641895f1c [file] [log] [blame]
Alexander Afanasyev7c6aeb02014-04-10 19:59:19 -07001NDN Regular Expression
2======================
3
Alexander Afanasyevc381bca2017-10-15 14:51:49 -04004NDN regular expression is a kind of regular expression that can match NDN names. Matching is
5performed at two levels: the name level and the name component level.
Alexander Afanasyev7c6aeb02014-04-10 19:59:19 -07006
Davide Pesavento861e0942021-02-06 01:07:13 -05007A name component matcher, enclosed in ``<`` and ``>``, specifies the pattern of a name component.
8The component pattern is expressed using the `Modified ECMAScript Regular Expression Syntax
Davide Pesavento78338c52020-04-20 23:00:02 -04009<https://en.cppreference.com/w/cpp/regex/ecmascript>`_.
Alexander Afanasyevc381bca2017-10-15 14:51:49 -040010For example, ``<ab*c>`` matches the 1st, 3rd, and 4th components of ``/ac/dc/abc/abbc``, but does
Davide Pesavento861e0942021-02-06 01:07:13 -050011not match the 2nd component. A special case is that ``<>`` denotes a wildcard matcher that can
12match **ANY** name component.
Alexander Afanasyev7c6aeb02014-04-10 19:59:19 -070013
Alexander Afanasyevc381bca2017-10-15 14:51:49 -040014A component matcher can match only one name component. To match a name, you need to compose an NDN
15regular expression with zero or more name component matchers. For example, ``<ndn><edu><ucla>``
16matches the name ``/ndn/edu/ucla``. To describe a more complicated name pattern, we borrow some
17syntaxes from the standard regular expressions.
Alexander Afanasyev7c6aeb02014-04-10 19:59:19 -070018
19NDN Regex Syntax
20----------------
21
22Anchors
23~~~~~~~
24
Alexander Afanasyevc381bca2017-10-15 14:51:49 -040025The ``^`` character matches the start of a name. For example, ``^<ndn>`` matches any name starting
26with the component ``ndn``, but does not match a name like ``/local/broadcast``.
Alexander Afanasyev7c6aeb02014-04-10 19:59:19 -070027
Alexander Afanasyevc381bca2017-10-15 14:51:49 -040028The ``$`` character matches the end of a name. For example, ``^<ndn><edu>$`` matches only one
29name: ``/ndn/edu``.
Alexander Afanasyev7c6aeb02014-04-10 19:59:19 -070030
Alexander Afanasyevc381bca2017-10-15 14:51:49 -040031Repetition
32~~~~~~~~~~
Alexander Afanasyev7c6aeb02014-04-10 19:59:19 -070033
Davide Pesavento861e0942021-02-06 01:07:13 -050034A component matcher can be followed by a **repeat quantifier** to indicate how many times the
35preceding component may appear.
Alexander Afanasyev7c6aeb02014-04-10 19:59:19 -070036
Alexander Afanasyevc381bca2017-10-15 14:51:49 -040037The ``*`` quantifier denotes "zero or more times". For example, ``^<A><B>*<C>$`` matches ``/A/C``,
38``/A/B/C``, ``/A/B/B/C``, and so on.
Alexander Afanasyev7c6aeb02014-04-10 19:59:19 -070039
Alexander Afanasyevc381bca2017-10-15 14:51:49 -040040The ``+`` quantifier denotes "one or more times". For example, ``^<A><B>+<C>$`` matches ``/A/B/C``,
41``/A/B/B/C``, and so on, but does not match ``/A/C``.
Alexander Afanasyev7c6aeb02014-04-10 19:59:19 -070042
Alexander Afanasyevc381bca2017-10-15 14:51:49 -040043The ``?`` quantifier denotes "zero or one time". For example, ``^<A><B>?<C>`` matches ``/A/C`` and
44``/A/B/C``, but does not match ``/A/B/B/C``.
Alexander Afanasyev7c6aeb02014-04-10 19:59:19 -070045
Davide Pesavento861e0942021-02-06 01:07:13 -050046A **bounded quantifier** specifies either a minimum or a maximum number of permitted matches, or
47both. ``{n}`` means "exactly ``n`` times"; ``{n,}`` means "at least ``n`` times"; ``{,n}`` means
48"at most ``n`` times"; ``{m,n}`` (with ``m n``) means "between ``m`` and ``n`` times (inclusive)".
49For example, ``^<A><B>{2,4}<C>$`` matches ``/A/B/B/C`` and ``/A/B/B/B/B/C``.
Alexander Afanasyev7c6aeb02014-04-10 19:59:19 -070050
Davide Pesavento861e0942021-02-06 01:07:13 -050051Note that the quantifiers are *greedy*, meaning that they will consume as many matching components
52as possible. For example, for the name ``/A/B/C/C/C``, ``^<A><B><C>+`` will match the entire name
53instead of only ``/A/B/C``. NDN regular expressions do not currently support non-greedy quantifiers
54and possessive quantifiers.
Alexander Afanasyev7c6aeb02014-04-10 19:59:19 -070055
56Sets
57~~~~
58
Davide Pesavento861e0942021-02-06 01:07:13 -050059A **name component set**, denoted by a bracket expression starting with ``[`` and ending with ``]``,
Alexander Afanasyevc381bca2017-10-15 14:51:49 -040060defines a set of name components. It matches any single name component that is a member of that set.
Alexander Afanasyev7c6aeb02014-04-10 19:59:19 -070061
Davide Pesavento861e0942021-02-06 01:07:13 -050062Unlike standard regular expressions, NDN regular expressions support only single components sets,
Alexander Afanasyevc381bca2017-10-15 14:51:49 -040063that is, you have to list all the set members one by one between the brackets. For example,
Davide Pesavento861e0942021-02-06 01:07:13 -050064``^[<ndn><localhost>]`` matches any names starting with either ``ndn`` or ``localhost`` component.
Alexander Afanasyev7c6aeb02014-04-10 19:59:19 -070065
Davide Pesavento861e0942021-02-06 01:07:13 -050066When a name component set starts with a ``'^'``, the set becomes a **negation set**. It matches the
Alexander Afanasyevc381bca2017-10-15 14:51:49 -040067complement of the contained name components. For example, ``^[^<ndn>]`` matches any non-empty name
Davide Pesavento861e0942021-02-06 01:07:13 -050068that does *not* start with ``/ndn``.
Alexander Afanasyev7c6aeb02014-04-10 19:59:19 -070069
Davide Pesavento861e0942021-02-06 01:07:13 -050070Some other types of sets, such as range sets, will be supported later.
Alexander Afanasyev7c6aeb02014-04-10 19:59:19 -070071
Alexander Afanasyevc381bca2017-10-15 14:51:49 -040072Note that component sets may be repeated in the same way as component matchers.
Alexander Afanasyev7c6aeb02014-04-10 19:59:19 -070073
74Sub-pattern and Back Reference
75~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
76
Davide Pesavento861e0942021-02-06 01:07:13 -050077A section beginning ``(`` and ending ``)`` acts as a **marked sub-pattern**. Whatever matched the
Alexander Afanasyevc381bca2017-10-15 14:51:49 -040078sub-pattern is split out in a separate field by the matching algorithm. For example
79``^<A>(<>{2})<B>(<>)`` matches the name ``/A/C/D/B/E``, and the first sub-pattern captures ``C/D``.
Alexander Afanasyev7c6aeb02014-04-10 19:59:19 -070080
Davide Pesavento861e0942021-02-06 01:07:13 -050081Marked sub-patterns can be referred to by a **back-reference** of the form ``\N``, which references
82one or more capturing groups. In the example above, a back reference ``\1\2`` extracts ``/C/D/E``
83out of the name.
Alexander Afanasyev7c6aeb02014-04-10 19:59:19 -070084
Alexander Afanasyevc381bca2017-10-15 14:51:49 -040085Marked sub-patterns can also be repeated. The regex engine does not permanently substitute
86back-references in a regular expression, but will use the last match saved into the back-reference.
87If a new match is found by capturing parentheses, the previous match is overwritten. For example,
88both ``^([<A><B><C>]+)$`` and ``^([<A><B><C>])+$`` match ``/C/A/B``. However, the former regex
89stores ``C/A/B`` as the first back-reference, while the latter one stores ``B``. That is because the
90``+`` quantifier in the latter NDN regular expression causes the marked sub-pattern to be matched
91three times, and ``B`` is the last saved match.