Alexander Afanasyev | 7c6aeb0 | 2014-04-10 19:59:19 -0700 | [diff] [blame] | 1 | NDN Regular Expression |
| 2 | ====================== |
| 3 | |
Alexander Afanasyev | c381bca | 2017-10-15 14:51:49 -0400 | [diff] [blame] | 4 | NDN regular expression is a kind of regular expression that can match NDN names. Matching is |
| 5 | performed at two levels: the name level and the name component level. |
Alexander Afanasyev | 7c6aeb0 | 2014-04-10 19:59:19 -0700 | [diff] [blame] | 6 | |
Davide Pesavento | 861e094 | 2021-02-06 01:07:13 -0500 | [diff] [blame^] | 7 | A name component matcher, enclosed in ``<`` and ``>``, specifies the pattern of a name component. |
| 8 | The component pattern is expressed using the `Modified ECMAScript Regular Expression Syntax |
Davide Pesavento | 78338c5 | 2020-04-20 23:00:02 -0400 | [diff] [blame] | 9 | <https://en.cppreference.com/w/cpp/regex/ecmascript>`_. |
Alexander Afanasyev | c381bca | 2017-10-15 14:51:49 -0400 | [diff] [blame] | 10 | For example, ``<ab*c>`` matches the 1st, 3rd, and 4th components of ``/ac/dc/abc/abbc``, but does |
Davide Pesavento | 861e094 | 2021-02-06 01:07:13 -0500 | [diff] [blame^] | 11 | not match the 2nd component. A special case is that ``<>`` denotes a wildcard matcher that can |
| 12 | match **ANY** name component. |
Alexander Afanasyev | 7c6aeb0 | 2014-04-10 19:59:19 -0700 | [diff] [blame] | 13 | |
Alexander Afanasyev | c381bca | 2017-10-15 14:51:49 -0400 | [diff] [blame] | 14 | A component matcher can match only one name component. To match a name, you need to compose an NDN |
| 15 | regular expression with zero or more name component matchers. For example, ``<ndn><edu><ucla>`` |
| 16 | matches the name ``/ndn/edu/ucla``. To describe a more complicated name pattern, we borrow some |
| 17 | syntaxes from the standard regular expressions. |
Alexander Afanasyev | 7c6aeb0 | 2014-04-10 19:59:19 -0700 | [diff] [blame] | 18 | |
| 19 | NDN Regex Syntax |
| 20 | ---------------- |
| 21 | |
| 22 | Anchors |
| 23 | ~~~~~~~ |
| 24 | |
Alexander Afanasyev | c381bca | 2017-10-15 14:51:49 -0400 | [diff] [blame] | 25 | The ``^`` character matches the start of a name. For example, ``^<ndn>`` matches any name starting |
| 26 | with the component ``ndn``, but does not match a name like ``/local/broadcast``. |
Alexander Afanasyev | 7c6aeb0 | 2014-04-10 19:59:19 -0700 | [diff] [blame] | 27 | |
Alexander Afanasyev | c381bca | 2017-10-15 14:51:49 -0400 | [diff] [blame] | 28 | The ``$`` character matches the end of a name. For example, ``^<ndn><edu>$`` matches only one |
| 29 | name: ``/ndn/edu``. |
Alexander Afanasyev | 7c6aeb0 | 2014-04-10 19:59:19 -0700 | [diff] [blame] | 30 | |
Alexander Afanasyev | c381bca | 2017-10-15 14:51:49 -0400 | [diff] [blame] | 31 | Repetition |
| 32 | ~~~~~~~~~~ |
Alexander Afanasyev | 7c6aeb0 | 2014-04-10 19:59:19 -0700 | [diff] [blame] | 33 | |
Davide Pesavento | 861e094 | 2021-02-06 01:07:13 -0500 | [diff] [blame^] | 34 | A component matcher can be followed by a **repeat quantifier** to indicate how many times the |
| 35 | preceding component may appear. |
Alexander Afanasyev | 7c6aeb0 | 2014-04-10 19:59:19 -0700 | [diff] [blame] | 36 | |
Alexander Afanasyev | c381bca | 2017-10-15 14:51:49 -0400 | [diff] [blame] | 37 | The ``*`` quantifier denotes "zero or more times". For example, ``^<A><B>*<C>$`` matches ``/A/C``, |
| 38 | ``/A/B/C``, ``/A/B/B/C``, and so on. |
Alexander Afanasyev | 7c6aeb0 | 2014-04-10 19:59:19 -0700 | [diff] [blame] | 39 | |
Alexander Afanasyev | c381bca | 2017-10-15 14:51:49 -0400 | [diff] [blame] | 40 | The ``+`` quantifier denotes "one or more times". For example, ``^<A><B>+<C>$`` matches ``/A/B/C``, |
| 41 | ``/A/B/B/C``, and so on, but does not match ``/A/C``. |
Alexander Afanasyev | 7c6aeb0 | 2014-04-10 19:59:19 -0700 | [diff] [blame] | 42 | |
Alexander Afanasyev | c381bca | 2017-10-15 14:51:49 -0400 | [diff] [blame] | 43 | The ``?`` quantifier denotes "zero or one time". For example, ``^<A><B>?<C>`` matches ``/A/C`` and |
| 44 | ``/A/B/C``, but does not match ``/A/B/B/C``. |
Alexander Afanasyev | 7c6aeb0 | 2014-04-10 19:59:19 -0700 | [diff] [blame] | 45 | |
Davide Pesavento | 861e094 | 2021-02-06 01:07:13 -0500 | [diff] [blame^] | 46 | A **bounded quantifier** specifies either a minimum or a maximum number of permitted matches, or |
| 47 | both. ``{n}`` means "exactly ``n`` times"; ``{n,}`` means "at least ``n`` times"; ``{,n}`` means |
| 48 | "at most ``n`` times"; ``{m,n}`` (with ``m ≤ n``) means "between ``m`` and ``n`` times (inclusive)". |
| 49 | For example, ``^<A><B>{2,4}<C>$`` matches ``/A/B/B/C`` and ``/A/B/B/B/B/C``. |
Alexander Afanasyev | 7c6aeb0 | 2014-04-10 19:59:19 -0700 | [diff] [blame] | 50 | |
Davide Pesavento | 861e094 | 2021-02-06 01:07:13 -0500 | [diff] [blame^] | 51 | Note that the quantifiers are *greedy*, meaning that they will consume as many matching components |
| 52 | as possible. For example, for the name ``/A/B/C/C/C``, ``^<A><B><C>+`` will match the entire name |
| 53 | instead of only ``/A/B/C``. NDN regular expressions do not currently support non-greedy quantifiers |
| 54 | and possessive quantifiers. |
Alexander Afanasyev | 7c6aeb0 | 2014-04-10 19:59:19 -0700 | [diff] [blame] | 55 | |
| 56 | Sets |
| 57 | ~~~~ |
| 58 | |
Davide Pesavento | 861e094 | 2021-02-06 01:07:13 -0500 | [diff] [blame^] | 59 | A **name component set**, denoted by a bracket expression starting with ``[`` and ending with ``]``, |
Alexander Afanasyev | c381bca | 2017-10-15 14:51:49 -0400 | [diff] [blame] | 60 | defines a set of name components. It matches any single name component that is a member of that set. |
Alexander Afanasyev | 7c6aeb0 | 2014-04-10 19:59:19 -0700 | [diff] [blame] | 61 | |
Davide Pesavento | 861e094 | 2021-02-06 01:07:13 -0500 | [diff] [blame^] | 62 | Unlike standard regular expressions, NDN regular expressions support only single components sets, |
Alexander Afanasyev | c381bca | 2017-10-15 14:51:49 -0400 | [diff] [blame] | 63 | that is, you have to list all the set members one by one between the brackets. For example, |
Davide Pesavento | 861e094 | 2021-02-06 01:07:13 -0500 | [diff] [blame^] | 64 | ``^[<ndn><localhost>]`` matches any names starting with either ``ndn`` or ``localhost`` component. |
Alexander Afanasyev | 7c6aeb0 | 2014-04-10 19:59:19 -0700 | [diff] [blame] | 65 | |
Davide Pesavento | 861e094 | 2021-02-06 01:07:13 -0500 | [diff] [blame^] | 66 | When a name component set starts with a ``'^'``, the set becomes a **negation set**. It matches the |
Alexander Afanasyev | c381bca | 2017-10-15 14:51:49 -0400 | [diff] [blame] | 67 | complement of the contained name components. For example, ``^[^<ndn>]`` matches any non-empty name |
Davide Pesavento | 861e094 | 2021-02-06 01:07:13 -0500 | [diff] [blame^] | 68 | that does *not* start with ``/ndn``. |
Alexander Afanasyev | 7c6aeb0 | 2014-04-10 19:59:19 -0700 | [diff] [blame] | 69 | |
Davide Pesavento | 861e094 | 2021-02-06 01:07:13 -0500 | [diff] [blame^] | 70 | Some other types of sets, such as range sets, will be supported later. |
Alexander Afanasyev | 7c6aeb0 | 2014-04-10 19:59:19 -0700 | [diff] [blame] | 71 | |
Alexander Afanasyev | c381bca | 2017-10-15 14:51:49 -0400 | [diff] [blame] | 72 | Note that component sets may be repeated in the same way as component matchers. |
Alexander Afanasyev | 7c6aeb0 | 2014-04-10 19:59:19 -0700 | [diff] [blame] | 73 | |
| 74 | Sub-pattern and Back Reference |
| 75 | ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ |
| 76 | |
Davide Pesavento | 861e094 | 2021-02-06 01:07:13 -0500 | [diff] [blame^] | 77 | A section beginning ``(`` and ending ``)`` acts as a **marked sub-pattern**. Whatever matched the |
Alexander Afanasyev | c381bca | 2017-10-15 14:51:49 -0400 | [diff] [blame] | 78 | sub-pattern is split out in a separate field by the matching algorithm. For example |
| 79 | ``^<A>(<>{2})<B>(<>)`` matches the name ``/A/C/D/B/E``, and the first sub-pattern captures ``C/D``. |
Alexander Afanasyev | 7c6aeb0 | 2014-04-10 19:59:19 -0700 | [diff] [blame] | 80 | |
Davide Pesavento | 861e094 | 2021-02-06 01:07:13 -0500 | [diff] [blame^] | 81 | Marked sub-patterns can be referred to by a **back-reference** of the form ``\N``, which references |
| 82 | one or more capturing groups. In the example above, a back reference ``\1\2`` extracts ``/C/D/E`` |
| 83 | out of the name. |
Alexander Afanasyev | 7c6aeb0 | 2014-04-10 19:59:19 -0700 | [diff] [blame] | 84 | |
Alexander Afanasyev | c381bca | 2017-10-15 14:51:49 -0400 | [diff] [blame] | 85 | Marked sub-patterns can also be repeated. The regex engine does not permanently substitute |
| 86 | back-references in a regular expression, but will use the last match saved into the back-reference. |
| 87 | If a new match is found by capturing parentheses, the previous match is overwritten. For example, |
| 88 | both ``^([<A><B><C>]+)$`` and ``^([<A><B><C>])+$`` match ``/C/A/B``. However, the former regex |
| 89 | stores ``C/A/B`` as the first back-reference, while the latter one stores ``B``. That is because the |
| 90 | ``+`` quantifier in the latter NDN regular expression causes the marked sub-pattern to be matched |
| 91 | three times, and ``B`` is the last saved match. |