oval_developer@lists.cisecurity.org

A list for people interested in developing the OVAL language.

View all threads

regex_capture quantified subpattern behavior clarification

JL
Jan Lieskovsky
Mon, Aug 8, 2016 10:14 AM

Hello OVAL Developers,

there's the following documentation section in RegexCaptureFunctionType
description:

"Note that a quantified capturing sub-pattern does not produce multiple
substrings.  Standard regular expression semantics are such that if a
capturing sub-pattern is required to match multiple times in order for
the overall regular expression to match, the capture produced is the
last substring to have matched the sub-pattern."

(from https://github.com/OVALProject/Language/blob/master/specifications/oval-language-specification.docx )

If I am reading the above section correctly, in the case there are multiple
"substrings" within the text, the regex_capture pattern could match against,
and pattern quantification is used, the last matched item should be collected
/ returned by the scanner.

But checking this behaviour in the OpenSCAP scanner, always the first matched
instance is returned (regardless if pattern quantification was used / specified
or not).

Suppose the attached example OVAL file.

Unless I have misunderstood something, the regex_capture()'s collected value
should be the "fs.suid_dumpable = 4" (IOW the last one), not the
"fs.suid_dumpable = 1", like it's done currently, right? IMHO last one should
be collected, since quantified sub-pattern was used in regex_capture specification.

Is this correct? Or I have overlooked something? If the latter, could you
hopefully provide an example of an pattern, when "regex_capture" would return
last substring that matched the sub-pattern, as specified in the specification?

Thank you && Regards, Jan

Jan iankko Lieskovsky / Red Hat Security Technologies Team

...

Hello OVAL Developers, there's the following documentation section in RegexCaptureFunctionType description: "Note that a quantified capturing sub-pattern does not produce multiple substrings.  Standard regular expression semantics are such that if a capturing sub-pattern is required to match multiple times in order for the overall regular expression to match, the capture produced is the last substring to have matched the sub-pattern." (from https://github.com/OVALProject/Language/blob/master/specifications/oval-language-specification.docx ) If I am reading the above section correctly, in the case there are multiple "substrings" within the text, the regex_capture pattern could match against, and pattern quantification is used, the last matched item should be collected / returned by the scanner. But checking this behaviour in the OpenSCAP scanner, always the first matched instance is returned (regardless if pattern quantification was used / specified or not). Suppose the attached example OVAL file. Unless I have misunderstood something, the regex_capture()'s collected value should be the "fs.suid_dumpable = 4" (IOW the last one), not the "fs.suid_dumpable = 1", like it's done currently, right? IMHO last one should be collected, since quantified sub-pattern was used in regex_capture specification. Is this correct? Or I have overlooked something? If the latter, could you hopefully provide an example of an pattern, when "regex_capture" would return last substring that matched the sub-pattern, as specified in the specification? Thank you && Regards, Jan -- Jan iankko Lieskovsky / Red Hat Security Technologies Team ...
JL
Jan Lieskovsky
Mon, Aug 8, 2016 12:56 PM

----- Original Message -----

From: "Jan Lieskovsky" jlieskov@redhat.com
To: "OVAL Developer List" oval_developer@lists.cisecurity.org
Sent: Monday, August 8, 2016 12:14:21 PM
Subject: [OVAL DEVELOPER] regex_capture quantified subpattern behavior clarification

Hello OVAL Developers,

there's the following documentation section in RegexCaptureFunctionType
description:

To express the question hopefully more clear. When modifying the original
regex pattern to read like:

<regex_capture pattern=".*(fs.suid_dumpable = \d)+">

it's possible to obtain the last matched item from the string.

"Note that a quantified capturing sub-pattern does not produce multiple
substrings.  Standard regular expression semantics are such that if a
capturing sub-pattern is required to match multiple times in order for
the overall regular expression to match, the capture produced is the
last substring to have matched the sub-pattern."

(from
https://github.com/OVALProject/Language/blob/master/specifications/oval-language-specification.docx
)

But the question was if it would be possible provide example of the pattern,
which would produce "the capture produced is the last substring to have matched
the sub-pattern".

Could we enhance the OVAL documentation with such an example?

Thank you && Regards, Jan

Jan iankko Lieskovsky / Red Hat Security Technologies Team

If I am reading the above section correctly, in the case there are multiple
"substrings" within the text, the regex_capture pattern could match against,
and pattern quantification is used, the last matched item should be collected
/ returned by the scanner.

But checking this behaviour in the OpenSCAP scanner, always the first matched
instance is returned (regardless if pattern quantification was used /
specified
or not).

Suppose the attached example OVAL file.

Unless I have misunderstood something, the regex_capture()'s collected value
should be the "fs.suid_dumpable = 4" (IOW the last one), not the
"fs.suid_dumpable = 1", like it's done currently, right? IMHO last one should
be collected, since quantified sub-pattern was used in regex_capture
specification.

Is this correct? Or I have overlooked something? If the latter, could you
hopefully provide an example of an pattern, when "regex_capture" would return
last substring that matched the sub-pattern, as specified in the
specification?

Thank you && Regards, Jan

Jan iankko Lieskovsky / Red Hat Security Technologies Team

...


OVAL_Developer mailing list
OVAL_Developer@lists.cisecurity.org
http://lists.cisecurity.org/mailman/listinfo/oval_developer_lists.cisecurity.org

...

----- Original Message ----- > From: "Jan Lieskovsky" <jlieskov@redhat.com> > To: "OVAL Developer List" <oval_developer@lists.cisecurity.org> > Sent: Monday, August 8, 2016 12:14:21 PM > Subject: [OVAL DEVELOPER] regex_capture quantified subpattern behavior clarification > > > Hello OVAL Developers, > > there's the following documentation section in RegexCaptureFunctionType > description: To express the question hopefully more clear. When modifying the original regex pattern to read like: <regex_capture pattern=".*(fs\.suid_dumpable = \d)+"> it's possible to obtain the last matched item from the string. > > "Note that a quantified capturing sub-pattern does not produce multiple > substrings.  Standard regular expression semantics are such that if a > capturing sub-pattern is required to match multiple times in order for > the overall regular expression to match, the capture produced is the > last substring to have matched the sub-pattern." > > (from > https://github.com/OVALProject/Language/blob/master/specifications/oval-language-specification.docx > ) But the question was if it would be possible provide example of the pattern, which would produce "the capture produced is the last substring to have matched the sub-pattern". Could we enhance the OVAL documentation with such an example? Thank you && Regards, Jan -- Jan iankko Lieskovsky / Red Hat Security Technologies Team > > If I am reading the above section correctly, in the case there are multiple > "substrings" within the text, the regex_capture pattern could match against, > and pattern quantification is used, the last matched item should be collected > / returned by the scanner. > > But checking this behaviour in the OpenSCAP scanner, always the first matched > instance is returned (regardless if pattern quantification was used / > specified > or not). > > Suppose the attached example OVAL file. > > Unless I have misunderstood something, the regex_capture()'s collected value > should be the "fs.suid_dumpable = 4" (IOW the last one), not the > "fs.suid_dumpable = 1", like it's done currently, right? IMHO last one should > be collected, since quantified sub-pattern was used in regex_capture > specification. > > Is this correct? Or I have overlooked something? If the latter, could you > hopefully provide an example of an pattern, when "regex_capture" would return > last substring that matched the sub-pattern, as specified in the > specification? > > Thank you && Regards, Jan > -- > Jan iankko Lieskovsky / Red Hat Security Technologies Team > > ... > _______________________________________________ > OVAL_Developer mailing list > OVAL_Developer@lists.cisecurity.org > http://lists.cisecurity.org/mailman/listinfo/oval_developer_lists.cisecurity.org > ...
DS
David Solin
Mon, Aug 8, 2016 2:10 PM

Hi Jan,

The preceding paragraph in the RegexCaptureFunctionType schema documentation states:

“If the regular expression contains multiple capturing sub-patterns, only the first capture is used. If there are no capturing sub-patterns, the result for each target string must be the empty string. Otherwise, if the regular expression could match the target string in more than one place, only the first match (and its first capture) is used. If no matches are found in a target string, the result for that target must be the empty string.”

This indicates that, given the content you specified, the FIRST match should be the result (i.e., “fs.suid_dumpable = 1”).

An example of a “quantified capturing sub-pattern” whose capturing sub-pattern would match multiple times (thus yielding the last match) in order for the overall pattern to match is:

“^.(fs.suid_dumpable = \d).$”

If you use that value as the pattern, in fact, the very last match will be returned!

Best regards,
—David A. Solin

On Aug 8, 2016, at 5:14 AM, Jan Lieskovsky jlieskov@REDHAT.COM wrote:

Hello OVAL Developers,

there's the following documentation section in RegexCaptureFunctionType
description:

"Note that a quantified capturing sub-pattern does not produce multiple
substrings.  Standard regular expression semantics are such that if a
capturing sub-pattern is required to match multiple times in order for
the overall regular expression to match, the capture produced is the
last substring to have matched the sub-pattern."

(from https://github.com/OVALProject/Language/blob/master/specifications/oval-language-specification.docx )

If I am reading the above section correctly, in the case there are multiple
"substrings" within the text, the regex_capture pattern could match against,
and pattern quantification is used, the last matched item should be collected
/ returned by the scanner.

But checking this behaviour in the OpenSCAP scanner, always the first matched
instance is returned (regardless if pattern quantification was used / specified
or not).

Suppose the attached example OVAL file.

Unless I have misunderstood something, the regex_capture()'s collected value
should be the "fs.suid_dumpable = 4" (IOW the last one), not the
"fs.suid_dumpable = 1", like it's done currently, right? IMHO last one should
be collected, since quantified sub-pattern was used in regex_capture specification.

Is this correct? Or I have overlooked something? If the latter, could you
hopefully provide an example of an pattern, when "regex_capture" would return
last substring that matched the sub-pattern, as specified in the specification?

Thank you && Regards, Jan

Jan iankko Lieskovsky / Red Hat Security Technologies Team

...<regex_capture_test.xml>_______________________________________________
OVAL_Developer mailing list
OVAL_Developer@lists.cisecurity.org
http://lists.cisecurity.org/mailman/listinfo/oval_developer_lists.cisecurity.org

...

Hi Jan, The preceding paragraph in the RegexCaptureFunctionType schema documentation states: “If the regular expression contains multiple capturing sub-patterns, only the first capture is used. If there are no capturing sub-patterns, the result for each target string must be the empty string. Otherwise, if the regular expression could match the target string in more than one place, only the first match (and its first capture) is used. If no matches are found in a target string, the result for that target must be the empty string.” This indicates that, given the content you specified, the FIRST match should be the result (i.e., “fs.suid_dumpable = 1”). An example of a “quantified capturing sub-pattern” whose capturing sub-pattern would match multiple times (thus yielding the last match) in order for the overall pattern to match is: “^.*(fs\.suid_dumpable = \d).*$” If you use that value as the pattern, in fact, the very last match will be returned! Best regards, —David A. Solin > On Aug 8, 2016, at 5:14 AM, Jan Lieskovsky <jlieskov@REDHAT.COM> wrote: > > > Hello OVAL Developers, > > there's the following documentation section in RegexCaptureFunctionType > description: > > "Note that a quantified capturing sub-pattern does not produce multiple > substrings. Standard regular expression semantics are such that if a > capturing sub-pattern is required to match multiple times in order for > the overall regular expression to match, the capture produced is the > last substring to have matched the sub-pattern." > > (from https://github.com/OVALProject/Language/blob/master/specifications/oval-language-specification.docx ) > > If I am reading the above section correctly, in the case there are multiple > "substrings" within the text, the regex_capture pattern could match against, > and pattern quantification is used, the last matched item should be collected > / returned by the scanner. > > But checking this behaviour in the OpenSCAP scanner, always the first matched > instance is returned (regardless if pattern quantification was used / specified > or not). > > Suppose the attached example OVAL file. > > Unless I have misunderstood something, the regex_capture()'s collected value > should be the "fs.suid_dumpable = 4" (IOW the last one), not the > "fs.suid_dumpable = 1", like it's done currently, right? IMHO last one should > be collected, since quantified sub-pattern was used in regex_capture specification. > > Is this correct? Or I have overlooked something? If the latter, could you > hopefully provide an example of an pattern, when "regex_capture" would return > last substring that matched the sub-pattern, as specified in the specification? > > Thank you && Regards, Jan > -- > Jan iankko Lieskovsky / Red Hat Security Technologies Team > > ...<regex_capture_test.xml>_______________________________________________ > OVAL_Developer mailing list > OVAL_Developer@lists.cisecurity.org > http://lists.cisecurity.org/mailman/listinfo/oval_developer_lists.cisecurity.org ...