Perl Best Practices by Conway Damian

Perl Best Practices by Conway Damian

Author:Conway, Damian [Damian Conway]
Language: eng
Format: epub
Tags: COMPUTERS / Programming Languages / JavaScript
ISBN: 9780596555023
Publisher: O'Reilly Media, Inc.
Published: 2009-06-17T16:00:00+00:00


Backtracking

Prevent useless backtracking.

In the final example of the previous guideline:

qr{ with \s+ (?: each \s+ (?:$EXPR | $VAR \s* in \s* [(] $LIST [)] ) | [(] $LIST [)] ) \s* $BLOCK}xms

if the match successfully reaches the shared \s* $BLOCK suffix but subsequently fails to match the trailing block, then the regex engine will immediately backtrack. That backtracking will cause it to reconsider the various (nested) alternatives: first by backtracking within the previous successful alternative, and then by trying any remaining unexamined alternatives. That's potentially a lot of expensive matching, all of which is utterly useless. For a start, the syntaxes of the various options are mutually exclusive, so if one of them already matched, none of the subsequent candidates ever will.

Even if that weren't the case, the regex is backtracking only because there wasn't a valid block at the end of the loop specification. But backtracking and messing around with the other alternatives won't change that fact. Even if the regex does find another way to match the first part of the loop specification, there still won't be a valid block at the end of the string when matching reaches that point again.

This particular situation arises every time an alternation consists of mutually exclusive alternatives. The "dumb but fast" behaviour of the regex engine forces it to go back and mindlessly try every other possibility, even when—to an outside observer—that's provably a complete waste of time and the engine would do much better to just forget about backtracking into the alternation.

As before, you have to explicitly point that optimization out to Perl. In this case, that's done by enclosing the alternation in a special form of parentheses: (?>…). These are Perl's "don't-ever-backtrack-into-me" markers. They tell the regex engine that the enclosed subpattern can safely be skipped over during backtracking, because you're confident that re-matching the contents either won't succeed or, if it does succeed, won't help the overall match.

In practical terms, you just need to replace the (?:…) parentheses of any mutually exclusive set of alternatives with (?>…) parentheses. For example, like this:

m{ with \s+ (?> each \s+ # (?> means: (?: $EXPR # There can be only | $VAR \s* in \s* [(] $LIST [)] # one way to match ) # the enclosed set | [(] $LIST [)] # of alternatives ) # ) \s* $BLOCK}xms;

This kind of optimization is even more important for repeated subpatterns (especially those containing alternations).

Suppose you wanted to write a regular expression to match a parenthesized list of comma-separated items. You might write:

$str =~ m{ [(] # A literal opening paren $ITEM # At least one item (?: # Followed by... , # a comma $ITEM # and another item )* # as many times as possible (but none is okay too) [)] # A literal closing paren }xms;

That pattern works fine: it matches every parenthesized list you give it and fails to match everything else. But consider what actually happens when you give it nearly the right input. If, for example,



Download



Copyright Disclaimer:
This site does not store any files on its server. We only index and link to content provided by other sites. Please contact the content providers to delete copyright contents if any and email us, we'll remove relevant links or contents immediately.