Web Vuser Functions (WEB) > General Topics in Web Protocol > Regular Expressions

Regular Expressions

In some cases, you can use regular expressions to increase the flexibility and adaptability of your Vuser scripts.

Regular expressions can be used in functions:

Note: In RTE Vuser scripts only, all regular expressions must begin with an exclamation point (!).

Regular expressions are a major programming topic, far beyond what is described here. They are often referred to as "REs" or "regexes", or in the singular, "RE" or "regex". You can get more information from the many textbooks dedicated to the subject and by searching the Internet.

This topic describes the features of REs most commonly used in Vuser scripts.

Note that the requirements of load testing rarely require the full power of regular expressions and that sophisticated use of regular expressions make a Vuser script harder to understand and debug. Moreover, use of REs adversely affects performance.

Avoid careless and unnecessary use of REs. For example, use of REs like ".+" and ".*" can have unexpected results.

With functions that take left and right boundaries, do not use REs to specify an open-ended boundary. To search from the beginning of a string to the right boundary, do not specify the left boundary. To search from the left boundary to the end of a string, do not specify the right boundary.

Regular Expression Syntax

REs specify a pattern which, when it occurs in a string, that string is said to match the RE pattern. Generally, there is a match if the RE occurs anywhere in the string, but an RE can be anchored to the beginning or end of the string. If the RE is preceded by a caret (^), it matches the string only if it occurs at the beginning of the string. If the RE ends with the dollar sign ($), it matches only if the RE occurs at the end of the string. The dollar sign is a common source of confusion, because there may be invisible characters at the end of a string, so that no match is found although casual inspection of the source seems to indicate that there is a match.

If an RE is anchored both to the beginning and the end, then it matches an entire string. For example, ^abc$ matches only "abc", but not "_abc" or "abc_".

Any character or phrase that is not one of the special metacharacters described below is matched literally. For example, a regular expression of "abc" matches exactly that: "abc".

Some characters, called metacharacters, have special meanings. The most common are square brackets ([ and ]), the backslash (\), the caret (^), the dollar sign ($), the dot (.), the vertical bar (|), the question mark (?), the asterisk (*), the plus sign (+), and parentheses ( and ). To search for an occurrence of a literal character that serves as a metacharacter, "escape" it by preceding it with a backslash. For example, to search for the string "Enter a \ or |", use the regular expression, "Enter a \\ or \|".

Note that ANSI C will treat a single backslash as part of the C syntax, rather than as part of the regular expression. In C scripts, to pass a regular expression that contains a backslash, precede the RE backslash with another backslash. For example, to pass the regular expression, \*, meaning to find a literal asterisk, pass \\*.

web_reg_save_param_regexp( "ParamName=stam", "RegExp=\\*", LAST);

Some metacharacters only function as metacharacters when they appear in specific contexts or positions, as shown in the following table. If you want to use a metacharacter as a literal character and it's not clear how it will function, you can always escape it to avoid doubt.

Meta-character Means When it occurs
[ Begins a character set or range Anywhere except between brackets
] Ends a character set or range As first ] after an opening bracket, [
( ) Indicates that the characters between the parentheses are treated as a group Anywhere
^ 1 - Anchors the RE to
the start of the string 2 - Negates a character set.
 
As first character in the RE
  As first character after opening bracket ([) in character set
$ Anchors the RE to end of string As last character in the RE
. Matches any one character Anywhere
? Indicates that the preceding character or group is optional. For example, "ab?c" matches both "abc" and "ac". "A(123)?B matches both "AB" and "A123B" Anywhere except the first character in the RE
+ Indicates that the preceding character or group appears one or more times Anywhere except the first character in the RE
* Indicates that the preceding character or group is optional but can appear any number of times Anywhere except the first character in the RE
\ The escape character. Indicates that the following character is interpreted literally Anywhere except after itself
| Indicates that either the preceding or following character or group appears. "(ABC)|(123)" matches "ABC" or 123 Anywhere except the first character in the RE

A common pitfall in the use of REs is copying a phrase containing metacharacters from another source into the script. Always check phrases pasted into REs for metacharacters and escape them as necessary.

The following are among the options that can be used to create regular expressions: