The parsing instructions are ARG, PARSE, and PULL.
The data to parse is a source string. Parsing splits up the data in a source string and assigns pieces of it into the variables named in a template. A template is a model specifying how to split the source string. The simplest kind of template consists of only a list of variable names. Here is an example:
variable1 variable2 variable3
This kind of template parses the source string into blank-delimited words. More complicated templates contain patterns in addition to variable names.
Match characters in the source string to specify where to split it.
Indicate the character positions at which to split the source string.
Parsing is essentially a two-step process.
Parse the source string into appropriate substrings using patterns.
Parse each substring into words.
Simple Templates for Parsing into Words
Here is a parsing instruction:
parse value ’time and tide’ with var1 var2 var3
The template in this instruction is: var1 var2 var3. The data to parse is between the keywords PARSE VALUE and the keyword WITH, the source string time and tide. Parsing divides the source string into blank-delimited words and assigns them to the variables named in the template as follows:
var1=’time’ var2=’and’ var3=’tide’
In this example,the source string to parse is a literal string, time and tide.In the next example, the source string is a variable.
/* PARSE VALUE using a variable as the source string to parse */ string=’time and tide’ parse value string with var1 var2 var3 /* same results */
(PARSE VALUE does not convert lowercase a–z in the source string to uppercase A–Z. If you want to convert characters to uppercase, use PARSE UPPER VALUE.
All of the parsing instructions assign the parts of a source string into the variables named in a template. There are various parsing instructions because of differences in the nature or origin of source strings.
The PARSE VAR instruction is similar to PARSE VALUE except that the source string to parse is always a variable. In PARSE VAR, the name of the variable containing the source string follows the keywords PARSE VAR. In the next example, the variable stars contains the source string. The template is star1 star2 star3.
/* PARSE VAR example */ stars=’Sirius Polaris Rigil’ parse var stars star1 star2 star3 /* star1=’Sirius’ */ /* star2=’Polaris’ */ /* star3=’Rigil’ */
All variables in a template receive new values.If there are more variables in the template than words in the source string, the leftover variables receive null (empty) values.This is true for all parsing: for parsing into words with simple templates and for parsing with templates containing patterns. Here is an example using parsing into words.
/* More variables in template than (words in) the source string */ satellite=’moon’ parse var satellite Earth Mercury /* Earth=’moon’ */ /* Mercury=’’ */
If there are more words in the source string than variables in the template, the last variable in the template receives all leftover data. Here is an example:
/* More (words in the) source string than variables in template */ satellites=’moon Io Europa Callisto...’ parse var satellites Earth Jupiter /* Earth=’moon’ */ /* Jupiter=’Io Europa Callisto...’*/
Parsing into words removes leading and trailing blanks from each word before it is assigned to a variable. The exception to this is the word or group of words assigned to the last variable.The last variable in a template receives leftover data, preserving extra leading and trailing blanks. Here is an example:
/* Preserving extra blanks */ solar5=’Mercury Venus Earth Mars Jupiter ’ parse var solar5 var1 var2 var3 var4 /* var1 =’Mercury’ */ /* var2 =’Venus’ */ /* var3 =’Earth’ */ /* var4 =’ Mars Jupiter ’ */
In the source string, Earth has two leading blanks. Parsing removes both of them (the word-separator blank and the extra blank) before assigning var3=’Earth’. Mars has three leading blanks. Parsing removes one word-separator blank and keeps the other two leading blanks. It also keeps all five blanks between Mars and Jupiter and both trailing blanks after Jupiter.
Parsing removes no blanks if the template contains only one variable. For example: parse value ’ Pluto ’ with var1 /* var1=’ Pluto ’*/
The Period as a Placeholde
A period in a template is a placeholder. It is used instead of a variable name, but it receives no data.It is useful:
As a “dummy variable” in a list of variables
Or to collect unwanted information at the end of a string.
The period in the first example is a placeholder. Be sure to separate adjacent periods with spaces; otherwise, an error results.
/* Period as a placeholder */ stars=’Arcturus Betelgeuse Sirius Rigil’ parse var stars . . brightest . /* brightest=’Sirius’ */ /* Alternative to period as placeholder */ stars=’Arcturus Betelgeuse Sirius Rigil’ parse var stars drop junk brightest rest /* brightest=’Sirius’ */
A placeholder saves the overhead of unneeded variables.
Templates Containing String Patterns
A string pattern matches characters in the source string to indicate where to split it. A string pattern can be a:
Literal string pattern
One or more characters within quotation marks.
Variable string pattern
A variable within parentheses with no plus (+) or minus (-) or equal sign (=) before the left parenthesis.
Here are two templates: a simple template and a template containing a literal string pattern:
The literal string pattern is: ’, ’. This template:
Puts characters from the start of the source string up to (but not including) the first character of the match (the comma) into var1
Puts characters starting with the character after the last character of the match (the character after the blank that follows the comma) and ending with the end of the string into var2.
A template with a string pattern omits data in the source string that matches the pattern. We used the pattern ’, ’ (with a blank) instead of ’,’ (no blank) because, without the blank in the pattern, the variable fn receives ' John' (including a blank). If the source string does not contain a match for a string pattern, then any variables preceding the unmatched string pattern get all the data in question. Any variables after that pattern receive the null string.
A null string is never found. It always matches the end of the source string.
A positional pattern is a number that identifies the character position at which to split data in the source string.The number must be a whole number.
An absolute positional pattern is
A number with no plus (+) or minus (-) sign preceding it or with an equal sign (=) preceding it
A variable in parentheses with an equal sign before the left parenthesis. The number specifies the absolute character position at which to split the source string.
Here is a template with absolute positional patterns:
variable1 11 variable2 21 variable3
The numbers 11 and 21 are absolute positional patterns. The number 11 refers to the 11th position in the input string, 21 to the 21st position.
Puts characters 1 through 10 of the source string into variable1
Puts characters 11 through 20 into variable2
Puts characters 21 to the end into variable3.
Positional patterns are probably most useful for working with a file of records, such as:
The following example uses this record structure.
/* Parsing with absolute positional patterns in template */
record.1=’Clemens Samuel Mark Twain ’ record.2=’Evans Mary Ann George Eliot ’ record.3=’Munro H.H. Saki ’ do n=1 to 3 parse var record.n lastname 11 firstname 21 pseudonym If lastname=’Evans’ & firstname=’Mary Ann’ then say ’By George!’ end /* Says ’By George!’ after record 2 */
The source string is first split at character position 11 and at position 21. The language processor assigns characters 1 to 10 into lastname, characters 11 to 20 into firstname, and characters 21 to 40 into pseudonym.
The template could have been:
1 lastname 11 firstname 21 pseudonym
lastname 11 firstname 21 pseudonym Specifying the 1 is optional.
Optionally, you can put an equal sign before a number in a template. An equal sign is the same as no sign before a number in a template. The number refers to a particular character position in the source string. These two templates work the same:
lastname 11 first 21 pseudonym lastname =11 first =21 pseudonym
A relative positional pattern is a number with a plus (+) or minus (-) sign preceding it. (It can also be a variable within parentheses, with a plus (+) or minus (-) sign preceding the left parenthesis. The number specifies the relative character position at which to split the source string. The plus or minus indicates movement right or left, respectively, from the start of the string (for the first pattern) or from the position of the last match.The position of the last match is the first character of the last match. Here is the same example as for absolute positional patterns done with relative positional patterns:
/* Parsing with relative positional patterns in template */ record.1=’Clemens Samuel Mark Twain ’ record.2=’Evans Mary Ann George Eliot ’ record.3=’Munro H.H. Saki ’ do n=1 to 3 parse var record.n lastname +10 firstname + 10 pseudonym If lastname=’Evans’ & firstname=’Mary Ann’ then say ’By George!’ end /* same results */
Blanks between the sign and the number are insignificant. Therefore, +10 and + 10 have the same meaning. Note that +0 is a valid relative positional pattern.
Absolute and relative positional patterns are interchangeable (except in the special case when a string pattern precedes a variable name and a positional pattern follows the variable name). The templates from the examples of absolute and relative positional patterns give the same results. (Impliedstartingpoint isposition 1.) Put characters1 through 10in lastname. (Non-inclusivestopping pointis 11 (1+10).) Put characters11 through 20in firstname. (Non-inclusivestopping pointis 21 (11+10).) Put characters21 throughend of stringin pseudonym. lastname 11lastname +10firstname 21firstname
Only with positional patterns can a matching operation back up to an earlier position in the source string. Here is an example using absolute positional patterns:
/* Backing up to an earlier position (with absolute positional)
*/ string=’astronomers’ parse var string 2 var1 4 1 var2 2 4 var3 5 11 var4 say string ’study’ var1||var2||var3||var4 /* Displays: "astronomers study stars" */
The absolute positional pattern 1 backs up to the first character in the source string.
With relative positional patterns, a number preceded by a minus sign backs up to an earlier position. Here is the same example using relative positional patterns:
/* Backing up to an earlier position (with relative positional) */
string=’astronomers’ parse var string 2 var1 +2 -3 var2 +1 +2 var3 +1 +6 var4 say string ’study’ var1||var2||var3||var4 /* same results */
In the previous example, the relative positional pattern -3 backs up to the first character in the source string.
The templates in the last two examples are equivalent.
You can use templates with positional patterns to make multiple assignments:
/* Making multiple assignments */ books=’Silas Marner, Felix Holt, Daniel Deronda, Middlemarch’ parse var books 1 Eliot 1 Evans /* Assigns the (entire) value of books to Eliot and to Evans. */
Combining Patterns and Parsing Into Words
What happens when a template contains patterns that divide the source string into sections containing multiple words? String and positional patterns divide the source string into substrings. The language processor then applies a section of the template to each substring, following the rules for parsing into words.
/* Combining string pattern and parsing into words */ name=’ John Q. Public’ parse var name fn init ’.’ ln /* Assigns: fn=’John’ */ /* init=’ Q’ */ /* ln=’ Public’ */
The pattern divides the template into two sections:
The matching pattern splits the source string into two substrings:
’ John Q’
The language processor parses these substrings into words based on the appropriate template section.
John had three leading blanks.All are removed because parsing into words removes leading and trailing blanks except from the last variable.Q has six leading blanks. Parsing removes one word-separator blank and keeps the rest because init is the last variable in that section of the template.
For the substring ’ Public’, parsing assigns the entire string into ln without removing any blanks. This is because ln is the only variable in this section of the template.
/* Combining positional patterns with parsing into words */ string=’R E X X’ parse var string var1 var2 4 var3 6 var4 /* Assigns: var1=’R’ */ /* var2=’E’ */ /* var3=’ X’ */ /* var4=’ X’ */
The pattern divides the template into three sections:
The matching patterns split the source string into three substrings that are individually parsed into words:
The variable var1 receives ’R’; var2 receives ’E’. Both var3 and var4 receive ’ X’ (with a blank before the X) because each is the only variable in its section of the template.