Go to the first, previous, next, last section, table of contents.
Patterns and Actions
As you have already seen, each awk
statement consists of a pattern with an associated action. This chapter describes how you build patterns and actions.
- Pattern Overview: What goes into a pattern.
- Action Overview: What goes into an action.
Pattern Elements
Patterns in awk
control the execution of rules: a rule is executed when its pattern matches the current input record. This section explains all about how to write patterns.
- Kinds of Patterns: A list of all kinds of patterns.
- Regexp Patterns: Using regexps as patterns.
- Expression Patterns: Any expression can be used as a pattern.
- Ranges: Pairs of patterns specify record ranges.
- BEGIN/END: Specifying initialization and cleanup rules.
- Empty: The empty pattern, which matches every record.
Kinds of Patterns
Here is a summary of the types of patterns supported in awk
.
/regular expression/
- A regular expression as a pattern. It matches when the text of the input record fits the regular expression. (See section Regular Expressions.)
expression
- A single expression. It matches when its value is non-zero (if a number) or non-null (if a string). (See section Expressions as Patterns.)
pat1, pat2
- A pair of patterns separated by a comma, specifying a range of records. The range includes both the initial record that matches pat1, and the final record that matches pat2. (See section Specifying Record Ranges with Patterns.)
BEGIN
END
- Special patterns for you to supply start-up or clean-up actions for your
awk
program. (See section TheBEGIN
andEND
Special Patterns.) empty
- The empty pattern matches every input record. (See section The Empty Pattern.)
Regular Expressions as Patterns
We have been using regular expressions as patterns since our early examples. This kind of pattern is simply a regexp constant in the pattern part of a rule. Its meaning is `$0 ~ /pattern/'. The pattern matches when the input record matches the regexp. For example:
/foo|bar|baz/ { buzzwords++ }
END { print buzzwords, "buzzwords seen" }
Expressions as Patterns
Any awk
expression is valid as an awk
pattern. Then the pattern matches if the expression's value is non-zero (if a number) or non-null (if a string).
The expression is reevaluated each time the rule is tested against a new input record. If the expression uses fields such as $1
, the value depends directly on the new input record's text; otherwise, it depends only on what has happened so far in the execution of the awk
program, but that may still be useful.
A very common kind of expression used as a pattern is the comparison expression, using the comparison operators described in section Variable Typing and Comparison Expressions.
Regexp matching and non-matching are also very common expressions. The left operand of the `~' and `!~' operators is a string. The right operand is either a constant regular expression enclosed in slashes (/regexp/
), or any expression, whose string value is used as a dynamic regular expression (see section Using Dynamic Regexps).
The following example prints the second field of each input record whose first field is precisely `foo'.
$ awk '$1 == "foo" { print $2 }' BBS-list
(There is no output, since there is no BBS site named "foo".) Contrast this with the following regular expression match, which would accept any record with a first field that contains `foo':
$ awk '$1 ~ /foo/ { print $2 }' BBS-list -| 555-1234 -| 555-6699 -| 555-6480 -| 555-2127
Boolean expressions are also commonly used as patterns. Whether the pattern matches an input record depends on whether its subexpressions match.
For example, the following command prints all records in `BBS-list' that contain both `2400' and `foo'.
$ awk '/2400/ && /foo/' BBS-list -| fooey 555-1234 2400/1200/300 B
The following command prints all records in `BBS-list' that contain either `2400' or `foo', or both.
$ awk '/2400/ || /foo/' BBS-list -| alpo-net 555-3412 2400/1200/300 A -| bites 555-1675 2400/1200/300 A -| fooey 555-1234 2400/1200/300 B -| foot 555-6699 1200/300 B -| macfoo 555-6480 1200/300 A -| sdace 555-3430 2400/1200/300 A -| sabafoo 555-2127 1200/300 C
The following command prints all records in `BBS-list' that do not contain the string `foo'.
$ awk '! /foo/' BBS-list -| aardvark 555-5553 1200/300 B -| alpo-net 555-3412 2400/1200/300 A -| barfly 555-7685 1200/300 A -| bites 555-1675 2400/1200/300 A -| camelot 555-0542 300 C -| core 555-2912 1200/300 C -| sdace 555-3430 2400/1200/300 A
The subexpressions of a boolean operator in a pattern can be constant regular expressions, comparisons, or any other awk
expressions. Range patterns are not expressions, so they cannot appear inside boolean patterns. Likewise, the special patterns BEGIN
and END
, which never match any input record, are not expressions and cannot appear inside boolean patterns.
A regexp constant as a pattern is also a special case of an expression pattern. /foo/
as an expression has the value one if `foo' appears in the current input record; thus, as a pattern, /foo/
matches any record containing `foo'.
Specifying Record Ranges with Patterns
A range pattern is made of two patterns separated by a comma, of the form `begpat, endpat'. It matches ranges of consecutive input records. The firstpattern, begpat, controls where the range begins, and the second one, endpat, controls where it ends. For example,
awk '$1 == "on", $1 == "off"'
prints every record between `on'/`off' pairs, inclusive.
A range pattern starts out by matching begpat against every input record; when a record matches begpat, the range pattern becomes turned on. The range pattern matches this record. As long as it stays turned on, it automatically matches every input record read. It also matches endpat against every input record; when that succeeds, the range pattern is turned off again for the following record. Then it goes back to checking begpat against each record.
The record that turns on the range pattern and the one that turns it off both match the range pattern. If you don't want to operate on these records, you can write if
statements in the rule's action to distinguish them from the records you are interested in.
It is possible for a pattern to be turned both on and off by the same record, if the record satisfies both conditions. Then the action is executed for just that record.
For example, suppose you have text between two identical markers (say the `%' symbol) that you wish to ignore. You might try to combine a range patternthat describes the delimited text with the next
statement (not discussed yet, see section The next
Statement), which causes awk
to skip any further processing of the current record and start over again with the next input record. Such a program would like this:
/^%$/,/^%$/ { next }
{ print }
This program fails because the range pattern is both turned on and turned off by the first line with just a `%' on it. To accomplish this task, you must write the program this way, using a flag:
/^%$/ { skip = ! skip; next }
skip == 1 { next } # skip lines with `skip' set
Note that in a range pattern, the `,' has the lowest precedence (is evaluated last) of all the operators. Thus, for example, the following program attempts to combine a range pattern with another, simpler test.
echo Yes | awk '/1/,/2/ || /Yes/'
The author of this program intended it to mean `(/1/,/2/) || /Yes/'. However, awk
interprets this as `/1/, (/2/ || /Yes/)'. This cannot be changed or worked around; range patterns do not combine with other patterns.
The BEGIN
and END
Special Patterns
BEGIN
and END
are special patterns. They are not used to match input records. Rather, they supply start-up or clean-up actions for your awk
script.
- Using BEGIN/END: How and why to use BEGIN/END rules.
- I/O And BEGIN/END: I/O issues in BEGIN/END rules.
Startup and Cleanup Actions
A BEGIN
rule is executed, once, before the first input record has been read. An END
rule is executed, once, after all the input has been read. For example:
$ awk ' > BEGIN { print "Analysis of \"foo\"" } > /foo/ { ++n } > END { print "\"foo\" appears " n " times." }' BBS-list -| Analysis of "foo" -| "foo" appears 4 times.
This program finds the number of records in the input file `BBS-list' that contain the string `foo'. The BEGIN
rule prints a title for the report. There is no need to use the BEGIN
rule to initialize the counter n
to zero, as awk
does this automatically (see section Variables).
The second rule increments the variable n
every time a record containing the pattern `foo' is read. The END
rule prints the value of n
at the end of the run.
The special patterns BEGIN
and END
cannot be used in ranges or with boolean operators (indeed, they cannot be used with any operators).
An awk
program may have multiple BEGIN
and/or END
rules. They are executed in the order they appear, all the BEGIN
rules at start-up and all the END
rules at termination. BEGIN
and END
rules may be intermixed with other rules. This feature was added in the 1987 version of awk
, and is included in the POSIX standard. The original (1978) version of awk
required you to put the BEGIN
rule at the beginning of the program, and the END
rule at the end, and only allowed one of each. This is no longer required, but it is a good idea in terms of program organization and readability.
Multiple BEGIN
and END
rules are useful for writing library functions, since each library file can have its own BEGIN
and/or END
rule to do its own initialization and/or cleanup. Note that the order in which library functions are named on the command line controls the order in which their BEGIN
and END
rules are executed. Therefore you have to be careful to write such rules in library files so that the order in which they are executed doesn't matter. See sectionCommand Line Options, for more information on using library functions. See section A Library of awk
Functions, for a number of useful library functions.
If an awk
program only has a BEGIN
rule, and no other rules, then the program exits after the BEGIN
rule has been run. (The original version of awk
used to keep reading and ignoring input until end of file was seen.) However, if an END
rule exists, then the input will be read, even if there are no other rules in the program. This is necessary in case the END
rule checks the FNR
and NR
variables (d.c.).
BEGIN
and END
rules must have actions; there is no default action for these rules since there is no current record when they run.
Input/Output from BEGIN
and END
Rules
There are several (sometimes subtle) issues involved when doing I/O from a BEGIN
or END
rule.
The first has to do with the value of $0
in a BEGIN
rule. Since BEGIN
rules are executed before any input is read, there simply is no input record, and therefore no fields, when executing BEGIN
rules. References to $0
and the fields yield a null string or zero, depending upon the context. One way to give $0
a real value is to execute a getline
command without a variable (see section Explicit Input with getline
). Another way is to simply assign a value to it.
The second point is similar to the first, but from the other direction. Inside an END
rule, what is the value of $0
and NF
? Traditionally, due largely to implementation issues, $0
and NF
were undefined inside an END
rule. The POSIX standard specified that NF
was available in an END
rule, containing the numberof fields from the last input record. Due most probably to an oversight, the standard does not say that $0
is also preserved, although logically one would think that it should be. In fact, gawk
does preserve the value of $0
for use in END
rules. Be aware, however, that Unix awk
, and possibly other implementations, do not.
The third point follows from the first two. What is the meaning of `print' inside a BEGIN
or END
rule? The meaning is the same as always, `print $0'. If $0
is the null string, then this prints an empty line. Many long time awk
programmers use `print' in BEGIN
and END
rules, to mean `print ""', relying on $0
being null. While you might generally get away with this in BEGIN
rules, in gawk
at least, it is a very bad idea in END
rules. It is also poor style, since if you want an empty line in the output, you should say so explicitly in your program.
The Empty Pattern
An empty (i.e. non-existent) pattern is considered to match every input record. For example, the program:
awk '{ print $1 }' BBS-list
prints the first field of every record.
Overview of Actions
An awk
program or script consists of a series of rules and function definitions, interspersed. (Functions are described later. See section User-defined Functions.)
A rule contains a pattern and an action, either of which (but not both) may be omitted. The purpose of the action is to tell awk
what to do once a match for the pattern is found. Thus, in outline, an awk
program generally looks like this:
[pattern] [{ action }] [pattern] [{ action }] ... function name(args) { ... } ...
An action consists of one or more awk
statements, enclosed in curly braces (`{' and `}'). Each statement specifies one thing to be done. The statements are separated by newlines or semicolons.
The curly braces around an action must be used even if the action contains only one statement, or even if it contains no statements at all. However, if you omit the action entirely, omit the curly braces as well. An omitted action is equivalent to `{ print $0 }'.
/foo/ { } # match foo, do nothing - empty action /foo/ # match foo, print the record - omitted action
Here are the kinds of statements supported in awk
:
- Expressions, which can call functions or assign values to variables (see section Expressions). Executing this kind of statement simply computes the value of the expression. This is useful when the expression has side effects (see section Assignment Expressions).
- Control statements, which specify the control flow of
awk
programs. Theawk
language gives you C-like constructs (if
,for
,while
, anddo
) as well as a few special ones (see section Control Statements in Actions). - Compound statements, which consist of one or more statements enclosed in curly braces. A compound statement is used in order to put several statements together in the body of an
if
,while
,do
orfor
statement. - Input statements, using the
getline
command (see section Explicit Input withgetline
), thenext
statement (see section Thenext
Statement), and thenextfile
statement (see section Thenextfile
Statement). - Output statements,
print
andprintf
. See section Printing Output. - Deletion statements, for deleting array elements. See section The
delete
Statement.
The next chapter covers control statements in detail.
Go to the first, previous, next, last section, table of contents.
'Programming > Script' 카테고리의 다른 글
GETOPTS (0) | 2023.04.17 |
---|---|
[awk] awk 연습 예제 (0) | 2014.08.28 |
[awk] The switch Statement (0) | 2014.08.28 |
[awk] Time Functions (0) | 2014.08.28 |
[awk] gawk - Date and time calculation functions (0) | 2014.08.28 |
[awk] The GNU Awk User's Guide (0) | 2014.08.28 |
[awk] The AWK Manual (0) | 2014.08.28 |
[RHEL] SU와 EOF 사용시 내부 변수 처리 (0) | 2014.05.07 |
쉘스크립트 컴파일러 shc (0) | 2014.01.29 |
[OS] Awk - A Tutorial & Introduction (Bruce Barnett) (0) | 2009.01.12 |