Scheduled Maintenance: We are aware of an issue with Google, AOL, and Yahoo services as email providers which are blocking new registrations. We are trying to fix the issue and we have several internal and external support tickets in process to resolve the issue. Please see: viewtopic.php?t=158230

 

 

 

Two loops for the price of one awk?

Programming languages, Coding, Executables, Package Creation, and Scripting.
Post Reply
Message
Author
Ahtiga Saraz
Posts: 1014
Joined: 2009-06-15 01:19

Two loops for the price of one awk?

#1 Post by Ahtiga Saraz »

According to information I can find, awk has the following model:
  • input: a file consisting of lines all having the same form
and an awk program has the form
  • BEGIN by doing something
  • loop over the lines, doing the same things with each line
  • END by doing something
But it seems that awk would be much more useful if one could loop twice:
  • BEGIN by doing something
  • loop over the lines, doing the same things with each line
  • record a result in a variable
  • loop over the lines again, doing the same things to each line
  • END by doing something
Is it possible?

Put another way, the built in NF function in awk must do something to count up the number of fields. I want an SF function, where the fields are numeric and I sum them.
Ahtiga Saraz

Le peuple debout contre les tyrans! De l'audace, encore de l'audace, toujours l'audace!

User avatar
drl
Posts: 427
Joined: 2006-09-20 02:27
Location: Saint Paul, Minnesota, USA

Re: Two loops for the price of one awk?

#2 Post by drl »

Hi.

I think of awk as the preeminent data-in-fields-processor. The form of an awk program is a series of statements:

Code: Select all

pattern { action }
where BEGIN and END are optional, special patterns that allow actions to be performed before the list of files is read, and after all files are read. The action allows a syntax very much like c. The input text data files can be almost any form. I think early on I had an awk program that implemented much of nroff (not written by me).

The object NF is a variable maintained by awk, not a function.

One reason that I might prefer perl over awk is that perl can read non-text files, awk cannot.

Modern awk allows user functions so that you can do modular coding.

The article at http://en.wikipedia.org/wiki/AWK has a lot of information and references.

The site http://awk.info/ is for everything awk.

The http://www.unix.com forum has an amazing group of gifted awk coders.

The book by A.W.K. is still in print, but I have no idea why it is priced so high at $80 / $50 at Amazon; I think I paid about $20 for it long ago.

Best wishes ... cheers, drl
["Sure, I can help you with that." -- USBank voice recognition system.
( Mn, 2.6.x, AMD-64 3000+, ASUS A8V Deluxe, 3 GB, SATA + IDE, NV34 )
Debian Wiki | Packages | Backports | Netinstall

tukuyomi
Posts: 150
Joined: 2006-12-05 19:53
Contact:

Re: Two loops for the price of one awk?

#3 Post by tukuyomi »

BEGIN by doing something
loop over the lines, doing the same things with each line
record a result in a variable
loop over the lines again, doing the same things to each line
END by doing something

Code: Select all

awk '
BEGIN{Do stuff}
NR==FNR{Do stuff for file, record in a variable if you want; next}
{Do stuff for file again}
END{Do stuff}
' file file
NR is incremented each line, FNR too but with the difference that it's reseted with each input:
Say file has 10 lines. As it's called twice, FNR will go 0~10, then 0~10, while FNR will go 0~10~20. This explains the first loop (NR==FNR{...; next})

User avatar
Telemachus
Posts: 4574
Joined: 2006-12-25 15:53
Been thanked: 2 times

Re: Two loops for the price of one awk?

#4 Post by Telemachus »

Ahtiga Saraz wrote:According to information I can find, awk has the following model:
  • input: a file consisting of lines all having the same form
and an awk program has the form
  • BEGIN by doing something
  • loop over the lines, doing the same things with each line
  • END by doing something
But it seems that awk would be much more useful if one could loop twice:
  • BEGIN by doing something
  • loop over the lines, doing the same things with each line
  • record a result in a variable
  • loop over the lines again, doing the same things to each line
  • END by doing something
Is it possible?
I'm worried this is an XY problem. Can you tell us what the real goal is? Ideally give a concrete example with a small amount of realistic data.
Ahtiga Saraz wrote:Put another way, the built in NF function in awk must do something to count up the number of fields. I want an SF function, where the fields are numeric and I sum them.
NF is not a function. It's a built-in variable that stores the number of fields for each line. The code that computes that value is somewhere in the C interpreter (presumably written in C), and I don't think it's directly available to you as a user of awk. Having said that, summing columns or rows is not very hard in awk, and you can define your own functions as part of a larger awk script.

Again it would help if you told us more concretely what you're trying to do.
"We have not been faced with the need to satisfy someone else's requirements, and for this freedom we are grateful."
Dennis Ritchie and Ken Thompson, The UNIX Time-Sharing System

Ahtiga Saraz
Posts: 1014
Joined: 2009-06-15 01:19

Cowardly reluctance to address XY problem

#5 Post by Ahtiga Saraz »

Hi Telemachus, I sure am glad to see you!

Yes, an XY problem.

The origins of my project is that there are FOSSware items--- including a few "toys" in the Debian repos--- which claim to solve certain problems, but they don't work very well and have very limited utility even for "toy problems".

Since I know from experience solving such problems "by hand" that one can do much better than the FOSSware I have found on the web, I decided to try to develop my own set of scripts each peforming various specific small tasks, with the goal of eventually formulating an outline for a modular package, as a way of trying to give something back to the community (in the unlikely event I ever actually came up with anything useful). The general area involves text processing, but I am reluctant to say more in public.

As time and energy permit, I have been writing a few sample scripts to build my skills, generally by modifying an example I found in a book. For example, here is a script I wrote the other day which computes averages across lines in a file of numbers and which uses an associative array:

Code: Select all

#!/bin/bash
# Ahtiga Saraz; modified from an example in Dougherty and Robbins, sed and awk
# Input: a text file consisting lines of numbers separated by spaces
# (where the lines can contain different numbers of fields)
# Output: average of each line, followed by average of averages
cat $1 | mawk '
BEGIN { OFS = "\t" }
# Do this to each line
{
# compute line average
        total = 0
        for (i = 1; i <= NF; ++i)
                total += $i
        avg = total / NF
# assign average to element of an array for later reference
        line_avg[NR] = avg
# assign number of fields to an array for later reference
        numf[NR] = NF
# print number of observations and line average
        print "Fields: ", NF, "Line Average: ", avg
}
# Compute average of line averages
END {
        totnum = 0
        for (x = 1; x <= NR; x++)
                totnum += numf[x]
        cum = 0
        for (y = 1; y <= NR; y++)
                cum += numf[y]*line_avg[y]
        cavg = cum / totnum
        print "Total fields: ", totnum, "Cumulative Average: ", cavg
}'
For example, given as input the file

Code: Select all

10 11 12
1 2 3 4 5 6 7 8 9
100
this produces the output

Code: Select all

Fields:         3       Line Average:   11
Fields:         9       Line Average:   5
Fields:         1       Line Average:   100
Total fields:   13      Cumulative Average:     13.6923
(Because other scripts I use consist of long chains where pipes pass data on to another script, I usually use the cat file | awk ' stuff ' style of writing awk scripts. I mention this as an example because some regulars here are making feel defensive about my alleged unwillingness to try to learn.)

The problems I am having so much trouble are superficially similar: given a file like

Code: Select all

110 5 25
3 17
1 2 34 13
I want to replace each entry by the value of a function which depends upon that entry and also upon the other entries in the line. I'd settle for doing this for just one line but it seems clear that if I can do it for one line I can do it for many and that should increase flexibility. I see no way of doing this kind of task except by at least two loops run after the other on each line, in which one computes first a quantity depending on all the values in the line, then the second computing the function value for each entry in the line. Then do this for each line.

There must be a way to do this using defined functions and associative arrays. tukuyomi suggested something which sounds interesting but I couldn't quite figure out how to make it work.

While I am asking here about awk, and naturally prefer to do as much early development as possible using tools I understand better than python, I am aware that python is often used these days for text processing tasks and am not averse to learning to use it bye and bye. So far I have not found a Python book which offers examples which seem relevant. (Quite possibly because I don't yet know Python, or I would perhaps better recognize relevance.) The on-line book called something like "Dive into python" proved frustrating.
Ahtiga Saraz

Le peuple debout contre les tyrans! De l'audace, encore de l'audace, toujours l'audace!

tukuyomi
Posts: 150
Joined: 2006-12-05 19:53
Contact:

Re: Cowardly reluctance to address XY problem

#6 Post by tukuyomi »

Ahtiga Saraz wrote: For example, given as input the file

Code: Select all

10 11 12
1 2 3 4 5 6 7 8 9
100
this produces the output

Code: Select all

Fields:         3       Line Average:   11
Fields:         9       Line Average:   5
Fields:         1       Line Average:   100
Total fields:   13      Cumulative Average:     13.6923
As an awk example:

Code: Select all

#!/bin/sh

awk '
{sum=0
for(i=1;i<=NF;i++)sum+=$i
nf+=NF; cumul+=sum
print "Fields:\t"NF"\tLine Average:\t"sum/NF
}
END{print "Total Fields:\t"nf"\tCumulative Avg:\t"cumul/nf}
' file

Ahtiga Saraz
Posts: 1014
Joined: 2009-06-15 01:19

I like your awk style

#7 Post by Ahtiga Saraz »

Nice!

Just to clarify: I was trying to use the weighted average as an example of the general kind of problem where I want to compute the values of a function which depends on all the entries in a line of data.
Ahtiga Saraz

Le peuple debout contre les tyrans! De l'audace, encore de l'audace, toujours l'audace!

Post Reply