Hi Telemachus, I sure am glad to see you!
Yes, an XY problem.
The origins of my project is that there are FOSSware items--- including a few "toys" in the Debian repos--- which claim to solve certain problems, but they don't work very well and have very limited utility even for "toy problems".
Since I know from experience solving such problems "by hand" that one can do
much better than the FOSSware I have found on the web, I decided to try to develop my own set of scripts each peforming various specific small tasks, with the goal of eventually formulating an outline for a modular package, as a way of trying to give something back to the community (in the unlikely event I ever actually came up with anything useful). The general area involves text processing, but I am reluctant to say more in public.
As time and energy permit, I have been writing a few sample scripts to build my skills, generally by modifying an example I found in a book. For example, here is a script I wrote the other day which computes averages across lines in a file of numbers and which uses an associative array:
Code: Select all
#!/bin/bash
# Ahtiga Saraz; modified from an example in Dougherty and Robbins, sed and awk
# Input: a text file consisting lines of numbers separated by spaces
# (where the lines can contain different numbers of fields)
# Output: average of each line, followed by average of averages
cat $1 | mawk '
BEGIN { OFS = "\t" }
# Do this to each line
{
# compute line average
total = 0
for (i = 1; i <= NF; ++i)
total += $i
avg = total / NF
# assign average to element of an array for later reference
line_avg[NR] = avg
# assign number of fields to an array for later reference
numf[NR] = NF
# print number of observations and line average
print "Fields: ", NF, "Line Average: ", avg
}
# Compute average of line averages
END {
totnum = 0
for (x = 1; x <= NR; x++)
totnum += numf[x]
cum = 0
for (y = 1; y <= NR; y++)
cum += numf[y]*line_avg[y]
cavg = cum / totnum
print "Total fields: ", totnum, "Cumulative Average: ", cavg
}'
For example, given as input the file
this produces the output
Code: Select all
Fields: 3 Line Average: 11
Fields: 9 Line Average: 5
Fields: 1 Line Average: 100
Total fields: 13 Cumulative Average: 13.6923
(Because other scripts I use consist of long chains where pipes pass data on to another script, I usually use the cat file | awk ' stuff ' style of writing awk scripts. I mention this as an example because some regulars here are making feel defensive about my alleged unwillingness to try to learn.)
The problems I am having so much trouble are superficially similar: given a file like
I want to replace each entry by the value of a function which depends upon that entry and also upon the other entries in the line. I'd settle for doing this for just one line but it seems clear that if I can do it for one line I can do it for many and that should increase flexibility. I see no way of doing this kind of task except by at least two loops run after the other on each line, in which one computes first a quantity depending on all the values in the line, then the second computing the function value for each entry in the line. Then do this for each line.
There must be a way to do this using defined functions and associative arrays. tukuyomi suggested something which sounds interesting but I couldn't quite figure out how to make it work.
While I am asking here about awk, and naturally prefer to do as much early development as possible using tools I understand better than python, I am aware that python is often used these days for text processing tasks and am not averse to learning to use it bye and bye. So far I have not found a Python book which offers examples which seem relevant. (Quite possibly because I don't yet know Python, or I would perhaps better recognize relevance.) The on-line book called something like "Dive into python" proved frustrating.