1

I have several fairly large csv files that I need to sum the values of each column in and transpose the column header.

我在天马彩票平台输了三十万Sample csv:

col1,col2,col3
enabled,disabled,active
disabled,disabled,enabled
N/A,enabled,active
enabled,N/A,disabled

Desired output:

col1,2 enabled,1 disabled,1 N/A
col2,1 enabled,2 disabled,1 N/A
col3,1 enabled,1 disabled,2 active

The actual csv has a lot more columns and rows so it would be preferable that it is able to iterate the file automatically. I could probably come up with some hack job of an awk program to do one column at a time but would prefer to be able to the entire file at once and don't know where to start with that. The output doesn't need to be in the exact format I've included but at least similar.

2
$ cat tst.awk
BEGIN { FS=OFS="," }
NR==1 { numRows = split($0,keys); next }
{
    for (i=1; i<=NF; i++) {
        sum[i,$i]++
        vals[$i]
    }
}
END {
    for (rowNr=1; rowNr<=numRows; rowNr++) {
        printf "%s", keys[rowNr]
        for (val in vals) {
            printf "%s%d %s", OFS, sum[rowNr,val], val
        }
        print ""
    }
}

$ awk -f tst.awk file
col1,1 disabled,2 enabled,1 N/A,0 active
col2,2 disabled,1 enabled,1 N/A,0 active
col3,1 disabled,1 enabled,0 N/A,2 active

我在天马彩票平台输了三十万or probably more usefully:

$ cat tst.awk
BEGIN { FS=OFS="," }
NR==1 { numRows = split($0,keys); next }
{
    for (i=1; i<=NF; i++) {
        sum[i,$i]++
        vals[$i]
    }
}
END {
    printf "%s", "key"
    for (val in vals) {
        printf "%s%s", OFS, val
    }
    print ""

    for (rowNr=1; rowNr<=numRows; rowNr++) {
        printf "%s", keys[rowNr]
        for (val in vals) {
            printf "%s%d", OFS, sum[rowNr,val]
        }
        print ""
    }
}

$ awk -f tst.awk file
key,disabled,enabled,N/A,active
col1,1,2,1,0
col2,2,1,1,0
col3,1,1,0,2
  • 1
    Works perfect, thanks! – jesse_b Feb 25 at 22:39
1

You can get pretty close with Miller:

mlr --icsvlite --odkvp put -q 'for(k,v in $*) { @count[k][v] += 1; } end {emit @count,"col"}' sample.csv
col=col1,enabled=2,disabled=1,N/A=1
col=col2,disabled=2,enabled=1,N/A=1
col=col3,active=2,enabled=1,disabled=1

Your Answer

By clicking “Post Your Answer”, you agree to our , and

Not the answer you're looking for? Browse other questions tagged or ask your own question.