Workshop in Computational Bioskills - Lesson 1

Workshop in Computational Bioskills - Spring 2008

Lesson 1 - Unix Shell Programming

Part I - The basics of text manipulation in UNIX
Part II - Primitive motif search - E.Coli sigma70 sites
Part III - Visualization of E.Coli genes' length

Part IV - Whole genomes nucleotide composition
Part V - foreach & friends
Part VI - Writing a shell script

Part VII - Regular Expressions


Part I - The basics of text manipulation in UNIX

<201|0>bioskill:~> cal
   February 2005
Su Mo Tu We Th Fr Sa
       1  2  3  4  5
 6  7  8  9 10 11 12
13 14 15 16 17 18 19
20 21 22 23 24 25 26
27 28
<202|0>bioskill:~> cal 2005

                            2005

      January               February               March
Su Mo Tu We Th Fr Sa  Su Mo Tu We Th Fr Sa  Su Mo Tu We Th Fr Sa
                   1         1  2  3  4  5         1  2  3  4  5
 2  3  4  5  6  7  8   6  7  8  9 10 11 12   6  7  8  9 10 11 12
 9 10 11 12 13 14 15  13 14 15 16 17 18 19  13 14 15 16 17 18 19
16 17 18 19 20 21 22  20 21 22 23 24 25 26  20 21 22 23 24 25 26
23 24 25 26 27 28 29  27 28                 27 28 29 30 31
30 31                                       
       April                  May                   June
Su Mo Tu We Th Fr Sa  Su Mo Tu We Th Fr Sa  Su Mo Tu We Th Fr Sa
                1  2   1  2  3  4  5  6  7            1  2  3  4
 3  4  5  6  7  8  9   8  9 10 11 12 13 14   5  6  7  8  9 10 11
10 11 12 13 14 15 16  15 16 17 18 19 20 21  12 13 14 15 16 17 18
17 18 19 20 21 22 23  22 23 24 25 26 27 28  19 20 21 22 23 24 25
24 25 26 27 28 29 30  29 30 31              26 27 28 29 30
                                            
        July                 August              September
Su Mo Tu We Th Fr Sa  Su Mo Tu We Th Fr Sa  Su Mo Tu We Th Fr Sa
                1  2      1  2  3  4  5  6               1  2  3
 3  4  5  6  7  8  9   7  8  9 10 11 12 13   4  5  6  7  8  9 10
10 11 12 13 14 15 16  14 15 16 17 18 19 20  11 12 13 14 15 16 17
17 18 19 20 21 22 23  21 22 23 24 25 26 27  18 19 20 21 22 23 24
24 25 26 27 28 29 30  28 29 30 31           25 26 27 28 29 30
31                                          
      October               November              December
Su Mo Tu We Th Fr Sa  Su Mo Tu We Th Fr Sa  Su Mo Tu We Th Fr Sa
                   1         1  2  3  4  5               1  2  3
 2  3  4  5  6  7  8   6  7  8  9 10 11 12   4  5  6  7  8  9 10
 9 10 11 12 13 14 15  13 14 15 16 17 18 19  11 12 13 14 15 16 17
16 17 18 19 20 21 22  20 21 22 23 24 25 26  18 19 20 21 22 23 24
23 24 25 26 27 28 29  27 28 29 30           25 26 27 28 29 30 31
30 31                                       

<204|0>bioskill:~> cal 2005 | cat -n
     1	                             2005
     2	
     3	      January               February               March
     4	Su Mo Tu We Th Fr Sa  Su Mo Tu We Th Fr Sa  Su Mo Tu We Th Fr Sa
     5	                   1         1  2  3  4  5         1  2  3  4  5
     6	 2  3  4  5  6  7  8   6  7  8  9 10 11 12   6  7  8  9 10 11 12
     7	 9 10 11 12 13 14 15  13 14 15 16 17 18 19  13 14 15 16 17 18 19
     8	16 17 18 19 20 21 22  20 21 22 23 24 25 26  20 21 22 23 24 25 26
     9	23 24 25 26 27 28 29  27 28                 27 28 29 30 31
    10	30 31                                       
    11	       April                  May                   June
    12	Su Mo Tu We Th Fr Sa  Su Mo Tu We Th Fr Sa  Su Mo Tu We Th Fr Sa
    13	                1  2   1  2  3  4  5  6  7            1  2  3  4
    14	 3  4  5  6  7  8  9   8  9 10 11 12 13 14   5  6  7  8  9 10 11
    15	10 11 12 13 14 15 16  15 16 17 18 19 20 21  12 13 14 15 16 17 18
    16	17 18 19 20 21 22 23  22 23 24 25 26 27 28  19 20 21 22 23 24 25
    17	24 25 26 27 28 29 30  29 30 31              26 27 28 29 30
    18	                                            
    19	        July                 August              September
    20	Su Mo Tu We Th Fr Sa  Su Mo Tu We Th Fr Sa  Su Mo Tu We Th Fr Sa
    21	                1  2      1  2  3  4  5  6               1  2  3
    22	 3  4  5  6  7  8  9   7  8  9 10 11 12 13   4  5  6  7  8  9 10
    23	10 11 12 13 14 15 16  14 15 16 17 18 19 20  11 12 13 14 15 16 17
    24	17 18 19 20 21 22 23  21 22 23 24 25 26 27  18 19 20 21 22 23 24
    25	24 25 26 27 28 29 30  28 29 30 31           25 26 27 28 29 30
    26	31                                          
    27	      October               November              December
    28	Su Mo Tu We Th Fr Sa  Su Mo Tu We Th Fr Sa  Su Mo Tu We Th Fr Sa
    29	                   1         1  2  3  4  5               1  2  3
    30	 2  3  4  5  6  7  8   6  7  8  9 10 11 12   4  5  6  7  8  9 10
    31	 9 10 11 12 13 14 15  13 14 15 16 17 18 19  11 12 13 14 15 16 17
    32	16 17 18 19 20 21 22  20 21 22 23 24 25 26  18 19 20 21 22 23 24
    33	23 24 25 26 27 28 29  27 28 29 30           25 26 27 28 29 30 31
    34	30 31                                       
<205|0>bioskill:~> cal 2005 | head -10
                             2005

      January               February               March
Su Mo Tu We Th Fr Sa  Su Mo Tu We Th Fr Sa  Su Mo Tu We Th Fr Sa
                   1         1  2  3  4  5         1  2  3  4  5
 2  3  4  5  6  7  8   6  7  8  9 10 11 12   6  7  8  9 10 11 12
 9 10 11 12 13 14 15  13 14 15 16 17 18 19  13 14 15 16 17 18 19
16 17 18 19 20 21 22  20 21 22 23 24 25 26  20 21 22 23 24 25 26
23 24 25 26 27 28 29  27 28                 27 28 29 30 31
30 31              
<206|0>bioskill:~> cal 2005 | cat -n | head -10
    1	                             2005
     2	
     3	      January               February               March
     4	Su Mo Tu We Th Fr Sa  Su Mo Tu We Th Fr Sa  Su Mo Tu We Th Fr Sa
     5	                   1         1  2  3  4  5         1  2  3  4  5
     6	 2  3  4  5  6  7  8   6  7  8  9 10 11 12   6  7  8  9 10 11 12
     7	 9 10 11 12 13 14 15  13 14 15 16 17 18 19  13 14 15 16 17 18 19
     8	16 17 18 19 20 21 22  20 21 22 23 24 25 26  20 21 22 23 24 25 26
     9	23 24 25 26 27 28 29  27 28                 27 28 29 30 31
    10	30 31         
<207|0>bioskill:~> cal 2005 | cat -n | tail -10
    25	24 25 26 27 28 29 30  28 29 30 31           25 26 27 28 29 30
    26	31                                          
    27	      October               November              December
    28	Su Mo Tu We Th Fr Sa  Su Mo Tu We Th Fr Sa  Su Mo Tu We Th Fr Sa
    29	                   1         1  2  3  4  5               1  2  3
    30	 2  3  4  5  6  7  8   6  7  8  9 10 11 12   4  5  6  7  8  9 10
    31	 9 10 11 12 13 14 15  13 14 15 16 17 18 19  11 12 13 14 15 16 17
    32	16 17 18 19 20 21 22  20 21 22 23 24 25 26  18 19 20 21 22 23 24
    33	23 24 25 26 27 28 29  27 28 29 30           25 26 27 28 29 30 31
    34	30 31                        
<208|0>bioskill

:~> cal 2005 | cat -n | tail -n +20
    20	Su Mo Tu We Th Fr Sa  Su Mo Tu We Th Fr Sa  Su Mo Tu We Th Fr Sa
    21	                1  2      1  2  3  4  5  6               1  2  3
    22	 3  4  5  6  7  8  9   7  8  9 10 11 12 13   4  5  6  7  8  9 10
    23	10 11 12 13 14 15 16  14 15 16 17 18 19 20  11 12 13 14 15 16 17
    24	17 18 19 20 21 22 23  21 22 23 24 25 26 27  18 19 20 21 22 23 24
    25	24 25 26 27 28 29 30  28 29 30 31           25 26 27 28 29 30
    26	31                                          
    27	      October               November              December
    28	Su Mo Tu We Th Fr Sa  Su Mo Tu We Th Fr Sa  Su Mo Tu We Th Fr Sa
    29	                   1         1  2  3  4  5               1  2  3
    30	 2  3  4  5  6  7  8   6  7  8  9 10 11 12   4  5  6  7  8  9 10
    31	 9 10 11 12 13 14 15  13 14 15 16 17 18 19  11 12 13 14 15 16 17
    32	16 17 18 19 20 21 22  20 21 22 23 24 25 26  18 19 20 21 22 23 24
    33	23 24 25 26 27 28 29  27 28 29 30           25 26 27 28 29 30 31
    34	30 31                 
<203|0>bioskill:~> man grep 

!?! What is the meaning of the flags -i,-v,-A,-B in the grep command?

<203|0>bioskill:~> cal 2005 | grep 30

23 24 25 26 27 28 29  27 28                 27 28 29 30 31
30 31                                       
24 25 26 27 28 29 30  29 30 31              26 27 28 29 30
24 25 26 27 28 29 30  28 29 30 31           25 26 27 28 29 30
23 24 25 26 27 28 29  27 28 29 30           25 26 27 28 29 30 31
30 31     
<209|0>bioskill:~> cal 2005 | tr 'A-Z' 'a-z'
                             2005

      january               february               march
su mo tu we th fr sa  su mo tu we th fr sa  su mo tu we th fr sa
                   1         1  2  3  4  5         1  2  3  4  5
 2  3  4  5  6  7  8   6  7  8  9 10 11 12   6  7  8  9 10 11 12
 9 10 11 12 13 14 15  13 14 15 16 17 18 19  13 14 15 16 17 18 19
16 17 18 19 20 21 22  20 21 22 23 24 25 26  20 21 22 23 24 25 26
23 24 25 26 27 28 29  27 28                 27 28 29 30 31
30 31                                       
       april                  may                   june
su mo tu we th fr sa  su mo tu we th fr sa  su mo tu we th fr sa
                1  2   1  2  3  4  5  6  7            1  2  3  4
 3  4  5  6  7  8  9   8  9 10 11 12 13 14   5  6  7  8  9 10 11
10 11 12 13 14 15 16  15 16 17 18 19 20 21  12 13 14 15 16 17 18
17 18 19 20 21 22 23  22 23 24 25 26 27 28  19 20 21 22 23 24 25
24 25 26 27 28 29 30  29 30 31              26 27 28 29 30
                                            
        july                 august              september
su mo tu we th fr sa  su mo tu we th fr sa  su mo tu we th fr sa
                1  2      1  2  3  4  5  6               1  2  3
 3  4  5  6  7  8  9   7  8  9 10 11 12 13   4  5  6  7  8  9 10
10 11 12 13 14 15 16  14 15 16 17 18 19 20  11 12 13 14 15 16 17
17 18 19 20 21 22 23  21 22 23 24 25 26 27  18 19 20 21 22 23 24
24 25 26 27 28 29 30  28 29 30 31           25 26 27 28 29 30
31                                          
      october               november              december
su mo tu we th fr sa  su mo tu we th fr sa  su mo tu we th fr sa
                   1         1  2  3  4  5               1  2  3
 2  3  4  5  6  7  8   6  7  8  9 10 11 12   4  5  6  7  8  9 10
 9 10 11 12 13 14 15  13 14 15 16 17 18 19  11 12 13 14 15 16 17
16 17 18 19 20 21 22  20 21 22 23 24 25 26  18 19 20 21 22 23 24
23 24 25 26 27 28 29  27 28 29 30           25 26 27 28 29 30 31
30 31                                       
<210|0>bioskill:~> cal | sed 's/february/bioskill/'
   February 2005
Su Mo Tu We Th Fr Sa
       1  2  3  4  5
 6  7  8  9 10 11 12
13 14 15 16 17 18 19
20 21 22 23 24 25 26
27 28
<211|0>bioskill:~> cal | sed 's/february/bioskill/i'
   bioskill 2005
Su Mo Tu We Th Fr Sa
       1  2  3  4  5
 6  7  8  9 10 11 12
13 14 15 16 17 18 19
20 21 22 23 24 25 26
27 28
!?! What is the flag -g in the sed command?
<212|0>bioskill:~> cal | fold -w 7
   Febr
uary 20
05
Su Mo T
u We Th
 Fr Sa
       
1  2  3
  4  5
 6  7  
8  9 10
 11 12
13 14 1
5 16 17
 18 19
20 21 2
2 23 24
 25 26
27 28
!?! Instead of using cat for viewing a file, try more or less (Look at the MAN pages)

Part II - Primitive motif search - E.Coli sigma70 sites

<201|0>bioskill:~> cd Data/EColi/Promoters/

<202|0>bioskill:Promoters> cat50-TSS.tfa | head -10

>ORF83.1
GCATAAAAAACTCTGCTGGCATTCACAAATGCGCAGGGGTAAAACGTTTC
>ORF83.2
AATGCGCAGGGGTAAAACGTTTCCTGTAGCACCGTGAGTTATACTTTGTA
>accA
TATGCTCGCGGGCTTGCTATCTCGCTGACGGACAGGCAAATTGATGACCA
>accB
TTACATGTTAGCTGTTGATTATCTTCCCTGATAAGACCAGTATTTAGCTG
>accD
TCGACCACTTTTTTATCCAAAGTTTCGGGCTGTTATGTTTTAATGTGCAA
!?! Do you know what is the Fasta format ?

<203|141>bioskill:Promoters> cat 50-TSS.tfa | FASTA2line.pl | head -10

>ORF83.1        GCATAAAAAACTCTGCTGGCATTCACAAATGCGCAGGGGTAAAACGTTTC
>ORF83.2        AATGCGCAGGGGTAAAACGTTTCCTGTAGCACCGTGAGTTATACTTTGTA
>accA   TATGCTCGCGGGCTTGCTATCTCGCTGACGGACAGGCAAATTGATGACCA
>accB   TTACATGTTAGCTGTTGATTATCTTCCCTGATAAGACCAGTATTTAGCTG
>accD   TCGACCACTTTTTTATCCAAAGTTTCGGGCTGTTATGTTTTAATGTGCAA
>aceBAK TAAAATGGAAATTGTTTTTGATTTTGCATTTTAAATGAGTAGTCTTAGTT
>aceE   ACTAAACGTAGAACCTGTCTTATTGAGCTTTCCGGCGAGAGTTCAATGGG
>acnAP1 CGTTATTCCAGACGACTGGCAACTAACATCGCAGCAGCAAGCCTTTATAG
>acnAP2 ATTTGGGTTGTTATCAAATCGTTACGCGATGTTTGTGTTATCTTTAATAT
>acnB   TTTTTTGTAAACAGATTAACACCTCGTCAAAATCCTGCTATTCTGCCCGT

<204|141>bioskill:Promoters> cat 50-TSS.tfa | FASTA2line.pl | grep -v ORF | head -10

>accA   TATGCTCGCGGGCTTGCTATCTCGCTGACGGACAGGCAAATTGATGACCA
>accB   TTACATGTTAGCTGTTGATTATCTTCCCTGATAAGACCAGTATTTAGCTG
>accD   TCGACCACTTTTTTATCCAAAGTTTCGGGCTGTTATGTTTTAATGTGCAA
>aceBAK TAAAATGGAAATTGTTTTTGATTTTGCATTTTAAATGAGTAGTCTTAGTT
>aceE   ACTAAACGTAGAACCTGTCTTATTGAGCTTTCCGGCGAGAGTTCAATGGG
>acnAP1 CGTTATTCCAGACGACTGGCAACTAACATCGCAGCAGCAAGCCTTTATAG
>acnAP2 ATTTGGGTTGTTATCAAATCGTTACGCGATGTTTGTGTTATCTTTAATAT
>acnB   TTTTTTGTAAACAGATTAACACCTCGTCAAAATCCTGCTATTCTGCCCGT
>ada    GCGCAAGATTGTTGGTTTTTGCGTGATGGTGACCGGGCAGCCTAAAGGCT
>adhE   GTTGTGCAAAACATGCTAATGTAGCCACCAAATCATACTACAATTTATTA

<205|141>bioskill:Promoters> cat 50-TSS.tfa | FASTA2line.pl | grep -v ORF | FASTAfromline.pl > 50-TSS.no-ORF.tfa

<206|141>bioskill:Promoters> head -7 50-TSS.no-ORF.tfa

>accA
TATGCTCGCGGGCTTGCTATCTCGCTGACGGACAGGCAAATTGATGACCA
>accB
TTACATGTTAGCTGTTGATTATCTTCCCTGATAAGACCAGTATTTAGCTG
>accD
TCGACCACTTTTTTATCCAAAGTTTCGGGCTGTTATGTTTTAATGTGCAA
>aceBAK

<207|0>bioskill:Promoters> FASTA2line.pl 50-TSS.tfa | wc -l

    447

<208|0>bioskill:Promoters> FASTA2line.pl 50-TSS.tfa | grep TATAAT | wc -l

     25

<209|1>bioskill:Promoters> FASTA2line.pl 50-TSS.tfa | grep TTGACA | wc -l

     18

<210|0>bioskill:Promoters> FASTA2line.pl 50-TSS.tfa | grep TATAAT | grep TTGACA | wc -l

      0

Part III - Statistics on E.Coli genes' length

<201|0>bioskill:~> cd Data/EColi/

<202|0>bioskill:EColi> head -10 Colibri_Gene.list

Colibri name      Gene length     SWISS-PROT      Location (kb)   Description
aarF    1641    P27854  4017.80 Regulator of 2'-N-acetyltransferase; involved in respiratory cofactor ubiquinone production
aas     2160    P31119  2974.00 2-Acyl-glycerophosphoethanolamine acyltransferase; acyl-ACP synthetase; salvage pathway for reacylation; inner membrane; bifunctional for turnover/incorporation
aat     705     P23885   926.70 Aminoacyl-tRNA-protein-transferase
abc     1032    P30750   222.60 ABC protein family homolog
abgA    1311    P77357  1402.60 Gene in putative abgABT operon; function unknown
abgB    1446    P76052  1401.30 Gene in putative abgABT operon; function unknown
abgR    909     P77744  1402.80 Putative regulator of abgABT operon
abgT    1533    P46133  1399.80 Para-aminobenzoyl-glutamate utilization; cryptic gene
abrB    1047    P75747   747.00 Possible regulator of aidB expression
!?! Are you familiar with the awk pattern scanning and processing language?
AWK scans each input file for lines that match a set of patterns. With each pattern there can be an associated action. Such a statement has the form: (pattern){ action }

<203|141>bioskill:EColi> cat Colibri_Gene.list | awk -F '\t' '{print $2}' | head -10

Gene length
1641
2160
705
1032
1311
1446
909
1533
1047

<204|141>bioskill:EColi> cat Colibri_Gene.list | awk -F'\t' '(FNR>1){print $2}' | head -10

1641
2160
705
1032
1311
1446
909
1533
1047
960
!?! What will be the result of: awk -F'\t' '(FNR>1){print $0}' and awk -F'\t' '(FNR>1)'

<205|141>bioskill:EColi> cat Colibri_Gene.list | awk -F'\t' '(FNR>1){print NF}'|uniq

5
!?! What will be the result of the same command when omitting the -F'\t'?

<205|141>bioskill:EColi> cat Colibri_Gene.list | awk -F'\t' '(FNR>1){sum+=$2;n++}END{print sum/n}'

948.826

<206|141>bioskill:EColi> tail +2 Colibri_Gene.list | cut -f 2 | stats.pl

Average : 948.826452064381
Sum     : 4067619
Sum Sqrs: 5631951129
Variance: 413552.642295925
Std dev.: 643.080587715043
N       : 4287

<207|0>bioskill:EColi> tail +2 Colibri_Gene.list | awk '{a[int($2/100)]++}END{for (i in a){print 100*i,a[i]}}' | sort -g > tmp

<208|0>bioskill:EColi> gnuplot

gnuplot> plot 'tmp' with lines
gnuplot> exit
!?! We will learn more about gnuplot and other graphical tools in Lesson 10

Part IV - Whole genomes nucleotide composition
Can we find the nucleotide composition by using simple shell commands?

<201|0>bioskill:~> cd Data/Bacteria

<202|0>bioskill:Bacteria> ls -l

total 1954
-rw-r--r--   1 bioskill users      923807 Mar 10 19:48 Borrelia_Burgdorferi.tfa
-rw-r--r--   1 bioskill users      828126 Mar 10 19:44 Mycoplasma_Pneumonia.tfa

<203|0>bioskill:Bacteria> grep -v '^>' Borrelia_Burgdorferi.tfa | head -10

TAAATATAATTTAATAGTATAAAAAAAATTAAATCAAATTAATAATAGTTTAAAAAACTGTTTGTATAAT
ATAATATTATTATATATAATATTAAGCAACTACTATGATACTAATGAAGTATAGTGCTATTTTATTAATA
TGTAGCGTTAATTTATTTTGTTTTCAAAATAAATTAACTACTTCTCGATGGGAATTCCCTAAAGAAGATT
TAATTAAAAAAAAAATAAAAATAGGCATAATTTACCATAATTACATAAATTCTATCTTTTACAATGAAAA
TTATAAATACATTGCCTTTATCGGAATATTGACATCTTATAATGAATGGATTGAAATACAATTTAGCCCC
ATAAATTTTTTTACTATCCCAACAAATAAAGATTTTATTTCAAATACTTATTTCAATTTAGCTTTCACTA
TTTACATTACCAAGTATTCAATTTTAACTGATACACTTGCTATAAAATTTTTTATTGGAACCCAAATCGA
TTTAACTCTGAGAACTACTATATTTACAGGAAAAACAACTCATGCATTTCTCTATCCAATTCTTCCCATA
ATTACCTTCAAATTTGAAATTGATTTCATACCTAATAACTATAGTATTTACTATAAATTATCGACTTCTT
TTAAAGAATTTATCCTTTTAGATCTAGGAATTTCTATATTTATATAATCCTTTTTTTATTATAGAACTTT

<204|0>bioskill:Bacteria> grep -v '^>' Borrelia_Burgdorferi.tfa | tr 'acgt' 'ACGT' | tr -cd 'ACGT' | fold -w 1 | sort | uniq -c

323079 A
130760 C
129646 G
327196 T

<205|0>bioskill:Bacteria> grep -v '^>' Mycoplasma_Pneumonia.tfa | tr 'acgt' 'ACGT' | tr -cd 'ACGT' | fold -w 1 | sort | uniq -c

249211 A
162920 C
163703 G
240560 T

<205|0>bioskill:Bacteria> grep -v '^>' Mycoplasma_Pneumonia.tfa | fold -w 1 | awk '{a[$1]++}END{for (i in a){print i,a[i]}}'

A 249211
C 162920
G 163703
T 240560

<206|0>bioskill:Bacteria> cd ../Virus

<207|0>bioskill:Virus> ls -l

total 242
-rw-r--r--   1 bioskill users      235788 Mar 10 19:59 Amsacta_Moorei_Entomopoxvirus.tfa
-rw-r--r--   1 bioskill users       10589 Mar 10 19:48 Human_Immunodeficiency_Virus.tfa

<208|0>bioskill:Virus> grep -v '^>' Amsacta_Moorei_Entomopoxvirus.tfa | head -10

ATTTTTTTAAAATGAAAAAAAAAAATATCATAACTACTAACTATGGATTTACCTATAGAAATTTTAGAAA
TTATATTTAATTATACAGATACATACATAAAATTATAATTTATATATTTAAAATATTTAGAATTTATTGA
AAATTAGTAAAATTAGATTGTTCTAAAACATATATTGATTCTCTAAAAGGAATACATTATCTTACTAATT
TACAAAAATTAATTCTTTAAAAGAAATATGTTGCCTTAATAATATTAAAAAAATAAATTGTTCATATACA
ATCATTGATTCTCTAAAAGGAATAAGTCTTAATAATTTAGAAGAATTATATTGTTATAATATAAAAATTT
ATTCTTTAAATATAATAATAAAAAATCTGCTTATTAAAAATATTAAATGGTTATAAATACATAAATTAAT
TATTTTATATAAATTATTGTTAAACATTTATATTAATATTCTAATATTAAAAATTGAAAAAAAAAATAAT
TATGTTAAAATGGAGTTACCTGTAGAAATGTTAGAAATTATATTTAATTATTTAGATAATGATACTAAAT
TACAATTTATAGATTCAAAATGTATTATATCAAAACTTATATATAAATTAAAATATAATTCTTGTTTAAA
AGAAATAAAGAATTTTATTAATTTAAAAGAATTAATATATAATAATTATTATATAAAATCTTTAGAAGGT

<209|0>bioskill:Virus> grep -v '^>' Amsacta_Moorei_Entomopoxvirus.tfa | tr 'acgt' 'ACGT' | tr -cd 'ACGT' | fold -w 1 | awk '{a[$1]++}END{for (i in a){print i,a[i]}}'

A  94121
C  20868
G  20454
T  96949

<210|0>bioskill:Virus> grep -v '^>' Human_Immunodeficiency_Virus.tfa | tr 'acgt' 'ACGT' | tr -cd 'ACGT' | fold -w 1 | awk '{a[$1]++}END{for (i in a){print i,a[i]}}'

A  3506
C  2132
G  2598
T  2123

Text:    cat, sort, uniq, grep, echo
         head, tail, wc
         tr, cut, fold
         awk, sed
         paste, join
Web:     lynx, wget
Math:    bc
Control: if, foreach, while, 
         jobs, kill, ctrl-D, ctrl-S, ctrl-Q, bg, fg
         [or in general: tcsh manual, (ba)sh manual]
Files:   File redirections (<, >, >>, >&, >!, >&!), tee