Workshop in Computational Bioskills

Workshop in Computational Bioskills - Lesson 3

Workshop in Computational Bioskills - Spring 2011

Back to Lesson 2
Part 1 - Control flow
Part 2 - Basic I/O
Part 3 - File Handles (Opening files for read/write)
Part 4 - File Tests
Part 5 - Data Manipulation
Part 6 - Functions
Part 7 - More Functions (Functions on Strings)

All About Control Flow in this link

Basic I/O:

- To read one line from the STDIN into a scalar
$line = <STDIN>;

- To read entire STDIN into an array (each line in a cell of each own)
@lines = <STDIN>;

- Reading STDIN, a line at a time (into $_)
while (defined($_ = <STDIN>)) {
...
}

- Same thing, only nicer...
while (<STDIN>) {
...
}

The <> operator:
Input from <> comes either from standard input, or from each file listed on the command line. Here's how it works: the first time <> is evaluated, the @ARGV array is checked, and if it is empty, $ARGV[0] is set to "-", which when opened gives you standard input. The @ARGV array is then processed as a list of filenames.

Output:
Nothing new to say here.

- You can use print to write a scalar into a file handle. (STDOUT is the default one)
- printf will do the same, but using a nicer format.

File Handles:

Reading from STDIN is always a pleasure, but sometimes we'll want to work
directly with files.

So how do we open files:
(Why don't you read for yourselves ?! Try perldoc -f open)

- For reading
open (IN, "$file"); # will open $file for reading
open (IN, "<$file"); # will do the same.
open (IN, "-"); # will open STDIN

- For writing
open (OUT, ">$outfile"); # $outfile will be truncated and opened for writing.
open (OUT, ">>$outfile"); # $outfile will be appended.
(In both cases, $outfile will be created unless it already exists.)

Reading from file handles:
As we've seen with STDIN, reading from a file handle is easy:
while (<IN>) {
... # parse each line (in $_)
}

Writing to file handles:
print OUT "Hello, world"; # prints into (already opened) OUT
print "Hello, world"; # prints into STDOUT
print STDERR "Hello, world"; # prints into STDERR (no need to open.)

Closing files:
Very important (yet simple):
close(IN);
close(OUT);

Let's see an example: Intersections between two files: intersect.pl

File Tests:

A simple way to test many properties of files, can be done with the -X operator.
(See perldoc -f -X)

Some favorite features:
-r File is readable by effective uid/gid.
-w File is writable by effective uid/gid.
-x File is executable by effective uid/gid.
-o File is owned by effective uid.

-e File exists.
-z File has zero size.
-s File has nonzero size (returns size).

-f File is a plain file.
-d File is a directory.
-l File is a symbolic link.
-p File is a named pipe (FIFO), or Filehandle is a pipe.
-S File is a socket.
-b File is a block special file.
-c File is a character special file.
-t Filehandle is opened to a tty.

-T File is a text file.
-B File is a binary file (opposite of -T).

Data Manipulation

o split

(See perldoc -f split)

Splits a string into an array of strings, and returns it. By default, empty leading fields are preserved, and empty trailing ones are deleted.

$delim = " ";
@after = split ( /$delim/,"Hello, world!" ); # $after[0] is 'Hello,' and $after[1] is 'world!'

o join

(See perldoc -f join)

$result = join ( "\t\t", @after );# $result is 'Hello world!'

Sorting Arrays:
(See perldoc -f sort)

o sort SUBNAME LIST

Sorts the LIST and returns the sorted list value. If SUBNAME or BLOCK is omitted, sort()s in standard string comparison order. If SUBNAME is specified, it gives the name of a subroutine that returns an integer less than, equal to, or greater than 0, depending on how the elements of the array are to be ordered. Instead of a SUBNAME, you can provide a BLOCK as an anonymous, in-line sort subroutine.

- sort lexically
@articles = sort @files;

- same thing, but with explicit sort routine
@articles = sort {$a cmp $b} @files;

- now case-insensitively
@articles = sort {uc($a) cmp uc($b)} @files;

- same thing in reversed order
@articles = sort {$b cmp $a} @files;

- sort numerically ascending
@articles = sort {$a <=> $b} @files;

- sort numerically descending
@articles = sort {$b <=> $a} @files;

!?! How can we sort a hash according to it's keys?
And how can we sort a hash according to it's values (Look at:sort_gene.pl)?
Can we sort a hash first by it's values and then by the keys?

Functions:
(See perldoc perlsub)

Defining a User Function:
sub function_name {
STATEMENT_1;
STATEMENT_2;
STATEMENT_3;
}

For example:
sub hello {
print "hello, world!\n";
}

- Put subroutines at the end of your program file.

- Within the subroutine body, you may access or change global variables.

Invoking a User Function:
hello();

Return a Value from a subroutine:
sub a_plus_b {
return $a+$b;
}

$a = 2; $b = 6;
$c = a_plus_b();

Passing Arguments:
In Perl, the subroutine invocation is followed by a list within parentheses,
causing the list to be automatically assigned to a special variable named @_.

- Advanced comment
Please note, that arguments in @_ are passed by reference.

o 1st example
sub say {
print "$_[0], $_[1]!\n";
}
say("hello","world"); # hello world

o 2nd example
sub add {
$sum = 0; # initialize the sum
foreach $_ (@_) {
$sum += $_; # add each element
}
return $sum; # last expression evaluated: sum of all elements
}
$a = add(4,5,6); # adds 4+5+6 = 15, and assigns to $a
print add(1,2,3,4,5); # prints 15
print add(1..5); # also prints 15, because 1..5 is expanded

Now let's try to sort a hash fisrt by it's values and then by the keys: sort_gene_byLengthAndName.pl

o But what if we already had a variable called $a or $sum ? Oops ...

Functions on strings:

Getting a substring from a string:
(See perldoc -f substr)

o substr EXPR,OFFSET,LEN,REPLACEMENT
return a substring of length LEN from EXPR starting from index OFFSET. It's also possible to replace it with REPLACEMENT. (Instead of using it as an lvalue.)

$str = "hello, world!";
$grab = substr($str, 5, 4); # $grab gets ", wo"
$grab = substr($str, -4, 4); # last 4 letters ("rld!")
substr($str, 0, 5) = "hi"; # $str is now "hi, world!"
substr($str, 0, 2, "Hello"); # $str is "Hello, world!" again

Finding a substring in a string:
(See perldoc -f rindex)

o rindex STR,SUBSTR,POSITION
The index function searches for one string within another, but without the wildcard-like behavior of a full regular-expression pattern match. It returns the position of the first occurrence of SUBSTR in STR at or after POSITION. If POSITION is omitted, starts searching from the beginning of the string. If the substring is not found, returns -1.

rindex works just like index except that it returns the position of the LAST occurrence of SUBSTR in STR. If POSITION is specified, returns the last occurrence at or before that position.

More Perl functions:
(See all Perl Functions by Category)