Workshop in Computational Bioskills - Spring 2011
Back
to Lesson 2
Part 1 - Control flow
Part
2 - Basic I/O
Part 3 - File Handles (Opening files for
read/write)
Part 4 - File Tests
Part 5 - Data Manipulation
Part
6 - Functions
Part 7 - More Functions (Functions on Strings)
- To read one line
from the STDIN into a scalar
$line =
<STDIN>;
- To read entire
STDIN into an array (each line in a cell of each own)
@lines =
<STDIN>;
- Reading STDIN, a
line at a time (into $_)
while
(defined($_ = <STDIN>)) {
...
}
- Same thing, only
nicer...
while
(<STDIN>) {
...
}
The
<> operator:
Input from <> comes either from standard input, or from
each file listed on the command line. Here's how it works: the
first time <> is evaluated, the @ARGV array is checked, and if it is empty, $ARGV[0] is set to "-", which when opened gives you standard input.
The @ARGV array is then processed as a list of
filenames.
Output:
Nothing new to say here.
- You can use print to write a scalar into a file handle.
(STDOUT is the default one)
- printf will do the same, but
using a nicer format.
Reading from STDIN is always a
pleasure, but sometimes we'll want to work
directly with files.
So how do
we open files:
(Why don't you read for
yourselves ?! Try perldoc -f open)
- For reading
open (IN,
"$file"); # will
open $file for reading
open (IN,
"<$file"); #
will do the same.
open (IN,
"-"); # will open
STDIN
- For writing
open (OUT,
">$outfile"); #
$outfile will be truncated and opened for writing.
open (OUT,
">>$outfile");
# $outfile will be appended.
(In both cases, $outfile will be created unless it already
exists.)
Reading
from file handles:
As we've seen with STDIN, reading from a file handle is easy:
while
(<IN>) {
... #
parse each line (in $_)
}
Writing
to file handles:
print OUT
"Hello, world"; #
prints into (already opened) OUT
print
"Hello, world"; #
prints into STDOUT
print STDERR
"Hello, world"; #
prints into STDERR (no need to open.)
Closing
files:
Very important (yet simple):
close(IN);
close(OUT);
A simple way to test many properties
of files, can be done with the -X operator.
(See perldoc -f -X)
Some
favorite features:
-r File is readable by effective uid/gid.
-w File is writable by effective uid/gid.
-x File is executable by effective uid/gid.
-o File is owned by effective uid.
-e File exists.
-z File has zero size.
-s File has nonzero size (returns size).
-f File is a plain file.
-d File is a directory.
-l File is a symbolic link.
-p File is a named pipe (FIFO), or Filehandle is a pipe.
-S File is a socket.
-b File is a block special file.
-c File is a character special file.
-t Filehandle is opened to a tty.
-T File is a text file.
-B File is a binary file (opposite of -T).
o split
(See perldoc -f split)
Splits a string into an array of strings, and returns it. By default, empty leading fields are preserved, and empty trailing ones are deleted.
$delim = " ";o join
(See perldoc -f join)
$result = join ( "\t\t", @after );# $result is 'Hello world!'
Sorting
Arrays:
(See perldoc -f sort)
o sort SUBNAME LIST
Sorts the LIST and returns the sorted list value. If SUBNAME or BLOCK is omitted, sort()s in standard string comparison order. If SUBNAME is specified, it gives the name of a subroutine that returns an integer less than, equal to, or greater than 0, depending on how the elements of the array are to be ordered. Instead of a SUBNAME, you can provide a BLOCK as an anonymous, in-line sort subroutine.
- sort lexically
@articles =
sort @files;
- same thing, but
with explicit sort routine
@articles =
sort {$a cmp $b} @files;
- now
case-insensitively
@articles =
sort {uc($a) cmp uc($b)} @files;
- same thing in
reversed order
@articles =
sort {$b cmp $a} @files;
- sort numerically
ascending
@articles =
sort {$a <=> $b} @files;
- sort numerically
descending
@articles =
sort {$b <=> $a} @files;
Functions:
(See perldoc
perlsub)
Defining
a User Function:
sub
function_name {
STATEMENT_1;
STATEMENT_2;
STATEMENT_3;
}
For
example:
sub
hello {
print
"hello, world!\n";
}
- Put subroutines at the end of your program file.
- Within the subroutine body, you may access or change global variables.
Invoking
a User Function:
hello();
Return a
Value from a subroutine:
sub
a_plus_b {
return
$a+$b;
}
$a = 2;
$b = 6;
$c =
a_plus_b();
Passing
Arguments:
In Perl, the subroutine
invocation is followed by a list within parentheses,
causing the
list to be automatically assigned to a special variable named @_.
-
Advanced comment
Please note, that arguments in @_ are passed by reference.
o 1st
example
sub
say {
print
"$_[0], $_[1]!\n";
}
say("hello","world"); # hello world
o 2nd
example
sub
add {
$sum
= 0; # initialize
the sum
foreach
$_ (@_) {
$sum
+= $_; # add each
element
}
return
$sum; # last
expression evaluated: sum of all elements
}
$a =
add(4,5,6); # adds
4+5+6 = 15, and assigns to $a
print
add(1,2,3,4,5); #
prints 15
print
add(1..5); # also
prints 15, because 1..5 is expanded
Now let's try to sort a hash fisrt by it's values and then by the keys: sort_gene_byLengthAndName.pl
o But what if we already had a variable called $a or $sum ? Oops ...
Functions on strings: Getting a
substring from a string: o substr
EXPR,OFFSET,LEN,REPLACEMENT $str =
"hello, world!"; Finding a
substring in a string: o rindex
STR,SUBSTR,POSITION rindex works just like index except
that it returns the position of the LAST occurrence of SUBSTR in
STR. If POSITION is specified, returns the last occurrence at or
before that position. More Perl
functions:
(See perldoc -f substr)
return a substring of length LEN from EXPR starting from index
OFFSET. It's also possible to replace it with REPLACEMENT.
(Instead of using it as an lvalue.)
$grab =
substr($str, 5, 4); # $grab
gets ", wo"
$grab =
substr($str, -4, 4); # last 4
letters ("rld!")
substr($str,
0, 5) = "hi"; #
$str is now "hi, world!"
substr($str,
0, 2, "Hello"); #
$str is "Hello, world!" again
(See
perldoc
-f rindex)
The index function searches for one string within another, but
without the wildcard-like behavior of a full regular-expression
pattern match. It returns the position of the first occurrence of
SUBSTR in STR at or after POSITION. If POSITION is omitted,
starts searching from the beginning of the string. If the
substring is not found, returns -1.
(See all Perl Functions by Category)