Workshop in Computational Bioskills - Spring 2011
Lesson 4 - Perl III
Part 1 - Safe code
Part 2 - Regular Expressions (Testing file properties)
Part
3 - References
Part 4 - Modules
my() :
Private Variables:
my() takes a list of variable names and creates
local versions of them.
o an
improvement to the 2nd example from Lesson 3 :
sub
add {
my
($sum); # make $sum
a local variable
$sum
= 0; # initialize
the sum
foreach
$_ (@_) {
$sum
+= $_; # add each
element
}
return
$sum;
}
o 1st
example :
sub
list_abs {
my
(@l) = @_;
foreach
$_ (@l) {
$_ =
abs($_);
}
return
@l;
}
local() :
Semi-Private Variables:
local() is quite similar to my() by making another
(local) copy of variable. Here, though, it is not so local...
These local variables are also visible to functions called from
within the block in which those variables are declared.
o 2nd
example
#!/usr/bin/perl -w
$a = 5;
{
local $a = 3;
f();
}
f();
!?!What will be the output of this program?
sub f {
if (defined $a) {
print "$a\n";
} else {
print "\$a not defined\n";
}
}
strict:
Forcing Variables Declaration:
It is
convenient to use the strict pragma:
use
strict;
It forces the user to declare all (global) variable using my(),
before they can be used.
- This is highly recommended!!
Patterns are very useful. We would
often like to know if they can be found in files,
how many times,
where, and sometimes even replace them with another.
In UNIX, we might use grep, sed &
tr to find and manipulate patterns.
In Perl, we can find almost similar commands.
Pattern
Matching - m//:
(See m// in perlop manpage)
m/pattern/options searches a string
for a pattern match,
and in scalar context returns true (1) or
false ('').
By default the search is done upon $_
Other variables can be searched using =~
if (m/hello/) { ... }; # search hello in $_
if ($_ =~
m/hello/) { ... }; # same
hello in $_
if (/hello/)
{ ... }; # same search
print if
(m/hello/); # print $_ if
matches m/hello/
if ($string =~
m/hello/) { ... }; # search hello in $string
Options are:
g Match globally, i.e., find all occurrences.
i Do case-insensitive pattern matching.
c Do not reset search position on a failed
match when /g is in effect.
m Treat string as multiple lines.
o Compile pattern only once.
s Treat string as single line.
x Use extended regular expressions.
Regular
Expression Syntax:
(See man perlre)
In particular the following meta-characters
have their standard egrep-ish
meanings:
\ Quote the next metacharacter
^ Match the beginning of the line
. Match any character (except newline)
$ Match the end of the line (or before newline
at the end)
| Alternation
() Grouping
[] Character class
The following standard quantifiers
are recognized:
* Match 0 or more times
+ Match 1 or more times
? Match 1 or 0 times
{n} Match exactly n times
{n,} Match at least n times
{n,m} Match at least n but not more than m times
Do
patterns must be so greedy ?
By default, a quantified subpattern is ``greedy'', that is, it
will match as many times as possible (given a particular starting
location) while still allowing the rest of the pattern to match.
If you want it to match the minimum
number of times possible, follow the quantifier with a ``?''.
Note that the meaning doesn't change, just the ``greediness'':
*? Match 0 or more times
+? Match 1 or more times
?? Match 0 or 1 time
{n}? Match exactly n times
{n,}? Match at least n times
{n,m}? Match at least n but not more than m times
Special
Characters:
\w Match a "word" character (alpha-numeric
plus "_")
\W Match a non-word character
\s Match a whitespace character
\S Match a non-whitespace character
\d Match a digit character
\D Match a non-digit character
Perl also defines the following
zero-width assertions:
\b Match a word boundary
\B Match a non-(word boundary)
\A Match only at beginning of string
\Z Match only at end of string, or before
newline at the end
\z Match only at end of string
Storing
Patterns in Memory:
When the bracketing construct ( ... ) is used, \<digit>
matches the digit'th substring.
Outside of the pattern, use ``$'' instead of ``\'' in front of the digit.
- Using \1,\2,\3
inside the pattern
if (m/Time:
(..):\1:\1/) { # Will match
"Time: 12:12:12"
$hours =
$minutes = $seconds = $1;
}
- Using $1,$2,$3
outside the pattern
if (m/Time:
(..):(..):(..)/) {# Will
match any hour.
$hours =
$1;
$minutes
= $2;
$seconds
= $3;
}
Pattern
Replacing - s///:
(See s// in perlop manpage)
s/pattern/replacement/options searches a string for a pattern, and if found, replaces that pattern with the replacement text and returns the number of substitutions made. Otherwise it returns false (specifically, the empty string).
Options are:
g Replace globally, i.e., all occurrences.
i Do case-insensitive pattern matching.
e Evaluate the right side as an expression.
m Treat string as multiple lines.
o Compile pattern only once.
s Treat string as single line.
x Use extended regular expressions.
Pattern
Transliterating - tr///:
(See tr// in perlop manpage)
tr/searchlist/replacementlist/options transliterates all occurrences of the characters found in the search list with the corresponding character in the replacement list.
tr returns the number of translations/deletions done.
A character range may be specified with a hyphen, so tr/A-J/0-9/ does the same replacement as tr/ACEGIBDFHJ/0246813579/.
Options:
c Complement the SEARCHLIST.
d Delete found but unreplaced characters.
s Squash duplicate replaced characters.
Example - Finding the binding sites of a TF:
Part III:
References:
perldoc perlref
A reference in Perl is a typed pointer. It stores the type & address of some variable.
Furthermore, you can see Perl references as C++'s "smart pointers", keeping track of the number of pointers, pointing to some variable, releasing it when it is no longer needed.
Creating
a references:
To create a reference,
use the Backslash (\) operator.
o
Referencing variables
$scalarref
= \$scalar;
$arrayref
= \@array;
$hashref
= \%hash;
$code_ref
= \&function;
o
Referencing anonymous objects
We've seen the \ creates a reference for existing variables.
But how can we reference
anonymous objects which don't have names ?
We need to create them from scratch.
To create an anonymous array
reference, use [ ].
$arrayref
= [1, 2, ['a', 'b', 'c']];
This is a reference to an array with 3 elements - 2 scalars and one reference to aother anonymous array.
To create an anonymous hash
reference, use { }.
$hashref
= {
'Adam' => 'Eve',
'Clyde' => 'Bonnie',
};
You can also take a
reference to a source code.
$coderef
= sub { print "Hello, World!\n" };
Using
references:
After referencing so
many different objects, we might want to use them ...
Dereferencing
The principle is very simple: add
the relevant symbol for each type of reference:
Rmember that
the context is what matters
$bar
= $$scalarref;
push(@$arrayref,
$filename);
$$arrayref[0]
= "January";
$$hashref{"KEY"}
= "VALUE";
&$code_ref(1,2,3);
Using the
Arrow operator:
$arrayref->[0]
= "January";
$hashref->{KEY}
= "VALUE";
!?! What is the difference between an array and an array-reference :
See an example: refExample.pl
Let's try writing a function factory : function_generator.pl
List of
Lists:
perldoc perldsc (Data Structures)
perldoc perllol (Lists of Lists)
an LoL is merely a list of
list references.
@LoL
= (
[ "fred", "barney" ],
[ "george", "jane", "elroy" ],
[ "homer", "marge", "bart" ],
);
print
$LoL[2][2]; # bart
Ref. to
List of Lists:
# a reference to a list
of list references
$ref_to_LoL
= [
[ "fred", "barney", "pebbles",
"bambam", "dino", ],
[ "homer", "bart", "marge",
"maggie", ],
[ "george", "jane", "elroy",
"judy", ],
];
print
$ref_to_LoL->[2][2]; # "elroy"
A Hash of
a Lists:
%HoL
= (
"flintstons" => [ "fred", "barney" ],
"jetsons" => [ "george", "jane",
"elroy" ],
"simpsons" => [ "homer", "marge",
"bart" ],
);
print $HoL{jetsons}[2]; # "elroy"
Once you get the idea,
the rest is quite simple.
Example: Finding differentially expressed genes: parseGE.pl
Here we parse a file which maps between probes on the an array, genes, and the measured log-ratio (tab separated file).
Each gene is represented by more than one probe on the array.
We serach for genes with an average log-ratio > 2.
Exmaple of an unput file:logRatioByProbe.txt
The ref function
!?! How to use a hash with array values ? What's wrong with the folowing code: my ($i,@a,%hash); while(<>){ chomp; ($i,@a)=split "\t"; $hash{$i}=\@a; } How can you solve the problem ? And how would you handle a hash of hashes with array values ?
Part IV: Perl
Modules
Packages, Libraries,
Modules & Programs are all different types of namespaces and
classes, which helps us maintain simple and organized code. Perl provides mechanisms to protect packages from stomping on each other's variables.
perldoc perlmod
Package:
Quite similar to C++
namespace. The default package of your code is 'main'.
package
Alpha;
$name
= "first";
package
Omega;
$name
= "last";
package
main;
print
"> Alpha is $Alpha::name, Omega is $Omega::name.\n";
>
Alpha is first, Omega is last.
Module:
A bunch of subroutines
that conforms to specific conventions. Stored in a *.pm file.
Modules can be loaded with use (at compile time) or require (at
run time)
A Pragma:
is a module that affects
the compilation behavior (like strict).
Writing a
Module:
You must follow some
basic rules. Here is an example to the Coffee module. In the
package, we'll include some variables ($with_milk and $sugar) and
a function: drink()
It must sit in a file called "Coffee.pm", within @INC (the UNIX shell variable $PERL5LIB might be handy here. See perlmod perlrun)
# ------------ Coffee.pm
------------
package
Coffee;
use
Exporter;
@ISA
= ('Exporter');
@EXPORT
= qw(&drink $with_milk $sugar);
$with_milk
= "no";
$sugar
= 1;
sub
drink {
print
"The coffee was great ($with_milk milk, $sugar
sugar)\n";
}
1;
# ------------ Coffee.pm ------------
In your program, do:
use
Coffee;
$sugar++;
drink();
>
The coffee was great (no milk, 2 sugar)
or
require
Coffee;
$Coffee::with_milk
= "a little";
Coffee::drink();
>
The coffee was great (a little milk, 1 sugar)
You can read more about lines 2-4, and the last line, in Coffee.pm in the perlmod man page.
Installing new modules: How to get those cool modules from CPAN, and install them on my machine ? - search CPAN, and download package, say GD-2.39.tar.gz > tar zxvf GD-2.39.tar.gz > cd GD-2.39 > perl Makefile.PL LIB=~/perllib PREFIX=~/perllib > make > make test > make install > cd .. > rm -Rf GD-2.39.tar.gz > setenv PERL5LIB ${HOME}/perllib > setenv MANPATH ${MANPATH}:${HOME}/perllib/lib/perl5/man/ What do you know about environment variables ? the setenv command ? and the .cshrc file under your home directory ?