|
Modeling Dependencies in Protein-DNA Binding Sites:
Analysis of Aligned Data |
|
To test whether dependencies can be found in DNA binding sites, we have extracted aligned sites from the
TRANSFAC database, version 6.2.
We have used TRANSFAC's original alignment, and built 95 datasets for proteins having at least 20 known binding sites.
For each group, we've performed a 10-fold cross validation test, learning a model on 90% of the sites, and
then calculating the log-likelihood of the rest 10%.
The following images show the differences in the average log-likelihood per
instance on the test data, when comparing all learned models versus the learned
PSSM. |
![]() |
![]() |
![]() |
![]() |
![]() |