Hello, Biostars. I want to look for over-represented transcription factor binding sites for a given set of genes. I have learned a lot from similar questions in Biostar and papers. But I still have a problem and can't find the precise answer. According to papers, the promoter region contains 5'UTR, INTROS, upstream or downstream of TSS. For the uptream or downstream, I don't know how many bps should I take(the specie is Mouse). In papers, it's different one from another and I didn't find the basic principles to do it. Could you give me some suggestions or give me some reference? Thanks.
If you wanted to be rigorous, you could look at the EST data to define the UTR regions. The true distance will probably vary with different genes.
When doing methylation work in humans, I think the strongest methylation changes correlated with gene expression changes occur within ~1500 bp upstream and ~500 bp downstream.
The simplest solution is probably to use existing tools to address this question. For example, GATHER calculates TRANSFAC enrichment for about a half dozen species (including mouse):
http://gather.genome.duke.edu/
I've also saved a list of TF-enrichment tools that I have found useful:
http://cdwscience.blogspot.com/2013/03/bioinformatics-101-gene-expression.html
Thanks.GATHER is a good tool, I will try it.
I can't open http://cdwscience.blogspot.com/2013/03/bioinformatics-101-gene-expression.html. Could send me again or send me a text?
I think you may just need to refresh your browser or something like that.
If it helps, here is the parent page:
http://cdwscience.blogspot.com/2013/03/bioinformatics-101.html
and here is the relevant text portion of the link:
Transcription Factor Motif Analysis:
IPA Upstream Regulator Analysis *Commercial tool that searches for enrichment of known targets for regulatory genes and molecules (such as transcription factors) *Can also detect if targets are consistent with activation or inhibition of the regulator
SCOPE *free tool that identifies upstream motifs enriched for gene lists *works on a wide variety of species, so it is useful for motif finding in less commonly studies organisms
Whole Genome rVISTA - calculate enrichment of transcription factor motifs predicted based upon evolutionary conservation
TRED (Transcriptional Regulatory Element Database) - database from CSHL for transcription factors. Includes target gene lists for transcription factors in human, mouse, and rat
TRANSFAC - database of transcription factor motif sequences. There are commercial and open-source versions of the database
JASPAR - open-source database of transcription factor motif sequences


