To make the stratified picks the genome is divided into the top 20%, middle 30%, and bottom 50% along two axis - gene density and nontranscribed conservation. Then three random picks are taken from each strata, and a fourth pick in the strata that are underrepresented in the manual picks. One additional backup pick is made in each strata in case there is an unforeseen technical problem with a region. The backup pick is last entry in each table section.
consNonTx 0% - 50%, gene 0% - 50% (1 manual) | |||
---|---|---|---|
June 2002 | November 2002 | April 2003 | stats |
chr13:28500001-29000000 | chr13:24500016-25000015 | chr13:29450016-29950015 | consNonTx 2.8%, gene 0.5% |
chr2:51700001-52200000 | chr2:51837455-52337454 | chr2:51616414-52116413 | consNonTx 3.8%, gene 0.0% |
chr4:119000001-119500000 | chr4:118527386-119027385 | chr4:118639860-119139859 | consNonTx 3.9%, gene 0.0% |
chr10:54300001-54800000 | chr10:54489120-54989119 | chr10:55376221-55876220 | consNonTx 2.8%, gene 1.2% |
chr5:15900001-16400000 | chr5:16187472-16687471 | chr5:15942554-16442553 | consNonTx 5.1%, gene 1.7% |
consNonTx 0% - 50%, gene 50% - 80% (4 manual) | |||
---|---|---|---|
June 2002 | November 2002 | April 2003 | stats |
chr2:115500001-116000000 | chr2:116215329-116715328 | chr2:118201388-118701387 | consNonTx 6.2%, gene 2.3% |
chr18:61100001-61600000 | chr18:61234622-61734621 | chr18:61046295-61546294 | consNonTx 3.4%, gene 3.4% |
chr12:40500001-41000000 | chr12:40239443-40739442 | chr12:40056957-40556956 | consNonTx 1.7%, gene 3.1% |
chr2:196700001-197200000 | chr2:197214044-197714043 | chr2:198465410-198965409 | consNonTx 5.4%, gene 3.3% |
consNonTx 0% - 50%, gene 80% - 100% (11 manual) | |||
---|---|---|---|
June 2002 | November 2002 | April 2003 | stats |
chr2:232500001-233000000 | chr2:233173598-233673597 | chr2:234508167-235008166 | consNonTx 1.3%, gene 4.6% |
chr13:111900001-112400000 | chr13:107927238-108427237 | chr13:112376702-112876701 | consNonTx 1.1%, gene 5.5% |
chr21:36900001-37400000 | chr21:36983033-37483032 | chr21:39242992-39742991 | consNonTx 2.3%, gene 5.2% |
chr4:47800001-48300000 | chr4:48032776-48532775 | chr4:47573442-48073441 | consNonTx 1.9%, gene 4.4% |
consNonTx 50% - 80%, gene 0% - 50% (2 manual) | |||
---|---|---|---|
June 2002 | November 2002 | April 2003 | stats |
chr16:25300001-25800000 | chr16:25969826-26469825 | chr16:25800363-26300362 | consNonTx 9.7%, gene 0.5% |
chr5:141800001-142300000 | chr5:142482586-142982585 | chr5:141883116-142383115 | consNonTx 6.7%, gene 1.7% |
chr18:25400001-25900000 | chr18:25196197-25696196 | chr18:25353226-25853225 | consNonTx 7.4%, gene 0.9% |
chr4:124800001-125300000 | chr4:124166677-124666676 | chr4:124280349-124780348 | consNonTx 6.3%, gene 0.9% |
consNonTx 50% - 80%, gene 50% - 80% (4 manual) | |||
---|---|---|---|
June 2002 | November 2002 | April 2003 | stats |
chr5:56000001-56500000 | chr5:57392856-57892855 | chr5:55805775-56305774 | consNonTx 7.9%, gene 2.2% |
chr6:131800001-132300000 | chr6:132023965-132523964 | chr6:132111977-132611976 | consNonTx 6.9%, gene 2.1% |
chr6:73700001-74200000 | chr6:73699933-74199932 | chr6:73683390-74183389 | consNonTx 6.4%, gene 3.6% |
chr4:53700001-54200000 | chr4:53859184-54359183 | chr4:53728692-54228691 | consNonTx 9.0%, gene 2.1% |
consNonTx 50% - 80%, gene 80% - 100% (3 manual) | |||
---|---|---|---|
June 2002 | November 2002 | April 2003 | stats |
chr1:149000001-149500000 | chr1:146905332-147405331 | chr1:147933156-148433155 | consNonTx 10.2%, gene 8.4% |
chr9:122800001-123300000 | chr9:123331831-123831830 | chr9:125138972-125638971 | consNonTx 8.3%, gene 5.9% |
chr15:39100001-39600000 | chr15:36628619-37128618 |
chr15:41311935-41810934 (manually placed) |
|
chr17:33400001-33900000 | chr17:35665792-36165791 | chr17:33478638-33978637 | consNonTx 7.7%, gene 6.1% |
consNonTx 80% - 100%, gene 0% - 50% (3 manual) | |||
---|---|---|---|
June 2002 | November 2002 | April 2003 | stats |
chr14:51200001-51700000 | chr14:47673341-48173340 | chr14:51867364-52367363 | consNonTx 14.9%, gene 0.1% |
chr11:133100001-133600000 | chr11:132612235-133112234 | chr11:131133068-131633067 | consNonTx 13.5%, gene 0.3% |
chr16:52600001-53100000 | chr16:62362206-62862205 | chr16:62010885-62510884 | consNonTx 15.4%, gene 0.0% |
chrX:41900001-42400000 | chrX:42149253-42649252 | chrX:42714870-43214869 | consNonTx 13.4%, gene 0.7% |
consNonTx 80% - 100%, gene 50% - 80% (1 manual) | |||
---|---|---|---|
June 2002 | November 2002 | April 2003 | stats |
chr8:117800001-118300000 | chr8:118874200-119374199 | chr8:118481838-118981837 | consNonTx 11.4%, gene 3.2% |
chr14:96900001-97400000 | chr14:93204045-93704044 | chr14:97378512-97878511 | consNonTx 15.9%, gene 2.9% |
chrX:117500001-118000000 | chrX:119675382-120175381 | chrX:120734591-121234590 | consNonTx 10.7%, gene 2.0% |
chr6:108100001-108600000 | chr6:108287568-108787567 | chr6:108264834-108764833 | consNonTx 18.6%, gene 2.3% |
consNonTx 80% - 100%, gene 80% - 100% (1 manual) | |||
---|---|---|---|
June 2002 | November 2002 | April 2003 | stats |
chr2:218300001-218800000 | chr2:218998720-219498719 | chr2:220241365-220741364 | consNonTx 13.3%, gene 9.1% |
chr11:66700001-67200000 | chr11:65865884-66365883 | chr11:64434365-64934364 | consNonTx 13.4%, gene 9.0% |
chr20:33600001-34100000 | chr20:33559944-34059943 | chr20:34509944-35009943 | consNonTx 11.5%, gene 9.2% |
chr6:41300001-41800000 | chr6:41294331-41794330 | chr6:41299332-41799331 | consNonTx 15.2%, gene 4.8% |
chr9:124300001-124800000 | chr9:124831831-125331830 | chr9:126638972-127138971 | consNonTx 11.4%, gene 5.4% |
Here is the noncoding conservation and gene density of non-overlapping 500 kb regions in the manual picks. The boundaries between strata are:
low 50% middle 30% high 20%
------------------------------
gene 0.0-1.9% 1.9-4.2% 4.2-100%
consNotTx 0.0-6.3% 6.3-10.6% 10.6-100%
Here's the stratification of the other zoo-seq regions. I recommend picking 7q21.13 and 7q31.33 to round things out.
June 2002 | November 2002 | April 2003 | Chrom band | chr7:88319137-89433560 | chr7:88318937-89433360 | chr7:89381916-90496339 | 7q21.13 3 | chr7:91589227-92559635 | chr7:91589027-92559435 | chr7:92652026-93622434 | 7q21.3 3 | chr7:93650712-94868826 | chr7:93650512-94868626 | chr7:94713518-95931632 | 7q21.3 3 | chr7:124556444-125719632 | chr7:124556244-125719432 | chr7:125619343-126782531 | 7q31.33 3 | chr7:126427707-127330661 | chr7:126427507-127330461 | chr7:127490599-128393553 | 7q32.1 3 |
---|
Gene density is defined as percentage of bases covered either by Ensembl genes, or human mRNA best blat alignments in the UCSC browser database.
Nontranscribed transcription was measured by a fairly elaborate process. 125 base non-overlapping subwindows were taken inside of the 500,000 base windows. Subwindows with less than 75% of their bases in a mouse alignment were thrown out. For the remaining subwindows the percentage with at least 80% base identity is used as the conservation score. To get the nontranscribed conservation score the mouse alignments in regions corresponding to Ensembl genes, all genbank mRNA blastz alignments, Fgenesh++ gene predictions, twinScan gene predictions, spliced EST alignments, and repeats were thrown out.