We present the Homo Sapiens Comprehensive Model Collection (HOCOMOCO) of transcription factor (TF) binding models obtained by careful integration of data from different sources. HOCOMOCO contains 426 non-redundant curated binding models for 401 human TFs.

DNA sequences of TF binding regions obtained by both pregenomic and high-throughput methods were collected from existing databases and other public data. The ChIPMunk software was used to construct positional weight matrices. Four motif discovery strategies were tested based on different motif shape priors including flat and periodic priors associated with DNA helix pitch. A quality rating was manually assigned to each model based on known binding preferences. An appropriate TFBS model was selected for each TF, with similar models selected for related TFs.

In any case only one model per TF was selected unless there was additional evidence for two distinct binding models or different stable modes of dimerization. All TFBS models and initial binding segments data used for motif discovery were mapped to UniPROT IDs.

HOCOMOCO: a comprehensive collection of human transcription factor binding sites models. Kulakovskiy IV, Medvedeva YA, Schaefer U, Kasianov AS, Vorontsov IE, Bajic VB, Makeev VJ. Nucleic Acids Res. 2013 Jan;41(Database issue):D195-202. doi: 10.1093/nar/gks1089.

Engelhardt Institute of Molecular Biology, Russian Academy of Sciences
Vavilov Institute of General Genetics, Russian Academy of Sciences
King Abdullah University of Science and Technology