The Information Content of Glutamine-Rich Sequences Define Protein Functional Characteristics
The presence of abnormally expanded glutamine (Q) repeats within specific proteins (e.g., huntingtin) is the well-established cause of several neurogenerative diseases, including Huntington disease and spinocerebellar ataxias. However, the impact of “expanded Q” stretches on the protein function is not well understood, mostly due to lack of knowledge about the physiological role of Q repeats and the mechanism by which these repeats achieve functional specificity. Indeed, it is intriguing that regions with such low complexity (low information content) can display exquisite functional specificity, prompting the question: where is this information stored? Applying biochemical/structural constraints and statistical analysis of protein composition, we identified Q-rich (Q R ) regions present in coiled coils of yeast transcription factors and endocytic proteins. Our analysis indicated the existence of non-Q amino acids (AAs) differentially enriched or excluded from Q R regions in one protein group versus the other. Importantly, when the non-Q AAs from an endocytic protein were exchanged by the ones enriched in Q R from transcription factors, the resulting protein was unable to localize to the plasma membrane and was instead found in the nucleus. These results indicate that while Q R repeats can efficiently engage in binding, the non-Q AAs provide essential specificity information. We speculate that coupling low complexity regions with information-intensive determinants might be a strategy used in many protein systems involved in different biological processes.