Traditionally, the choice of codons to encode amino acid sequences of proteins has been assumed to be neutral terms of fitness. We now know that synonymous substitutions that do not change amino acid identities can strongly influence the function of proteins. This is because codons affect mRNA stability, expression levels, protein folding rates, and how the protein folds as it emerges from the ribosome. We study how the choice of codons is correlated with the three-dimensional structure of proteins and the evolution of codon usage. Our work has resulted in a database where codon usage is correlated with 3D protein structure and other features. This database can be mined to identify correlations between structural features in protein, such as secondary structure, and codon usage. We also built large language models of codon sequences to understand what features control codon usage in coding sequences.
Project: We would like to develop better ways to visualize our data by creating a webpage that displays the data in our database. This includes mapping codon sequence biases onto 3D structures, visualizing codon conservation in protein families, and mapping coding usage on species phylogenetic trees.
Desired background: The candidate should have a strong interest in visualization and a solid background in Python programming.
Environment: In addition to research on codon usage, the research group (andrelab.lu.se) also does research in protein structure prediction, computational protein design, and protein evolution. The group uses a range of computational approaches (from deep learning to molecular simulation) and experiments to complement computational projects, providing a multidisciplinary environment for a thesis student to learn.
Contact: If this sounds interesting to you, contact Ingemar André, ingemar.andre@biochemistry.lu.se, for further information.