Tamir_TullerTamir Tuller, Ph.D., Ph.D., Associate Professor and Head, Laboratory of Computational Systems and Synthetic Biology, Tel Aviv University’s colleague Hadas Zur, Ph.D., Researcher, Department of Biomedical Engineering, Tel Aviv University will be speaking on “Combining Synthetic Biology and Molecular Evolution for Optimizing and Understanding Gene Expression” at the 10th Annual Optimising Expression Platforms conference, 15-16 November 2017, as part of the 9th Annual PEGS Europe event in Lisbon, Portugal. Below is a recent interview with Dr. Tuller.

Tamir Tuller is an Associate Professor, the head of the Laboratory of Computational Systems and Synthetic Biology at Tel Aviv University. He has a multidisciplinary academic background in engineering, life science, computer science, and medical science (four BSc, two MSc studies, and two Ph.D. titles). Prof. Tuller is the author of more than 110 peer reviewed scientific articles and received various awards and fellowships. He performs multidisciplinary research focusing on various aspects of gene expression and specifically mRNA translation. Among others, he aims at developing novel approaches for modeling and engineering gene expression, and employs synthetic biology tools for understanding the way gene expression is encoded in the genetic material. Website: Laboratory of Computational Systems and Synthetic Biology

Using 'Big Data' and Synthetic Biology for
Gene Expression Modeling, Engineering, and Understanding


Why use ‘Big Data’ for gene expression modeling and engineering? What are the advantages?

The cost of performing protocols for measuring large-scale intracellular gene expression variables, among others based on next-generation sequencing (NGS) technologies, (e.g., mRNA levels, ribosome densities, transcription and translation rates, 3D genomic organization, DNA and RNA methylation, and more), is decreasing at an exponential rate. In addition, this type of data and genomic sequencing is accumulating for many organisms across the tree of life, in different conditions and tissues. Since each of these experiments reports the expression status of thousands of genes in a certain condition they provide vast amounts of information related to the way gene expression is encoded in all these genes.

Careful usage of these data via the filtering and modeling of these gene expression codes should provide models that connect the nucleotide composition in different parts of the genome to gene expression. These models can be used for engineering gene expression: manipulating the genetic material to obtain a certain gene expression pattern.


What challenges remain with the above approach?

The first challenge is related to the need to develop efficient algorithms for dealing with this huge amount of data (the size of the output of a typical NGS experiment is dozens of gigabases). The second challenge is related to causality: finding associations between genomic features and gene expressions is only the first stage; we should also decipher if the relations are direct and causal, and this is impossible without additional experiments. The third challenge, related to the previous one, is the fact that endogenous genes are different (e.g., in terms of length, amino acid bias, GC content and more) and their evolution is effected by many variables, making it challenging to statistically decipher relations between pairs of variables. The fourth challenge is related to the non-trivial biases in NGS experiments; these biases need to be filtered for accurate analysis and modeling of these data. Finally, the gene expression codes are often organism specific; this fact should be considered when analyzing these data and developing these models.


What led you to consider synthetic biology as well as molecular evolution to understand gene expression?

The two approaches have complementary advantages: The molecular evolution techniques can be applied on all genes in all genomes to detect potential gene expression codes; however, based on this approach, it is hard to understand the strength, the causalities, and the directionality of the relations based on the associations reported.

Accurate design of a synthetic biology experiment can provide a good quantitative estimation related to the strength of the effect of a genomic sequence on gene expression and its causality. However, these results will be based on a relatively small set of genes (that participate in the experiment).

Thus, with the combination of the two approaches we are able to screen all the genes, and also quantify the causality and strength of the effects of the gene expression codes on gene expression.


What technologies and tools are particularly promising for analysis of endogenous gene expression data?

The analysis of endogenous gene expression data is performed with a large set of tools and models; this includes, among others: 1) mapping NGS data to different parts of the genome (see, for example, Langmead et al., Genome Biol. 2009; Martin, EMBnet.journal. 2011; Trapnell et al., Nature Protocols. 2012); 2) in-house computational models for the biophysics of gene expression steps (see, for example, Zur & Tuller, Nucleic Acids Res. 2016); 3) statistical approaches for inferring gene expression models (Zur & Tuller, Bioinformatics. 2015; Heinz et al., Mol. Cell. 2010; Love et al., Genome Biol. 2014); 4) in-house approaches for detecting selection for silent codes and for controlling for various alternative relations; usually these approaches should be tailored to the analyzed data and the studied question (see examples in Diament et al., Nature Commun. 2014).

To learn more about the work of Dr. Tuller’s group and the PEGS Europe Summit, visit PEGSummitEurope.com/Optimising-Protein-Expression