New Worlds: Big Data and the Social Character of Genes


DNA structure [Illustrative]. (photo credit:INIMAGE)

University of Haifa Researchers have managed to narrow down from 900 million to just 340,000 the possibilities requiring examination of the connection between genetic markers and genetic expression. In the process, they have identified “social” genes that play a cooperative role. The study was just published in the journal PLOS ONE.

The new study used “big-data” analytical methods to reveal the “social character” of genes – a phenomenon in certain diseases whereby genes operate jointly rather than independently.

“The problem is that the possible number of combinations of different genes is enormous, and it is almost impossible to examine them all effectively and reliably,” the researchers explained. “Our study offers a solution to this problem.” The study, which was undertaken as part of a master’s thesis by Pavel Goldstein from the University’s Department of Statistics, was headed by Dr. Anat Reiner-Benaim. It proposed a new method for discovering complex and rare genetic effects that form part of the mechanism of creation of complex disorders such as autoimmune diseases.

The study focuses on the connection between genetic markers – DNA segments situated along the genome that effectively represent genes – and the expression (the creation of the proteins they encode) of different genes. Various studies over recent years have shown that in complex biological mechanisms, such as those in most diseases, the genetic expression is not the product of the action of a single marker, but rather of a combination of several markers, some close to the location of the gene on the DNA chain and others more distant.

In the Human Genome Project, for example, the researchers initially found that some 98 percent of the human genome contains genes that do not “do” anything. However, it later emerged that some of these genes are in fact active – not independently, but as part of a network of genes. Thus the influence of a given genetic marker may be dependent on the influence of other markers – a phenomenon known as epistasis.

The problem is that the theoretical number of combinations in which different genes could cooperate is almost infinite – equivalent to the product of the enormous number of potential connections between markers and the potential list of genetic expression.

Thus it is difficult to decide where to look for these connections.

The Haifa researchers propose a new method of calculation that significantly reduces the number of possibilities, causing the identification of the interactions between genes to become feasible. The method shrinks the number of markers by applying a hierarchical filter to DNA areas containing at least one epistatic phenomenon, enabling research to focus solely on genetic markers within these areas. The method reduces the number of genetic expressions by clustering together similar expressions. The proposed method was applied for the purpose of analyzing the genome of the thale cress plant (Arabidopsis thaliana).

%d bloggers like this: