Overview

The Laboratory for Applied Computational Genomics generates and analyzes sequencing and imaging data to understand the molecular basis of cellular functioning in health and disease. The laboratory is 70% dry and 30% wet. On the dry side, we perform bioinformatics analysis of genome, transcriptome, and chromatin conformation sequencing data. On the wet side, we perform electron microscope tomography for 3D genomics by visualizing gene regulatory processes in the cell nucleus.

Basic genomics

Transcription initiation Cap Analysis Gene Expression (CAGE) is a unique transcriptome profiling method developed at RIKEN that identifies and quantifies transcription initiation at single-nucleotide resolution. Pinpointing the transcription start site allows us to determine the promoter region accurately and search it for transcription factor binding sites. We combine this with the expression quantitation provided by CAGE to analyze the global regulatory activity by transcription factors (Suzuki et al., Nature Genetics, 2009; Alam et al., Genome Research, 2020).

MicroRNA promoter discovery CAGE captures transcripts with a 5’ cap, including mRNAs, long non-coding RNAs, enhancer RNAs, and primary miRNA transcripts. In FANTOM5, we used CAGE, to identify the exact transcription start site of primary miRNAs, enabling a detailed analysis of the regulatory control signals located in miRNA promoters (De Rie, et al., Nature Biotechnology, 2017). This atlas was later used to analyze GWAS signals in miRNA-target gene networks for screening miRNAs as candidate biomarkers (Sakaue et al., Nucleic Acids Research, 2018).

Transcription start site of miR-22 as determined by CAGE in FANTOM5.

 

Enhancers and short capped RNAs  Enhancers are regulatory control elements on DNA that can activate the expression of genes over megabases of genomic distance. We developed a novel library protocol that selectively captures short capped RNAs, including enhancer RNAs, allowing us to identify >10,000 enhancers in a single cell line. The short capped RNA data also revealed truncated transcripts terminating at splice sites, which may reflect a currently unknown quality control or regulatory mechanism (De Hoon et al., Genome Research, 2022).

Truncated transcripts at the cystatin B gene in human.

 

Noncoding RNA — Most disease-associated genetic variants are located in noncoding genome regions. Pervasive transcription of the genome results in an abundance of noncoding RNAs, in particular in higher organisms. For 95% of such long noncoding RNA (lncRNA) genes, the function is currently unknown. In FANTOM6, we generated functional annotations of lncRNAs by analyzing the cellular and transcriptome response after knocking down specific lncRNAs using LNA GapmeR antisense oligos (Ramilowski et al., Genome Research, 2020).

Chromatin structure — As lncRNAs tend to be enriched in the nucleus, they may play a role in gene regulation by diffusing in the nucleus and interacting with chromatin regions nearby in 3D space. We also functionally annotated >10,000 lncRNAs based on their physical association with protein-coding genes as observed in chromatin conformation data. RNA-chromatin and RNA-protein interaction data suggested that lncRNAs act as scaffolds to guide regulatory proteins to their genomic target sites (Agrawal et al., PLOS ONE, 2024). We developed ZENBU-Reports for the interactive visualization and analysis of large sets of next-generation sequencing data, and their dissemination to the community (Severin et al., NAR Genomics and Bioinformatics, 2023).

Annotation of long non-coding RNA LINC02980 (ENSG00000272462) based on chromatin interactions

Applied genomics for biomedicine

Genomics for personalized medicine — We apply bioinformatics analysis to biomedicine in close collaboration with biomedical scientists. In cancer and other complex diseases, the exact genomic causes contributing to the disease will vary between patients, though disease phenotypes may appear similar. An important goal for personalized medicine is to be able to predict which drugs are most likely to be effective in each patient depending on the specific inherited or acquired genomic abnormalities. Using CAGE, we found that the motif activity of TP53 is associated with drug response in acute myeloid leukemia (Hashimoto et al., Nature Cancer, 2021).

The motif activity of TP53 correlation with the In vitro responsiveness of acute myeloid leukemia cells to the drug AZD5582.

 

RNA processing in cancer — By sequencing short RNAs after knocking down specific RNA processing enzymes, we revealed the degradation pathway of oncomiR miR-21, the most researched microRNA due to its importance in cancer (Boele et al. PNAS, 2014).

Adenylation and degradation of oncomiR miR-21 in cancer.

3D genomics

Sequencing-based genomics excels at creating atlases of the genome and catalogs of biomolecules, as well as at finding global patterns among them using statistical methods. But what is the biophysical reality behind the statistics? Progress in molecular biology and its application in biomedicine often requires understanding the precise molecular mechanisms governing specific genes, for example to elucidate key regulatory pathways in cell differentiation, or to identify crucial oncogenes for targeting by anti-cancer drugs in personalized medicine. To enable detailed mechanistic studies of such essential regulatory mechanisms, we are developing methods to image the 3D structure of the genome and the location of regulatory biomolecules in the nuclear environment.

Our goal is to develop 3D genomics as a new field to understand genome function at the biophysical level by integrating sequencing data analysis with EM imaging of the nucleus and taking advantage of the resolution revolution in genomics and genetics. Specifically, we aim to generate an annotated image of the 3D regulatory environment of a gene. This requires the development of experimental methods to stain and label specific molecules in the nucleus, as well as computational methods for automated image analysis and annotation. 3D genomics complements sequencing with EM imaging with sequencing to understand, at the biophysical level, genome function in general, and promoter/enhancer dynamics specifically.