A workshop on Hi-C data will be held at INSA Toulouse, from the 4th (pm) to the 5th (am) December 2019.
The precise location of the symposium is room 109, 1st floor of building 20 on this map.
This workshop aims at bringing together statisticians, bioinformaticians and biologists interested by the topic of chromatine comformation and Hi-C data. The following speakers have already confirmed their participation:
- Frédéric Bantignies (IGH, Montpellier, France)
- Nicolas Servant (Institut Curie, Paris, France)
- Marco Di Stefano (CNAG-CRG, Barcelona, Spain)
- 13h30-14h: Welcoming and opening
- 14h-15h: Nicolas Servant (Institut Curie, Paris, France) Efficient processing of Hi-C data and application to cancer abstract
Over the past decade, major advances in high-throughput sequencing have allowed the development of new epigenetics approaches. Among them, the Hi-C technique was proposed as a genome-wide method to explore the chromatin organization in three-dimension (3D). Since then, the spatial organization of the genome and the physical interactions occurring within and between chromosomes has been described as a key factor of gene regulation and genome functions in general.
However, as any genome-wide sequencing data, Hi-C usually requires several millions to billions of paired-end sequencing reads, depending on genome size and on the desired resolution. Managing these data thus requires optimized bioinformatics workflows able to extract the contact frequencies in reasonable computational time and with reasonable resource and storage requirements. In this context, we developed a couple of years ago, HiC-Pro (https://github.com/nservant/HiC-Pro), an optimized and flexible pipeline for processing Hi-C data from raw reads to normalized contact maps. Today, I would like to focus on a new collaborative project called nf-core-hic (https://github.com/nf-core/hic) which is a Nextflow-based pipeline for Hi-C data analysis. The current version of nf-core-hic is dedicated to data processing, but in the coming months, we could like to encourage the community to further develop this pipeline, including additional analytical steps.
In addition, I will discuss the current computational challenges that emerge when Hi-C is applied on cancer cells. Given the important recent insights that chromosome conformation techniques have provided into 3D genome organization in a normal context, the application of such approach to a disease context offers the possibility to further explore the genome organization of cancer cells, and its impact on cell regulation. I will demonstrate why the Hi-C cancer data require dedicated normalization method, and how we can solve these issues through two recent normalization methods that we have developed in this purpose.
- 15h-15h30: Vera Pancaldi (CRCT, Toulouse, France) Chromatin 3D organization principles revealed by network theory: gene-regulation, replication, and beyond
Recent technological advances have allowed us to map chromatin conformation and uncover the spatial organization of the genome inside the nucleus. These experiments have revealed the complexities of genome folding, characterized by the presence of loops and domains at different scales which can change across development and cell types. Many approaches have been employed to describe 3D genome organization, which can be broadly divided into polymer physics models, constraint based models and statistical approaches. An increasingly popular representation of chromatin is given by networks, in which genomic fragments are the nodes and connections represent experimentally observed spatial proximity of two genomically distant regions. This formalism, applied to promoter centred chromatin interaction networks generated by promoter capture HiC, has allowed us to consider a variety of chromatin features in association with the 3D structure. In particular, we exploited a known popular network metric to define Chromatin Assortativity: the tendency for regions of chromatin with similar properties to preferentially interact with each other. In addition to recapitulating known results, measuring chromatin assortativity of tens of features in mouse embryonic stem cells led us to novel biological insight on gene regulation .
Moreover, we have characterized DNA replication in a 3D chromatin context, generating novel maps of replication origins in mouse embryonic stem cells under normal conditions and during DNA replication stress. These origins were then contextualized by projection on a promoter-centred chromatin contact network defined at a few kb resolution. We found that replication origins with similar efficiency and genomic regions of similar replication timing interact with each other preferentially . These findings suggest that DNA replication takes place in the context of hierarchical multi-scale structures spanning tens of megabases and even bridging chromosomes. More specifically, origins that interact with others tend to replicate earlier and with higher efficiency. The changes of origin activation patterns in normal and stressed conditions support a stochastic model of activation in which both local and global chromatin properties modulate efficiency. Finally, we propose tools to investigate chromatin organization at different scales using networks, in particular an R package and an online chromatin network interaction viewer building on this framework. The ChAseR package allows users to efficiently integrate genome-wide datasets or lists of genomic regions with 3D chromatin interaction networks. It then efficiently computes Chromatin Assortativity of these features, highlighting the ones that are most strongly associated with genome architecture and performing different kinds of randomizations to assess the significance of these associations. Furthermore, we have developed GARDE-NET ( https://pancaldi.bsc.es/garden-net ), a web-portal where users can visualize multiple chromatin networks (>10 human PCHiC datasets and mouse embryonic stem cell PCHiC so far) in combination with pre-loaded chromatin features (histone modification peaks etc.) and with a chance to upload their own chromatin features of interest .
We will conclude by reflecting on general organization principles in genome architecture that can be revealed by applying this formalism.
 Pancaldi et al. Genome Biology 17 (1), 152 2016
 Jodkowska, Pancaldi et al. bioRxiv 644971 2019
 Madrid-Mencia, Raineri and Pancaldi, bioRxiv 717298 2019
- 15h30-16h00: Coffee break
- 16h-17h: Frédéric Bantignies (IGH, Montpellier, France) Super-resolution imaging reveals principles of physical chromatin folding in eukaryotes
The recent application of the high-throughput Chromosome Conformation Capture (Hi-C) method has revealed that the genome of many species is organized into domains of preferential internal chromatin interactions commonly named «Topologically Associating Domains » (TADs). The presence of these domains emerged as a key feature of higher-order genome organization, and they have been proposed to define regulatory landscapes through the spatial regulation of chromatin contacts between genes and cis-regulatory elements such as enhancers. However, Hi-C data generally represent averaged interaction profiles coming from millions of cells, making difficult the characterization of the physical nature of TADs. Using a combination of DNA Fluorescent in-Situ Hybridization and super-resolution 3D-Structured Illumination Microscopy, we imaged at a sub-diffraction resolution a large number of chromosomal loci in individual cells and uncovered general features of TAD structural properties. In Drosophila, in which TADs correspond to chromatin epigenetic landscapes, we observed that repressed TADs form discrete nanocompartments interspersed by decondensed active chromatin. Single-cell analysis revealed that Drosophila TADs form dynamic yet physically insulated genome units, consistent with a steady segregation of active and repressed chromatin. These results support a physical basis for chromosomal domains in the regulation of DNA-dependent processes. Given the diversity of TAD features across species and during cell differentiation, we are currently investigating the principles underlying TAD physical folding during mouse embryonic stem cell differentiation. This study will shed light into the mechanisms of chromatin folding and provide new insights into the relationship between the structure and the function of TADs.
- 17h-17h30: Maria Marti-Marimon (CNAG-CRG, Barcelona, Spain) Major reorganization of chromosome conformation during late muscle development
The three dimensional organization of the genome plays a major role in the regulation of gene expression. Chromosome territories, compartments, topological domains, and loops, are the main features of the genome topology. Most of these features are quite stable ensuring a suitable niche for maintaining either transcriptional activation or repression. However, the structural plasticity of the chromatin also permits conformational changes that may lead to alterations in the transcriptional activity. These dynamic changes are particularly remarkable during gene expression reprograming occurring in early development (i.e. zygote genome activation, transition from pluripotent to lineage-committed cells, and cell differentiation).
However, these dynamic events remain poorly understood, especially those concerning late development and tissue maturity processes. Our study offers new insights into the 3D genome organization dynamics at late gestation in mammals. More precisely, we addressed the global genome organization of porcine muscle nuclei at 90 and 110 days of gestation by performing in situ Hi-C experiments. This stage of gestation is a relevant period for porcine muscle development and maturity, as already shown in a previous transcriptome study. We obtained evidence of important topological changes in the 3D genome structure at this period that are associated to variations in gene expression. This dynamic changes correspond to a global fragmentation of the genome, switches of compartment type, differential chromatin interactions and dynamics of the telomeric regions. Overall, our study shows that extensive conformational changes occur in late development even though the gene expression program does not change as much as during early development.
- 9h-10h: Marco Di Stefano (CNAG-CRG, Barcelona, Spain) Exploring the dimensions of the genome organization: 1D chromatin tracks and 2D interaction maps for generating 4D models
The characterization of the genome structure and functional state has been boosted by the concomitant development of different genomic techniques. Each of these experiments gives a different layer of information from the localization along the (1D) genome sequence of histone epigenetic modifications to the frequency of interactions in the 3D space of specific loci. The interpretation and integration of these layers have been facilitated by complementary computational techniques. We contributed mostly to the computational efforts and, during my talk, I will discuss the latest developments.
Firstly, I will cover our attempts to reconstruct 3D models of the entire A. thaliana genome. The characterization of the genome organization in this plant is challenging because it presents specific features that still lack a seamless interpretation in terms of biophysical mechanisms. These include preferential positioning of various structural futures as the nucleolus in the nuclear centre, and the telomeres and the centromeres at the nucleolar periphery. The 2D Hi-C interaction maps unveiled also specific contact patterns, such as stripes and strongly interacting intra- and inter-chromosome (IHIs) regions. We tested whether the integration of 1D epigenetic tracks in physical-models of chromosomes can unveil basic physical principles that recapitulate all these structural features. We partitioned the genome in epigenetic states and applied simple short-range interactions them in molecular dynamics simulations in 3D chromosome models. Interestingly, we found that by applying attractions within the eu- and the heterochromatic regions, and repulsions between heterochromatin and the other chromatin states (euchromatin and polycomblike) can produce 3D models, which account for almost all the genomic structural features showing an intrinsic interplay between epigenetic states and 3D genome structure in A. thaliana.
Then, I will present TADdyn our novel computational tool to characterise the genome structural organization in 4D. Indeed, TADdyn, combining polymerbased chromatin representation and time-series 2D Hi-C datasets, allows to study how chromosomes regions rearrange over time. We implemented and used several measures to characterize the structure and the dynamics of modelled chromatin loci, providing valuable insight on the 3D and 4D chromatin organization that goes beyond the static picture characterized by the 2D Hi-C interaction maps. For example, we used TADdyn to study the Sox2 activation dynamics during cell reprogramming of mouse B cells to Pluripotent cells. We found that during activation Sox2 is embedded inside a structural domain (cage) that constraints within a confined space the dynamics of the Sox2 transcription starting site (TSS). The caging maximizes the contacts between the TSS and the annotated Sox2 super-enhancer region and, more in general, forms aspatial neighbourhood of open and active regions around the TSS. These results point to a strong interplay between genomic structure and function that can be further investigated and unravelled for different loci and other biological processes by using TADdyn.
- 10h-10h30: Raphaël Mourad (IBCG, Toulouse, France) Studying 3D genome evolution using genomic sequence
Motivation: The 3D genome is essential to numerous key processes such as the regulation of gene expression and the replication-timing program. In vertebrates, chromatin looping is often mediated by CTCF, and marked by CTCF motif pairs in convergent orientation. Comparative Hi-C recently revealed that chromatin looping evolves across species. However, Hi-C experiments are complex and costly, which currently limits their use for evolutionary studies over a large number of species. Results: Here, we propose a novel approach to study the 3D genome evolution in vertebrates using the genomic sequence only, e.g. without the need for Hi-C data. The approach is simple and relies on comparing the distances between convergent and divergent CTCF motifs by computing a ratio we named the 3D ratio or "3DR". We show that 3DR is a powerful statistic to detect CTCF looping encoded in the human genome sequence, thus reflecting strong evolutionary constraints encoded in DNA and associated with the 3D genome. When comparing vertebrate genomes, our results reveal that 3DR which underlies CTCF looping and TAD organization evolves over time and suggest that ancestral character reconstruction can be used to infer 3DR in ancestral genomes. Availability: The R code is available at https://github.com/morphos30/PhyloCTCFLooping.
- 10h30-11h00: Coffee break
- 11h-11h30: Cyril Kurylo (GenPhySE, INRA, Toulouse, France) Detecting and comparing genomic compartments abstract
Genomic compartmentalization is a biological factor affecting cell functionality. Analysis of data produced by the Hi-C protocol reveals compartmentalization of chromatin in the nucleus, which can vary as a tissue develops. Today, existing methods to detect genomic compartmentalization are limited in at least one of the following ways: detecting compartments qualitatively with no confidence measure, ignoring experimental biases, and/or dismissing replicate variability.
We propose an improvement over existing methodology to detect compartments and compare compartmentalization between conditions. First, we properly correct the diverse technological and biological biases inherent to Hi-C data. Then, we use an unsupervised learning method, constrained k-means, to detect compartments from normalized data. This method enables us to produce quantitative “concordance” values for each genomic region in each replicate, supporting our compartment predictions. Finally, we use these concordance values for differential analysis of compartmentalization between conditions. From their distributions, we obtain p-values revealing the significance of each predicted compartment change.
The method was implemented in an R package available on github.com/mzytnicki/HiCDOC, and was validated with Hi-C data originating from muscles of fetal pigs.
Our data consists of three biological replicates at 90 days of pregnancy and three biological replicates at 110 days of pregnancy. The detected compartment changes open a way towards a better understanding of neonatal mortality affecting piglets.
- 11h30-12h00: Nathanaël Randriamihamison (MIAT, INRA, Toulouse, France) Hi-C differential analysis: a new method using tree representation based on Contiguity Constrained Hierarchical Agglomerative Clustering
Hi-C data measures the spatial proximity between pairs of genomic positions and gives insights on the 3D organization of DNA. Hi-C data have already allowed to show/confirm the existence of biologically relevant structures (such as Topologically Associating Domains, A/B compartments, ...) that play an important role in the regulation of gene expression. The aim of Hi-C differential analysis is to find significant differences in 3D structure of the genome between two sets of Hi-C matrices, respectively corresponding to two biological conditions (cell lines, fetal development stages, ...). In this presentation, we will provide a short state of the art of existing methods for Hi-C differential analysis, which usually focus on individual comparisons of the matrix entries. However, these approaches do not account for the hierarchical aspect of the data and might result in difficulties for the interpretation and to understand the structural differences between conditions. We will present the ideas for a new differential analysis method based on Hierarchical Agglomerative Clustering with Contiguity Constraint (CCHAC). CCHAC is performed on individual Hi-C matrices to represent the hierarchical structure on the form of a binary tree, called dendrogram. The problem of Hi-C differential analysis is then translated to a tree comparison problem and handled using tree distances.
Registration is free but mandatory here.
More information here
This symposium is supported by the CNRS Mission for Interdisciplinarity (project ”SCALES”) and by IMABS.
Organizers: Pierre Neuvial (CNRS, IMT), Sylvain Foissac (INRA, GenPhySE) and Nathalie Vialaneix (INRA, MIAT).