Gene-peak association

A crucial step in the Pando workflow is the association of genes with putative regulatory regions inside the function infer_grn(). Per default, this is done by proximity alone like in the Signac function LinkPeaks(). However, there are other ways to do this. GREAT also utilizes proximity, but also takes regulatory domains of neighboring genes into account. This way of gene-peak association is also implemented in Pando and accessible by setting peak_to_gene_method='GREAT'.

muo_data <- infer_grn(
    muo_data,
    peak_to_gene_method = 'GREAT'
)

An even more interesting and likely better strategy would be to determine these regulatory domains experimentally. Hi-C data can be utilized to identify topologically associating domains (TADs) which constrain regulatory interactions. This can also reveal some more distal enhancer-promoter interactions that might otherwise be missed. In Pando such information can be incorporated by using the peak_to_gene_domains argument of infer_grn(). It takes a GenomicRanges object with regulatory regions and a gene_name column indicating the corresponding gene. Here’s an example of how this could look like:

gene_domains

## GRanges object with 4 ranges and 4 metadata columns:
##       seqnames              ranges strand |         gene_id   gene_biotype
##          <Rle>           <IRanges>  <Rle> |     <character>    <character>
##   [1]     chr1   67898473-68433120      - | ENSG00000116729 protein_coding
##   [2]     chr3 193936145-194338732      + | ENSG00000114315 protein_coding
##   [3]     chr7   31137461-31540894      - | ENSG00000164600 protein_coding
##   [4]     chr5 174524533-174930893      + | ENSG00000120149 protein_coding
##         gene_name region_type
##       <character> <character>
##   [1]         WLS         TAD
##   [2]        HES1         TAD
##   [3]     NEUROD6         TAD
##   [4]        MSX2         TAD
##   -------
##   seqinfo: 25 sequences from an unspecified genome; no seqlengths

This can be quite flexibly utilized, since you can manually adapt the regulatory regions for each gene and also provide multiple regions per gene, e.g. if you have information about individual enhancer-promoter contacts. Providing this object to infer_grn() will now only associate peaks within the provided regions with the gene.

muo_data <- infer_grn(
    muo_data,
    peak_to_gene_domains = gene_domains
)

Jonas Simon Fleck

5/18/2022