association.Rmd
A crucial step in the Pando workflow is the association of genes with putative regulatory regions inside the function infer_grn()
. Per default, this is done by proximity alone like in the Signac
function LinkPeaks()
. However, there are other ways to do this. GREAT also utilizes proximity, but also takes regulatory domains of neighboring genes into account. This way of gene-peak association is also implemented in Pando and accessible by setting peak_to_gene_method='GREAT'
.
muo_data <- infer_grn(
muo_data,
peak_to_gene_method = 'GREAT'
)
An even more interesting and likely better strategy would be to determine these regulatory domains experimentally. Hi-C data can be utilized to identify topologically associating domains (TADs) which constrain regulatory interactions. This can also reveal some more distal enhancer-promoter interactions that might otherwise be missed. In Pando such information can be incorporated by using the peak_to_gene_domains
argument of infer_grn()
. It takes a GenomicRanges
object with regulatory regions and a gene_name
column indicating the corresponding gene. Here’s an example of how this could look like:
gene_domains
## GRanges object with 4 ranges and 4 metadata columns:
## seqnames ranges strand | gene_id gene_biotype
## <Rle> <IRanges> <Rle> | <character> <character>
## [1] chr1 67898473-68433120 - | ENSG00000116729 protein_coding
## [2] chr3 193936145-194338732 + | ENSG00000114315 protein_coding
## [3] chr7 31137461-31540894 - | ENSG00000164600 protein_coding
## [4] chr5 174524533-174930893 + | ENSG00000120149 protein_coding
## gene_name region_type
## <character> <character>
## [1] WLS TAD
## [2] HES1 TAD
## [3] NEUROD6 TAD
## [4] MSX2 TAD
## -------
## seqinfo: 25 sequences from an unspecified genome; no seqlengths
This can be quite flexibly utilized, since you can manually adapt the regulatory regions for each gene and also provide multiple regions per gene, e.g. if you have information about individual enhancer-promoter contacts. Providing this object to infer_grn()
will now only associate peaks within the provided regions with the gene.
muo_data <- infer_grn(
muo_data,
peak_to_gene_domains = gene_domains
)