gatac.tl.DNAMotif#
- class gatac.tl.DNAMotif(id, pwm, name=None, family=None, pfm=None)#
Bases:
objectDNA motif represented as a position weight matrix (PWM).
- pwm#
Position weight matrix of shape (length, 4) with columns [A, C, G, T]
- Type:
np.ndarray
Examples
>>> import numpy as np >>> from gatac.tl import DNAMotif >>> pwm = np.array([ ... [0.10, 0.10, 0.70, 0.10], # strong G ... [0.70, 0.10, 0.10, 0.10], # strong A ... [0.10, 0.10, 0.10, 0.70], # strong T ... [0.25, 0.25, 0.25, 0.25], # ambiguous ... ]) >>> motif = DNAMotif(id="MA0001.1", pwm=pwm, name="Example-TF") >>> len(motif) 4 >>> motif.info_content() # total bits of information
- __init__(id, pwm, name=None, family=None, pfm=None)#
Initialize a DNAMotif.
- Parameters:
- id
str Unique identifier
- pwm
np.ndarray Position weight matrix, shape (length, 4)
- name
str, optional Human-readable name
- family
str, optional Transcription factor family
- pfm
np.ndarray, optional Raw position frequency (count) matrix, shape (length, 4). When provided,
to_log_odds(mode="motifmatchr")applies a MOODS-compatible pseudocount to the counts before computing log-odds, matching R’s motifmatchr scoring exactly.
- id
Methods
__init__(id, pwm[, name, family, pfm])Initialize a DNAMotif.
Compute total information content of the motif in bits.
Return reverse complement of this motif.
to_log_odds([bg_probs, mode, pseudocount])Convert PWM to log-odds scores.
- to_log_odds(bg_probs=(0.25, 0.25, 0.25, 0.25), mode='gatac', pseudocount=0.8)#
Convert PWM to log-odds scores.
- Parameters:
- bg_probs
tuple Background nucleotide probabilities (A, C, G, T)
- mode
{"gatac", "motifmatchr"}, default"gatac" “gatac”: Natural log-odds with minimal pseudocount
”motifmatchr”: Log2-odds matching MOODS/motifmatchr scoring. When raw counts (
pfm) are available, applies a MOODS-compatible pseudocount:prob = (count + pseudocount * bg) / (row_sum + pseudocount)
- pseudocount
float, default0.8 Pseudocount multiplier used in motifmatchr mode when raw counts (
self.pfm) are available. Ignored for probability-only motifs (e.g. from MEME files where the pseudocount is already baked in).
- bg_probs
- Returns:
np.ndarray Log-odds matrix of shape (length, 4)
- Return type: