gatac.tl.DNAMotif#

class gatac.tl.DNAMotif(id, pwm, name=None, family=None, pfm=None)#

Bases: object

DNA motif represented as a position weight matrix (PWM).

Parameters:
id str

pwm np.ndarray

name Optional[str]

family Optional[str]

pfm Optional[np.ndarray]

id#

Unique identifier for the motif

Type:

str

name#

Human-readable name

Type:

str, optional

family#

Transcription factor family

Type:

str, optional

pwm#

Position weight matrix of shape (length, 4) with columns [A, C, G, T]

Type:

np.ndarray

Examples

>>> import numpy as np
>>> from gatac.tl import DNAMotif
>>> pwm = np.array([
...     [0.10, 0.10, 0.70, 0.10],   # strong G
...     [0.70, 0.10, 0.10, 0.10],   # strong A
...     [0.10, 0.10, 0.10, 0.70],   # strong T
...     [0.25, 0.25, 0.25, 0.25],   # ambiguous
... ])
>>> motif = DNAMotif(id="MA0001.1", pwm=pwm, name="Example-TF")
>>> len(motif)
4
>>> motif.info_content()  # total bits of information
__init__(id, pwm, name=None, family=None, pfm=None)#

Initialize a DNAMotif.

Parameters:
id str

Unique identifier

pwm np.ndarray

Position weight matrix, shape (length, 4)

name str, optional

Human-readable name

family str, optional

Transcription factor family

pfm np.ndarray, optional

Raw position frequency (count) matrix, shape (length, 4). When provided, to_log_odds(mode="motifmatchr") applies a MOODS-compatible pseudocount to the counts before computing log-odds, matching R’s motifmatchr scoring exactly.

Methods

__init__(id, pwm[, name, family, pfm])

Initialize a DNAMotif.

info_content()

Compute total information content of the motif in bits.

reverse_complement()

Return reverse complement of this motif.

to_log_odds([bg_probs, mode, pseudocount])

Convert PWM to log-odds scores.

info_content()#

Compute total information content of the motif in bits.

Return type:

float

to_log_odds(bg_probs=(0.25, 0.25, 0.25, 0.25), mode='gatac', pseudocount=0.8)#

Convert PWM to log-odds scores.

Parameters:
bg_probs tuple

Background nucleotide probabilities (A, C, G, T)

mode {"gatac", "motifmatchr"}, default "gatac"

  • “gatac”: Natural log-odds with minimal pseudocount

  • ”motifmatchr”: Log2-odds matching MOODS/motifmatchr scoring. When raw counts (pfm) are available, applies a MOODS-compatible pseudocount: prob = (count + pseudocount * bg) / (row_sum + pseudocount)

pseudocount float, default 0.8

Pseudocount multiplier used in motifmatchr mode when raw counts (self.pfm) are available. Ignored for probability-only motifs (e.g. from MEME files where the pseudocount is already baked in).

Returns:

np.ndarray Log-odds matrix of shape (length, 4)

Return type:

ndarray

reverse_complement()#

Return reverse complement of this motif.

Returns:

DNAMotif New motif with reversed and complemented PWM

Return type:

DNAMotif