Chorus provides multiple visualization functions for genomic tracks and predictions.
The primary function for visualizing Chorus predictions with gene annotations.
visualize_chorus_predictions(
predictions: Dict[str, np.ndarray],
chrom: str,
start: int,
track_ids: List[str],
output_file: Optional[str] = None,
bin_size: int = 128,
style: str = 'modern',
use_pygenometracks: bool = True,
figsize: Optional[Tuple[float, float]] = None,
gtf_file: Optional[str] = None,
show_gene_names: bool = True
) -> NoneFeatures:
- Automatic track coloring based on assay type
- Gene annotation track (when GTF provided)
- PyGenomeTracks support for publication-quality figures
- Matplotlib fallback for quick visualization
Example:
# Basic visualization
visualize_chorus_predictions(
predictions=oracle_predictions,
chrom='chrX',
start=48726820, # Use output window start for Enformer
track_ids=['ENCFF413AHU', 'CNhs11250'],
output_file='predictions.png'
)
# With gene annotations
visualize_chorus_predictions(
predictions=oracle_predictions,
chrom='chrX',
start=48726820,
track_ids=['ENCFF413AHU', 'CNhs11250'],
gtf_file='gencode.v48.gtf.gz',
show_gene_names=True,
output_file='predictions_with_genes.png'
)Track Colors (automatic assignment):
- DNase/ATAC: Blue (#1f77b4)
- CAGE: Orange (#ff7f0e)
- ChIP-seq: Red (#d62728)
- Others: Purple (#9467bd)
Visualize multiple BedGraph track files.
visualize_tracks(
tracks_filenames: List[str],
track_names: List[str],
scales: Optional[List[Tuple[float, float]]] = None,
colors: Optional[List[str]] = None,
output_file: Optional[str] = None,
genomic_region: Optional[str] = None,
figure_size: Optional[Tuple[float, float]] = None,
style: str = 'default'
) -> NoneExample:
# Visualize saved BedGraph files
visualize_tracks(
tracks_filenames=[
'wt_ENCFF413AHU.bedgraph',
'mutant_ENCFF413AHU.bedgraph'
],
track_names=['Wild-type DNase', 'Mutant DNase'],
genomic_region='chrX:48780000-48790000',
scales=[(0, 20), (0, 20)], # Same scale for comparison
colors=['blue', 'red'],
output_file='comparison.png'
)Create heatmaps across multiple regions.
plot_track_heatmap(
tracks: List[Union[str, pd.DataFrame]],
track_names: List[str],
genomic_regions: List[str],
output_file: Optional[str] = None,
cmap: str = 'RdBu_r',
normalize_tracks: bool = True,
cluster_tracks: bool = False,
cluster_regions: bool = False
) -> NoneExample:
# Compare signals across promoters
regions = [
'chr11:5247000-5248000', # HBB
'chr11:5269000-5270000', # HBD
'chr16:226000-227000' # HBA1
]
plot_track_heatmap(
tracks=['dnase_erythroid.bedgraph', 'h3k4me3_erythroid.bedgraph'],
track_names=['DNase', 'H3K4me3'],
genomic_regions=regions,
normalize_tracks=True,
cluster_regions=True,
output_file='promoter_heatmap.png'
)Compare and correlate two tracks.
plot_track_comparison(
track1_file: str,
track2_file: str,
track1_name: str,
track2_name: str,
genomic_region: Optional[str] = None,
output_file: Optional[str] = None,
correlation_method: str = 'pearson'
) -> Dict[str, float]Example:
# Compare DNase and ATAC
stats = plot_track_comparison(
track1_file='dnase_k562.bedgraph',
track2_file='atac_k562.bedgraph',
track1_name='DNase-seq',
track2_name='ATAC-seq',
genomic_region='chr1:1000000-2000000',
correlation_method='pearson',
output_file='dnase_vs_atac.png'
)
print(f"Correlation: {stats['correlation']:.3f}")
print(f"P-value: {stats['p_value']:.3e}")High-quality visualization using pyGenomeTracks.
plot_tracks_with_pygenometracks(
track_files: List[str],
genomic_region: str,
output_file: str,
track_config: Optional[Dict[str, Dict]] = None,
gtf_file: Optional[str] = None,
height_ratios: Optional[List[float]] = None,
width: float = 10,
dpi: int = 300
) -> boolExample:
# Publication-quality figure
success = plot_tracks_with_pygenometracks(
track_files=[
'dnase.bedgraph',
'cage.bedgraph',
'h3k4me3.bedgraph'
],
genomic_region='chrX:48780000-48790000',
output_file='figure_2a.pdf',
track_config={
'dnase.bedgraph': {
'color': '#1f77b4',
'style': 'fill',
'height': 3,
'max_value': 50
},
'cage.bedgraph': {
'color': '#ff7f0e',
'style': 'line:2',
'height': 3,
'max_value': 200
}
},
gtf_file='gencode.gtf',
width=8,
dpi=300
)When a GTF file is provided, gene tracks show:
- Gene bodies as rectangles
- Strand direction (+ strand: blue, - strand: red)
- Gene names as labels
- Arrows indicating transcription direction
Enformer has asymmetric input/output windows:
- Input: 393,216 bp
- Output: 114,688 bp (centered)
Important: When visualizing Enformer predictions, use the output window coordinates:
# Get output window coordinates
region_center = (start + end) // 2
output_start, output_end = oracle.get_output_window_coords(region_center)
# Visualize using output coordinates
visualize_chorus_predictions(
predictions=predictions,
chrom='chrX',
start=output_start, # NOT the original start
track_ids=track_ids
)- Clean white background
- Colored tracks with transparency
- Grid lines for reference
- Track statistics overlay
- Reduced visual elements
- No spines or ticks
- Focus on data
- High contrast
- Suitable for journals
- No decorative elements
- PNG: Best for presentations and web
- PDF: Vector format for publications
- SVG: Editable vector format
# High resolution for print
plt.savefig('figure.png', dpi=300, bbox_inches='tight')
# Screen resolution
plt.savefig('figure.png', dpi=100)- Use PyGenomeTracks for complex figures:
pip install pyGenomeTracks- Consistent scales across conditions:
max_val = max(np.max(wt), np.max(mutant))
scales = [(0, max_val), (0, max_val)]- Color accessibility:
- Use colorblind-friendly palettes
- Avoid red-green combinations
- Test with grayscale printing
- Font sizes:
plt.rcParams['font.size'] = 12
plt.rcParams['axes.labelsize'] = 14
plt.rcParams['axes.titlesize'] = 16For exploring data interactively:
- Jupyter notebooks: Inline plots with zoom/pan
- Genome browsers: Export BedGraph → IGV/UCSC
- Plotly (future support): Interactive web plots
# Ensure matplotlib backend is set
%matplotlib inline # For Jupyter
# Or use explicit display
from IPython.display import display
display(fig)# Install separately
pip install pyGenomeTracks
# Or use matplotlib fallback
use_pygenometracks=False- Downsample data for visualization
- Use specific genomic regions
- Save to file instead of displaying
fig, axes = plt.subplots(2, 1, figsize=(12, 6), sharex=True)
# Plot wild-type
visualize_predictions_on_ax(wt_predictions, axes[0], 'Wild-type')
# Plot with enhancer
visualize_predictions_on_ax(enhancer_predictions, axes[1], 'With Enhancer')
plt.tight_layout()regions = ['chr1:1000-2000', 'chr2:3000-4000', 'chr3:5000-6000']
fig, axes = plt.subplots(len(regions), 1, figsize=(12, 3*len(regions)))
for region, ax in zip(regions, axes):
predictions = oracle.predict(region, tracks)
visualize_predictions_on_ax(predictions, ax, region)# Show effects of all variants in a region
effects = []
positions = []
for pos in range(start, end):
result = oracle.predict_variant_effect(...)
max_effect = np.max(np.abs(result['effect_sizes']['alt_1']['ENCFF413AHU']))
effects.append(max_effect)
positions.append(pos)
plt.figure(figsize=(12, 4))
plt.bar(positions, effects, width=1)
plt.xlabel('Genomic Position')
plt.ylabel('Max Variant Effect')