Skip to content

Dog genes #4

@bschilder

Description

@bschilder

Hey @igordot , hope all is well with you!

My colleagues and I noticed something odd in the Canis lupus familiaris orthologs in orthogene, and I traced it back to babelgene. See here for the full details: neurogenomics/orthogene#30

In the "symbol" column, I expect this to be the gene symbol as in the format of the origin species (in this case dog). This seems to be the case for other species like mouse, but not for dog.

source_id <- 9615
orths <- babelgene:::orthologs_df
orths <- subset(orths, taxon_id == source_id)
head(orths, 20)

Screenshot 2023-03-31 at 15 46 18

Delving a bit deeper, i checked the number of human gene symbols that are identical to their non-human orthologs in each species. This varies quite a bit across species (even within mammals), but I was wondering if this is expected? I'm familiar with the different casing with mouse genes, but not sure what the convention is for the other species.

counts <- dplyr::group_by(babelgene:::orthologs_df, taxon_id) |>
    dplyr::summarise(identical_symbols=sum(human_symbol==symbol, na.rm = TRUE),
                     percent=sum(human_symbol==symbol, na.rm = TRUE)/dplyr::n()*100) |>
    dplyr::arrange(dplyr::desc(percent))
counts$species_name <- orthogene::map_species(counts$taxon_id, method = "babelgene")
counts

Screenshot 2023-03-31 at 15 55 57

Repeated using the agr_orthologs_df data:

   counts_agr <- dplyr::group_by(babelgene:::agr_orthologs_df, species_name) |>
        dplyr::summarise(identical_symbols=sum(human_symbol==species_symbol),
                         percent=sum(human_symbol==species_symbol)/dplyr::n()*100) |>
        dplyr::arrange(dplyr::desc(percent))
counts_agr

Screenshot 2023-03-31 at 15 55 18

Thanks!,
Brian

Session info

Using babelgene v22.9

Details ``` R version 4.2.1 (2022-06-23) Platform: x86_64-apple-darwin17.0 (64-bit) Running under: macOS Ventura 13.2.1

Matrix products: default
LAPACK: /Library/Frameworks/R.framework/Versions/4.2/Resources/lib/libRlapack.dylib

locale:
[1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8

attached base packages:
[1] stats graphics grDevices utils datasets methods base

other attached packages:
[1] ggtree_3.6.2 orthogene_1.5.1

loaded via a namespace (and not attached):
[1] utf8_1.2.3 devoptera_0.99.0 R.utils_2.12.2
[4] RUnit_0.4.32 tidyselect_1.2.0 RSQLite_2.3.0
[7] AnnotationDbi_1.60.2 htmlwidgets_1.6.2 grid_4.2.1
[10] combinat_0.0-8 devtools_2.4.5 munsell_0.5.0
[13] codetools_0.2-19 DT_0.27 miniUI_0.1.1.1
[16] withr_2.5.0 colorspace_2.1-0 Biobase_2.58.0
[19] filelock_1.0.2 TreeTools_1.9.1 knitr_1.42
[22] rstudioapi_0.14 stats4_4.2.1 ggsignif_0.6.4
[25] MatrixGenerics_1.10.0 Rdpack_2.4 labeling_0.4.2
[28] GenomeInfoDbData_1.2.9 mnormt_2.1.1 optimParallel_1.0-2
[31] topGO_2.50.0 bit64_4.0.5 farver_2.1.1
[34] rprojroot_2.0.3 coda_0.19-4 vctrs_0.6.1
[37] treeio_1.23.1 generics_0.1.3 clusterGeneration_1.3.7
[40] xfun_0.37 BiocFileCache_2.6.1 R6_2.5.1
[43] doParallel_1.0.17 GenomeInfoDb_1.34.9 rsvg_2.4.0
[46] grImport2_0.2-0 bitops_1.0-7 cachem_1.0.7
[49] gridGraphics_0.5-1 DelayedArray_0.24.0 promises_1.2.0.1
[52] scales_1.2.1 gtable_0.3.3 biocViews_1.66.3
[55] processx_3.8.0 phangorn_2.11.1 ontologyPlot_1.6
[58] rlang_1.1.0 scatterplot3d_0.3-43 rstatix_0.7.2
[61] lazyeval_0.2.2 broom_1.0.4 BiocManager_1.30.20
[64] yaml_2.3.7 abind_1.4-5 ggimage_0.3.1.002
[67] backports_1.4.1 httpuv_1.6.9 RBGL_1.74.0
[70] tools_4.2.1 usethis_2.1.6 ggplotify_0.1.0
[73] ggplot2_3.4.1 ellipsis_0.3.2 paintmap_1.0
[76] RColorBrewer_1.1-3 BiocGenerics_0.44.0 sessioninfo_1.2.2
[79] Rcpp_1.0.10 plyr_1.8.8 base64enc_0.1-3
[82] zlibbioc_1.44.0 purrr_1.0.1 RCurl_1.98-1.10
[85] ps_1.7.3 prettyunits_1.1.1 ggpubr_0.6.0
[88] urlchecker_1.0.1 S4Vectors_0.36.2 SummarizedExperiment_1.28.0
[91] grr_0.9.5 fs_1.6.1 here_1.0.1
[94] magrittr_2.0.3 data.table_1.14.8 magick_2.7.4
[97] SparseM_1.81 R.cache_0.16.0 matrixStats_0.63.0
[100] pkgload_1.3.2 patchwork_1.1.2 mime_0.12
[103] evaluate_0.20 xtable_1.8-4 XML_3.99-0.14
[106] jpeg_0.1-10 IRanges_2.32.0 compiler_4.2.1
[109] tibble_3.2.1 maps_3.4.1 crayon_1.5.2
[112] R.oo_1.25.0 htmltools_0.5.4 ggfun_0.0.9
[115] later_1.3.0 tidyr_1.3.0 aplot_0.1.10
[118] expm_0.999-7 ontoProc_1.20.0 DBI_1.1.3
[121] gprofiler2_0.2.1 dbplyr_2.3.2 MASS_7.3-58.3
[124] rappdirs_0.3.3 babelgene_22.9 Matrix_1.5-3
[127] car_3.1-1 piggyback_0.1.4 cli_3.6.1
[130] quadprog_1.5-8 R.methodsS3_1.8.2 rbibutils_2.2.13
[133] parallel_4.2.1 igraph_1.4.1 GenomicRanges_1.50.2
[136] pkgconfig_2.0.3 numDeriv_2016.8-1.1 plotly_4.10.1
[139] foreach_1.5.2 stringdist_0.9.10 XVector_0.38.0
[142] BiocCheck_1.34.3 yulab.utils_0.0.6 stringr_1.5.0
[145] callr_3.7.3 digest_0.6.31 phytools_1.5-1
[148] graph_1.76.0 Biostrings_2.66.0 rmarkdown_2.20.1
[151] fastmatch_1.1-3 tidytree_0.4.2 curl_5.0.0
[154] shiny_1.7.4 lifecycle_1.0.3 nlme_3.1-162
[157] jsonlite_1.8.4 carData_3.0-5 OmaDB_2.14.0
[160] viridisLite_0.4.1 fansi_1.0.4 pillar_1.9.0
[163] ontologyIndex_2.10 lattice_0.20-45 homologene_1.4.68.19.3.27
[166] KEGGREST_1.38.0 fastmap_1.1.1 httr_1.4.5
[169] plotrix_3.8-2 pkgbuild_1.4.0 GO.db_3.16.0
[172] interactiveDisplayBase_1.36.0 glue_1.6.2 remotes_2.4.2
[175] png_0.1-8 iterators_1.0.14 BiocVersion_3.16.0
[178] bit_4.0.5 Rgraphviz_2.42.0 stringi_1.7.12
[181] profvis_0.3.7 blob_1.2.4 AnnotationHub_3.6.0
[184] rphylopic_1.0.0 memoise_2.0.1 dplyr_1.1.1
[187] ape_5.7-1

</details>

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions