Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

enrichKEGG issue:the Description column and ID information are identical in 4.10.0 #742

Open
onmyojiyys opened this issue Nov 19, 2024 · 2 comments

Comments

@onmyojiyys
Copy link

When using version 4.10.0 of clusterProfiler, the output from the enrichKEGG function shows that the Description column and ID information are identical, which did not happen with version 4.6.0. Is this a bug introduced in the newer version? How can it be resolved? The KEGG database used is internally constructed, with the use_internal_data parameter set to TRUE.
x <- enrichKEGG(geneSet, organism = organism, keyType = 'kegg', pvalueCutoff = 0.05, pAdjustMethod = 'BH', minGSSize = 5, maxGSSize = 2000, qvalueCutoff = 0.2, use_internal_data = T)

image

image

@guidohooiveld
Copy link

guidohooiveld commented Nov 25, 2024

Note that you are using an old version of clusterProfiler (4.10.0, current is 4.143). Moreover, it seems that your installation uses mixed Bioconductor packages. Note that clusterProfiler v4.10.x is compatible with Bioconductor v3.18.

Let me give some context: when use_internal_data = T, the the KEGG information in the package KEGG.db will be used. This is not recommended, because of license issues re-packaging of the KEGG database downloaded from their FTP site is not allowed anymore for a long time, and therefore the content of KEGG.db has not been updated since many years, That is also the reason that the package KEGG.db finally has been removed from Bioconductor since release 3.12. (link)

Since KEGG.db was removed from Bioconductor 3.12, and clusterProfiler 4.10.0 corresponds to a later Bioconductor release (3.18), I conclude you have somehow mixed up your installation.

Yet, using clusterProfiler 4.10.x, when setting use_internal_data = FALSE, you can still query KEGG through its API, and then everything looks fine to me...

> library(clusterProfiler)
> library(org.Hs.eg.db)
> 
> ## load and prepare example data / results
> data(geneList, package="DOSE")
> 
> up <- names(geneList)[abs(geneList) > 2]
> 
> ## run ORA using GOBP categories
> res.up <- enrichKEGG(gene = up,
+                      organism = "hsa",
+                      keyType = "kegg",
+                      pvalueCutoff = 0.05,
+                      pAdjustMethod = "BH",
+                      minGSSize = 10,
+                      maxGSSize = 500,
+                      qvalueCutoff = 0.2,
+                      use_internal_data = FALSE)
Reading KEGG annotation online: "https://rest.kegg.jp/link/hsa/pathway"...
Reading KEGG annotation online: "https://rest.kegg.jp/list/pathway/hsa"...
> 
> res.up <- setReadable(res.up, 'org.Hs.eg.db', keyType = "ENTREZID")
> head(res.up)
                                     category
hsa04110                   Cellular Processes
hsa04114                   Cellular Processes
hsa04218                   Cellular Processes
hsa04061 Environmental Information Processing
hsa03320                   Organismal Systems
hsa04814                   Cellular Processes
                                 subcategory       ID
hsa04110               Cell growth and death hsa04110
hsa04114               Cell growth and death hsa04114
hsa04218               Cell growth and death hsa04218
hsa04061 Signaling molecules and interaction hsa04061
hsa03320                    Endocrine system hsa03320
hsa04814                       Cell motility hsa04814
                                                           Description
hsa04110                                                    Cell cycle
hsa04114                                                Oocyte meiosis
hsa04218                                           Cellular senescence
hsa04061 Viral protein interaction with cytokine and cytokine receptor
hsa03320                                        PPAR signaling pathway
hsa04814                                                Motor proteins
         GeneRatio  BgRatio       pvalue     p.adjust       qvalue
hsa04110    15/106 158/8865 4.779149e-10 1.022738e-07 1.001106e-07
hsa04114    10/106 139/8865 5.746555e-06 6.148814e-04 6.018761e-04
hsa04218    10/106 157/8865 1.688773e-05 1.204658e-03 1.179178e-03
hsa04061     8/106 100/8865 2.400030e-05 1.284016e-03 1.256858e-03
hsa03320     7/106  76/8865 3.179854e-05 1.360977e-03 1.332191e-03
hsa04814    10/106 197/8865 1.167349e-04 4.163545e-03 4.075482e-03
                                                                                         geneID
hsa04110 CDC45/CDC20/CCNB2/NDC80/CCNA2/CDK1/MAD2L1/CDT1/TTK/AURKB/CHEK1/TRIP13/CCNB1/MCM5/PTTG1
hsa04114                             CDC20/CCNB2/CDK1/MAD2L1/CALML5/AURKA/CCNB1/PTTG1/ITPR1/PGR
hsa04218                          FOXM1/MYBL2/CCNB2/CCNA2/CDK1/CALML5/CHEK1/CCNB1/CACNA1D/ITPR1
hsa04061                                    CXCL10/CXCL13/CXCL11/CXCL9/CCL18/CCL8/CXCL14/CX3CR1
hsa03320                                              MMP1/FADS2/ADIPOQ/PCK1/FABP4/HMGCS2/PLIN1
hsa04814                        KIF23/CENPE/KIF18A/KIF11/KIFC1/KIF18B/KIF20A/KIF4A/MYH11/DNALI1
         Count
hsa04110    15
hsa04114    10
hsa04218    10
hsa04061     8
hsa03320     7
hsa04814    10
> 

> packageVersion("clusterProfiler")
[1] ‘4.10.1’
> BiocManager::version()
[1] ‘3.18’
> sessionInfo()
R version 4.3.0 (2023-04-21 ucrt)
Platform: x86_64-w64-mingw32/x64 (64-bit)
Running under: Windows 10 x64 (build 19042)

Matrix products: default


locale:
[1] LC_COLLATE=English_United States.utf8 
[2] LC_CTYPE=English_United States.utf8   
[3] LC_MONETARY=English_United States.utf8
[4] LC_NUMERIC=C                          
[5] LC_TIME=English_United States.utf8    

time zone: Europe/Amsterdam
tzcode source: internal

attached base packages:
[1] stats4    stats     graphics  grDevices utils     datasets  methods  
[8] base     

other attached packages:
[1] org.Hs.eg.db_3.18.0    AnnotationDbi_1.64.1   IRanges_2.36.0        
[4] S4Vectors_0.40.2       Biobase_2.62.0         BiocGenerics_0.48.1   
[7] clusterProfiler_4.10.1

loaded via a namespace (and not attached):
 [1] DBI_1.2.2               bitops_1.0-7            shadowtext_0.1.3       
 [4] gson_0.1.0              gridExtra_2.3           rlang_1.1.3            
 [7] magrittr_2.0.3          DOSE_3.28.2             compiler_4.3.0         
[10] RSQLite_2.3.6           png_0.1-8               vctrs_0.6.5            
[13] reshape2_1.4.4          stringr_1.5.1           pkgconfig_2.0.3        
[16] crayon_1.5.2            fastmap_1.1.1           XVector_0.42.0         
[19] ggraph_2.2.1            utf8_1.2.4              HDO.db_0.99.1          
[22] enrichplot_1.23.1.992   purrr_1.0.2             bit_4.0.5              
[25] zlibbioc_1.48.2         cachem_1.0.8            aplot_0.2.2            
[28] GenomeInfoDb_1.38.8     jsonlite_1.8.8          blob_1.2.4             
[31] BiocParallel_1.36.0     tweenr_2.0.3            parallel_4.3.0         
[34] R6_2.5.1                stringi_1.8.3           RColorBrewer_1.1-3     
[37] GOSemSim_2.29.1.001     Rcpp_1.0.12             Matrix_1.6-5           
[40] splines_4.3.0           igraph_2.0.3            tidyselect_1.2.1       
[43] qvalue_2.34.0           viridis_0.6.5           codetools_0.2-20       
[46] lattice_0.22-6          tibble_3.2.1            plyr_1.8.9             
[49] treeio_1.26.0           withr_3.0.0             KEGGREST_1.42.0        
[52] gridGraphics_0.5-1      scatterpie_0.2.1        polyclip_1.10-6        
[55] Biostrings_2.70.3       BiocManager_1.30.22     pillar_1.9.0           
[58] ggtree_3.10.1           ggfun_0.1.4             generics_0.1.3         
[61] RCurl_1.98-1.14         ggplot2_3.5.0           munsell_0.5.1          
[64] scales_1.3.0            tidytree_0.4.6          glue_1.7.0             
[67] lazyeval_0.2.2          tools_4.3.0             data.table_1.15.4      
[70] fgsea_1.28.0            fs_1.6.3                graphlayouts_1.1.1     
[73] fastmatch_1.1-4         tidygraph_1.3.1         cowplot_1.1.3          
[76] grid_4.3.0              tidyr_1.3.1             ape_5.7-1              
[79] colorspace_2.1-0        nlme_3.1-164            GenomeInfoDbData_1.2.11
[82] patchwork_1.2.0         ggforce_0.4.2           cli_3.6.2              
[85] fansi_1.0.6             viridisLite_0.4.2       dplyr_1.1.4            
[88] gtable_0.3.4            yulab.utils_0.1.4       digest_0.6.35          
[91] ggrepel_0.9.5           ggplotify_0.1.2         farver_2.1.1           
[94] memoise_2.0.1           lifecycle_1.0.4         httr_1.4.7             
[97] GO.db_3.18.0            bit64_4.0.5             MASS_7.3-60.0.1        
> 

@onmyojiyys
Copy link
Author

Thanks!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants