Introducing the Centaur Knowledge Graph
With the cost of sequencing data decreasing every year, there is an essential need for robust tools to visualize the increasingly complex universe of genomics. Gene-phenotype interactions and transcriptomic gene networks are primed to take advantage of modern advances in graph databases, as they involve huge numbers of weighted connections which must be shaped into a meaningful subset. Fauna Bio’s Centaur is a knowledge graph with a Neo4j database backend built to explore these datasets and accelerate our scientific discovery pipeline. Merging our unique data with public biomedical resources was a significant effort, as it requires accurate connections between the diverse ontologies of human disease, clinical drug development, molecular biology, and animal genomics. The resulting graph enables our researchers to leverage their expertise in the context of current biomedical understanding.
Centaur allows our scientific team to quickly determine whether any of our proprietary gene targets also connect to extreme traits in other species or relate to known human disease pathways. The graphical user interface of Neo4j Bloom has been a highly effective solution, allowing for efficient expansion of our “Emerging Animal Model” program and the determination of tractable indications for our genetic targets. The addition of UK Biobank data has pushed this capacity even further forward with rare human variant-phenotype connections feeding directly into our functional genomics analyses in mammals. Later in our pipeline, comparing the signatures of compounds with known activity in the clinic allows us to better understand the mechanisms underlying their effect.
Interesting results saved as “scenes” in Bloom can easily be saved and shared with other team members for further exploration, compounding the power of the team’s broad biological background. Beyond visualization, the underlying database can take full advantage of graph theory via Neo4j’s Graph Data Science library to apply algorithms in real-time and utilize machine learning to predict new relationships. We routinely pull in new data and update connections based on the latest published research, and we have developed automated processes to that end. The team’s use of Centaur has allowed us to explore our analyses in the full context of human and animal genomics, uncovering unexpected connections and guiding us to novel scientific discoveries.
As the world of genomics grows exponentially, tools that allow connections between siloed data sources while also yielding a meaningful signal are of paramount importance. Centaur is a resource that will only increase in value for us in the years to come as our disease interests and sequencing biobank continue to expand.