Project description

Modern DNA sequencing lets us study entire microbial communities, including bacteria and viruses from the human gut, ocean, soil, and clinical samples. The data is often represented as a sequence graph, where nodes are DNA fragments and edges show how those fragments may connect. These graphs can contain important structures such as paths, repeats, bubbles, circular genomes, and connected components. However, finding these patterns often requires custom scripts, manual graph inspection, or specialised knowledge of file formats and graph algorithms. In this project, you will design a query framework that makes large metagenomic sequence graphs easier to explore programmatically. You will design ways for researchers to ask useful questions about these graphs, and build the algorithms that find the answers efficiently. For example, you may design queries to find circular structures, repeated regions, branching patterns, graph bubbles, or paths connecting regions of interest. This project gives you the opportunity to contribute to open-source bioinformatics software while addressing a missing layer in current sequence graph analysis; reusable, interpretable graph queries for large and complex biological graphs. By the end of the project, you will have built a prototype query system and evaluated its ability to search large sequence graphs efficiently and flexibly.

Co-supervisors

Prof Robert Edwards, Flinders Accelerator for Microbiome Exploration, College of Science & Engineering

Further information

For more information about our research, check out our GitHub profiles metagentools, Vini2, linsalrob, theĀ Edward's lab website and the FAME group's website.

Assumed knowledge

You should have a background in Computer Science, Information Technology, or a related field. A basic understanding of Programming in Python, Data structures and Graph theory would be awesome. We'll teach you bioinformatics and the related biology.


Note: You need to register interest in projects from different supervisors (not a number of projects with the one supervisor).
You must also contact each supervisor directly to discuss both the project details and your suitability to undertake the project.