What is a strain in microbiology and why does it matter?

By Prof. Colin Hill, Microbiology Department and APC Microbiome Ireland, University College Cork, Ireland

At the recent ISAPP meeting in Sitges we had an excellent debate on the topic of ‘All probiotic effects must be considered strain-specific’. Notwithstanding which side of the debate prevailed, it does raise the question: what exactly is a strain? As a card-carrying microbiologist I should probably be able to simply define the term and give you a convincing answer, but I find that it is a surprisingly difficult concept to capture. It is unfortunately a little technical as a topic for a light-hearted blog, but here goes. Let me start by saying that the term ‘strain’ is important largely because we like to name things and then use those names when we share information, but that the concept of ‘strain’ may have no logical basis in nature where mutations and changes to a bacterial genome are constantly occurring events.

Let’s suppose I have a culture of Lactobacillus acidophilus growing in a test-tube, grown from a single colony. This clonal population is obviously a single strain that I will name strain Lb. acidophilus ISAPP2022. That was easy! I am aware of course that within this population there will almost certainly be a small number of individual cells with mutations (single nucleotide polymorphisms, or SNPs), cells that may have lost a plasmid, or cells that have undergone small genomic rearrangements. Nonetheless, because this genetic heterogeneity is unavoidable, I still consider this to be a pure strain. If I isolate an antibiotic resistant version of this strain by plating the strain on agar containing streptomycin and selecting a resistant colony I will now have an alternative clonal population all sharing a SNP (almost certainly in a gene encoding a ribosomal subunit). Even though there is a potentially very important genotypic and phenotypic difference I would not consider this to be a new strain, but rather it is a variant of Lb. acidophilus ISAPP2022. To help people in the lab or collaborators I might call this variant ISAPP2022SmR, or ISAPP2022-1. In my view, I could continue to make changes to ISAPP2022 and all of those individual clonal populations will still be variants of the original strain. So, the variant concept is that any change in the genome, no matter how small, creates a new variant. When I grow ISAPP2022 in my lab for many years, or share it with others around the word, it is my view that we are all working with the same strain, despite the fact that different variants will inevitably emerge over time and in different labs.

Where the strain concept becomes more difficult is when I isolate a bacterium from a novel source and I want to determine if it is the same strain as ISAPP2022. If the whole genome sequence (WGS) is a perfect match (100% average nucleotide identity or ANI) then both isolates are the same strain and both can be called ISAPP2022. If they have only a few SNPs then they are variants of the same strain. If the two isolates only share 95% ANI then they are obviously not the same strain and cannot even be considered as members of the same species (I am using a species ANI cut-off of 96% that I adopted from a recent paper in IJSEM.

Where it gets really tricky is when the ANI lies between 96% (so that we know that the isolates are both members of the same species) and 100% (where they are unequivocally the same strain). Where should we place the cut-off to define a strain? At what point is a threshold crossed and an isolate goes from being a variant to becoming a new strain? Should this be a mathematical decision based solely on ANI, or do we have to consider the functionality of the changes? If it is mathematical then we could simply choose a specific value, say 99.95% or 99.99% ANI, and declare anything below that value is a new strain. Remember that the 2Mb genome size of Lb. acidophilus would mean that two isolates sharing 99.99% ANI could differ by up to 200 SNPs. This could lead to a situation where an isolate with 199 SNPs compared to ISAPP2022 is considered a variant, but an isolate with 201 SNPs is a new strain (even though it only differs from the variant with 199 SNPs by two additional SNPs). This feels very unsatisfactory. But what about an isolate with only 50 SNPs, but one that has a very different phenotype to ISAPP2022 because the SNPs are located in important genes? Or what about an isolate with an additional plasmid, or missing a plasmid, or with a chromosomal deletion or insertion? I would argue we should not have a hard and fast cut-off based on SNPs alone, but we should continue to call all of these variants, and not define them as new strains.

So, by how much do two isolates have to differ before we no longer consider them as variants of one another, but as new strains? I will leave that question to taxonomists and philosophers since for me it falls into the territory of ‘how many angels can dance on the head of pin?’

All this may seem somewhat esoteric, but there are practical implications. Can we translate the findings from a clinical trial done with a specific variant of a strain to all other variants of the same strain? If Lactobacillus acidophilus ISAPP2022 has been shown to deliver a health benefit (and is therefore a probiotic), can we assume that Lb acidophilus ISAP2022-1 or any other variant will have the same effect? What if a variant has only one mutation, but that mutation eliminates an important phenotype required for the functionality of the original strain? I am afraid that at the end of all this verbiage I have simply rephrased the original debate topic from ‘All probiotic effects must be considered strain-specific’ to ‘All probiotic effects must be considered variant-specific’. Looks like we might be heading back to the debate stage in 2023!