DNA quality control standards ratified for the Bennett Lab in Aug 2020:
- All in vitro / PCR-generated DNA aside from vector elements (replication origin, resistance marker) must be sequenced after assembling into a plasmid.
- Plasmids need not be sequenced if assembled by a non-polymerase-based assembly method (e.g. Golden Gate) from sequenced DNAs (e.g. "parts). However, diagnostic restriction digests are required, which secondarily verify overall plasmid preparation by examination on a gel.
- All plasmids destined for certain use in a publication must be sequenced over elements critical to the claims of the plasmid-encoded functions. This often, but not always, excludes vector elements.
Dideoxynucleotide Chain Terminator "Sanger" Sequencing
Sequencing of 500–1000 ng plasmid or linear DNA in the region 800 bp after a primer that is provided or selected from the service's list.
Invented in 1977, Sanger sequencing is still the most common sequencing method (as of 2022).
Sanger sequencing reactions are generally reliable from 25 bp after the primer up to 800 bp, though good reactions can produce 1000 bp of good read. The sequence ≈25 bp downstream of the primer is generally of too poor quality to use, as is the sequence toward the end of the read. To cover a larger sequence, primers for multiple reactions must thus be designed in a way that either the ends of the sequencing reads are bound to overlap (convergent orientation of reads), and/or downstream primers overlap the reads of upstream primer's read (tandem orientation of reads). Quality scores are assigned to each nucleotide read of the chromatogram, so programs that read sequencing traces (.ab1 files) automatically grey out the low-quality regions and highlight mismatches and insertions/deletions. Ab1 files can be downloaded as part of sequencing results and uploaded for alignment to the template in Benchling.
See Lab Orientation page for instructions on Sanger sequencing.
Mechanism: A provided primer binds the sample DNA, is extended by a DNA polymerase in a reaction that contains a small amount fluorophore-labeled 2′,3′-dideoxynucleotide-triphosphates (ddNTPs) in addition to the normal dNTPs. Incorporation of a ddNTPs by the polymerase terminates the chain, as without a 3′ hydroxyl in the final dDNTP, the polymerase cannot extend the replicated DNA chain further. Because the ddNTPs incorporation is random (but still complimentary to the template DNA), a replicated chain of each and every length is produced by the polymerase extension reaction, and each length terminates in a complimentary ddNTP. Since each of the four ddNTPs have a unique fluorophore, fluorometric capillary electrophoresis of the reaction mixture will produce a sequence of fluorescence (a chromatogram) that corresponds to the sequence of the primer extension product from smallest to largest (closest termination from primer to farthest), which is the sequence of the template DNA.
Epoch sequencing
- Ordering and results go through email; no website. Your lab's ordering manager pays by PO through iO for each order (default) or in a monthly lab bill if you tell them you want to set this up for your lab. You can also request a quantity of prepaid sequencing credits and pay by PO through iO.
$3.00 per sample. - Download the improved submission form file from lab Drive, prefilled with billing info.¹ Here's a blank one for other labs:
Fill the form with your name and email address; save and reuse for each future order. - Fill form with sample info, and email file to seq@epochlifescience.com between 2–2:30 pm (2 pm preferable for Keck). Write "[lab name' - Keck" in subject line.
- Print 2nd sheet, compact printable form, and submit with samples by 3 pm in Keck 306 dropbox or BRC front desk.
- Samples should be in PCR strips with numbered tubes, extra cut off.
They ask for 80 ng/µL plasmid in 6–10 µL (=480–800 ng), and a final 5 µM primer.
Shyam's formulas are: 0.32 µL 100 µM primer + 6 µL(300–550 ng plasmid + H2O) → 6.32 µL.
Or with diluted primer: 0.65 µL 50 µM primer + 6 µL(300–550 ng plasmid + H2O) → 6.65 µL. - Results are emailed by ≈2 pm the next weekday.
- ¹ The improved submission form is overall much more compact than the one Epoch made, has a column for notes like volumes, and its print form (2nd page) is auto-filled from inputs from the first page, and its columns and texts are sized so they don't overflow onto more than one page.
- ¹ For monthly, lab-aggregated billing, the form must have lab name and lab or ordering manager's email address for billing.
- ¹ If paying by p-card, be sure to download/forward the collected p-card receipts to biospurch@rice.edu every few weeks if not each time, with biodesign@rice.edu cc'd as a receipt backup and record.
For rapidly calculating volumes: in Wetlab Calculator, go to Sequencing by Mass, where at the top, enter desired DNA mass (e.g. 480 ng), reaction volume (6 µL), and primer volume (0.3 µL). For each sample, you need only enter the mass concentration as measured by Nanodrop, or the molar concentration if normalized. Then add the calculated water volumes to the sequencing tubes, then the common primer volume, and lastly the calculated DNA volumes (this lets you use one tip to quickly add a primer to all the clean water volumes).
Genewiz Sequencing
$5.50 per sample.
Requested formula: 500, 800, or 1000 ng for <6 kb, 6–10 kb, and >10 kb plasmids. 10, 20, 40, or 60 ng for <0.5, 0.5–1, 1–2, and 2–4 kb PCR products. For larger PCR products, treat like plasmids. Combine with 25 pmol primer in a final 15 µL.
Simple formula: 2.5 µL 10 µM primer, ideally 500 ng DNA (as low as 300 ng), up to 15 µL with water.
In Wetlab Calculator, go to Genewiz Sequencing, where at the top, enter the desired DNA mass (500 ng), reaction volume (15 µL), and primer volume (2.5 µL). Then add the calculated water volumes to the sequencing tubes, then the common primer volume, then the calculated DNA volumes (this lets you use one tip to quickly add a primer to clean water volumes).
Benchling Primer3Plus Sanger Sequencing Primer Design
Benchling can run Primer3Plus for automated design of primers according to many parameters. Shyam spent a few hours fiddling with the parameters to get it produce primers like the ones he would design manually to optimally Sanger sequence the middle of long parts. These parameter calculations give primers with similar spacing and read overlap as meticulously manually-designed primers that split the target sequencing region into ≈equal portions covered by individual sequencing reads that overlap sufficiently not to leave any sequencing gaps. Primer3 has the added benefit of template specificity checking. It will generate pairs of divergent forward and reverse primers, but often you'll need only one of each pair to cover the entire target span with reads optimally spaced. –Shyam Bhakta
* Below, this 850 nt value represents your idea of reliable good sequencing read (after the ≈25 nt "junk" lead between the primer 3′ end and beginning of good sequencing trace). This spacing can be adjusted to the sequencing read length you're comfortable with, but be sure to adjust it in all the other parameters it's used here, marked with an * .
- Select the target sequencing span; right click on the sequence (not the features), and in the menu click Run Primer3.
- Task: Sequencing
- Tm Parameters:
- Algorithm:
- SantaLucia 1999, default params if Sanger sequencing or using Taq polymerase for a colony PCR using these primers. Or use Q5 params if using with Q5.
- Modified Breslauer 1986 , if using Q5 or Phusion for a colony PCR
- Click Set to Primer3 Defaults: DNA: 50 nM; Na⁺/K⁺: 50 mM; Mg²⁺: 1.5 mM; dNTP: 0.6 mM
- Algorithm:
- Region:
- Target: Start x to End y spanning the desired sequencing interval."
These target indices need adjustment if the 5′ and 3′ ends will be within the coverage of existing sequencing primers you have (e.g. AB17/AB18 in the vector/connectors). Instead of simply omitting ~800 bp of the ends, it works better to include them in the equal portioning of Sanger reads. To do this, select the full target interval (including flanking Golden Gate sites, where applicable), and click Use Selection. Calculate #Results R and Spacing S as below. Add this S to the START index and subtract S from the END index:
x = left boundary index + S
y = right boundary index – S
Before you generate primers, you can reduce #Results by 2, or leave it alone and see an optional primer pair between the last two sequencing spans.
- Target: Start x to End y spanning the desired sequencing interval."
- Primer:
These parameters can likely be adjusted to your liking without issue.- GC%: min 30% – opt 50% – max 70%
- Tm: min 53° – opt 56° – max 63°
- Size: min 17 nt – opt 20 nt – max 25 nt
- 3′ GC Clamp: 1
- Result Generation:
- # Results: R = ⌈L ÷ 850*⌉.
#Results R = (target length L) ÷ (850* nt reliable read length), rounded up, never down.
Target length L = y – x target sequence indices, or just look at the length of the selection.
- # Results: R = ⌈L ÷ 850*⌉.
- Sequencing:
- Spacing: S = L ÷ R
Spacing S = target seq length L ÷ #Results R. Normally between 575–900 nt. This evenly distributes the number of ideal primer sites across the target length. - Interval: 40 nt.
If you need both primers in a primer pair that the results give you and they are too close to use (3′ ends <50 bp apart, reads may not overlap), then after saving other selected primer pairs, rerun Primer3 with Interval set to 50 or 60 nt to spread apart that primer pair. - Lead: 0 nt.
- Accuracy: 20 nt
- Spacing: S = L ÷ R
Whole-DNA Nanopore Sequencing
Sequencing of whole plasmids or linear DNAs generally <25 kb, or up to entire bacterial chromosomes. Requires no primers. Practically insensitive to extreme GC/AT content, structures, or repeats, except sometimes for very long homopolymeric repeats (single base stretch), which may be shortened by one base.
- Plasmidsaurus $15/sample, results within 2 workdays.
- 300 ng in 10 µL. Weekly pickup box in Keck, or FedEx on your own.
- $15 for ≤25 kb; $50 for 25–125 kb; $100 for 125–300 kb.
- Provides consensus pLannotated .gbk and .fasta, .ab1 virtual chromatogram, virtual gel, pLannotate .html map, per-base error classification, multimer and contaminant classification (% genomic contam, %monomer, %dimer), read coverage map, read length histogram, raw .fastq, .tsv.
- Eurofins $15/sample.
Identifying errors in nanopore sequencing data with Epi2Me
Contribution from Kshitij Rai (Caleb Bashor Lab, Rice University, Houston, TX)
Regions with multiple repeats being deleted/skipped over are the most prominent error mode in Nanopore sequencing. It doesn't have to do with the secondary structure around terminators, as the electric potential around the molecular motors on the ASIC chip make it so that pretty much all secondary structure is linearized.
The issue comes from the basecalling models from the electrical current measurements around the pores being unable to resolve long stretches of repeats (high A/T content around the terminators) and can produce artefacts of this kind.
The best way to currently resolve that is by sequencing much deeper (which is unfortunately out of the user's control if you submit to a third party like Plasmidsaurus), but you can aggregate reads across multiple constructs to get an idea of whether the deletion is real or not. If across the colonies you sequenced, the region is in the exact same place and exact same length over and over, it's most likely real (and the sequence isn't there). If the deletions are slightly more random (i.e. in slightly different regions, or of variable lengths in different colonies that were sequenced), it's most likely a sequencing/basecalling error.
You can get a better idea of whether it's the latter by doing a reference sequence alignment by downloading the Epi2Me desktop agent (Nanopore's UI for data analysis), uploading your reference file there, and aligning the .fastq files against it to see the read depth distribution across the whole plasmid. A drop of ~50% in the region where the consensus reports a deletion means it's most likely a basecalling error. Any lower depth than 20-30% indicates that the sequence is actually missing in the plasmid.
Next-Generation Sequencing (NGS)
A paper that serves as a primer on outsourced NGS for the beginner: https://pubs.acs.org/doi/full/10.1021/acssynbio.1c00592
The focus on a protein variant library here can be generalized to any library.
Feb 2022 advice from Kshitij Rai (Caleb Bashor Lab, Rice University, Houston, TX)
- Commercial - As already mentioned, most commercial companies don't have a service at this scale. There are a few options however that I could find from a two-week sprint of calling companies and explaining what I wanted to do and why "just doing 800 million reads" was not something I wanted to commit to. Below are tables with the prices and turnaround times for all the companies around Houston that I could find. Notably, Genewiz offers a 20% discount on all NGS services for first time users, so I'd make use of that and submit from a lab member's account who has never sent any NGS samples to Genewiz before, if you choose to go this route.
- In house - Having found no "cheap and fast" options above, I ultimately made the switch and decided to sequence in house. I buy the sequencing kits and flow cells from Illumina directly, and then sequence with a lab that has a MiSeq machine (which most PIs are okay with since it's your own sample and kits, just their machine running it). Lots of things to note if you do go this route, particularly that the QC will have to all be done in your hands as well, including making sure you have a single band, the band has the complete Illumina adapter on it, the concentrations are spot on perfect, and you have a decent amount of ΦX174 DNA spiked in to create diversity in your run (I'd be happy to talk more about this, and help design the experiment and libraries if you do choose to do this). Gang Bao's lab has a MiSeq machine, David Zhang's lab had one (though I don't know where it is now ever since he has left), and a number of labs at the med center also have them. Again, most of them would be happy to let you hop on to their machine when its free hopefully. The options at this scale depend on the kit you buy, but here are some prices for a 300 cycle (150 paired end reads) kit:
- Nano kit v2 - 300 cycles, 2 million reads, for $390
- Micro kit v2 - 300 cycles, 5-6 million reads, $540
- Regular kit v2 - 300 cycles, 15 million reads, $1100
- Regular kit v3 - 500 cycles, 25 million reads, $1200
- Shared runs - If you are apprehensive about diving in and buying a full kit and flow cell in house, you could jump in on someone else's run if they have space. I usually sequence on the MiSeq every 2-3 weeks, and usually have space leftover on my flow cell (cases where I only REALLY need like 7 million reads, but am running a 15 million read kit v2 and taking all the reads because why not). I do have a kit v2 in lab right now, and may be able to incorporate your samples in the next run, so let me know if you are interested in doing this and we can talk logistics.
Hack - My personal favorite hack is to leverage high scale read setups that commercial companies and cores have for different purposes (usually genome sequencing, or single cell RNA seq/bulk RNA seq). They don't understand targeted sequencing of a defined region with variability in the middle, as is the case with most syn bio projects. However, since companies run these services at scale, they give you really good prices (Genewiz charges ~$800 for 700 million reads). All you need to do is to get on the phone with these companies and tell them that you will be doing the library prep and that they just need to run it on their sequencer and give you the FastQ file outputs for your "genome" sample, and then prep and give them your sample. These services are very, very cheap, and open up a lot more options. Baylor's sequencing core is one I would really recommend for this purpose. I believe they offer 200 million reads for $350 or something ridiculous like that, and will give you results in ~1 week.
Low–Medium Depth (2–10 million Reads) Sr. No. Provider/Company Read Depth Cost Comments 1 Genewiz 15 million $1344 Quote requested 2 Wyzer Biosciences 4 million $1120 Single end reads 3 Beijing Genomics NA NA Do not offer a service at this depth 4 MD Anderson Sequencing Core 15 million $1708 5 DNA Link Sequencing Lab 50 million $249 Paired end reads, 100bp max 6 LGC Biosciences 50 million $1826 7 ABM NA NA Did not respond to calls or emails High Depth (200–400 million Reads) Sr. No. Provider/Company Read Depth Cost Comments 1 Genewiz 350 million $1440 Additional 30% off on first NGS Run 2 Beijing Genomics 300 million $700 Best price 3 LGC Biosciences 200 million $5843 Quote requested 4 MD Anderson Sequencing Core > 100 million $3238 Costs only $1632 for MD Anderson faculty 5 DNA Link Quote requested 6 ABM NA NA Did not respond to calls or emails