DNA Sequencing

DNA quality control standards ratified for the Bennett Lab in Aug 2020:

All in vitro / PCR-generated DNA aside from vector elements (replication origin, resistance marker) must be sequenced after assembling into a plasmid.
Plasmids need not be sequenced if assembled by a non-polymerase-based assembly method (e.g. Golden Gate) from sequenced DNAs (e.g. "parts). However, diagnostic restriction digests are required, which secondarily verify overall plasmid preparation by examination on a gel.
All plasmids destined for certain use in a publication must be sequenced over elements critical to the claims of the plasmid-encoded functions. This often, but not always, excludes vector elements.

Dideoxynucleotide Chain Terminator "Sanger" Sequencing

Sequencing of 500–1000 ng plasmid or linear DNA in the region 800 bp after a primer that is provided or selected from the service's list.
Invented in 1977, Sanger sequencing is still the most common sequencing method (as of 2022).

Sanger sequencing reactions are generally reliable from 25 bp after the primer up to 800 bp, though good reactions can produce 1000 bp of good read. The sequence ≈25 bp downstream of the primer is generally of too poor quality to use, as is the sequence toward the end of the read. To cover a larger sequence, primers for multiple reactions must thus be designed in a way that either the ends of the sequencing reads are bound to overlap (convergent orientation of reads), and/or downstream primers overlap the reads of upstream primer's read (tandem orientation of reads). Quality scores are assigned to each nucleotide read of the chromatogram, so programs that read sequencing traces (.ab1 files) automatically grey out the low-quality regions and highlight mismatches and insertions/deletions. Ab1 files can be downloaded as part of sequencing results and uploaded for alignment to the template in Benchling.

See Lab Orientation page for instructions on Sanger sequencing.

Mechanism: A provided primer binds the sample DNA, is extended by a DNA polymerase in a reaction that contains a small amount fluorophore-labeled 2′,3′-dideoxynucleotide-triphosphates (ddNTPs) in addition to the normal dNTPs. Incorporation of a ddNTPs by the polymerase terminates the chain, as without a 3′ hydroxyl in the final dDNTP, the polymerase cannot extend the replicated DNA chain further. Because the ddNTPs incorporation is random (but still complimentary to the template DNA), a replicated chain of each and every length is produced by the polymerase extension reaction, and each length terminates in a complimentary ddNTP. Since each of the four ddNTPs have a unique fluorophore, fluorometric capillary electrophoresis of the reaction mixture will produce a sequence of fluorescence (a chromatogram) that corresponds to the sequence of the primer extension product from smallest to largest (closest termination from primer to farthest), which is the sequence of the template DNA.

Epoch sequencing

Epoch (expand)

Ordering and results go through email; no website. Your lab's ordering manager pays by PO through iO for each order (default) or in a monthly lab bill if you tell them you want to set this up for your lab. You can also request a quantity of prepaid sequencing credits and pay by PO through iO.
$3.00 per sample.
Download the improved submission form file from lab Drive, prefilled with billing info.¹ Here's a blank one for other labs:

Fill the form with your name and email address; save and reuse for each future order.
Fill form with sample info, and email file to seq@epochlifescience.com between 2–2:30 pm (2 pm preferable for Keck). Write "[lab name' - Keck" in subject line.
Print 2nd sheet, compact printable form, and submit with samples by 3 pm in Keck 306 dropbox or BRC front desk.
Samples should be in PCR strips with numbered tubes, extra cut off.
They ask for 80 ng/µL plasmid in 6–10 µL (=480–800 ng), and a final 5 µM primer.
Shyam's formulas are: 0.37 µL 100 µM primer + 7 µL(300–550 ng plasmid + H₂O) → 7.37 µL.
Or with diluted primer: 0.65 µL 50 µM primer + 6 µL(300–550 ng plasmid + H₂O) → 6.65 µL.
Results are emailed by ≈2 pm the next weekday.
¹ The improved submission form is overall much more compact than the one Epoch made, has a column for notes like volumes, and its print form (2nd page) is auto-filled from inputs from the first page, and its columns and texts are sized so they don't overflow onto more than one page.
¹ For monthly, lab-aggregated billing, the form must have lab name and lab or ordering manager's email address for billing.
¹ If paying by p-card, be sure to download/forward the collected p-card receipts to biospurch@rice.edu every few weeks if not each time, with biodesign@rice.edu cc'd as a receipt backup and record.

For rapidly calculating volumes: in Wetlab Calculator, go to Sequencing by Mass, where at the top, enter desired DNA mass (e.g. 480 ng), reaction volume (6 µL), and primer volume (0.3 µL). For each sample, you need only enter the mass concentration as measured by Nanodrop, or the molar concentration if normalized. Then add the calculated water volumes to the sequencing tubes, then the common primer volume, and lastly the calculated DNA volumes (this lets you use one tip to quickly add a primer to all the clean water volumes).

Eurofins(uncommon)

Eurofins (expand)

$3.75 per sample for tube sequencing. Pickup 3–4:30 pm in Eurofins box outside the lab, Keck 3rd floor, or at the BRC front desk at 3 pm. Samples should be PCR strips with tubes labeled XY1, XY2, etc, following the IDs generated online, with extra tubes cut off. High fee for using non-barcoded 1.5 mL tubes.
Simple formula: 2.5 µL 10 µM primer + 300–750 ng small plasmid or 750–1250 ng large plasmid + water up to 10 µL.
General formula: 10 µL sample containing 10–50 pmol primer (1–5 µL 10 µM) + plasmid (<1 kb: 150–300 ng), (1–6 kb: 300–750 ng), (6–20 kb: 750–1250 ng); or PCR product (0.1–0.3 kb: 50–100 ng), (0.3–1 kb: 100–200 ng), (>1 kb: 200–300 ng).

$2–3 per sample for 94–63 samples, respectively, in Short-Seq plate sequencing. $3(?) for regular long seq plates. Short-seq plates (550 bp reads, though in practice 840bp) are $188 for a plate of 24–94 samples, though only cost-effective >63 samples. For easiest preparation efficiency, find what volume plasmid maximizes the number of samples that end up between 300–750 ng and make a water + 30–40 pmol primer master mix for the remainder volume.

Login. Username: ; Password: ; Account ID: .

We do not have a standing PO at Eurofins. Please send your invoice to the ordering manager so that they can generate a PO for your order to get through.

Azenta (Genewiz)

Azenta (expand)

Ask your webmaster to invite you to the lab's Genewiz GeneGroup with your Rice email address, from which you will be directed to make an account. The GeneGroup manager will handle the payment options, which should be based on billing info. Genewiz PO#:

$5.51 per sample
Log into your account and find "Simple Order"/"Original Order" forms under Sanger Sequencing, or the Plasmid/PCR product quick links. Fill out the form and print only sheet 1 (page 2 is useless). You can write your plasmid/water volume notes on this for sample prep.
Samples should be prepared according to their guidelines in PCR (0.2 mL) 8-tube strips , with each tube labeled on the side according to the sequencing order form. 2.5 µL 10 µM primer + 400–500 ng DNA + water up to 15 µL total.
Requested formula: 500, 800, or 1000 ng for <6 kb, 6–10 kb, and >10 kb plasmids. 10, 20, 40, or 60 ng for <0.5, 0.5–1, 1–2, and 2–4 kb PCR products. For larger PCR products, treat like plasmids. Combine with 25 pmol primer in a final 15 µL.
Simple formula: 2.5 µL 10 µM primer + ideally 500 ng DNA (as low as 300 ng) + up to 15 µL with water.
Place the tubes in a small ziplock bag (can be reused primer bag) and either put the order form inside or staple it to the form (which is PDF displayed when an order is placed).
Drop off in Genewiz box in front of Keck 201 or BRC front Desk. Pickup 3–3:30 pm every weekday. If no orders from any Keck lab have been placed by 3 pm, they may not come.
Sequencing results are typically available on your account the next weekday afternoon, ≈2 pm.
In Wetlab Calculator, go to Genewiz Sequencing, where at the top, enter the desired DNA mass (500 ng), reaction volume (15 µL), and primer volume (2.5 µL). Then add the calculated water volumes to the sequencing tubes, then the common primer volume, then the calculated DNA volumes (this lets you use one tip to quickly add a primer to clean water volumes).

Benchling Primer3Plus Sanger Sequencing Primer Design

Benchling can run Primer3Plus for automated design of primers according to many parameters. Shyam spent a few hours fiddling with the parameters to get it produce primers like the ones he would design manually to optimally Sanger sequence the middle of long parts. These parameter calculations give primers with similar spacing and read overlap as meticulously manually-designed primers that split the target sequencing region into ≈equal portions covered by individual sequencing reads that overlap sufficiently not to leave any sequencing gaps. Primer3 has the added benefit of template specificity checking. It will generate pairs of divergent forward and reverse primers, but often you'll need only one of each pair to cover the entire target span with reads optimally spaced. –Shyam Bhakta

* Below, this 850 nt value represents your idea of reliable good sequencing read (after the ≈25 nt "junk" lead between the primer 3′ end and beginning of good sequencing trace). This spacing can be adjusted to the sequencing read length you're comfortable with, but be sure to adjust it in all the other parameters it's used here, marked with an * .

Select the target sequencing span; right click on the sequence (not the features), and in the menu click Run Primer3.
Task: Sequencing
T_m Parameters:
- Algorithm:
  - SantaLucia 1999, default params if Sanger sequencing or using Taq polymerase for a colony PCR using these primers. Or use Q5 params if using with Q5.
  - Modified Breslauer 1986 , if using Q5 or Phusion for a colony PCR
- Click Set to Primer3 Defaults: DNA: 50 nM; Na⁺/K⁺: 50 mM; Mg²⁺: 1.5 mM; dNTP: 0.6 mM
Region:
- Target: Start x to End y spanning the desired sequencing interval."
  
  These target indices need adjustment if the 5′ and 3′ ends will be within the coverage of existing sequencing primers you have (e.g. AB17/AB18 in the vector/connectors). Instead of simply omitting ~800 bp of the ends, it works better to include them in the equal portioning of Sanger reads. To do this, select the full target interval (including flanking Golden Gate sites, where applicable), and click Use Selection. Calculate #Results R and Spacing S as below. Add this S to the START index and subtract S from the END index:
  x = left boundary index + S
  y = right boundary index – S
  Before you generate primers, you can reduce #Results by 2, or leave it alone and see an optional primer pair between the last two sequencing spans.

Primer:
These parameters can likely be adjusted to your liking without issue.
- GC%: min 30% – opt 50% – max 70%
- T_m: min 53° – opt 56° – max 63°
- Size: min 17 nt – opt 20 nt – max 25 nt
- 3′ GC Clamp: 1

Result Generation:
- # Results: R = ⌈L ÷ 850*⌉.
  #Results R = (target length L) ÷ (850* nt reliable read length), rounded up, never down.
  Target length L = y – x target sequence indices, or just look at the length of the selection.

Sequencing:
- Spacing: S = L ÷ R
  Spacing S = target seq length L ÷ #Results R. Normally between 575–900 nt. This evenly distributes the number of ideal primer sites across the target length.
- Interval: 40 nt.
  If you need both primers in a primer pair that the results give you and they are too close to use (3′ ends <50 bp apart, reads may not overlap), then after saving other selected primer pairs, rerun Primer3 with Interval set to 50 or 60 nt to spread apart that primer pair.
- Lead: 0 nt.
- Accuracy: 20 nt

Whole-DNA Nanopore Sequencing

Often called Whole-Plasmid Sequencing (WPS) — sequencing of whole plasmids or linear DNAs generally <25 kb, or up to entire bacterial chromosomes. Requires no primers. Practically insensitive to extreme GC/AT content, structures, or repeats, except sometimes for very long homopolymeric repeats (single base stretch), which may be shortened by one base.

Quintara :
$10 per sample WPS standard. Results early next morning. Dropbox at Keck 207, 306, and BRC front desk, pickup daily ≈3:30 pm.
- Pure plasmid or ≥1 kb PCR products: 5–10 µL 100–200 ng/µL (≥30 ng/µL works fine) = (150)–500–2000 ng in 5–10 µL final. Submitted in numbered tube strips for ≤36 samples.

$5 per sample PlasmidExpress. ColE1 and p15A origin plasmids only(?), selected on form.

- Purified Plasmid/RCA Product: 50–100 ng/µL, ≥10 µL.
- Bacterial Culture/Glycerol Stock: 10 µL
- Bacterial Colony: Pick a colony and suspend it in 30–100 µL water and submit 10 µL; save rest.
- Provided pLannotate .html map, per-base error classification, multimer and contaminant classification (% genomic contam, %monomer, %dimer), read coverage map, read length histogram, raw .fastq, .tsv. Chromatogram .ab1 files are faulty currently.
- Hongli.Zhang@quintarabio.com 415·876·9953 <text for weekend pickup. Operates locally in Sugarland. Login info: see lab orientation page.
GDEC : $8 per sample, <30 samples. $6.66–$3.33 for 30–60 samples, etc. Results in 2–3 days. Dropoff at BRC 215. We're sending sequencing through one lab member's FOM/GDEC account, currently (name, initials XY).
For ≤9 kb samples, submit 300 ng DNA in 10 µL, in tube strips (if ≤24 samples) labeled XY1, XY2… . Write on a piece of paper YYMMDD - (name) and place it in a bag with the samples.
Plasmidsaurus : $15 per sample that is 0.5–25 kb. Results within 24 hr of receipt. Dropbox by Keck 306 or BRC front desk, pickup daily by ≈3:30 pm.
- For <25 kb samples, submit 10 µL 30 ng/µL DNA, with the proper order letters and tube strip numbers.
- $30 for 25–125 kb. $60 for 125–300 kb. $90 for bacterial genomes up to 12 Mb in 3–5 days.
- Provides consensus pLannotated .gbk and .fasta, .ab1 virtual chromatogram, virtual gel, pLannotate .html map, per-base error classification, multimer and contaminant classification (% genomic contam, %monomer, %dimer), read coverage map, read length histogram, raw .fastq, .tsv.
Eurofins : $15 per sample that is 0.5–25 kb. Results within 24 hr of receipt. Dropbox by Keck 306, pickup 3–4 pm (must call/text number on box), or daily 3 pm at BRC front desk.
For <25 kb samples, submit 10 µL 30 ng/µL DNA, with the proper tube strip labeling.
$30 for 25–125 kb; $60 for 125–300 kb. TBA for bacterial genomes.

Bacterial colony and culture sequencing

Some services can accept resuspended bacterial colonies, cultures, and glycerol stocks, whose plasmids are amplified and nanopore-sequenced.

Kshitij Rai's experience: I've sequenced colony suspensions, overnight cultures, and miniprepped plasmids through the Quintara's PlasmidExpress, and the consensus built from the results frequently has artifactual point mutations if you submit colonies or cultures (mutations are in the order of colonies > overnight cultures > minipreps, with minipreps having almost no mutations, consistent with the plasmid abundance in these samples, allowing for PCR mutations proportional to it). I figured I'd use PlasmidExpress to sequence the colonies, and then miniprep only the correct ones based on the results, but it turns out it's still more worthwhile to do the minipreps and sequence those.

Identifying errors in nanopore sequencing data with Epi2Me

Contribution from Kshitij Rai (Caleb Bashor Lab, Rice University, Houston, TX)

Regions with multiple repeats being deleted/skipped over are the most prominent error mode in Nanopore sequencing. It doesn't have to do with the secondary structure around terminators, as the electric potential around the molecular motors on the ASIC chip make it so that pretty much all secondary structure is linearized.

The issue comes from the basecalling models from the electrical current measurements around the pores being unable to resolve long stretches of repeats (high A/T content around the terminators) and can produce artefacts of this kind.

The best way to currently resolve that is by sequencing much deeper (which is unfortunately out of the user's control if you submit to a third party like Plasmidsaurus), but you can aggregate reads across multiple constructs to get an idea of whether the deletion is real or not. If across the colonies you sequenced, the region is in the exact same place and exact same length over and over, it's most likely real (and the sequence isn't there). If the deletions are slightly more random (i.e. in slightly different regions, or of variable lengths in different colonies that were sequenced), it's most likely a sequencing/basecalling error.

You can get a better idea of whether it's the latter by doing a reference sequence alignment by downloading the Epi2Me desktop agent (Nanopore's UI for data analysis), uploading your reference file there, and aligning the .fastq files against it to see the read depth distribution across the whole plasmid. A drop of ~50% in the region where the consensus reports a deletion means it's most likely a basecalling error. Any lower depth than 20-30% indicates that the sequence is actually missing in the plasmid.

Next-Generation Sequencing (NGS)

A paper that serves as a primer on outsourced NGS for the beginner: https://pubs.acs.org/doi/full/10.1021/acssynbio.1c00592
The focus on a protein variant library here can be generalized to any library.

Feb 2022 advice from Kshitij Rai (Caleb Bashor Lab, Rice University, Houston, TX)

You seem to have encountered the "intermediate depth" problem that I stumbled upon back in the end of 2020. The issue is that companies purchase HiSeq and NextSeq kits (that give you upwards of 100 million reads) to be able to offer a discounted price on those, since they run it at scale. Genewiz has a really nice setup on the MiniSeq that lets them offer 50,000 reads for $50, with amazingly no price difference for 150 bp – 500 bp amplicons (even though those require the cheapest (75-paired end) and the most expensive (250 paired end) read kits for the MiniSeq). However, no commercial services seemingly offer read depths b/w 1–20 million, that would need to run on a MiSeq. MiSeqs however, are cheap enough machines (cost ~$99k) for labs to be able to purchase them for their own needs. That said, I looked around Houston to find out what the turnaround times and read depths offered from all the major suppliers were, and here are your options -

Commercial - As already mentioned, most commercial companies don't have a service at this scale. There are a few options however that I could find from a two-week sprint of calling companies and explaining what I wanted to do and why "just doing 800 million reads" was not something I wanted to commit to. Below are tables with the prices and turnaround times for all the companies around Houston that I could find. Notably, Genewiz offers a 20% discount on all NGS services for first time users, so I'd make use of that and submit from a lab member's account who has never sent any NGS samples to Genewiz before, if you choose to go this route.
In house - Having found no "cheap and fast" options above, I ultimately made the switch and decided to sequence in house. I buy the sequencing kits and flow cells from Illumina directly, and then sequence with a lab that has a MiSeq machine (which most PIs are okay with since it's your own sample and kits, just their machine running it). Lots of things to note if you do go this route, particularly that the QC will have to all be done in your hands as well, including making sure you have a single band, the band has the complete Illumina adapter on it, the concentrations are spot on perfect, and you have a decent amount of ΦX174 DNA spiked in to create diversity in your run (I'd be happy to talk more about this, and help design the experiment and libraries if you do choose to do this). Gang Bao's lab has a MiSeq machine, David Zhang's lab had one (though I don't know where it is now ever since he has left), and a number of labs at the med center also have them. Again, most of them would be happy to let you hop on to their machine when its free hopefully. The options at this scale depend on the kit you buy, but here are some prices for a 300 cycle (150 paired end reads) kit:
- Nano kit v2 - 300 cycles, 2 million reads, for $390
- Micro kit v2 - 300 cycles, 5-6 million reads, $540
- Regular kit v2 - 300 cycles, 15 million reads, $1100
- Regular kit v3 - 500 cycles, 25 million reads, $1200
Shared runs - If you are apprehensive about diving in and buying a full kit and flow cell in house, you could jump in on someone else's run if they have space. I usually sequence on the MiSeq every 2-3 weeks, and usually have space leftover on my flow cell (cases where I only REALLY need like 7 million reads, but am running a 15 million read kit v2 and taking all the reads because why not). I do have a kit v2 in lab right now, and may be able to incorporate your samples in the next run, so let me know if you are interested in doing this and we can talk logistics.

Hack - My personal favorite hack is to leverage high scale read setups that commercial companies and cores have for different purposes (usually genome sequencing, or single cell RNA seq/bulk RNA seq). They don't understand targeted sequencing of a defined region with variability in the middle, as is the case with most syn bio projects. However, since companies run these services at scale, they give you really good prices (Genewiz charges ~$800 for 700 million reads). All you need to do is to get on the phone with these companies and tell them that you will be doing the library prep and that they just need to run it on their sequencer and give you the FastQ file outputs for your "genome" sample, and then prep and give them your sample. These services are very, very cheap, and open up a lot more options. Baylor's sequencing core is one I would really recommend for this purpose. I believe they offer 200 million reads for $350 or something ridiculous like that, and will give you results in ~1 week.

Low–Medium Depth (2–10 million Reads)
Sr. No.	Provider/Company	Read Depth	Cost	Comments
1	Genewiz	15 million	$1344	Quote requested
2	Wyzer Biosciences	4 million	$1120	Single end reads
3	Beijing Genomics	NA	NA	Do not offer a service at this depth
4	MD Anderson Sequencing Core	15 million	$1708
5	DNA Link Sequencing Lab	50 million	$249	Paired end reads, 100bp max
6	LGC Biosciences	50 million	$1826
7	ABM	NA	NA	Did not respond to calls or emails

High Depth (200–400 million Reads)
Sr. No.	Provider/Company	Read Depth	Cost	Comments
1	Genewiz	350 million	$1440	Additional 30% off on first NGS Run
2	Beijing Genomics	300 million	$700	Best price
3	LGC Biosciences	200 million	$5843	Quote requested
4	MD Anderson Sequencing Core	> 100 million	$3238	Costs only $1632 for MD Anderson faculty
5	DNA Link			Quote requested
6	ABM	NA	NA	Did not respond to calls or emails

Space shortcuts

Child pages