@nanopore I now have a comparison of the same sample on Flongle (cleanup using SFB) and PromethION (cleanup using LFB) flow cells.

These were both called with the new bacterial caller in simplex mode. According to LAST-train, the qScore-adjusted substitution accuracy for the Flongle was q41, and the qScore-adjusted substitution accuracy for the PromethION was q45.

#P2Solo #DNAladder

@nanopore hmm... looks like I messed up on the 3kb and 1.5kb ladder bands.

This new kmer-based mapper seems to be doing the right things, though. I'm very pleased.

#DNAladder #P2Solo

@[email protected] That new bacterial @nanopore bascalling model is doing pretty well.

#DNAladder #P2Solo

@nanopore #DNALadder Yep, definitely some work to do on fixing up those reference definitions. Most peaks under 1kb don't look too bad (with 200bp and 500bp being obvious outliers), but I think I'm going to need to re-do the longer bands, especially 15k.

@nanopore #DNALadder I now have an answer for each band. I'm pretty sure that a few of them are the wrong answer, but getting something down is helping me think of how to work out if my initial guesses are correct.

I expect there's going to be a lot of negative filtering and inspecting individual reads before I'm happy with the result.

Stats from LAST-train:

# delOpen: 0.0042914 (q23)
# insOpen: 0.00339294 (q24)
# delExtend: 0.39626
# insExtend: 0.338931

# substitution % id: 99.87 (q28)

@nanopore #DNALadder this is how I find the shorter DNA bands (under 1kb)

@nanopore #DNALadder okay, so I *have* found a situation where barcode trimming is actually useful:

$ lastal -P 14 -p trained_q1.mat ~/scripts/dental_db/barcode_full_RBK114.24.fa <(pv reads_400bps_BC02.fq.gz | zcat) | ~/scripts/maf2csv.pl | awk -F ',' '{if($3 == "+"){print $1":"$5"-"$7}}' > adapter_trim_locs.txt

$ samtools fqidx -n 100000 reads_400bps_BC02.fq.gz -r adapter_trim_locs.txt | ~/scripts/fastx-fetch.pl -u -min 20 | bgzip > bcTrimmed_400bps_BC02.fq.gz