
GITHUB . COM {
}
Detected CMS Systems:
- Wordpress (2 occurrences)
Title:
Assertion error in few processesed files Β· Issue #369 Β· marcelm/cutadapt
Description:
I am running cutadapt as part of DADA2 pipeline to analyze paired-end MiSeq 16S sequencing data. Since my sequences contains 0-7 b long heterogeneity spacer sequence, as described in Fadrosh et al. 2014, in addition to primer sequences, ...
Website Age:
17 years and 8 months (reg. 2007-10-09).
Matching Content Categories {π}
- Technology & Computing
- Mobile Technology & AI
- Video & Online Content
Content Management System {π}
What CMS is github.com built with?
Github.com utilizes WORDPRESS.
Traffic Estimate {π}
What is the average monthly size of github.com audience?
ππ Tremendous Traffic: 10M - 20M visitors per month
Based on our best estimate, this website will receive around 10,000,019 visitors per month in the current month.
However, some sources were not loaded, we suggest to reload the page to get complete results.
check SE Ranking
check Ahrefs
check Similarweb
check Ubersuggest
check Semrush
How Does Github.com Make Money? {πΈ}
Subscription Packages {π³}
We've located a dedicated page on github.com that might include details about subscription plans or recurring payments. We identified it based on the word pricing in one of its internal links. Below, you'll find additional estimates for its monthly recurring revenues.How Much Does Github.com Make? {π°}
Subscription Packages {π³}
Prices on github.com are in US Dollars ($).
They range from $4.00/month to $21.00/month.
We estimate that the site has approximately 4,989,889 paying customers.
The estimated monthly recurring revenue (MRR) is $20,957,532.
The estimated annual recurring revenues (ARR) are $251,490,385.
Wordpress Themes and Plugins {π¨}
What WordPress theme does this site use?
It is strange but we were not able to detect any theme on the page.
What WordPress plugins does this website use?
It is strange but we were not able to detect any plugins on the page.
Keywords {π}
read, file, marcelm, line, cutadapt, issue, error, files, running, sequences, adapter, sign, sequence, reads, matches, commented, code, lauripa, bases, version, cctacgggnggcwgcag, filepathpath, filtn, rflags, total, pairs, length, removed, pretty, call, homelaurilocallibpythonsitepackagescutadaptmodifierspy, owner, navigation, pull, requests, actions, security, assertion, processesed, closed, dada, pairedend, long, primer, fixed, things, command, python, fwd, rev,
Topics {βοΈ}
/dada2analyysi/cutadapt/a059-7m-3-cagttcca-tgtctaag-paulamaki-run20190304r_s59_l001_r1_001 /dada2analyysi/cutadapt/a059-7m-3-cagttcca-tgtctaag-paulamaki-run20190304r_s59_l001_r2_001 /dada2analyysi/filtn/a059-7m-3-cagttcca-tgtctaag-paulamaki-run20190304r_s59_l001_r1_001 /dada2analyysi/filtn/a059-7m-3-cagttcca-tgtctaag-paulamaki-run20190304r_s59_l001_r2_001 6/site-packages/cutadapt/modifiers 6/site-packages/cutadapt/pipeline 6/site-packages/cutadapt/adapters 6/site-packages/cutadapt/main completed marcelm mentioned paired-end mode total basepairs processed local/bin/cutadapt marcelm closed local/lib/python3 comment metadata assignees src/cutadapt/_align skipped lengthy print pretty lengthy post cutadapt error cutting fixed number problematic file pairs ambiguous bases fnfs filtn/ subdirectory fnrs gz processing reads error message removed sequences code pairs written install cutadapt running cutadapt files succesfully dada2 pipeline suspiciously long command prompt error primer sequences pretty similar pretty good recent call call return adapter sequence persistent trial wsl ubuntu 18 multithread = false /home/lauri/ begun smoothly passing filters allowed errors skipped prints sample lost
Payment Methods {π}
- Braintree
Questions {β}
- Already have an account?
- Can you make one of the problematic file pairs available to me?
Schema {πΊοΈ}
DiscussionForumPosting:
context:https://schema.org
headline:Assertion error in few processesed files
articleBody:I am running cutadapt as part of DADA2 pipeline to analyze paired-end MiSeq 16S sequencing data. Since my sequences contains 0-7 b long heterogeneity spacer sequence, as described in Fadrosh et al. 2014, in addition to primer sequences, I had to remove them by using primer sequences as a template instead of just cutting fixed number of bases from each end.
So, I am relatively new to running things on command prompt, but after persistent trial and error, I managed to install cutadapt to my WSL Ubuntu 18.04.
As DADA2 is R program, I am naturally running it on latest version of R that is 3.5.3. Python is also updated to 3.6.7 and cutadapt is version 2.1
This is the code that I was running in R:
`
>FWD <- "CCTACGGGNGGCWGCAG"
>REV <- "GACTACHVGGGTATCTAATCC"
>
>#Removal of ambiguous bases
>fnFs.filtN <- file.path(path, "filtN", basename(fnFs)) # Put N-filterd files in filtN/ subdirectory
>fnRs.filtN <- file.path(path, "filtN", basename(fnRs))
>filterAndTrim(fnFs, fnFs.filtN, fnRs, fnRs.filtN, maxN = 0, multithread = FALSE)
>
>cutadapt <- "/home/lauri/.local/bin/cutadapt"
>
> path.cut <- file.path(path, "cutadapt")
> if(!dir.exists(path.cut)) dir.create(path.cut)
> fnFs.cut <- file.path(path.cut, basename(fnFs))
> fnRs.cut <- file.path(path.cut, basename(fnRs))
>
> FWD.RC <- dada2:::rc(FWD)
> REV.RC <- dada2:::rc(REV)
>
> R1.flags <- paste("-g", FWD, "-a", REV.RC)
> R2.flags <- paste("-G", REV, "-A", FWD.RC)
for(i in seq_along(fnFs)) {
system2(cutadapt, args = c(R1.flags, R2.flags, "-n", 2,
"-o", fnFs.cut[i], "-p", fnRs.cut[i], # output files
fnFs.filtN[i], fnRs.filtN[i])) # input files
}
Running this begun smoothly and I got these prints as expected:
> === Summary ===
>
> Total read pairs processed: 71,065
> Read 1 with adapter: 70,393 (99.1%)
> Read 2 with adapter: 70,102 (98.6%)
> Pairs written (passing filters): 71,065 (100.0%)
>
> Total basepairs processed: 42,922,924 bp
> Read 1: 23,166,984 bp
> Read 2: 19,755,940 bp
> Total written (filtered): 38,799,372 bp (90.4%)
> Read 1: 21,163,432 bp
> Read 2: 17,635,940 bp
>
> === First read: Adapter 1 ===
>
> Sequence: GGATTAGATACCCBDGTAGTC; Type: regular 3'; Length: 21; Trimmed: 3938 times.
>
> No. of allowed errors:
> 0-9 bp: 0; 10-19 bp: 1; 20-21 bp: 2
>
> Bases preceding removed adapters:
> A: 16.5%
> C: 3.2%
> G: 44.7%
> T: 35.6%
> none/other: 0.0%
>
> .... (I skipped lengthy print of removed sequences, some of which were suspiciously long)
>
> === First read: Adapter 2 ===
>
> Sequence: CCTACGGGNGGCWGCAG; Type: regular 5'; Length: 17; Trimmed: 70265 times.
>
> No. of allowed errors:
> 0-9 bp: 0; 10-16 bp: 1
>
> Overview of removed sequences
> length count expect max.err error counts
> 3 25 1110.4 0 25
> 11 3 0.0 1 1 2
> 12 1 0.0 1 0 1
> 14 5 0.0 1 4 1
> 15 14 0.0 1 8 6
> 16 282 0.0 1 87 195
> 17 18379 0.0 1 18261 118
> 18 324 0.0 1 62 262
> 19 18035 0.0 1 17885 150
> 20 305 0.0 1 61 244
> 21 15572 0.0 1 15458 114
> 22 575 0.0 1 292 283
> 23 16654 0.0 1 16539 115
> 24 86 0.0 1 29 57
> 25 2 0.0 1 2
> 27 1 0.0 1 1
> 28 1 0.0 1 1
> 67 1 0.0 1 1
....(skipped prints for the second read, they were pretty similar to this one)
Things went pretty good until I got this error message:
> This is cutadapt 2.1 with Python 3.6.7
> Command line parameters: -g CCTACGGGNGGCWGCAG -a GGATTAGATACCCBDGTAGTC -G GACTACHVGGGTATCTAATCC -A CTGCWGCCNCCCGTAGG -n 2 -o /mnt/d/DADA2analyysi/cutadapt/A059-7M-3-CAGTTCCA-TGTCTAAG-Paulamaki-run20190304R_S59_L001_R1_001.fastq.gz -p /mnt/d/DADA2analyysi/cutadapt/A059-7M-3-CAGTTCCA-TGTCTAAG-Paulamaki-run20190304R_S59_L001_R2_001.fastq.gz /mnt/d/DADA2analyysi/filtN/A059-7M-3-CAGTTCCA-TGTCTAAG-Paulamaki-run20190304R_S59_L001_R1_001.fastq.gz /mnt/d/DADA2analyysi/filtN/A059-7M-3-CAGTTCCA-TGTCTAAG-Paulamaki-run20190304R_S59_L001_R2_001.fastq.gz
> Processing reads on 1 core in paired-end mode ...
> [ 8<---------] 00:00:06 30,000 reads @ 209.6 Β΅s/read; 0.29 M reads/minuteTraceback (most recent call last):
> File "/home/lauri/.local/bin/cutadapt", line 11, in <module>
> sys.exit(main())
> File "/home/lauri/.local/lib/python3.6/site-packages/cutadapt/__main__.py", line 803, in main
> stats = runner.run()
> File "/home/lauri/.local/lib/python3.6/site-packages/cutadapt/pipeline.py", line 727, in run
> (n, total1_bp, total2_bp) = self._pipeline.process_reads(progress=self._progress)
> File "/home/lauri/.local/lib/python3.6/site-packages/cutadapt/pipeline.py", line 324, in process_reads
> read1, read2 = modifier(read1, read2, matches1, matches2)
> File "/home/lauri/.local/lib/python3.6/site-packages/cutadapt/modifiers.py", line 33, in __call__
> return self._modifier1(read1, matches1), self._modifier2(read2, matches2)
> File "/home/lauri/.local/lib/python3.6/site-packages/cutadapt/modifiers.py", line 179, in __call__
> match = AdapterCutter.best_match(self.adapters, trimmed_read)
> File "/home/lauri/.local/lib/python3.6/site-packages/cutadapt/modifiers.py", line 107, in best_match
> match = adapter.match_to(read)
> File "/home/lauri/.local/lib/python3.6/site-packages/cutadapt/adapters.py", line 841, in match_to
> alignment = self.aligner.locate(read_seq)
> File "src/cutadapt/_align.pyx", line 515, in cutadapt._align.Aligner.locate
> AssertionError
Code kept running after this and proceed to process some more files succesfully. Only 5 samples out of 30 were affected and looks like they are missing most of the reads. As an example one sample lost over 70% of its reads. This ended up being pretty lengthy post, but I wasn't sure what to include.
author:
url:https://github.com/LauriPa
type:Person
name:LauriPa
datePublished:2019-03-19T12:36:05.000Z
interactionStatistic:
type:InteractionCounter
interactionType:https://schema.org/CommentAction
userInteractionCount:4
url:https://github.com/369/cutadapt/issues/369
context:https://schema.org
headline:Assertion error in few processesed files
articleBody:I am running cutadapt as part of DADA2 pipeline to analyze paired-end MiSeq 16S sequencing data. Since my sequences contains 0-7 b long heterogeneity spacer sequence, as described in Fadrosh et al. 2014, in addition to primer sequences, I had to remove them by using primer sequences as a template instead of just cutting fixed number of bases from each end.
So, I am relatively new to running things on command prompt, but after persistent trial and error, I managed to install cutadapt to my WSL Ubuntu 18.04.
As DADA2 is R program, I am naturally running it on latest version of R that is 3.5.3. Python is also updated to 3.6.7 and cutadapt is version 2.1
This is the code that I was running in R:
`
>FWD <- "CCTACGGGNGGCWGCAG"
>REV <- "GACTACHVGGGTATCTAATCC"
>
>#Removal of ambiguous bases
>fnFs.filtN <- file.path(path, "filtN", basename(fnFs)) # Put N-filterd files in filtN/ subdirectory
>fnRs.filtN <- file.path(path, "filtN", basename(fnRs))
>filterAndTrim(fnFs, fnFs.filtN, fnRs, fnRs.filtN, maxN = 0, multithread = FALSE)
>
>cutadapt <- "/home/lauri/.local/bin/cutadapt"
>
> path.cut <- file.path(path, "cutadapt")
> if(!dir.exists(path.cut)) dir.create(path.cut)
> fnFs.cut <- file.path(path.cut, basename(fnFs))
> fnRs.cut <- file.path(path.cut, basename(fnRs))
>
> FWD.RC <- dada2:::rc(FWD)
> REV.RC <- dada2:::rc(REV)
>
> R1.flags <- paste("-g", FWD, "-a", REV.RC)
> R2.flags <- paste("-G", REV, "-A", FWD.RC)
for(i in seq_along(fnFs)) {
system2(cutadapt, args = c(R1.flags, R2.flags, "-n", 2,
"-o", fnFs.cut[i], "-p", fnRs.cut[i], # output files
fnFs.filtN[i], fnRs.filtN[i])) # input files
}
Running this begun smoothly and I got these prints as expected:
> === Summary ===
>
> Total read pairs processed: 71,065
> Read 1 with adapter: 70,393 (99.1%)
> Read 2 with adapter: 70,102 (98.6%)
> Pairs written (passing filters): 71,065 (100.0%)
>
> Total basepairs processed: 42,922,924 bp
> Read 1: 23,166,984 bp
> Read 2: 19,755,940 bp
> Total written (filtered): 38,799,372 bp (90.4%)
> Read 1: 21,163,432 bp
> Read 2: 17,635,940 bp
>
> === First read: Adapter 1 ===
>
> Sequence: GGATTAGATACCCBDGTAGTC; Type: regular 3'; Length: 21; Trimmed: 3938 times.
>
> No. of allowed errors:
> 0-9 bp: 0; 10-19 bp: 1; 20-21 bp: 2
>
> Bases preceding removed adapters:
> A: 16.5%
> C: 3.2%
> G: 44.7%
> T: 35.6%
> none/other: 0.0%
>
> .... (I skipped lengthy print of removed sequences, some of which were suspiciously long)
>
> === First read: Adapter 2 ===
>
> Sequence: CCTACGGGNGGCWGCAG; Type: regular 5'; Length: 17; Trimmed: 70265 times.
>
> No. of allowed errors:
> 0-9 bp: 0; 10-16 bp: 1
>
> Overview of removed sequences
> length count expect max.err error counts
> 3 25 1110.4 0 25
> 11 3 0.0 1 1 2
> 12 1 0.0 1 0 1
> 14 5 0.0 1 4 1
> 15 14 0.0 1 8 6
> 16 282 0.0 1 87 195
> 17 18379 0.0 1 18261 118
> 18 324 0.0 1 62 262
> 19 18035 0.0 1 17885 150
> 20 305 0.0 1 61 244
> 21 15572 0.0 1 15458 114
> 22 575 0.0 1 292 283
> 23 16654 0.0 1 16539 115
> 24 86 0.0 1 29 57
> 25 2 0.0 1 2
> 27 1 0.0 1 1
> 28 1 0.0 1 1
> 67 1 0.0 1 1
....(skipped prints for the second read, they were pretty similar to this one)
Things went pretty good until I got this error message:
> This is cutadapt 2.1 with Python 3.6.7
> Command line parameters: -g CCTACGGGNGGCWGCAG -a GGATTAGATACCCBDGTAGTC -G GACTACHVGGGTATCTAATCC -A CTGCWGCCNCCCGTAGG -n 2 -o /mnt/d/DADA2analyysi/cutadapt/A059-7M-3-CAGTTCCA-TGTCTAAG-Paulamaki-run20190304R_S59_L001_R1_001.fastq.gz -p /mnt/d/DADA2analyysi/cutadapt/A059-7M-3-CAGTTCCA-TGTCTAAG-Paulamaki-run20190304R_S59_L001_R2_001.fastq.gz /mnt/d/DADA2analyysi/filtN/A059-7M-3-CAGTTCCA-TGTCTAAG-Paulamaki-run20190304R_S59_L001_R1_001.fastq.gz /mnt/d/DADA2analyysi/filtN/A059-7M-3-CAGTTCCA-TGTCTAAG-Paulamaki-run20190304R_S59_L001_R2_001.fastq.gz
> Processing reads on 1 core in paired-end mode ...
> [ 8<---------] 00:00:06 30,000 reads @ 209.6 Β΅s/read; 0.29 M reads/minuteTraceback (most recent call last):
> File "/home/lauri/.local/bin/cutadapt", line 11, in <module>
> sys.exit(main())
> File "/home/lauri/.local/lib/python3.6/site-packages/cutadapt/__main__.py", line 803, in main
> stats = runner.run()
> File "/home/lauri/.local/lib/python3.6/site-packages/cutadapt/pipeline.py", line 727, in run
> (n, total1_bp, total2_bp) = self._pipeline.process_reads(progress=self._progress)
> File "/home/lauri/.local/lib/python3.6/site-packages/cutadapt/pipeline.py", line 324, in process_reads
> read1, read2 = modifier(read1, read2, matches1, matches2)
> File "/home/lauri/.local/lib/python3.6/site-packages/cutadapt/modifiers.py", line 33, in __call__
> return self._modifier1(read1, matches1), self._modifier2(read2, matches2)
> File "/home/lauri/.local/lib/python3.6/site-packages/cutadapt/modifiers.py", line 179, in __call__
> match = AdapterCutter.best_match(self.adapters, trimmed_read)
> File "/home/lauri/.local/lib/python3.6/site-packages/cutadapt/modifiers.py", line 107, in best_match
> match = adapter.match_to(read)
> File "/home/lauri/.local/lib/python3.6/site-packages/cutadapt/adapters.py", line 841, in match_to
> alignment = self.aligner.locate(read_seq)
> File "src/cutadapt/_align.pyx", line 515, in cutadapt._align.Aligner.locate
> AssertionError
Code kept running after this and proceed to process some more files succesfully. Only 5 samples out of 30 were affected and looks like they are missing most of the reads. As an example one sample lost over 70% of its reads. This ended up being pretty lengthy post, but I wasn't sure what to include.
author:
url:https://github.com/LauriPa
type:Person
name:LauriPa
datePublished:2019-03-19T12:36:05.000Z
interactionStatistic:
type:InteractionCounter
interactionType:https://schema.org/CommentAction
userInteractionCount:4
url:https://github.com/369/cutadapt/issues/369
Person:
url:https://github.com/LauriPa
name:LauriPa
url:https://github.com/LauriPa
name:LauriPa
InteractionCounter:
interactionType:https://schema.org/CommentAction
userInteractionCount:4
interactionType:https://schema.org/CommentAction
userInteractionCount:4
External Links {π}(2)
Analytics and Tracking {π}
- Site Verification - Google
Libraries {π}
- Clipboard.js
- D3.js
- Lodash
Emails and Hosting {βοΈ}
Mail Servers:
- aspmx.l.google.com
- alt1.aspmx.l.google.com
- alt2.aspmx.l.google.com
- alt3.aspmx.l.google.com
- alt4.aspmx.l.google.com
Name Servers:
- dns1.p08.nsone.net
- dns2.p08.nsone.net
- dns3.p08.nsone.net
- dns4.p08.nsone.net
- ns-1283.awsdns-32.org
- ns-1707.awsdns-21.co.uk
- ns-421.awsdns-52.com
- ns-520.awsdns-01.net