#RStats #GenomicsIO #Duckdb

The duckdb C API and the related extension mechanism is underrated

template from which i just copy the two headers and transform a python script to bash and remove the cmake stuff !
https://github.com/duckdb/extension-template-c/tree/main

vibe coded extension for some application use https://github.com/RGenomicsETL/RBCFTools/tree/main/inst/duckdb_bcf_reader_extension

#RStats #GenomicsIO #RGenomicsETL
just incredible how lax with the specs the average vcf file out there is

so to convert them into #nanoarrow streams/ipc, you either enforce yourself some corrections or go for all strings.

#Rstats #GenomicsIO fraq: A high-throughput extensible toolkit for processing fastq data github.com/traversc/fraq

GitHub - traversc/fraq
#Rstats #GenomicsIO
fraq: A high-throughput extensible toolkit for processing fastq data
https://github.com/traversc/fraq
#htslib #Bioinformatics #GenomicsIO
@yokofakun any idea what is the fastest method to get the nth bcf record using htslib or bcftools.h without explicit loops? (trying out something similar to your project https://github.com/lindenb/rbcf but with ALTREP)
I guess it is possible to guess which block to (lazy) parse if one know the blocks offsets and number of records per block
GitHub - GabrielHoffman/GenomicDataStream: Read genomic data files (VCF, BCF, BGEN, PGEN, BED, H5AD, DelayedArray) into R/Rcpp in chunks

Read genomic data files (VCF, BCF, BGEN, PGEN, BED, H5AD, DelayedArray) into R/Rcpp in chunks - GabrielHoffman/GenomicDataStream

GitHub
GitHub - GabrielHoffman/GenomicDataStream: Read genomic data files (VCF, BCF, BGEN, PGEN, BED, H5AD, DelayedArray) into R/Rcpp in chunks

Read genomic data files (VCF, BCF, BGEN, PGEN, BED, H5AD, DelayedArray) into R/Rcpp in chunks - GabrielHoffman/GenomicDataStream

GitHub