ICYMI, @ihaque just finished sharing a tootorial on our recent work @ Recursion investigating a widespread source of systematic #confounding in #CRISPR-Cas9 screens we dub "proximity bias."
https://www.biorxiv.org/content/10.1101/2023.04.15.537038v1
We believe that this confounding has implications for most CRISPR-based discovery and may have implications for therapy.
Some sections haven't yet made their way off of the bird site, so take a look here: https://twitter.com/ImranSHaque/status/1650911267530629120
Any recommendations for papers extending this type of framework to make sure that embeddings are well aligned with downstream tasks?
It's notably different than something like https://unified-io.allenai.org/ that enforces a single sequence based representation (basically T5 for vision language).
For just learning visual representations of text, MS-CLIP explores the impact of parameter sharing https://github.com/Hxyou/MSCLIP
Would love more reading recs!
Went back to BLIP (https://arxiv.org/abs/2201.12086) last night. When I first skimmed it, I focused on the part of the paper focused on bootstrapping captions, but the "Multimodal mixture of Encoder-Decoder" architecture is pretty cool.
It uses a structured architecture involving multiple encoder/decoders wherein some parts of the architecture take advantage of others (e.g. using the contrastive loss for hard example mining for the image-text matching loss).
Vision-Language Pre-training (VLP) has advanced the performance for many vision-language tasks. However, most existing pre-trained models only excel in either understanding-based tasks or generation-based tasks. Furthermore, performance improvement has been largely achieved by scaling up the dataset with noisy image-text pairs collected from the web, which is a suboptimal source of supervision. In this paper, we propose BLIP, a new VLP framework which transfers flexibly to both vision-language understanding and generation tasks. BLIP effectively utilizes the noisy web data by bootstrapping the captions, where a captioner generates synthetic captions and a filter removes the noisy ones. We achieve state-of-the-art results on a wide range of vision-language tasks, such as image-text retrieval (+2.7% in average recall@1), image captioning (+2.8% in CIDEr), and VQA (+1.6% in VQA score). BLIP also demonstrates strong generalization ability when directly transferred to video-language tasks in a zero-shot manner. Code, models, and datasets are released at https://github.com/salesforce/BLIP.
I suspect a lot of really good ideas get left undeveloped, because they'll take six to twelve months to start showing real potential, and that's way more runway allotted to most industry projects.
Perhaps I'm too influenced by that ("Impact of research declines since the 1950s") paper which went around, but this morning I was consumed by the thought of how so much effort and money goes into developing "OK" ideas at the expense of really good ones. Anybody else have thoughts on this?
The methods are fun too - some very clever use of CRISPR to truncate single copies of particular chromosome arms.
Enjoying this paper from the Sheltzer lab - it reads a bit like a group of biologists playing detective. They start with a theory that #aneuploidy is necessary for the malignancy of some cancer lines, and then trace this all the way to showing some very tight links to MDM4. They even propose a therapeutic using differential sensitivity between disomic and trisomic cells.
Is anyone aware of papers training modern capsule networks on larger datasets (e.g. ImageNet or other datasets with >500 classes)? It seems like the ideas from https://www.nature.com/articles/s41598-021-93977-0 and https://papers.nips.cc/paper/2019/hash/e46bc064f8e92ac2c404b9871b2a4ef2-Abstract.html could be used to scale up but I haven't seen anything more than CIFAR-100 / Tiny-ImageNet.
The basic idea:
- self-attention as routing
- use a moderately sized backbone (e.g. ResNet / ViT) followed by a few convolutional capsule layers and some FC capsule layers + squashing function
Deep convolutional neural networks, assisted by architectural design strategies, make extensive use of data augmentation techniques and layers with a high number of feature maps to embed object transformations. That is highly inefficient and for large datasets implies a massive redundancy of features detectors. Even though capsules networks are still in their infancy, they constitute a promising solution to extend current convolutional networks and endow artificial visual perception with a process to encode more efficiently all feature affine transformations. Indeed, a properly working capsule network should theoretically achieve higher results with a considerably lower number of parameters count due to intrinsic capability to generalize to novel viewpoints. Nevertheless, little attention has been given to this relevant aspect. In this paper, we investigate the efficiency of capsule networks and, pushing their capacity to the limits with an extreme architecture with barely 160 K parameters, we prove that the proposed architecture is still able to achieve state-of-the-art results on three different datasets with only 2% of the original CapsNet parameters. Moreover, we replace dynamic routing with a novel non-iterative, highly parallelizable routing algorithm that can easily cope with a reduced number of capsules. Extensive experimentation with other capsule implementations has proved the effectiveness of our methodology and the capability of capsule networks to efficiently embed visual representations more prone to generalization.
Randomly ended up taking a look at https://www.nature.com/articles/s41591-022-02116-3 after wondering if some recent symptoms I've been having are #longcovid related.
It's so interesting to look at the difference between this study and how long COVID is discussed on social media (or https://doi.org/10.1016/j.eclinm.2021.101019). Maybe it's the choice in topic model / clustering or the curation of ICD-10 codes, but I wonder if symptoms like brain fog, generalized anxiety, and ME/CFS don't often make it into EHR.
Machine learning applied to electronic health records in two US cohorts from the RECOVER initiative identified four Long-COVID subphenotypes that differ in the involvement of organ systems, previous SARS-CoV-2 infection severity and underlying conditions.
How good of a BERT can one get in ONE DAY on ONE GPU?
With all the recent studies about scaling compute up, this paper takes a refreshing turn and does a deep dive into scaling down compute.
It's well written, stock full of insights. Here is my summary and my opinions.
https://arxiv.org/abs/2212.14034 by @jonasgeiping and @tomgoldstein
🧶 1/N
Recent trends in language modeling have focused on increasing performance through scaling, and have resulted in an environment where training language models is out of reach for most researchers and practitioners. While most in the community are asking how to push the limits of extreme computation, we ask the opposite question: How far can we get with a single GPU in just one day? We investigate the downstream performance achievable with a transformer-based language model trained completely from scratch with masked language modeling for a single day on a single consumer GPU. Aside from re-analyzing nearly all components of the pretraining pipeline for this scenario and providing a modified pipeline with performance close to BERT, we investigate why scaling down is hard, and which modifications actually improve performance in this scenario. We provide evidence that even in this constrained setting, performance closely follows scaling laws observed in large-compute settings. Through the lens of scaling laws, we categorize a range of recent improvements to training and architecture and discuss their merit and practical applicability (or lack thereof) for the limited compute setting.