Mastodawn

John Urbanik May 9, 2023

Charles Tapley Hoyt May 9, 2023

what if all of the high performance computing groups are actually just medium performance computing groups

John Urbanik May 2, 2023

ICYMI, @ihaque just finished sharing a tootorial on our recent work @ Recursion investigating a widespread source of systematic #confounding in #CRISPR-Cas9 screens we dub "proximity bias."

https://www.biorxiv.org/content/10.1101/2023.04.15.537038v1

We believe that this confounding has implications for most CRISPR-based discovery and may have implications for therapy.

Some sections haven't yet made their way off of the bird site, so take a look here: https://twitter.com/ImranSHaque/status/1650911267530629120

Show thread

John Urbanik Jan 18, 2023

Any recommendations for papers extending this type of framework to make sure that embeddings are well aligned with downstream tasks?

It's notably different than something like https://unified-io.allenai.org/ that enforces a single sequence based representation (basically T5 for vision language).

For just learning visual representations of text, MS-CLIP explores the impact of parameter sharing https://github.com/Hxyou/MSCLIP

Would love more reading recs!

Unified-IO, a new general purpose model from AI2

Unified-IO is the first neural model to perform a large, diverse set of AI tasks from computer vision to natural language processing.

John Urbanik Jan 18, 2023

Went back to BLIP (https://arxiv.org/abs/2201.12086) last night. When I first skimmed it, I focused on the part of the paper focused on bootstrapping captions, but the "Multimodal mixture of Encoder-Decoder" architecture is pretty cool.

It uses a structured architecture involving multiple encoder/decoders wherein some parts of the architecture take advantage of others (e.g. using the contrastive loss for hard example mining for the image-text matching loss).

#ilp #vlp #multimodal

BLIP: Bootstrapping Language-Image Pre-training for Unified Vision-Language Understanding and Generation

Vision-Language Pre-training (VLP) has advanced the performance for many vision-language tasks. However, most existing pre-trained models only excel in either understanding-based tasks or generation-based tasks. Furthermore, performance improvement has been largely achieved by scaling up the dataset with noisy image-text pairs collected from the web, which is a suboptimal source of supervision. In this paper, we propose BLIP, a new VLP framework which transfers flexibly to both vision-language understanding and generation tasks. BLIP effectively utilizes the noisy web data by bootstrapping the captions, where a captioner generates synthetic captions and a filter removes the noisy ones. We achieve state-of-the-art results on a wide range of vision-language tasks, such as image-text retrieval (+2.7% in average recall@1), image captioning (+2.8% in CIDEr), and VQA (+1.6% in VQA score). BLIP also demonstrates strong generalization ability when directly transferred to video-language tasks in a zero-shot manner. Code, models, and datasets are released at https://github.com/salesforce/BLIP.

arXiv.org

John Urbanik Jan 18, 2023

Show thread

Lee Zamparo Jan 18, 2023

I suspect a lot of really good ideas get left undeveloped, because they'll take six to twelve months to start showing real potential, and that's way more runway allotted to most industry projects.

Perhaps I'm too influenced by that ("Impact of research declines since the 1950s") paper which went around, but this morning I was consumed by the thought of how so much effort and money goes into developing "OK" ideas at the expense of really good ones. Anybody else have thoughts on this?

Show thread

John Urbanik Jan 10, 2023

The methods are fun too - some very clever use of CRISPR to truncate single copies of particular chromosome arms.

#genomics #crispr #geneediting #oncology

John Urbanik Jan 10, 2023

Enjoying this paper from the Sheltzer lab - it reads a bit like a group of biologists playing detective. They start with a theory that #aneuploidy is necessary for the malignancy of some cancer lines, and then trace this all the way to showing some very tight links to MDM4. They even propose a therapeutic using differential sensitivity between disomic and trisomic cells.

https://www.biorxiv.org/content/10.1101/2023.01.09.523344v1

John Urbanik Jan 5, 2023

Is anyone aware of papers training modern capsule networks on larger datasets (e.g. ImageNet or other datasets with >500 classes)? It seems like the ideas from https://www.nature.com/articles/s41598-021-93977-0 and https://papers.nips.cc/paper/2019/hash/e46bc064f8e92ac2c404b9871b2a4ef2-Abstract.html could be used to scale up but I haven't seen anything more than CIFAR-100 / Tiny-ImageNet.

The basic idea:
- self-attention as routing
- use a moderately sized backbone (e.g. ResNet / ViT) followed by a few convolutional capsule layers and some FC capsule layers + squashing function

Efficient-CapsNet: capsule network with self-attention routing - Scientific Reports

Deep convolutional neural networks, assisted by architectural design strategies, make extensive use of data augmentation techniques and layers with a high number of feature maps to embed object transformations. That is highly inefficient and for large datasets implies a massive redundancy of features detectors. Even though capsules networks are still in their infancy, they constitute a promising solution to extend current convolutional networks and endow artificial visual perception with a process to encode more efficiently all feature affine transformations. Indeed, a properly working capsule network should theoretically achieve higher results with a considerably lower number of parameters count due to intrinsic capability to generalize to novel viewpoints. Nevertheless, little attention has been given to this relevant aspect. In this paper, we investigate the efficiency of capsule networks and, pushing their capacity to the limits with an extreme architecture with barely 160 K parameters, we prove that the proposed architecture is still able to achieve state-of-the-art results on three different datasets with only 2% of the original CapsNet parameters. Moreover, we replace dynamic routing with a novel non-iterative, highly parallelizable routing algorithm that can easily cope with a reduced number of capsules. Extensive experimentation with other capsule implementations has proved the effectiveness of our methodology and the capability of capsule networks to efficiently embed visual representations more prone to generalization.

Nature

John Urbanik Jan 5, 2023

Randomly ended up taking a look at https://www.nature.com/articles/s41591-022-02116-3 after wondering if some recent symptoms I've been having are #longcovid related.

It's so interesting to look at the difference between this study and how long COVID is discussed on social media (or https://doi.org/10.1016/j.eclinm.2021.101019). Maybe it's the choice in topic model / clustering or the curation of ICD-10 codes, but I wonder if symptoms like brain fog, generalized anxiety, and ME/CFS don't often make it into EHR.

Data-driven identification of post-acute SARS-CoV-2 infection subphenotypes - Nature Medicine

Machine learning applied to electronic health records in two US cohorts from the RECOVER initiative identified four Long-COVID subphenotypes that differ in the involvement of organ systems, previous SARS-CoV-2 infection severity and underlying conditions.

Nature

John Urbanik Dec 30, 2022

Lucas Beyer Dec 29, 2022

How good of a BERT can one get in ONE DAY on ONE GPU?

With all the recent studies about scaling compute up, this paper takes a refreshing turn and does a deep dive into scaling down compute.

It's well written, stock full of insights. Here is my summary and my opinions.

https://arxiv.org/abs/2212.14034 by @jonasgeiping and @tomgoldstein

🧶 1/N

Cramming: Training a Language Model on a Single GPU in One Day

Recent trends in language modeling have focused on increasing performance through scaling, and have resulted in an environment where training language models is out of reach for most researchers and practitioners. While most in the community are asking how to push the limits of extreme computation, we ask the opposite question: How far can we get with a single GPU in just one day? We investigate the downstream performance achievable with a transformer-based language model trained completely from scratch with masked language modeling for a single day on a single consumer GPU. Aside from re-analyzing nearly all components of the pretraining pipeline for this scenario and providing a modified pipeline with performance close to BERT, we investigate why scaling down is hard, and which modifications actually improve performance in this scenario. We provide evidence that even in this constrained setting, performance closely follows scaling laws observed in large-compute settings. Through the lens of scaling laws, we categorize a range of recent improvements to training and architecture and discuss their merit and practical applicability (or lack thereof) for the limited compute setting.

arXiv.org