Ankit Sharma

98 Followers
493 Following
154 Posts
ML/AI Engineer by day 🐤 l Hackerman by night 🐦 | Weird photographer 📷 | Hobbyist  developer
#emacs #macOS #cpp #rust #python #iOS #swift #machinelearning #deeplearning #softwareengineering #mlops #nlp #llm
Websitehttps://nezubn.com
Twitterhttps://twitter.com/nezubn
GitHubhttps://github.com/ankitsharma07
Linkedinhttps://www.linkedin.com/in/ankitkumar1107

Welcome to a new browser!

> Ladybird uses a brand new engine based on web standards, without borrowing any code from other browsers. It started as a humble HTML viewer for the SerenityOS hobby project, but since then it's grown into a full cross-platform browser project supporting Linux, macOS, and other Unix-like systems. — https://ladybird.org/announcement.html

This is good for the Web!

Announcing the Ladybird Browser Initiative

We've created a US non-profit to develop Ladybird into a truly independent web browser...

This intro to Linear Algebra is what you really needed but never had.

Link: https://pabloinsente.github.io/intro-linear-algebra

Introduction to Linear Algebra for Applied Machine Learning with Python

the github feed used to be so good.. used to discover a lot of alpha

wtf they changed that into.. now I get a couple of weird updates and then nothing, tried filtering but nothing works out well

📚 Pretty nervous about this, but here goes. Here's a side project I've been working on for the past few months. It's called Viberary and it's a semantic search engine. It gives you book recommendations based on ✨vibes. ✨ You enter a search query like "funny scifi" and it returns a list of (hopefully!) good recommendations.

https://viberary.pizza/

There is an about page that explains the data, model, etc. It's still pretty early stages but it's been a labor of love for me.

Viberary

Find your book vibe semantically!

@zhenyi hey, I was going through the Sessions app which you created last year and I was wondering how did you pull the videos there. Is there an api ??
@dansup did you find any?
@xenodium interesting .. got may things which I will try and experiment with
@xenodium you’re using spacemacs or vanilla emacs?

Fast Distributed Inference Serving for Large Language Models

FastServe improves the average and tail job completion time by up to 5.1x and 6.4x, respectively, compared to the SotA solution Orca.

https://arxiv.org/abs/2305.05920

Fast Distributed Inference Serving for Large Language Models

Large language models (LLMs) power a new generation of interactive AI applications exemplified by ChatGPT. The interactive nature of these applications demands low latency for LLM inference. Existing LLM serving systems use run-to-completion processing for inference jobs, which suffers from head-of-line blocking and long latency. We present FastServe, a distributed inference serving system for LLMs. FastServe exploits the autoregressive pattern of LLM inference to enable preemption at the granularity of each output token. FastServe uses preemptive scheduling to minimize latency with a novel skip-join Multi-Level Feedback Queue scheduler. Based on the new semi-information-agnostic setting of LLM inference, the scheduler leverages the input length information to assign an appropriate initial queue for each arrival job to join. The higher priority queues than the joined queue are skipped to reduce demotions. We design an efficient GPU memory management mechanism that proactively offloads and uploads intermediate state between GPU memory and host memory for LLM inference. We build a system prototype of FastServe and experimental results show that compared to the state-of-the-art solution vLLM, FastServe improves the throughput by up to 31.4x and 17.9x under the same average and tail latency requirements, respectively.

arXiv.org