shubh (@shub777)
Latent Space Podcast(@latentspacepod)을 언급하며 최근 에피소드를 따라잡는 중이라고 밝힘. 현재 엔드투엔드 제품·가치 중심 작업을 제대로 반영하는 진정한 AI 벤치마크 논의가 부족하다고 지적하며, 소프트웨어 공학 외 다른 워크로드를 입증할 수 있는 벤치마크 필요성을 강조함.
shubh (@shub777)
Latent Space Podcast(@latentspacepod)을 언급하며 최근 에피소드를 따라잡는 중이라고 밝힘. 현재 엔드투엔드 제품·가치 중심 작업을 제대로 반영하는 진정한 AI 벤치마크 논의가 부족하다고 지적하며, 소프트웨어 공학 외 다른 워크로드를 입증할 수 있는 벤치마크 필요성을 강조함.
We are advertising a postdoc position to work on #generative #models, #structure #induction, and MI #estimation with Michael Gutmann as part of GenAI (@genaihub)!
https://elxw.fa.em3.oraclecloud.com/hcmUI/CandidateExperience/en/sites/CX_1001/job/13930
Get in touch! (#ML #AI)
👉 homepages.inf.ed.ac.uk/snaraya3/
👉 michaelgutmann.github.io
👉 genai.ac.uk
We invite applications for a Postdoctoral Research Associate in machine learning based in the School of Informatics, University of Edinburgh. The postholder will also be formally affiliated with the EPSRC-funded Hub in Generative AI and work with Drs Siddharth N. and Michael Gutmann as part of the Hub. This is an outstanding opportunity to conduct methodological research at the frontier of machine learning and to collaborate across a vibrant national network of leading universities and industry partners.
#DoorDash launched a multimodal #ML system aligning images, text, and user queries in a shared embedding space.
• Trained on 32M query–product pairs
• Uses contrastive learning
• Improves semantic search, ranking, and advertising relevance
More details here ⇨ https://bit.ly/41fhrl3
#SoftwareArchitecture #AI #Rankings #Search #VectorDatabases #EmbeddedDatabases #InfoQ
`Understanding Model Calibration -- A gentle introduction and visual exploration of calibration and the expected calibration error (ECE)`

To be considered reliable, a model must be calibrated so that its confidence in each decision closely reflects its true outcome. In this blogpost we'll take a look at the most commonly used definition for calibration and then dive into a frequently used evaluation measure for model calibration. We'll then cover some of the drawbacks of this measure and how these surfaced the need for additional notions of calibration, which require their own new evaluation measures. This post is not intended to be an in-depth dissection of all works on calibration, nor does it focus on how to calibrate models. Instead, it is meant to provide a gentle introduction to the different notions and their evaluation measures as well as to re-highlight some issues with a measure that is still widely used to evaluate calibration.
`This study presents a novel calibration assessment framework for ML models, designed to address the limitations of existing popular metrics, particularly the ECE. Our framework enables a more fine- grained evaluation of calibration by assessing model performance locally, for different confidence regions or classes, providing a com- prehensive understanding of the model’s behavior.`
https://boa.unimib.it/retrieve/3998da14-0e54-49d1-8a14-4af87f9226c7/Famiglini-2023-ECAI-VoR.pdf
ML endpoints in the wild see messy inputs: noise, typos, adversarial tricks. InferProbe simulates that mess locally. What's the most chaotic real input you've had to test?