Data Augmentation là gì? Vai trò của tăng cường dữ liệu học máy

Học máy hiệu quả phụ thuộc rất nhiều vào dữ liệu đầu vào. Tuy nhiên, khi dữ liệu không đủ nhiều hoặc thiếu tính đa dạng, kỹ thuật Data Augmentation trở thành một giải pháp quan trọng. Bài viết dưới đây sẽ cung cấp cho bạn cái nhìn toàn diện về tăng cường dữ liệu: từ khái niệm cơ bản đến ứng dụng trong thực tiễn và những khó khăn đi kèm.

Xem bài viết ngay: https://interdata.vn/blog/data-augmentation-la-gi/

#interdata #dataaugmentation

Generative AI Using SAS | CoListy
Explore Generative AI with SAS, from SMOTE and GANs to LLMs like BERT, enhancing your skills in data generation and AI innovation. Free learning! | CoListy
#generativeai #sasviya #machinelearning #smote #gans #bert #retrieval-augmentedgeneration #aiinnovation #syntheticdata #llms #freeonlinelearning #colisty #courselist #analyticslifecycle #trustworthyai #aitechnologies #responsibleai #aisystems #dataaugmentation

https://colisty.netlify.app/courses/generative-ai-using-sas_/

Generative AI Using SAS

Explore Generative AI with SAS, from SMOTE and GANs to LLMs like BERT, enhancing your skills in data generation and AI innovation. Free learning! | CoListy

Exciting news! Check out our latest article on Multimodal TextImage Augmentation for Document Images, a collaboration with Albumentations AI. Enhance your datasets with this new technique! #DataAugmentation #DocumentImages

https://huggingface.co/blog/doc_aug_hf_alb

Introducing TextImage Augmentation for Document Images

We’re on a journey to advance and democratize artificial intelligence through open source and open science.

Learn how data augmentation enriches machine learning models with diverse datasets. Explore its benefits in AI, healthcare, retail, finance, and more!

More details: https://solguruz.com/generative-ai/what-is-data-augmentation/

#dataaugmentation
#gen AI
#generative AI

Data Augmentation | Generative AI Wiki

Learn how data augmentation enriches machine learning models with diverse datasets. Explore its benefits in AI, healthcare, retail, finance, and more!

"Good luck, Viv, I know that guy's a total douche."

"Thanks Tay. Have you got the multi-wake word model training?"

"Yeah. Are you're sure he'll pick that #WakeWord though?"

"I'm pretty sure. He won't outright use the phrase "dole bludger" but it's pretty close."

---

A sardonic smile crept over the Prime Minister's face, the hot summer sun reflecting off his near-bald temples.

"So, by choosing a Wake Word that has difficult to pronounce sounds in it, it means that it won't work well for people who speak with an accent?"

"Yes, Prime Minister, precisely".

"Like the 'th' sound in them, there that?"

"Yes, or words that start or end with hard consonants are also difficult for some accents".

"Do we know which Wake Word would be the hardest for immigrants? Indigenous people?".

Pained, the #linguist had feared this question. She knew exactly what he was trying to do.

The long call centre queues hadn't done the trick - people had added screeners to their phones so that after being on hold for 7 hours, the phone would alert them to a picked up call.

The Assistant was supposed to be the replacement for the call centre. Just load the app on your phone, and ask it a question! So simple! No queuing! The government actually wanted to help people!

You just needed to use the Wake Word to "wake up" the assistant first.

"Well ---", she hesitated.

"I don't have all day Doctor!"

"Our research shows that a Wake Word like `This Starts With Me` has lots of hard to pronounce phonemes - sounds".

"Excellent. And I like the overtone of personal responsibility."

Of course the fucker did.

"Very well Prime Minister, we will implement that Wake Word".

He trotted off, probably to kill some babies or kittens, she thought.

---

Tay was configuring the #DataAugmentation for the #ML training run for the #WakeWord model.

They had just finished downloading every instance of the words "this", "starts" "with" "me" in every accent of English, from Common Voice.

By augmenting the Wake Word model with accent data, they could make it recognise more accents, more accurately.

Resistance came in many forms.

---
https://arxiv.org/abs/2104.01454
https://dl.acm.org/doi/10.1145/3617694.3623258
---

#Tootfic #Microfiction

Few-Shot Keyword Spotting in Any Language

We introduce a few-shot transfer learning method for keyword spotting in any language. Leveraging open speech corpora in nine languages, we automate the extraction of a large multilingual keyword bank and use it to train an embedding model. With just five training examples, we fine-tune the embedding model for keyword spotting and achieve an average F1 score of 0.75 on keyword classification for 180 new keywords unseen by the embedding model in these nine languages. This embedding model also generalizes to new languages. We achieve an average F1 score of 0.65 on 5-shot models for 260 keywords sampled across 13 new languages unseen by the embedding model. We investigate streaming accuracy for our 5-shot models in two contexts: keyword spotting and keyword search. Across 440 keywords in 22 languages, we achieve an average streaming keyword spotting accuracy of 87.4% with a false acceptance rate of 4.3%, and observe promising initial results on keyword search.

arXiv.org
April 2024 (Part 1)

Building a Robust Data Ecosystem for Generative AI: Key Strategies for Organizations Organizations are constantly seeking innovative ways to leverage artificial intelligence (AI) to gain a competitive edge. Generative AI, a subset of AI that involves creating new content such as images, text, or eve

"Beer, Data & Robots" ⚛️ è stata davvero una serata esplosiva.. grazie a tutti !! 💥 🙏🏻

Grazie a Simona Mazzarino e a Andrea Marchese per averci illustrato fino a che punto una IA può mal interpretare i dati e come poterli esplorare con un visore di realtà aumentata 📊

Ecco il video della serata: https://video.linux.it/w/3cnKaZqmSLtpEgkYc8jAFq

#databeerstorino #ai #AIbias #AIfairness #syntheticdata #XGBoost #DataAugmentation #vr #dataframe #open3d #unity #pythontorino #datascience #python

Beer, Data & Robots

PeerTube

"Know your Bias: Tackling Data Bias through Synthetic Data" by Simona Mazzarino #ClearboxAI #databeerstorino @torino — Beer, Data & Robots

#ai #AIbias #AIfairness #syntheticdata #XGBoost #DataAugmentation

Listen to the #InfoQ #podcast featuring Sam Partee, where he shares insights on Redis' vector database offering, different approaches to embeddings, and how to enhance #LLMs by adding a search component for retrieval augmented generation: https://bit.ly/3ukrEjw

Plus, a peek into the world of hybrid search in Redis!

#AI #ML #DataBase #DataAugmentation

Sam Partee on Retrieval Augmented Generation (RAG)

Sam Partee shares insights on Redis' vector database offering, different approaches to embeddings, how to enhance large language models by adding a search component for retrieval augmented generation.

InfoQ

Soft Augmentation for Image Classification
Authors: Yang Liu, Shen Yan, Laura Leal-Taixé, James Hays, Deva Ramanan

abs: http://arxiv.org/abs/2211.04625
code: https://github.com/youngleox/soft_augmentation

#arXiv #ComputerVision #DataAugmentation

Soft Augmentation for Image Classification

Modern neural networks are over-parameterized and thus rely on strong regularization such as data augmentation and weight decay to reduce overfitting and improve generalization. The dominant form of data augmentation applies invariant transforms, where the learning target of a sample is invariant to the transform applied to that sample. We draw inspiration from human visual classification studies and propose generalizing augmentation with invariant transforms to soft augmentation where the learning target softens non-linearly as a function of the degree of the transform applied to the sample: e.g., more aggressive image crop augmentations produce less confident learning targets. We demonstrate that soft targets allow for more aggressive data augmentation, offer more robust performance boosts, work with other augmentation policies, and interestingly, produce better calibrated models (since they are trained to be less confident on aggressively cropped/occluded examples). Combined with existing aggressive augmentation strategies, soft target 1) doubles the top-1 accuracy boost across Cifar-10, Cifar-100, ImageNet-1K, and ImageNet-V2, 2) improves model occlusion performance by up to $4\times$, and 3) halves the expected calibration error (ECE). Finally, we show that soft augmentation generalizes to self-supervised classification tasks. Code available at https://github.com/youngleox/soft_augmentation

arXiv.org