AISatoshi (@AiXsatoshi)

Opus 증류 데이터셋을 일본어화하기 위해 Blackwell GPU 4대를 돌리고 있다고 밝혔다. 대규모 GPU 인프라를 이용해 고속으로 데이터셋을 변환·생성하는 작업으로, 일본어용 학습 데이터 확장과 후속 튜닝에 유용한 흐름이다.

https://x.com/AiXsatoshi/status/2040471660357455980

#blackwell #gpu #dataset #japanese #llm

AI✖️Satoshi⏩️ (@AiXsatoshi) on X

Opusの蒸留データセット日本語化すべく Blackwell GPU x4 ぶん回してる

X (formerly Twitter)

AISatoshi (@AiXsatoshi)

Opus 증류 데이터셋을 Gemma-4-26B로 번역해 일본어화한 작업을 언급한 트윗이다. 대규모 모델을 활용해 학습용 데이터셋을 현지화하는 실험으로, 일본어 튜닝 및 데이터 생성 파이프라인 구축에 활용될 수 있는 사례다.

https://x.com/AiXsatoshi/status/2040475933296193922

#gemma #dataset #translation #finetuning #japanese

AI✖️Satoshi⏩️ (@AiXsatoshi) on X

Opusの蒸留データセットgemma-4-26bで翻訳した

X (formerly Twitter)
Enphase AC Battery Grid-connected Storage in Our UK Home: Review - Does it do what it says on the tin? And what about the important feature not on the datasheet? #storage #dataset #behindTheMeter - https://www.earth.org.uk/Enphase-AC-Battery-REVIEW.html
Enphase AC Battery Grid-connected Storage in Our UK Home: Review

Does it do what it says on the tin? And what about the important feature not on the datasheet? #storage #dataset #behindTheMeter

Below is a list of the top websites that provide datasets for machine learning projects:

#dataset #machine #learning

https://www.ml-nn.eu/a1/29.html

Open Datasets

Machine Learning & Neural Networks Blog

🌐Christof Schöch, University of Trier, details how the #DOAJ journal #dataset is used to teach #Python programming for the Machine Learning in a Digital Humanities Master's program @christof

#PythonProgramming #APCs #DataClassiication #DataCleaning #MachineLearning
🔗 https://blog.doaj.org/2026/03/30/teaching-python-programming-with-doajs-journal-dataset/

Rohan Paul (@rohanpaul_ai)

Unitree Robotics가 공개한 새로운 오픈소스 로보틱스 데이터셋 UnifoLM-WBT-Dataset을 소개합니다. 실제 환경에서 수집한 고품질 전신 텔레오퍼레이션 데이터로, 휴머노이드 로봇의 개방형 환경 조작 연구와 학습에 활용될 수 있습니다.

https://x.com/rohanpaul_ai/status/2037497343764025600

#unitree #robotics #opensource #dataset #humanoid

Rohan Paul (@rohanpaul_ai) on X

New big open source robotic dataset from @UnitreeRobotics UnifoLM-WBT-Dataset - a high-quality dataset drawn from real-world settings for whole-body teleoperation of humanoid robots in open environments. Unitree says the dataset will grow to include broader scenarios and more

X (formerly Twitter)
16WW Energy Series Dataset - Time series of electricity and gas and other energy generation and use. #energy #microgen #dataset - https://www.earth.org.uk/energy-series-dataset.html
16WW Energy Series Dataset

Time series of electricity and gas and other energy generation and use. #energy #microgen #dataset

От сигнатур к ML IDS: чему IDS Suricata может научить модель?

[Текст не для публикации: не нашел как Редакции прикрепить сообщение, эта статья написана в рамках Блога "Институт системного программирования им. В.П. Иванникова РАН"]

https://habr.com/ru/articles/1015132/

#IDS #Suricata #ML #dataset

От сигнатур к ML IDS: чему IDS Suricata может научить модель?

В настоящее время для противодействия компьютерным атакам применяются разнообразные средства защиты информации: межсетевые экраны; системы обнаружения вторжений уровня сети; системы обнаружения...

Хабр
General Bibliography - General public bibliography for EOU and related research. #bibliography #dataset - https://m.earth.org.uk/bibliography.html
General Bibliography

General public bibliography for EOU and related research. #bibliography #dataset

On 16WW Data Collections and Graphs - Open for research home #dataset - https://www.earth.org.uk/note-on-data.html
On 16WW Data Collections and Graphs

Open for research home #dataset