Title: P5: I have been read DVC documentation [2023-11-15 Wed]
- metrics :: feature of 'experiments' - allow compare results.
- cache :: hidden storage .dvc/cache

😶\n#supported #datascience #ds #ml #machinelearning #dvc #data
Title: P4: I have been read DVC documentation [2023-11-15 Wed]
(considered outdated) when any of their dependencies change.
- https://dvc.org/doc/user-guide/data-management/remote-storage#supported-storage-types
- output :: result of stage, tracked by DVC.
- parameters :: granular dependencies of stage, such as 'batch size', DVC can track any key/value pair in a supported
parameters file (params.yaml by default)\n#supported #supported #datascience #ds #ml #machinelearning #dvc #data
Remote Storage

Open-source version control system for Data Science and Machine Learning projects. Git-like experience to organize your data, models, and experiments.

Data Version Control · DVC
Title: P3: I have been read DVC documentation [2023-11-15 Wed]
- DVC remotes :: similar to Git remotes, used with /dvc push/ and /dvc pull/ commands. To add: /dvc remote/ to .dvc/config.
- stage :: processing step of pipeline. allow connecting code to its corresponding data input/dependencies and output.
- dependencies :: input for a stage. specified as paths in the dev field of '.dvc'. Stages are invalidated\n#supported #datascience #ds #ml #machinelearning #dvc #data
Title: P2: I have been read DVC documentation [2023-11-15 Wed]
- Model validation: for example, input/output and performance validation — all dependencies present for
inference to run, and model scores within thresholds.
✧ ❂ ❉ ✯ ✵
Terms:
- data registry :: git + dvc repository - for versioning of data and model files. The data itself is stored in
one or more /DVC remotes/\n#supported #datascience #ds #ml #machinelearning #dvc #data

Title: P1: I have been read DVC documentation [2023-11-15 Wed]
- allow to create pipiline, fix input and outputs, allow to avoid reruns.
- DVCLive tool for experiment tracking
- allow to create development server with shared and cached data, chached data may be shared between projects.

allow:
- Data validation: for example, validation against a schema or verifying pipeline consistency — correct
shapes, data types, etc.\n#supported #datascience #ds #ml #machinelearning #dvc #data

Title: P0: I have been read DVC documentation [2023-11-15 Wed]
Main features and terms of DVC from my notes:

DVC fetch data from external storages, codify data/models and provide reproducible pipelines.

features:
- allow to download data from supported sources and keep hash of files.
- versioning through codification - metafiles describing: datasets, ML artifacts, etc. to track.\n#supported #datascience #ds #ml #machinelearning #dvc #data

Title: P5: I have been read DVC documentation [2023-11-15 Wed]
- metrics :: feature of 'experiments' - allow compare results.
- cache :: hidden storage .dvc/cache

😶\n#supported #datascience #ds #ml #machinelearning #dvc #data
Title: P4: I have been read DVC documentation [2023-11-15 Wed]
(considered outdated) when any of their dependencies change.
- https://dvc.org/doc/user-guide/data-management/remote-storage#supported-storage-types
- output :: result of stage, tracked by DVC.
- parameters :: granular dependencies of stage, such as 'batch size', DVC can track any key/value pair in a supported
parameters file (params.yaml by default)\n#supported #supported #datascience #ds #ml #machinelearning #dvc #data
Remote Storage

Open-source version control system for Data Science and Machine Learning projects. Git-like experience to organize your data, models, and experiments.

Data Version Control · DVC
Title: P3: I have been read DVC documentation [2023-11-15 Wed]
- DVC remotes :: similar to Git remotes, used with /dvc push/ and /dvc pull/ commands. To add: /dvc remote/ to .dvc/config.
- stage :: processing step of pipeline. allow connecting code to its corresponding data input/dependencies and output.
- dependencies :: input for a stage. specified as paths in the dev field of '.dvc'. Stages are invalidated\n#supported #datascience #ds #ml #machinelearning #dvc #data
Title: P2: I have been read DVC documentation [2023-11-15 Wed]
- Model validation: for example, input/output and performance validation — all dependencies present for
inference to run, and model scores within thresholds.
✧ ❂ ❉ ✯ ✵
Terms:
- data registry :: git + dvc repository - for versioning of data and model files. The data itself is stored in
one or more /DVC remotes/\n#supported #datascience #ds #ml #machinelearning #dvc #data