Data Flow Facilitator for Machine Learning (dffml) v0.4.0 Release Notes
Release Date: 2021-02-18 // almost 5 years ago-
โ Added
- ๐ New model for Anomaly Detection
- Ablity to specify maximum number of contexts running at a time
- CLI and Python example usage of Custom Neural Network
- ๐ PyTorch loss function entrypoint style loading
- ๐ Custom Neural Network, last layer support for pre-trained models
- Example usage of sklearn operations
- Example Flower17 species image classification
- Configloading ablity from CLI using "@" before filename
- ๐ Docstrings and doctestable example for DataFlowPreprocessSource
- XGBoost Regression Model
- Pre-Trained PyTorch torchvision Models
- Spacy model for NER
- Ability to rename outputs using GetSingle
- Tutorial for using NLP operations with models
- ๐ Operations plugin for NLP wrapping spacy and scikit functions
- ๐ Support for default value in a Definition
- Source for reading images in directories
- ๐ Operations plugin for image preprocessing
-prettyflag tolist recordsandpredictcommands- daal4py based linear regression model
- DataFlowPreprocessSource can take a config file as dataflow via the CLI.
- ๐ Support for link on conditions in dataflow diagrams
edit allcommand to edit records in bulk- ๐ Support for Tensorflow 2.2
- Vowpal Wabbit Models
- ๐ Python 3.8 support
- binsec branch to
operations/binsec - โ
Doctestable example for
model_predictoperation. - โ
Doctestable examples to
operation/mapping.py - shouldi got an operation to run Dependency-check on java code.
loadandrunfunctions in high level API- โ
Doctestable examples to
dboperations. - ๐ Source for parsing
.inifile formats - โ Tests for noasync high level API.
- โ Tests for load and save functions in high level API.
- 0๏ธโฃ
Operationinputs and outputs default to emptydictif not given. - Ability to export any object with
dffml service dev export - Complete example for dataflow run cli command
- โ Tests for default configs instantiation.
- Example ffmpeg operation.
- ๐ Operations to deploy docker container on receiving github webhook.
- ๐ New use case
Redeploying dataflow on webhookin docs. - ๐ Documentation for creating Source for new File types taking
.inias an example. - ๐ New input modes, output modes for HTTP API dataflow registration.
- Usage example for tfhub text classifier.
AssociateDefinitionoutput operation to map definition names to values produced as a result of passing Inputs with those definitions to operations.- DataFlows now have a syntax for providing a set of definitions that will override the operations default definition for a given input.
- Source which modifies record features as they are read from another source. Useful for modifying datasets as they are used with ML commands or editing in bulk.
- Auto create Definition for the
opwhen they might have a spec, subspec. shouldi usecommand which detects the language of the codebase given via path to directory or Git repo URL and runs the appropriate static analyzers.- ๐ Support for entrypoint style loading of operations and seed inputs in
dataflow create. - Definition for output of the function that
opwraps. - ๐ฆ Expose high level load, run and save functions to noasync.
- Operation to verify secret for GitHub webhook.
- Option to modify flow and add config in
dataflow create. - Ability to use a function as a data source via the
opsource - ๐ Make every model's directory property required
- ๐ New model AutoClassifierModel based on
AutoSklearn. - ๐ New model AutoSklearnRegressorModel based on
AutoSklearn. - Example showing usage of locks in dataflow.
-skipflag toservice dev installcommand to let users not install certain core plugins- HTTP service got a
-redirectflag which allows for URL redirection via a HTTP 307 response - ๐ Support for immediate response in HTTP service
- Daal4py example usage.
- Gitter chatbot tutorial.
- Option to run dataflow without sources from cli.
- โ Sphinx extension for automated testing of tutorials (consoletest)
- Example of software portal using DataFlows and HTTP service
- Retry parameter to
Operation. Allows for setting number of times operation should be retried before it's exception should be raised. ### ๐ Changed - ๐ Renamed
-seedto-inputsindataflow createcommand - ๐ Renamed configloader/png to configloader/image and added support for loading JPEG and TIFF file formats
- Update record
__str__method to output in tabular format - โก๏ธ Update MNIST use case to normalize image arrays.
- ๐
arg_notation replaced withCONFIG = ExampleConfigstyle syntax for parsing all command line arguments. - ๐ Moved usage/io.rst to docs/tutorials/dataflows/io.rst
editcommand substituted withedit record- ๐
Edit on Githubbutton now hidden for plugins. - โ Doctests now run via unittests
- Every class and function can now be imported from the top level module
opattempts to createDefinitions for each argument if aninputsare not given.- 0๏ธโฃ Classes now use
CONFIGif it has a default for every field andconfigisNone - Models now dynamically import third party modules.
- Memory dataflow classes now use auto args and config infrastructure
- ๐จ
dffml list recordscommand prints Records as JSON using.export() - ๐ Feature class in
dffml/feature/feature.pyinitialize a feature object - All DefFeatures() functions are substituted with Features()
- All feature.type() and feature.lenght() are substituted with feature.type and feature.length
- FileSource takes pathlib.Path as filename
- โ Tensorflow tests re-run themselves up to 6 times to stop them from failing the CI due to their randomly initialized weights making them fail ~2% of the time
- ๐ Any plugin can now be loaded via it's entrypoint style path
with_featuresnow raises a helpful error message if no records with matching features were found- Split out model tutorial into writing the model, and another tutorial for packaging the model.
- โ IntegrationCLITestCase creates a new directory and chdir into it for each test
- โ Automated testing of Automating Classification tutorial
- ๐จ
dffml versioncommand now prints git repo hash and if the repo is dirty ### ๐ Fixed export_valuenow converts numpy array to JSON serializable datatype- CSV source overwriting configloaded data to every row
- Race condition in
MemoryRedundancyCheckerwhen more than 4 possible parameter sets for an operation. - ๐ Typing of config values for numpy parsed docstrings where type should be tuple or list
- Model predict methods now use
SourcesContext.with_features### โ Removed - โ Monitor class and associated tests (unused)
- DefinedFeature class in
dffml/feature/feature.py - DefFeature function in
dffml/feature/feature.py - load_def function in Feature class in
dffml/feature/feature.py
Previous changes from v0.3.7
-
[0.3.7] - 2020-04-14
โ Added
- IO operations demo and
literal_evaloperation. - Python prompts
>>>can now be enabled or disabled for easy copying of code into interactive sessions. - Whitespace check now checks .rst and .md files too.
GetMultioperation which gets all Inputs of a given definition- โ Python usage example for LogisticRegression and its related tests.
- ๐ Support for async generator operations
- Example CLI commands and Python code for
SLRModel savefunction in high level API to quickly save all given records to a
source- ๐ง Ability to configure sources and models for HTTP API from command line when
starting server - ๐ Documentation page for command line usage of HTTP API
- Usage of HTTP API to the quickstart to use trained model
๐ Changed
- ๐ Renamed
"arg"to"plugin". - CSV source sorts feature names within headers when saving
- ๐ Moved HTTP service testing code to HTTP service
util.testing
๐ Fixed
- ๐ Exporting plugins
- ๐ Issue parsing string values when using the
dataflow runcommand and
specifying extra inputs.
โ Removed
- Unused imports
- IO operations demo and