ScraperFlow is a framework which enables flow-based programming in a declarative way. It is based on two main components: the core which translates the declarative description (JSON or YAML) into a format that is understood by the framework, and the actual nodes which can be used to construct a workflow. The architecture is plugin-based, so nodes can be implemented on their own and provided to the framework.
The main goal of this framework is to facilitate reuse of code (nodes) and help managing control flow of programs in an easy way (declarative workflow specification).
The workflow specification is statically checked to ensure that the configuration is well-typed against the composition of nodes.
- ScraperFlow Node Documentation
- The documentation contains all nodes, including extra nodes, not only the nodes in the core framework
- ScraperFlow Wiki
- ScraperFlow Editor (prototype, deprecated)
- Example Workflows
The documentation can be found at the ScraperFlow Wiki.
A minimal specification that can be used for any of the quickstart sections:
start:
- {f: log, log: hello world}ScraperFlow is deployed to Dockerhub.
To use a ScraperFlow container once, use
docker run -v "$PWD":/rt -v "$PWD":/nodes -v "$PWD":/plugins -v "$PWD":/runtime-nodes --rm albsch/scraperflow:latest help
and place your workflow in the current workflow directory. '$PWD' can be changed to another working directory if needed. If custom nodes or plugins are to be supplied (like dev-nodes), place the jar(s) in the current working directory (or change '$PWD'), too.
ScraperFlow is fully modularized.
Get the latest modular jar bundle and any plugin jar or additional node jars you like.
Place the additional plugin and node java modules in a var folder where the run script
resides.
Use the provided run script to run ScraperFlow.
ScraperFlow will look for workflows relative to the working directory.
Execute ./gradlew installDist. This will install scraper in your home
directory at ~/opt/scraperflow.
A scraperflow start script can then be executed via
~/opt/scraperflow/scraperflow.
Additional plugin jars can be put into ~/opt/scraperflow/var.
Using
gradle clean build codeCov
will
- compile the project
- test the project
- package the project at
application/build/distributions - generate code coverage report at
build/reports/jacoco/codeCoverageReport/html/index.html
Specification parsers are plugins and need to be provided on the module path. Executing scraper in a IDE requires the module path to be extended with the following JVM parameter:
--add-modules ALL-MODULE-PATH