Skip to content

Navigation Menu

Sign in
Appearance settings

Search code, repositories, users, issues, pull requests...

Provide feedback

We read every piece of feedback, and take your input very seriously.

Saved searches

Use saved searches to filter your results more quickly

Appearance settings

scraperflow/scraperflow

Open more actions menu

Repository files navigation

ScraperFlow - A Composable Workflow Framework

language build

Java CI with Gradle

ScraperFlow is a framework which enables flow-based programming in a declarative way. It is based on two main components: the core which translates the declarative description (JSON or YAML) into a format that is understood by the framework, and the actual nodes which can be used to construct a workflow. The architecture is plugin-based, so nodes can be implemented on their own and provided to the framework.

The main goal of this framework is to facilitate reuse of code (nodes) and help managing control flow of programs in an easy way (declarative workflow specification).

The workflow specification is statically checked to ensure that the configuration is well-typed against the composition of nodes.

Links

Documentation

The documentation can be found at the ScraperFlow Wiki.

Quickstart - Specification

A minimal specification that can be used for any of the quickstart sections:

start:
 - {f: log, log: hello world}

Quickstart - Docker

ScraperFlow is deployed to Dockerhub.

To use a ScraperFlow container once, use

docker run -v "$PWD":/rt -v "$PWD":/nodes -v "$PWD":/plugins -v "$PWD":/runtime-nodes --rm albsch/scraperflow:latest help

and place your workflow in the current workflow directory. '$PWD' can be changed to another working directory if needed. If custom nodes or plugins are to be supplied (like dev-nodes), place the jar(s) in the current working directory (or change '$PWD'), too.

Quickstart - Java

ScraperFlow is fully modularized.

Get the latest modular jar bundle and any plugin jar or additional node jars you like.

Place the additional plugin and node java modules in a var folder where the run script resides. Use the provided run script to run ScraperFlow.

ScraperFlow will look for workflows relative to the working directory.

Quickstart - Java Native

Execute ./gradlew installDist. This will install scraper in your home directory at ~/opt/scraperflow. A scraperflow start script can then be executed via ~/opt/scraperflow/scraperflow. Additional plugin jars can be put into ~/opt/scraperflow/var.

Quickstart - Development

Using

  gradle clean build codeCov

will

  • compile the project
  • test the project
  • package the project at application/build/distributions
  • generate code coverage report at build/reports/jacoco/codeCoverageReport/html/index.html

Specification parsers are plugins and need to be provided on the module path. Executing scraper in a IDE requires the module path to be extended with the following JVM parameter:

--add-modules ALL-MODULE-PATH

About

Data flow based java framework

Resources

License

Stars

Watchers

Forks

Packages

 
 
 

Contributors

Morty Proxy This is a proxified and sanitized view of the page, visit original site.