data-engineering

Use Case

Please provide a use case to help us understand your request in context
The Kubernetes Job tasks in our task library mimic the Kubernetes API, but an expected 'normal' use case of them is composed of several steps, namely creating a namespaced job, polling for it to complete, and deleting the job at the end. Right now no task in the task library knows how to poll for job status, an

Situation

When creating a package:

import quilt3
quilt3.config(default_remote_registry='s3://your-bucket')
p = quilt3.Package()
p.push("username/packagename")

The package name can be any string. In particular it may be e.g. fashion-mnist.

Why is it wrong?

I would like

janitor.biology could do with a to_fasta function, I think. The intent here would be to conveniently export a dataframe of sequences as a FASTA file, using one column as the fasta header.

strawman implementation below:

import pandas_flavor as pf
from Bio.SeqRecord import SeqRecord
from Bio.Seq import Seq
from Bio import SeqIO

@pf.register_dataframe_method
def to_fasta(d

In this example the generated table of contents doesn't link to the sections on the page, because the headers have anchor tags in them. These should be sanitized out.

Followon from #2324 but will target for 1.4 release - remove the unnecessary excusions

We should add some stuff to contributors.md. Something like:

when opening a PR, feel free to immediately request a review, probably from @BenBirt or @lewish
one reviewer is fine, add two or more though if you want to get something in faster / want more eyes reviewing
after resolving a round of PR comments, hit the "re-request review" button
once the PR is approved & you have resolved any

If you make a bokeh plot, it will not show the bottom portion or the x-axis unless you put the window in fullscreen mode. Width is okay.

Ubuntu 16.04, Ansible 2.3.0
As per the readme, a directory should be created at /etc/ansible/hosts. However, the default Ansible inventory location is /etc/ansible/hosts. This means the default inventory location specified in /etc/ansible/ansible.cfg must be changed to some other location. It would be good to specify it in the readme to not confuse Ansible newcomers. Thanks!

Expected Behavior

Consistent use of logging levels in Waimak. Minimal use of INFO level.

Actual Behavior

Many messages (especially in storage) and logged at INFO when DEBUG should be used.

Specifications

Spark Version: 2.2
Operating System: Linux
Waimak Module: waimak-core, waimak-storage...

Update elements of GettingStarted.md based on snags and issues during development and testing

Currently, there are some examples in README.md with Elasticsearch queries and corresponding uptasticsearch code. That code is effectively pseudocode right now, as it references a fictional Elasticsearch cluster.

I think this could lead to a bad experience with the docs and lead people to walk away from the project and not come back.

I would love if someone changed those examples to be r

May	JUN	Jul
	01
2019	2020	2021

data-engineering

Here are 405 public repositories matching this topic...

PrefectHQ / prefect

Use Case

kantord / just-dashboard

adilkhash / Data-Engineering-HowTo

quiltdata / quilt

Situation

Why is it wrong?

awslabs / aws-data-wrangler

GoogleCloudPlatform / data-science-on-gcp

san089 / goodreads_etl_pipeline

ericmjl / pyjanitor

AlexIoannides / pyspark-example-project

kevintpeng / Learn-Something-Every-Day

Cascading / cascading

alexklibisz / elastik-nearest-neighbors

san089 / Udacity-Data-Engineering-Projects

sderosiaux / every-single-day-i-tldr

odpi / egeria

dataform-co / dataform

aiguofer / gspread-pandas

gunnarmorling / awesome-opensource-data-engineering

LGE-ARC-AdvancedAI / auptimizer

d6t / d6t-python

Leverege / gcp-data-engineer-exam

Flor91 / Data-engineering-nanodegree

swoop-inc / spark-alchemy

sernst / cauldron

InsightDataScience / ansible-playbook

polakowo / datadocs

Minyus / pipelinex

CoxAutomotiveDataSolutions / waimak

Expected Behavior

Actual Behavior

Specifications

finos / datahelix

uptake / uptasticsearch

Improve this page

Add this topic to your repo