data-engineering
Here are 405 public repositories matching this topic...
-
Updated
Jun 1, 2020 - JavaScript
-
Updated
Mar 13, 2020
Situation
When creating a package:
import quilt3
quilt3.config(default_remote_registry='s3://your-bucket')
p = quilt3.Package()
p.push("username/packagename")
The package name can be any string. In particular it may be e.g. fashion-mnist.
Why is it wrong?
I would like
-
Updated
Jun 1, 2020 - Python
-
Updated
Jun 1, 2020 - Jupyter Notebook
-
Updated
Mar 9, 2020 - Python
janitor.biology could do with a to_fasta function, I think. The intent here would be to conveniently export a dataframe of sequences as a FASTA file, using one column as the fasta header.
strawman implementation below:
import pandas_flavor as pf
from Bio.SeqRecord import SeqRecord
from Bio.Seq import Seq
from Bio import SeqIO
@pf.register_dataframe_method
def to_fasta(d-
Updated
Feb 19, 2019 - Python
In this example the generated table of contents doesn't link to the sections on the page, because the headers have anchor tags in them. These should be sanitized out.
-
Updated
Nov 29, 2018 - Java
-
Updated
Apr 20, 2020 - Python
-
Updated
Mar 5, 2020 - Python
-
Updated
May 6, 2020
Followon from #2324 but will target for 1.4 release - remove the unnecessary excusions
We should add some stuff to contributors.md. Something like:
- when opening a PR, feel free to immediately request a review, probably from @BenBirt or @lewish
- one reviewer is fine, add two or more though if you want to get something in faster / want more eyes reviewing
- after resolving a round of PR comments, hit the "re-request review" button
- once the PR is approved & you have resolved any
-
Updated
May 2, 2020 - Python
-
Updated
May 11, 2020
-
Updated
May 12, 2020 - Python
-
Updated
Mar 25, 2019
-
Updated
Aug 7, 2019 - Jupyter Notebook
-
Updated
Oct 30, 2019 - Scala
Ubuntu 16.04, Ansible 2.3.0
As per the readme, a directory should be created at /etc/ansible/hosts. However, the default Ansible inventory location is /etc/ansible/hosts. This means the default inventory location specified in /etc/ansible/ansible.cfg must be changed to some other location. It would be good to specify it in the readme to not confuse Ansible newcomers. Thanks!
-
Updated
Jan 15, 2020 - JavaScript
-
Updated
Jun 1, 2020 - Python
Expected Behavior
Consistent use of logging levels in Waimak. Minimal use of INFO level.
Actual Behavior
Many messages (especially in storage) and logged at INFO when DEBUG should be used.
Specifications
- Spark Version: 2.2
- Operating System: Linux
- Waimak Module: waimak-core, waimak-storage...
Update elements of GettingStarted.md based on snags and issues during development and testing
Currently, there are some examples in README.md with Elasticsearch queries and corresponding uptasticsearch code. That code is effectively pseudocode right now, as it references a fictional Elasticsearch cluster.
I think this could lead to a bad experience with the docs and lead people to walk away from the project and not come back.
I would love if someone changed those examples to be r
Improve this page
Add a description, image, and links to the data-engineering topic page so that developers can more easily learn about it.
Add this topic to your repo
To associate your repository with the data-engineering topic, visit your repo's landing page and select "manage topics."


Use Case
Please provide a use case to help us understand your request in context
The Kubernetes Job tasks in our task library mimic the Kubernetes API, but an expected 'normal' use case of them is composed of several steps, namely creating a namespaced job, polling for it to complete, and deleting the job at the end. Right now no task in the task library knows how to poll for job status, an