Skip to content

Navigation Menu

Sign in
Appearance settings

Search code, repositories, users, issues, pull requests...

Provide feedback

We read every piece of feedback, and take your input very seriously.

Saved searches

Use saved searches to filter your results more quickly

Appearance settings
Discussion options

Built a Scalable Data Platform on Azure & Databricks- DS4Earth

I’ve developed a cloud-native data platform focused on performance, scalability, and cost optimization.

Tech Stack:
API
Databricks-Medallion Arch (PySpark)
Delta Lake
Databricks Notebook and Pipeline
Tableau

Key Features:

Optimized data pipelines (30% faster)
Scalable architecture design
End-to-end analytics workflow

Repo: https://github.com/errajeshcs-pixel/DS4Earth

Would love feedback and suggestions!

You must be logged in to vote

Replies: 1 comment

Comment options

Hi, thanks for sharing DS4Earth—this is a really interesting and well-thought-out platform. I went through the overview and architecture, and here are some detailed thoughts that might help strengthen it further:

👍 What stands out
The problem statement is very relevant—fragmentation + lack of real-time processing is a real issue in climate data systems
Good use of modern data stack (Databricks Medallion, Azure, APIs)
Clear focus on scalability and cloud-native design
Including quantitative impact metrics is a strong plus
The end-to-end ownership (data → models → dashboards) is impressive
🔧 Suggestions / Areas to Improve

  1. Make use cases more concrete
    Right now, the platform is described at a high level (“climate intelligence”, “risk modeling”). It would be stronger if you explicitly show 2–3 real-world scenarios, for example:

Flood prediction for a specific region
Heatwave alerts for urban areas
Water quality or environmental anomaly detection

This helps readers quickly understand practical value.

  1. Clarify model performance metrics
    The statement:

“Model accuracy improved by 20% (7% to 95%)”

is a bit confusing and may raise questions. It would help to:

Specify the metric (accuracy, F1-score, RMSE, etc.)
Explain the baseline vs improved model
Use more realistic/traceable comparisons

  1. Add more depth to system architecture
    The current pipeline (Data աղբ → ETL → Medallion → AI → Dashboard) is solid but quite high-level.

You could strengthen it by briefly mentioning:

Streaming vs batch processing (e.g., real-time ingestion approach)
Handling of geospatial data (indexing, tiling, etc.)
Data quality / missing data strategies

Even a short paragraph here would add a lot of technical credibility.

  1. Make the product aspect more tangible
    It currently reads slightly more like a research/system design than a product.

Consider adding:

Sample API endpoint (e.g., /climate-risk?lat=...)
Dashboard screenshots or example outputs
A simple user flow (what a user actually does on the platform)

  1. Strengthen impact metrics
    You already mention:

200K+ data points/day
40% reduction in analysis time

You could make this even stronger by adding:

Prediction lead time improvement
Reduction in false positives
Any real-world decision impact (if applicable)

  1. Minor cleanup

There are a few repetitions (e.g., “AI-powered climate intelligence”)
“test test” at the end can be removed
Formatting of sections like “Research & Academic Validation” could be cleaner

You must be logged in to vote
0 replies
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
2 participants
Morty Proxy This is a proxified and sanitized view of the page, visit original site.