% Off Udemy Coupon - CoursesWyn

Databricks Certified Data Engineer Professional -Preparation

UPDATED: 2026 | Preparation course for Databricks Data Engineer Professional certification exam with hands-on training

$11.99 (93% OFF)
Get Course Now

About This Course

<div>If you are interested in becoming a Certified Data Engineer Professional from Databricks, you have come to the right place! This study guide will help you with preparing for this certification exam.</div><div><br></div><div>By the end of this course, you should be able to:</div><div><br></div><div>1- Develop Code for Data Processing using Python and SQL</div><div><br></div><div>Using Python and Tools for development</div><div><ul><li><span style="font-size: 1rem;">Design and implement a scalable Python project structure optimized for Databricks Asset Bundles (DABs), enabling modular development, deployment automation, and CI/CD integration.</span></li><li><span style="font-size: 1rem;">Manage and troubleshoot external third-party library installations and dependencies in Databricks, including PyPI packages, local wheels, and source archives.</span></li><li><span style="font-size: 1rem;">Develop User-Defined Functions (UDFs) using Pandas/Python UDF</span></li></ul></div><div><br></div><div>Building and Testing an ETL pipeline with Lakeflow Declarative Pipelines, SQL, and Apache Spark on the Databricks platform</div><div><ul><li><span style="font-size: 1rem;">Build and manage reliable, production-ready data pipelines, for batch and streaming data using Lakeflow Declarative Pipelines and Autoloader.</span></li><li><span style="font-size: 1rem;">Create and Automate ETL workloads using Jobs via UI/APIs/CLI.</span></li><li><span style="font-size: 1rem;">Explain the advantages and disadvantages of streaming tables compared to materialized views.</span></li><li><span style="font-size: 1rem;">Use APPLY CHANGES APIs to simplify CDC in Lakeflow Declarative Pipelines.</span></li><li><span style="font-size: 1rem;">Compare Spark Structured Streaming and Lakeflow Declarative Pipelines to determine the optimal approach for building scalable ETL pipelines. ● Create a pipeline component that uses control flow operators (e.g. if/else, foreach, etc.)</span></li><li><span style="font-size: 1rem;">Choose the appropriate configs for environments and dependencies, high memory for notebook tasks, and auto-optimization to disallow retries.</span></li><li><span style="font-size: 1rem;">Develop unit and integration tests using assertDataFrameEqual, assertSchemaEqual, DataFrame.transform, and testing frameworks, to ensure code correctness, including a built-in debugger.</span></li></ul></div><div><br></div><div>2- Data Ingestion &amp; Acquisition:</div><div><ul><li><span style="font-size: 1rem;">Design and implement data ingestion pipelines to efficiently ingest a variety of data formats including Delta Lake, Parquet, ORC, AVRO, JSON, CSV, XML, Text and Binary from diverse sources such as message buses and cloud storage.</span></li><li><span style="font-size: 1rem;">Create an append-only data pipeline capable of handling both batch and streaming data using Delta.</span></li></ul></div><div><br></div><div>3- Data Transformation, Cleansing, and Quality</div><div><ul><li><span style="font-size: 1rem;">Write efficient Spark SQL and PySpark code to apply advanced data transformations, including window functions, joins, and aggregations, to manipulate and analyze large Datasets.</span></li><li><span style="font-size: 1rem;">Develop a quarantining process for bad data with Lakeflow Declarative Pipelines or autoloader in classic jobs.</span></li></ul></div><div><br></div><div>4- Data Sharing and Federation</div><div><ul><li><span style="font-size: 1rem;">Demonstrate delta sharing securely between Databricks deployments using Databricks to Databricks Sharing(D2D) or to external platforms using open sharing protocol(D2O).</span></li><li><span style="font-size: 1rem;">Configure Lakehouse Federation with proper governance across supported source Systems.</span></li><li><span style="font-size: 1rem;">Use Delta Share to share live data from Lakehouse to any computing platform.</span></li></ul></div><div><br></div><div>5- Monitoring and Alerting</div><div><ul><li><span style="font-size: 1rem;">Monitoring</span></li><li><span style="font-size: 1rem;">Use system tables for observability over resource utilization, cost, auditing and workload monitoring.</span></li><li><span style="font-size: 1rem;">Use Query Profiler UI and Spark UI to monitor workloads. ● Use the Databricks REST APIs/Databricks CLI for monitoring jobs and pipelines.</span></li><li><span style="font-size: 1rem;">Use Lakeflow Declarative Pipelines Event Logs to monitor pipelines.</span></li><li><span style="font-size: 1rem;">Alerting</span></li><li><span style="font-size: 1rem;">Use SQL Alerts to monitor data quality.</span></li><li><span style="font-size: 1rem;">Use the Workflows UI and Jobs API to set up job status and performance issue notifications.</span></li></ul></div><div><br></div><div>6- Cost &amp; Performance Optimisation</div><div><ul><li><span style="font-size: 1rem;">Understand how / why using Unity Catalog managed tables reduces operation Overhead and maintenance burden.</span></li><li><span style="font-size: 1rem;">Understand delta optimization techniques, such as deletion vectors and liquid clustering.</span></li><li><span style="font-size: 1rem;">Understand the optimization techniques used by Databricks to ensure the performance of queries on large datasets (data skipping, file pruning, etc).</span></li><li><span style="font-size: 1rem;">Apply Change Data Feed (CDF) to address specific limitations of streaming tables and enhance latency.</span></li><li><span style="font-size: 1rem;">Use query profile to analyze the query and identify bottlenecks, such as bad data kipping, inefficient types of joins, data shuffling.</span></li></ul></div><div><br></div><div>7- Ensuring Data Security and Compliance</div><div><ul><li><span style="font-size: 1rem;">Applying Data Security mechanisms.</span></li><li><span style="font-size: 1rem;">Use ACLs to secure Workspace Objects, enforcing the principle of least privilege, including enforcing principles like least privilege, policy enforcement.</span></li><li><span style="font-size: 1rem;">Use row filters and column masks to filter and mask sensitive table data.</span></li><li><span style="font-size: 1rem;">Apply anonymization and pseudonymization methods such as Hashing, Tokenization, Suppression, and Generalization to confidential data.</span></li><li><span style="font-size: 1rem;">Ensuring Compliance</span></li><li><span style="font-size: 1rem;">Implement a compliant batch &amp; streaming pipeline that detects and applies masking of PII to ensure data privacy.</span></li><li><span style="font-size: 1rem;">Develop a data purging solution ensuring compliance with data retention policies.</span></li></ul></div><div><br></div><div>8- Data Governance</div><div><ul><li><span style="font-size: 1rem;">Create and add descriptions/metadata about enterprise data to make it more discoverable.</span></li><li><span style="font-size: 1rem;">Demonstrate understanding of Unity Catalog permission inheritance model.</span></li></ul></div><div><br></div><div>9- Debugging and Deploying</div><div><ul><li><span style="font-size: 1rem;">Debugging and Troubleshooting</span></li><li><span style="font-size: 1rem;">Identify pertinent diagnostic information using Spark UI, cluster logs, system tables, and query profiles to troubleshoot errors.</span></li><li><span style="font-size: 1rem;">Analyze the errors and remediate the failed job runs with job repairs and parameter overrides.</span></li><li><span style="font-size: 1rem;">Use Lakeflow Declarative Pipelines event logs &amp; the Spark UI to debug Lakeflow Declarative Pipelines and Spark pipelines.</span></li><li><span style="font-size: 1rem;">Deploying CI/CD</span></li><li><span style="font-size: 1rem;">Build and Deploy Databricks resources using Databricks Asset Bundles.</span></li><li><span style="font-size: 1rem;">Configure and integrate with Git-based CI/CD workflows using Databricks Git Folders for notebook and code deployment.</span></li></ul></div><div><br></div><div>10- Data Modelling</div><div><ul><li><span style="font-size: 1rem;">Design and implement scalable data models using Delta Lake to manage large datasets.</span></li><li><span style="font-size: 1rem;">Simplify data layout decisions and optimize query performance using Liquid Clustering.</span></li><li><span style="font-size: 1rem;">Identify the benefits of using liquid Clustering over Partitioning and Z-Order.</span></li><li><span style="font-size: 1rem;">Design Dimensional Models for analytical workloads, ensuring efficient querying and aggregation.</span></li></ul></div><div><span style="font-size: 1rem;">With the knowledge you gain during this course, you will be ready to take the certification exam.</span></div><div><br></div><div>I am looking forward to meeting you!</div>

What you'll learn:

  • Learn how to model data management solutions on Databricks Lakehouse
  • Build data processing pipelines using the Spark and Delta Lake APIs
  • Understand how to use and the benefits of using the Databricks platform and its tools
  • Build production pipelines using best practices around security and governance
  • Learn how to monitor and log production jobs
  • Follow best practices for deploying code on Databricks