
Databricks Certified Professional Data Engineer - Databricks-Certified-Professional-Data-Engineer Exam Questions
QUESTION NO: 1
The business reporting team requires that data for their dashboards be updated every hour. The total processing time for the pipeline that extracts, transforms, and loads the data for their pipeline runs in 10 minutes. Assuming normal operating conditions, which configuration will meet their service-level agreement requirements with the lowest cost?
The business reporting team requires that data for their dashboards be updated every hour. The total processing time for the pipeline that extracts, transforms, and loads the data for their pipeline runs in 10 minutes. Assuming normal operating conditions, which configuration will meet their service-level agreement requirements with the lowest cost?
Correct Answer: D
Explanation: Only visible for Pass4Test members. You can sign-up / login (it's free).
QUESTION NO: 2
A data pipeline uses Structured Streaming to ingest data from kafka to Delta Lake. Data is being stored in a bronze table, and includes the Kafka_generated timesamp, key, and value. Three months after the pipeline is deployed the data engineering team has noticed some latency issued during certain times of the day.
A senior data engineer updates the Delta Table ' s schema and ingestion logic to include the current timestamp (as recoded by Apache Spark) as well the Kafka topic and partition. The team plans to use the additional metadata fields to diagnose the transient processing delays:
Which limitation will the team face while diagnosing this problem?
A data pipeline uses Structured Streaming to ingest data from kafka to Delta Lake. Data is being stored in a bronze table, and includes the Kafka_generated timesamp, key, and value. Three months after the pipeline is deployed the data engineering team has noticed some latency issued during certain times of the day.
A senior data engineer updates the Delta Table ' s schema and ingestion logic to include the current timestamp (as recoded by Apache Spark) as well the Kafka topic and partition. The team plans to use the additional metadata fields to diagnose the transient processing delays:
Which limitation will the team face while diagnosing this problem?
Correct Answer: A
Explanation: Only visible for Pass4Test members. You can sign-up / login (it's free).
QUESTION NO: 3
A workspace admin has created a new catalog called finance_data and wants to delegate permission management to a finance team lead without giving them full admin rights.
Which privilege should be granted to the finance team lead?
A workspace admin has created a new catalog called finance_data and wants to delegate permission management to a finance team lead without giving them full admin rights.
Which privilege should be granted to the finance team lead?
Correct Answer: D
Explanation: Only visible for Pass4Test members. You can sign-up / login (it's free).
QUESTION NO: 4
The security team is exploring whether or not the Databricks secrets module can be leveraged for connecting to an external database.
After testing the code with all Python variables being defined with strings, they upload the password to the secrets module and configure the correct permissions for the currently active user. They then modify their code to the following (leaving all other variables unchanged).

Which statement describes what will happen when the above code is executed?
The security team is exploring whether or not the Databricks secrets module can be leveraged for connecting to an external database.
After testing the code with all Python variables being defined with strings, they upload the password to the secrets module and configure the correct permissions for the currently active user. They then modify their code to the following (leaving all other variables unchanged).

Which statement describes what will happen when the above code is executed?
Correct Answer: E
Explanation: Only visible for Pass4Test members. You can sign-up / login (it's free).
QUESTION NO: 5
A data engineering team uses Databricks Lakehouse Monitoring to track the percent_null metric for a critical column in their Delta table. The profile metrics table ( prod_catalog.prod_schema.
customer_data_profile_metrics ) stores hourly percent_null values. The team wants to trigger an alert when the daily average of percent_null exceeds 5% for three consecutive days, while ensuring notifications are not spammed during sustained issues. Which SQL alert configuration achieves this goal while minimizing false positives and redundant notifications?
A data engineering team uses Databricks Lakehouse Monitoring to track the percent_null metric for a critical column in their Delta table. The profile metrics table ( prod_catalog.prod_schema.
customer_data_profile_metrics ) stores hourly percent_null values. The team wants to trigger an alert when the daily average of percent_null exceeds 5% for three consecutive days, while ensuring notifications are not spammed during sustained issues. Which SQL alert configuration achieves this goal while minimizing false positives and redundant notifications?
Correct Answer: D
Explanation: Only visible for Pass4Test members. You can sign-up / login (it's free).
QUESTION NO: 6
A junior member of the data engineering team is exploring the language interoperability of Databricks notebooks. The intended outcome of the below code is to register a view of all sales that occurred in countries on the continent of Africa that appear in the geo_lookup table.
Before executing the code, running SHOW TABLES on the current database indicates the database contains only two tables: geo_lookup and sales .

Which statement correctly describes the outcome of executing these command cells in order in an interactive notebook?
A junior member of the data engineering team is exploring the language interoperability of Databricks notebooks. The intended outcome of the below code is to register a view of all sales that occurred in countries on the continent of Africa that appear in the geo_lookup table.
Before executing the code, running SHOW TABLES on the current database indicates the database contains only two tables: geo_lookup and sales .

Which statement correctly describes the outcome of executing these command cells in order in an interactive notebook?
Correct Answer: B
Explanation: Only visible for Pass4Test members. You can sign-up / login (it's free).
QUESTION NO: 7
A data engineer wants to create a cluster using the Databricks CLI for a big ETL pipeline. The cluster should have five workers , one driver of type i3.xlarge, and should use the ' 14.3.x-scala2.12 ' runtime.
Which command should the data engineer use?
A data engineer wants to create a cluster using the Databricks CLI for a big ETL pipeline. The cluster should have five workers , one driver of type i3.xlarge, and should use the ' 14.3.x-scala2.12 ' runtime.
Which command should the data engineer use?
Correct Answer: A
Explanation: Only visible for Pass4Test members. You can sign-up / login (it's free).
QUESTION NO: 8
Which statement characterizes the general programming model used by Spark Structured Streaming?
Which statement characterizes the general programming model used by Spark Structured Streaming?
Correct Answer: E
Explanation: Only visible for Pass4Test members. You can sign-up / login (it's free).
QUESTION NO: 9
A data engineer is designing a pipeline in Databricks that processes records from a Kafka stream where late- arriving data is common.
Which approach should the data engineer use?
A data engineer is designing a pipeline in Databricks that processes records from a Kafka stream where late- arriving data is common.
Which approach should the data engineer use?
Correct Answer: D
Explanation: Only visible for Pass4Test members. You can sign-up / login (it's free).
QUESTION NO: 10
The data science team has created and logged a production using MLFlow. The model accepts a list of column names and returns a new column of type DOUBLE.
The following code correctly imports the production model, load the customer table containing the customer_id key column into a Dataframe, and defines the feature columns needed for the model.

Which code block will output DataFrame with the schema ' ' customer_id LONG, predictions DOUBLE ' ' ?
The data science team has created and logged a production using MLFlow. The model accepts a list of column names and returns a new column of type DOUBLE.
The following code correctly imports the production model, load the customer table containing the customer_id key column into a Dataframe, and defines the feature columns needed for the model.

Which code block will output DataFrame with the schema ' ' customer_id LONG, predictions DOUBLE ' ' ?
Correct Answer: B
Explanation: Only visible for Pass4Test members. You can sign-up / login (it's free).
QUESTION NO: 11
A data engineer wants to automate job monitoring and recovery in Databricks using the Jobs API. They need to list all jobs, identify a failed job, and rerun it.
Which sequence of API actions should the data engineer perform?
A data engineer wants to automate job monitoring and recovery in Databricks using the Jobs API. They need to list all jobs, identify a failed job, and rerun it.
Which sequence of API actions should the data engineer perform?
Correct Answer: B
Explanation: Only visible for Pass4Test members. You can sign-up / login (it's free).




