
Snowflake SnowPro Advanced: Data Engineer (DEA-C02) - DEA-C02 Exam Questions
QUESTION NO: 1
You have a table named 'TRANSACTIONS which is frequently queried by 'TRANSACTION_DATE and 'CUSTOMER ID. You want to define a clustering strategy for this table. You are aware that defining multiple clustering keys is possible. Given the following considerations, which of the following clustering strategies would provide the BEST performance AND minimize reclustering costs, assuming both columns have similar cardinality and are equally used in WHERE clauses? (Assume cost optimization is the most critical factor if performance difference is minimal.)
You have a table named 'TRANSACTIONS which is frequently queried by 'TRANSACTION_DATE and 'CUSTOMER ID. You want to define a clustering strategy for this table. You are aware that defining multiple clustering keys is possible. Given the following considerations, which of the following clustering strategies would provide the BEST performance AND minimize reclustering costs, assuming both columns have similar cardinality and are equally used in WHERE clauses? (Assume cost optimization is the most critical factor if performance difference is minimal.)
Correct Answer: A,B
Explanation: Only visible for Pass4Test members. You can sign-up / login (it's free).
QUESTION NO: 2
A data engineering team is using Snowflake's data lineage features, and they need to audit changes to data masking policies applied to a table named 'EMPLOYEES'. They want to identify when a masking policy was added, modified, or removed from specific columns.
What are the recommended Snowflake features or audit logs that the data engineering team could use to get these requirements?
A data engineering team is using Snowflake's data lineage features, and they need to audit changes to data masking policies applied to a table named 'EMPLOYEES'. They want to identify when a masking policy was added, modified, or removed from specific columns.
What are the recommended Snowflake features or audit logs that the data engineering team could use to get these requirements?
Correct Answer: A
Explanation: Only visible for Pass4Test members. You can sign-up / login (it's free).
QUESTION NO: 3
Snowpark DataFrame 'employee_df' contains employee data, including 'employee_id', 'department', and 'salary'. You need to calculate the average salary for each department and also retrieve all the employee details along with the department average salary.
Which of the following approaches is the MOST efficient way to achieve this?
Snowpark DataFrame 'employee_df' contains employee data, including 'employee_id', 'department', and 'salary'. You need to calculate the average salary for each department and also retrieve all the employee details along with the department average salary.
Which of the following approaches is the MOST efficient way to achieve this?
Correct Answer: C
Explanation: Only visible for Pass4Test members. You can sign-up / login (it's free).
QUESTION NO: 4
You are designing a Snowflake data pipeline that continuously ingests clickstream dat a. You need to monitor the pipeline for latency and throughput, and trigger notifications if these metrics fall outside acceptable ranges. Which of the following combinations of Snowflake features and techniques would be MOST effective for achieving this goal?
You are designing a Snowflake data pipeline that continuously ingests clickstream dat a. You need to monitor the pipeline for latency and throughput, and trigger notifications if these metrics fall outside acceptable ranges. Which of the following combinations of Snowflake features and techniques would be MOST effective for achieving this goal?
Correct Answer: C,E
Explanation: Only visible for Pass4Test members. You can sign-up / login (it's free).
QUESTION NO: 5
A company is using Snowflake's web app interface to manage its data'. A data engineer needs to create a new table, load data into it from a CSV file stored in an internal stage, and then grant SELECT privileges on the table to a specific role using the web app. Which sequence of actions within the Snowflake web app represents the most efficient and secure way to accomplish this task?
A company is using Snowflake's web app interface to manage its data'. A data engineer needs to create a new table, load data into it from a CSV file stored in an internal stage, and then grant SELECT privileges on the table to a specific role using the web app. Which sequence of actions within the Snowflake web app represents the most efficient and secure way to accomplish this task?
Correct Answer: C
Explanation: Only visible for Pass4Test members. You can sign-up / login (it's free).
QUESTION NO: 6
You have a base table 'ORDERS' with columns 'ORDER ID, 'CUSTOMER D', 'ORDER DATE, and 'ORDER AMOUNT'. You need to create a view that aggregates the total order amount per customer per month. However, for data governance purposes, you need to ensure that the view only shows data for the last 3 months. What is the MOST efficient and secure way to create this view in Snowflake?
You have a base table 'ORDERS' with columns 'ORDER ID, 'CUSTOMER D', 'ORDER DATE, and 'ORDER AMOUNT'. You need to create a view that aggregates the total order amount per customer per month. However, for data governance purposes, you need to ensure that the view only shows data for the last 3 months. What is the MOST efficient and secure way to create this view in Snowflake?
Correct Answer: B
Explanation: Only visible for Pass4Test members. You can sign-up / login (it's free).
QUESTION NO: 7
You are using Snowpipe to ingest data from Azure Blob Storage into a Snowflake table. You have successfully set up the pipe and configured the event notifications. However, you notice that duplicate records are appearing in your target table. After reviewing the logs, you determine that the same file is being processed multiple times by Snowpipe. Which of the following strategies can you implement to prevent duplicate data ingestion, assuming you cannot modify the source data in Azure Blob Storage to include a unique ID or timestamp?
You are using Snowpipe to ingest data from Azure Blob Storage into a Snowflake table. You have successfully set up the pipe and configured the event notifications. However, you notice that duplicate records are appearing in your target table. After reviewing the logs, you determine that the same file is being processed multiple times by Snowpipe. Which of the following strategies can you implement to prevent duplicate data ingestion, assuming you cannot modify the source data in Azure Blob Storage to include a unique ID or timestamp?
Correct Answer: E
Explanation: Only visible for Pass4Test members. You can sign-up / login (it's free).
QUESTION NO: 8
A data engineer is tasked with creating an external table that points to a directory in AWS S3 containing CSV files. The files have a header row and are comma-delimited. The engineer executes the following DDL statement:
A data engineer is tasked with creating an external table that points to a directory in AWS S3 containing CSV files. The files have a header row and are comma-delimited. The engineer executes the following DDL statement:
Correct Answer: C
Explanation: Only visible for Pass4Test members. You can sign-up / login (it's free).
QUESTION NO: 9
You have implemented a Snowpipe using auto-ingest to load data from an AWS S3 bucket. The pipe is configured to load data into a table with a 'DATE column ('TRANSACTION DATE'). The data files in S3 contain a date field in the format 'YYYYMMDD'. Occasionally, you observe data loading failures in Snowpipe with the error message indicating an issue converting the string to a date. The 'FILE FORMAT' definition includes 'DATE FORMAT = 'YYYYMMDD''. Furthermore, you are also noticing that after a while, some files are not being ingested even though they are present in the S3 bucket. How to effectively diagnose and resolve these issues?
You have implemented a Snowpipe using auto-ingest to load data from an AWS S3 bucket. The pipe is configured to load data into a table with a 'DATE column ('TRANSACTION DATE'). The data files in S3 contain a date field in the format 'YYYYMMDD'. Occasionally, you observe data loading failures in Snowpipe with the error message indicating an issue converting the string to a date. The 'FILE FORMAT' definition includes 'DATE FORMAT = 'YYYYMMDD''. Furthermore, you are also noticing that after a while, some files are not being ingested even though they are present in the S3 bucket. How to effectively diagnose and resolve these issues?
Correct Answer: A,C
Explanation: Only visible for Pass4Test members. You can sign-up / login (it's free).
QUESTION NO: 10
You are responsible for monitoring data quality in a Snowflake data warehouse. Your team has identified a critical table, 'CUSTOMER DATA, where the 'EMAIL' column is frequently missing or contains invalid entries. You need to implement a solution that automatically detects and flags these anomalies. Which of the following approaches, or combination of approaches, would be MOST effective in proactively monitoring the data quality of the 'EMAIL' column?
You are responsible for monitoring data quality in a Snowflake data warehouse. Your team has identified a critical table, 'CUSTOMER DATA, where the 'EMAIL' column is frequently missing or contains invalid entries. You need to implement a solution that automatically detects and flags these anomalies. Which of the following approaches, or combination of approaches, would be MOST effective in proactively monitoring the data quality of the 'EMAIL' column?
Correct Answer: A,C,E
Explanation: Only visible for Pass4Test members. You can sign-up / login (it's free).
QUESTION NO: 11
You are tasked with building a data pipeline to process image metadata stored in JSON format from a series of URLs. The JSON structure contains fields such as 'image_url', 'resolution', 'camera_model', and 'location' (latitude and longitude). Your goal is to create a Snowflake table that stores this metadata along with a thumbnail of each image. Given the constraints that you want to avoid downloading and storing the images directly in Snowflake, and that Snowflake's native functions for image processing are limited, which of the following approaches would be most efficient and scalable?
You are tasked with building a data pipeline to process image metadata stored in JSON format from a series of URLs. The JSON structure contains fields such as 'image_url', 'resolution', 'camera_model', and 'location' (latitude and longitude). Your goal is to create a Snowflake table that stores this metadata along with a thumbnail of each image. Given the constraints that you want to avoid downloading and storing the images directly in Snowflake, and that Snowflake's native functions for image processing are limited, which of the following approaches would be most efficient and scalable?
Correct Answer: A,E
Explanation: Only visible for Pass4Test members. You can sign-up / login (it's free).
QUESTION NO: 12
A data engineer is tasked with creating a Listing to share a large dataset stored in Snowflake. The dataset contains sensitive Personally Identifiable Information (PII) that must be masked for certain consumer roles. The data engineer wants to use Snowflake's dynamic data masking policies within the Listing to achieve this. Which of the following approaches is the MOST secure and maintainable way to implement this requirement, assuming that the consumer roles are pre-defined and known?
A data engineer is tasked with creating a Listing to share a large dataset stored in Snowflake. The dataset contains sensitive Personally Identifiable Information (PII) that must be masked for certain consumer roles. The data engineer wants to use Snowflake's dynamic data masking policies within the Listing to achieve this. Which of the following approaches is the MOST secure and maintainable way to implement this requirement, assuming that the consumer roles are pre-defined and known?
Correct Answer: A
Explanation: Only visible for Pass4Test members. You can sign-up / login (it's free).
QUESTION NO: 13
You are building a data pipeline that extracts data from a REST API, transforms it using Pandas DataFrames, and loads it into Snowflake. You need to implement error handling to gracefully handle network issues and API rate limits. Which of the following code snippets demonstrates the most robust approach to handle potential errors during data loading into Snowflake using the Python connector?

You are building a data pipeline that extracts data from a REST API, transforms it using Pandas DataFrames, and loads it into Snowflake. You need to implement error handling to gracefully handle network issues and API rate limits. Which of the following code snippets demonstrates the most robust approach to handle potential errors during data loading into Snowflake using the Python connector?

Correct Answer: B
Explanation: Only visible for Pass4Test members. You can sign-up / login (it's free).




