Pass4Test 10%OFF Discount Code

Snowflake SnowPro Advanced: Data Engineer (DEA-C02) - DEA-C02 Exam Questions

QUESTION NO: 1
You have a table named 'TRANSACTIONS which is frequently queried by 'TRANSACTION_DATE and 'CUSTOMER ID. You want to define a clustering strategy for this table. You are aware that defining multiple clustering keys is possible. Given the following considerations, which of the following clustering strategies would provide the BEST performance AND minimize reclustering costs, assuming both columns have similar cardinality and are equally used in WHERE clauses? (Assume cost optimization is the most critical factor if performance difference is minimal.)

A.

B.

C. Create two separate tables: one clustered by 'TRANSACTION DATE and another clustered by 'CUSTOMER ID', and use appropriate views to redirect queries to the correct table. D.

E.

Correct Answer: A,B

Explanation: Only visible for Pass4Test members. You can sign-up / login (it's free).

QUESTION NO: 2
A data engineering team is using Snowflake's data lineage features, and they need to audit changes to data masking policies applied to a table named 'EMPLOYEES'. They want to identify when a masking policy was added, modified, or removed from specific columns.
What are the recommended Snowflake features or audit logs that the data engineering team could use to get these requirements?

A. The Account Usage view 'POLICY REFERENCES coupled with 'QUERY HISTORY, filtering for 'ALTER TABLE MODIFY COLUMN SET MASKING POLICY statements and also comparing snapshots of the 'POLICY_REFERENCES' view over time. B. Snowflake's native Data Lineage feature automatically captures all changes to data masking policies without any additional configuration, and those changes are then available to the data steward through the user interface. C. The 'INFORMATION SCHEMA.POLICY REFERENCES view to determine what masking policies are currently in place. Then, combine that with the use of Snowflake's Alerting framework to get notified on the creation/removal of tables, and also on changes on the masking policies via SYSTEM$GET_PRIVILEGES() function. D. Snowflake event tables provide complete audit trail capabilities. These tables capture all the events including policies. E. The 'OBJECT DEPENDENCIES' view in the ACCOUNT USAGE schema will directly track changes related to masking policies applied to tables since that is the best place for lineage information.

Correct Answer: A

Explanation: Only visible for Pass4Test members. You can sign-up / login (it's free).

QUESTION NO: 3
Snowpark DataFrame 'employee_df' contains employee data, including 'employee_id', 'department', and 'salary'. You need to calculate the average salary for each department and also retrieve all the employee details along with the department average salary.
Which of the following approaches is the MOST efficient way to achieve this?

A. Use 'groupBV to get a dataframe containing average salary by department and then use a Python UDF to iterate through the 'employee_df and add the value to each row B. Use a correlated subquery within the SELECT statement to calculate the average salary for each department for each employee. C. Use the 'window' function with 'avg' to compute the average salary per department and include it as a new column in the original DataFrame. D. Create a separate DataFrame with average salaries per department, then join it back to the original DataFrame. E. Create a temporary table with average salaries per department, then join it back to the original DataFrame.

Correct Answer: C

Explanation: Only visible for Pass4Test members. You can sign-up / login (it's free).

QUESTION NO: 4
You are designing a Snowflake data pipeline that continuously ingests clickstream dat a. You need to monitor the pipeline for latency and throughput, and trigger notifications if these metrics fall outside acceptable ranges. Which of the following combinations of Snowflake features and techniques would be MOST effective for achieving this goal?

A. Use Snowflake's 'QUERY_HISTORY view to track query execution times and implement a scheduled task that queries this view, calculates latency and throughput, and sends email notifications using Snowflake's built-in email integration if thresholds are exceeded. B. Create a custom dashboard using a Bl tool that connects to Snowflake via JDBC/ODBC and visualizes data ingestion and processing metrics. Manually monitor the dashboard for anomalies. C. Use Snowflake's Event Tables and Event Notifications to capture events related to data ingestion and processing. Configure alerts based on event patterns that indicate latency or throughput issues. D. Rely on Snowflake's default resource monitors to track warehouse usage. If warehouse usage exceeds a certain threshold, assume there are performance issues and send a notification. E. Implement a combination of Snowflake Streams, Tasks, and external functions. Streams capture changes, Tasks process the changes, and external functions send notifications to a monitoring service when latency or throughput issues are detected.

Correct Answer: C,E

Explanation: Only visible for Pass4Test members. You can sign-up / login (it's free).

QUESTION NO: 5
A company is using Snowflake's web app interface to manage its data'. A data engineer needs to create a new table, load data into it from a CSV file stored in an internal stage, and then grant SELECT privileges on the table to a specific role using the web app. Which sequence of actions within the Snowflake web app represents the most efficient and secure way to accomplish this task?

A. 1. Use the Database Tables interface to create the new table using the table editor. 2. Upload the CSV file directly to the table using the 'Load Data' option. 3. Use the SQL worksheet to execute GRANT SELECT ON TABLE statement. B. 1. Use the Database Tables interface to create the new table using the table editor. 2. Use the Data Load Data wizard to load the CSV file. 3. Use the Database -> Tables interface, select the table, and use the 'Privileges' tab to grant SELECT privilege to the role. C. 1. Use the Database Tables interface to create the new table using the table editor. 2. Use the Data Load Data wizard to load the CSV file. 3. Use the SQL worksheet to execute GRANT SELECT ON TABLE statement. D. 1. Use the SQL worksheet to execute CREATE TABLE statement. 2. Use the Data Load Data wizard to load the CSV file. 3. Use the SQL worksheet to execute GRANT SELECT ON TABLE statement. E. 1. Use the SQL worksheet to execute CREATE TABLE statement. 2. Use the Database -> Tables interface, select the table, and use the 'Load Data' option to load the CSV file. 3. Use the Database -> Tables interface, select the table, and use the 'Privileges' tab to grant SELECT privilege to the role.

Correct Answer: C

Explanation: Only visible for Pass4Test members. You can sign-up / login (it's free).

QUESTION NO: 6
You have a base table 'ORDERS' with columns 'ORDER ID, 'CUSTOMER D', 'ORDER DATE, and 'ORDER AMOUNT'. You need to create a view that aggregates the total order amount per customer per month. However, for data governance purposes, you need to ensure that the view only shows data for the last 3 months. What is the MOST efficient and secure way to create this view in Snowflake?

A. Option A B. Option C C. Option E D. Option D E. Option B

Correct Answer: B

Explanation: Only visible for Pass4Test members. You can sign-up / login (it's free).

QUESTION NO: 7
You are using Snowpipe to ingest data from Azure Blob Storage into a Snowflake table. You have successfully set up the pipe and configured the event notifications. However, you notice that duplicate records are appearing in your target table. After reviewing the logs, you determine that the same file is being processed multiple times by Snowpipe. Which of the following strategies can you implement to prevent duplicate data ingestion, assuming you cannot modify the source data in Azure Blob Storage to include a unique ID or timestamp?

A. Modify the Azure Event Grid subscription configuration to filter events based on file size or creation time to avoid resending events for already processed files. B. Implement idempotent logic within a Snowflake stored procedure that is triggered by a task after the data is loaded by Snowpipe. The stored procedure should identify and remove duplicate rows based on all other columns in the table. C. Configure the Snowpipe definition with the 'PURGE = TRUE parameter. This will ensure that each file is only processed once. D. Use a data masking policy with the 'MASK' function to obfuscate duplicate records based on their similarity, making them effectively invisible to downstream queries. E. Create a Snowflake stream on the target table and use it to incrementally load data into a separate, deduplicated table using a merge statement with conditional logic to insert or update records based on a combination of columns.

Correct Answer: E

Explanation: Only visible for Pass4Test members. You can sign-up / login (it's free).

QUESTION NO: 8
A data engineer is tasked with creating an external table that points to a directory in AWS S3 containing CSV files. The files have a header row and are comma-delimited. The engineer executes the following DDL statement:

A. The statement will fail because external tables cannot be created directly from CSV files without defining a stage. B. The statement will fail because the 'LOCATION' is missing the protocol (e.g., 's3:/P). C. The statement will fail because the 'SKIP HEADER property is invalid and the 'skip_header' parameter should be used in the 'CREATE FILE FORMAT statement. D. The statement will fail because the FIELD_DELIMITER property should be 'DELIMITER. E. The statement will succeed if a file format named 'csv_format' with the specified properties already exists.

Correct Answer: C

Explanation: Only visible for Pass4Test members. You can sign-up / login (it's free).

QUESTION NO: 9
You have implemented a Snowpipe using auto-ingest to load data from an AWS S3 bucket. The pipe is configured to load data into a table with a 'DATE column ('TRANSACTION DATE'). The data files in S3 contain a date field in the format 'YYYYMMDD'. Occasionally, you observe data loading failures in Snowpipe with the error message indicating an issue converting the string to a date. The 'FILE FORMAT' definition includes 'DATE FORMAT = 'YYYYMMDD''. Furthermore, you are also noticing that after a while, some files are not being ingested even though they are present in the S3 bucket. How to effectively diagnose and resolve these issues?

A. The 'DATE FORMAT parameter is case-sensitive. Ensure it matches the case of the incoming data. Also, check the 'VALIDATION MODE and ERROR parameters to ensure error handling is appropriately configured for files with date format errors. For the files that are not ingested use 'SYSTEM$PIPE to find the cause of the issue. B. Snowflake's auto-ingest feature has limitations and may not be suitable for inconsistent data formats. Consider using the Snowpipe REST API to implement custom error handling and data validation logic. Monitor the Snowflake event queue to ensure events are being received. C. Verify that the 'DATE FORMAT is correct and that all files consistently adhere to this format. Check for corrupted files in S3 that may be preventing Snowpipe from processing subsequent files. Additionally, review the Snowpipe error notifications in Snowflake to identify the root cause of ingestion failures. Use 'SYSTEM$PIPE to troubleshoot the files not ingested D. The error could be due to invalid characters in the source data files. Implement data cleansing steps to remove invalid characters from the date fields before uploading to S3. For files not being ingested, check S3 event notifications for missing or failed events. E. The issue may arise if the time zone of the Snowflake account does not match the time zone of your data in AWS S3. Try setting the 'TIMEZONE parameter in the FILE FORMAT definition. For files that are not being ingested, manually refresh the Snowpipe with 'ALTER PIPE ... REFRESH'.

Correct Answer: A,C

Explanation: Only visible for Pass4Test members. You can sign-up / login (it's free).

QUESTION NO: 10
You are responsible for monitoring data quality in a Snowflake data warehouse. Your team has identified a critical table, 'CUSTOMER DATA, where the 'EMAIL' column is frequently missing or contains invalid entries. You need to implement a solution that automatically detects and flags these anomalies. Which of the following approaches, or combination of approaches, would be MOST effective in proactively monitoring the data quality of the 'EMAIL' column?

A. Use Snowflake's Data Quality features (if available) to define data quality rules for the 'EMAILS column, specifying acceptable formats and thresholds for missing values. Configure alerts to be triggered when these rules are violated. B. Implement a Streamlit application connected to Snowflake that visualizes the percentage of NULL and invalid 'EMAIL' values over time, allowing the team to manually monitor trends. C. Create a Snowflake Task that executes a SQL query to count NULL 'EMAIL' values and invalid 'EMAIL' formats (using regular expressions). The task logs the results to a separate monitoring table and alerts the team if the count exceeds a predefined threshold. D. Schedule a daily full refresh of the 'CUSTOMER DATA' table from the source system, overwriting any potentially corrupted data. E. Utilize an external data quality tool (e.g., Great Expectations, Deequ) to define and run data quality checks on the 'CUSTOMER DATA' table, integrating the results back into Snowflake for reporting and alerting.

Correct Answer: A,C,E

Explanation: Only visible for Pass4Test members. You can sign-up / login (it's free).

QUESTION NO: 11
You are tasked with building a data pipeline to process image metadata stored in JSON format from a series of URLs. The JSON structure contains fields such as 'image_url', 'resolution', 'camera_model', and 'location' (latitude and longitude). Your goal is to create a Snowflake table that stores this metadata along with a thumbnail of each image. Given the constraints that you want to avoid downloading and storing the images directly in Snowflake, and that Snowflake's native functions for image processing are limited, which of the following approaches would be most efficient and scalable?

A. Create a Python-based external function that fetches the JSON metadata and image from their respective URLs. The external function uses libraries like PIL (Pillow) to generate a thumbnail of the image and returns the metadata along with the thumbnail's Base64 encoded string within a JSON object. B. Create a Snowflake stored procedure that iterates through each URL, downloads the JSON metadata using 'SYSTEM$URL_GET, extracts the image URL from the metadata, downloads the image using 'SYSTEM$URL_GET , generates a thumbnail using SQL scalar functions, and stores the metadata and thumbnail in a Snowflake table. C. Create a Snowflake view that selects from a table containing the metadata URLs, using 'SYSTEM$URL GET to fetch the metadata. For each image URL found in the metadata, use a JavaScript UDF to generate a thumbnail. Embed the thumbnail into a VARCHAR column as a Base64 encoded string. D. Create a Snowflake external table that points to an external stage which holds the JSON metadata files. Develop a spark process to fetch image URL, create thumbnails and store as base64 encoded strings in an external stage, create a view using the external table and generated thumbnails data E. Store just the 'image_url' in snowflake. Develop a separate application using any programming language to pre generate the thumbnails and host those at publicly accessible URLs. Within Snowflake, create a view to generate the links for image and thumbnail using 'CONCAT.

Correct Answer: A,E

Explanation: Only visible for Pass4Test members. You can sign-up / login (it's free).

QUESTION NO: 12
A data engineer is tasked with creating a Listing to share a large dataset stored in Snowflake. The dataset contains sensitive Personally Identifiable Information (PII) that must be masked for certain consumer roles. The data engineer wants to use Snowflake's dynamic data masking policies within the Listing to achieve this. Which of the following approaches is the MOST secure and maintainable way to implement this requirement, assuming that the consumer roles are pre-defined and known?

A. Apply dynamic data masking policies directly to the base tables containing the PII and share these tables in the Listing. Policies should use the function to determine when to mask the data. B. Create multiple versions of the shared tables, each with different masking applied. The data engineer must manually manage which version each consumer can access. C. Create a view that applies conditional masking using 'CASE' statements based on the function and share the view in the Listing. D. Implement an external function that masks the data based on the consumer's role and share this function in the Listing. Use this external function in a view shared through the listing.

Correct Answer: A

Explanation: Only visible for Pass4Test members. You can sign-up / login (it's free).

QUESTION NO: 13
You are building a data pipeline that extracts data from a REST API, transforms it using Pandas DataFrames, and loads it into Snowflake. You need to implement error handling to gracefully handle network issues and API rate limits. Which of the following code snippets demonstrates the most robust approach to handle potential errors during data loading into Snowflake using the Python connector?

A. Option A B. Option C C. Option E D. Option D E. Option B

Correct Answer: B

Explanation: Only visible for Pass4Test members. You can sign-up / login (it's free).