Google Cloud Data Engineer
Cloud Composer
DAG graphs to create pipelines, like 1 pipeline after the other.
Share resources within a company
Imagine a company having lots of analysts, each of them having a cloud project themselves, but they need to access common resources like logging data stored in Bigquery and GCS. How to manage this?
Also, if the admin want to see the queries run by the analysts on Bigquey, but doesn’t want the analysts to see each other’s query history, how to manage this?
Dataflow x Dataproc
When using Dataproc and the data is stored in GCS, one of the instances is IO heavy, what should we do?(Store the data to local)
How about when one of the instances is compute heavy, what should we do? (increase RAM/CPU)
SSD for compute instances.
Bigquery ML
A data scientist has built a model using Bigquery ML, how should we store and provide prediction data so that the prediction result can be served in real time?
ML Engine / Auto ML
Used for serve real time prediction in a serverless way.
Bigquery View
You can’t share view or table data at table level. Just dataset level.
Resource hierarchy
What to do when you find your company is running out of resource? (Too many queries that they went beyond quota). Set up a resource hierarchy and set up priority levels, change your plan to plain.