Google Cloud Data Engineer

Yiling Liu
1 min readNov 4, 2019

--

Cloud Composer

DAG graphs to create pipelines, like 1 pipeline after the other.

Share resources within a company

Imagine a company having lots of analysts, each of them having a cloud project themselves, but they need to access common resources like logging data stored in Bigquery and GCS. How to manage this?

Also, if the admin want to see the queries run by the analysts on Bigquey, but doesn’t want the analysts to see each other’s query history, how to manage this?

Dataflow x Dataproc

When using Dataproc and the data is stored in GCS, one of the instances is IO heavy, what should we do?(Store the data to local)

How about when one of the instances is compute heavy, what should we do? (increase RAM/CPU)

SSD for compute instances.

Bigquery ML

A data scientist has built a model using Bigquery ML, how should we store and provide prediction data so that the prediction result can be served in real time?

ML Engine / Auto ML

Used for serve real time prediction in a serverless way.

Bigquery View

You can’t share view or table data at table level. Just dataset level.

Resource hierarchy

What to do when you find your company is running out of resource? (Too many queries that they went beyond quota). Set up a resource hierarchy and set up priority levels, change your plan to plain.

--

--

Yiling Liu
Yiling Liu

Written by Yiling Liu

Creative Technologist, ex-googler

No responses yet