GCP: Own questions

Open-source distributed file system that provides high throughput access to application data by partitioning data across many machines. (abbreviation) HDFS

Framework for job scheduling and cluster resource management (task coordination) YARN

MapReduce; Operation to be performed in parallel on small portion of dataset, output of operation, operation to combine the results Map, key-value pair, reduce

Apache ecosystem; Data warehouse with HDFS storages and enable SQL like queries and MapReduce abstractions Hive

Apache ecosystem; High level scripting language for ETL workloads pig

Apache ecosystem; Framework for writing fast distributed programs for data processing and analysis such as MapReduce jobs with fast in-memory approach. Spark

Apache ecosystem; stream processing framework for bounded and unbounded sets. Has a message queue. Flink/kafka

Apache ecosystem; Programming model to define and execute data processing pipelines with ETL, batch and stream processing. Beam

IAM member types; single person, non-person (application), multiple people google account, service account, google group

IAM roles; 1. Owner, Editor viewer 2. Finer-grained control managed by GCP 3. Finer-grained control combination primitive roles, predefined roles, custom roles

IAM best practice; Use XX roles when they exist over YY roles. predefined, primitive

1. GCP monitoring, logging and diagnostics solution, Main functions; (D, E, M, A, T, L) stackdriver, debugger, error reporting, monitoring, alerting, tracing, logging

Concept; Primary objective is data analysis for large volumes of data, complex queries and uses data warehouses OLAP

Concept; Primary objective is data processing, manage databases and modifying data using simple queries OLTP

Concept; Stores data by row Row format

Concept; Stores data by column column format

Concept; Gives you infrastructures pieces such as VMs but you have to maintain it (abbreviation), GC option IaaS, Compute engine

Concept; Gives you infrastructure pieced togheter so you just can deploy your code on the platform (abbreviation), GC option PaaS, App engine

Compute choice, mainly used for; Websites, mobile apps, gaming backends, RESTful APIs, IoT apps app engine

Compute choice, mainly used for; containerised workloads, cloud-native distributed systems and hybrid applications kubernetes engine

Compute choice, mainly used for; Currently deployed and on-premise software that you want to run on the cloud, any workload requiring a specific OS or configuration compute engine

Preemtible VMs are around XX% cheaper and terminate after YY hours 80, 24

Fully managed block storage (SSD/HDD) suitable for VM/containers. persistent disk

Affordable object/blob storage suitable for e.g images, videos cloud storage

Storage class; High performance with none storage duration standard

Storage class; Access once per month or less. 30 day duration nearline

Storage class; Access infrequent data. 90 day duration coldline

Storage class; Lowest cost for backup and disaster recovery. 365 day duration archive

GCP service; Fully managed relational database for SQL (MySQL, PostgreSQL), not scalable and fits small GBs data. Good for YY workloads. cloud SQL, OLTP

GCP service; Mission-critical relational database. Combines benefits of relational and non-relational databases. Supports YY. cloud spanner, horizontal scaling

GCP service; Columnar database for high throughput and low latency e.g IoT, user analytics, time-series for non-structured key/value data. Has a row index known as a YY. Ideal for handling large amounts of data for a long period of time bigtable, row key

True/False; BigTable supports SQL queries? False

Bucket names must be globally unique

GCP service; Highly scalable NoSQL database for structured data for web and mobile applications cloud datastore

Storage choice; your data is unstructured cloud storage

Storage choice; your data is structured and you're doing transactional workload using NoSQL cloud datastore

Storage choice; your data is structured and you're doing transactional workload using SQL [One database, horizontal scalability] cloud SQL, cloud spanner

Storage choice; your data is structured and you're doing analytics workload [ms latency, s latency] cloud bigtable, bigquery

Dataproc provides XX using hadoop YY metric autoscaling, YARN

BQ can submit jobs by; [W, C, R] web UI, command-line tool, console, REST API

GCP service; Clean and transform data with UI cloud dataprep

IAM project role; Permissions for read-only actions that do not affect state, such as viewing (but not modifying) existing resources or data. viewer

IAM project role; Viewing the data and permissions for actions that modify state, such as changing existing resources. editor

IAM project role; Viewing the data, modify and change existing resources. Manage roles and set up billing for project owner

Default BigQuery encoding UTF-8

GCP service to use: Migrate hadoop job to google cloud; without rewrite, with rewrite cloud dataproc, maybe cloud dataflow

BigQuery streaming restrictions; max row size, max througput in records/s per project 1 MB, 100000

In BigQuery, use XX[S] and YY[A] to consolidate the data instead of splitting it up into smaller tables structs, arrays

ML model; Forecast a number linear regression

ML model; Classify with binary or multiclass options logistic regression

ML model; Recommend something matrix factorization

ML model; Explore data clustering

IoT data streaming expects [some/no] delays no

[True/False] You can write you own connectors with apache beam? And you can use pipelines using Java, Python and GO? True

[True/False] Dataflow can autoscale workers? True

[True/False] You can't connect google sheets data together with BigQuery data in data studio? False

A data lake is usually in a cloud storage bucket

Cleaned data is typically stored in the data warehouse

BigQuery can be seen as a serverless data warehouse

The ETL is usually done between the data XX and the data YY lake, warehouse

[True/False] On federated (external) queries on BigQuery, you get cacheing False

GCP service; Scales to GB and TB. Ideal for back-end database. Record based storage cloud SQL

GCP service; Scales to PB. Easily to connect external data sources for ingestion. Column based storage BigQuery

[True/False] Cloud SQL can be considered as RDBMS True

Fast in memory analysis in BigQuery BI engine

GCP service; A fully managed and highly scalable data discovery and metadata management service. cloud data catalog

GCP service; Fully managed service designed to help you discover, classify, and protect your most sensitive data Cloud data loss prevention

GCP service; A fully managed scalable workflow orchestration service. Can automate pipelines cloud composer

Cloud composer automated pipelines are written in python

Data storage and ETL options on GCP; you data is relational [S, SP] Cloud SQL, Cloud Spanner

Data storage and ETL options on GCP; you data is NoSQL [F, B] Cloud firestore, cloud bigtable

[True/False] A cloud bucket is associated with a certain region True

Storage; XX names can be set to private but YY names can't and should never be sensitive object, bucket

Cloud storage; google handles everything, and gives encryption keys to the encryption keys. You can also control what the top encryption key is CMEK

Cloud storage; Customer-supplied encryption keys CSEK

Bucket access control; allows you to use IAM alone to manage permissions. IAM applies permissions to all the objects contained inside the bucket or groups of objects with common name prefixes., [is/not recommended] uniform, is

Bucket access control; enables you to use IAM and ACLs together to manage permissions. [is/not recommended] fine-grained, not

Are a legacy access control system for Cloud Storage designed for interoperability with Amazon S3. [is/not recommended] ACLs, not

Workload type; Fast, reveal snapshot, simple query, 80% writes and 20% reads transactional

Workload type; Read the whole dataset, complex query, 20% writes and 80% read analytical

GCP service; NoSQL database built for global apps that lets you easily store, sync, and query data for your mobile and web apps - at global scale. Cloud firestore

GCP service; most cost effective to store relational data cloud SQL

GCP service; Big database with relational data requiring to be globally distributed cloud spanner

GCP service; Require hight throughput and ultra low latency for relational data cloud bigtable

Which GCP service is the most cost effective for relational data; BigQuery vs Bigtable BigQuery

GCP service; Good as a relational data lake since it can handle the 3rd parry RDBMS; MySQL, PostgreSQL, MS SQL server cloud SQL

You want XX data warehouse with scalable scale one

In BigQuery, permission are at XX level dataset

[True/False] You can authenticate IAM using gsuite and gmail? True

Concept; SQL query and look like a read-only table with more fine grained control to only share tables and not the whole dataset. view

[True/False] You can't run BigQuery to export data from a view False

Precomputed views that periodically cache the results of a query for increased performance and efficiency materialized views

[True/False] Cached queries on BigQuery are charged False

BigQuery can XX schemas but it's not 100% sure to work autodetect

BigQuery service which provides connectors and pre-built load jobs. Good for EL jobs [abreviation] DTS

BigQuery service; Can handle late data data backfill

[True/False] BigQuery supports user-defined functions in SQL and Javascript True

A BigQuery user-defined function (UDF) is stored as an XX and YY[can/can't] be shared object, can

Schema design; Stores data efficient and saves space normalized

Schema design; Allow duplicate field values for column. Is fast to query and can be easily parallised denormalized

Schema design; is not efficient for GROUP BY filtering denormalized

For RDBMS, JOINS are [expensive/inexpensive] expensive

A struct has the type XX in BigQuery RECORD

An array has the type XX in BigQuery REPEATED

Having super XX schemas improve performance since BigQuery is YY based wide, column

Function; Unpacks a field to a single row value UNNEST

Function; Aggregates field to array ARRAY_AGG

BigQueries ways of partitioning tables [T C, I T, I R] time column, ingestion time, integer range

[True/False] BigQuery does not support JSON parsing False

Use the SQL function XX to format values to the same format cast

GCP service; Fully managed, cloud-native, enterprise data integration service for quickly building and managing data pipelines cloud data fusion

Which GCP service is considered easiest to use since it has a simple GUI; cloud dataproc, cloud data fusion, cloud dataflow cloud data fusion

GCP services, ETL to solve data quality issues; recommended, latency/throughput issues, reusing spark pipelines, need for visual pipeline building [no cloud prefix] BigQuery/Dataflow, bigtable, dataproc, data fusion

Labels can be used on [D, T] datasets, tables

Cloud dataproc can [automatically/manually] scale you cluster automatically

GCP Service; Cloud dataproc can write with petabit bisection bandwidth to [C S, B, C B] cloud storage, bigquery, cloud bigtable

For optimising dataproc; make sure that the cluster region and storage region is close

For optimizing dataproc; does not use more than XX input files 10000

Storage option; Datastore. Unstructured data cloud storage

Storage option; Large amount of sparse data. HBase-compliant. Low latency and high scalability cloud bigtable

Storage option; Data warehouse. Storage API BigQuery

Since autoscaling exist, start with a XX cluster and it will YY if needed small, upscale

Data fusion component; used for handlings connections, transforms and data quality wrangler

Cloud data fusion is made for XX[streaming/batch] data batch

[True/False] Cloud data fusion wrangler can access data from other providers than GCP True

Color to indicate in airflow that a DAG has not ben run since a previous DAG failed pink

[True/False] A PCollection can both represent streaming and batch data? True

Dataflow handles late arriving data using "smart" data watermarking

Cloud dataflow; Handles parallel intermediate jobs such as filtering data, formatting, and extracting. And you will need to provide a YY ParDo, DoFn

XXByKey is more effective than GroupByKey since dataflow know how to parallelise it combine

Additional input to a ParDo transform e.g inject additional data at runtime side input

Dataflow SQL integrates with Apache Beam SQL and support XX syntax ZetaSQL

Processing streaming data; Pub/sub -> XX -> YY or ZZ [No prefix] dataflow, bigquery, bigtable

Pub/Sub is fast since it stores messages in multiple XX for up to YY days by default locations, 7

Apache beam window type; Non-overlapping intervals, Dataflow name fixed-time window, tumbling window

Apache beam window type; Used for computing i.e gives a time window every interval time, GCP name sliding window, hopping window

Apache beam window type; Minimum gap duration between windows e.g a website visit with bursting data session window

Cloud dataflow trigger; datetime stamp Event time

Cloud dataflow trigger; Triggers on the time a element is processed processing time

Cloud dataflow trigger; Condition of data contained in the element data-driven

Cloud dataflow trigger; Mix of different triggers composite

Acceptable time for data insight with BigQuery, Cloud Bigtable s, ms

For cloud Bigtable, you need the data to be pre-XX sorted

Optimizing cloud Bigtable; XX related data, YY data evenly, place ZZ values in the same row group, distribute, identical

To XX data results in worse performance for Bigtable, it should be YY [data quantity] little, >300 GB

Which is the most performance heavy work for BigQuery? I/O[number of columns] or Computing[function uses] I/O

BigQuery function to get back previous values LAG

BigQuery cached queries are stored for 24 hours

[True/False] WITH clauses inhibits BigQuery caching True

Self-join i.e join a table with itself is [good/bad] in BigQuery bad

Ordering i.e ORDER BY, has to be performed on XX worker a single

Approximate functions should be used if an error of around XX is tolerable 1%

Approximate function to find the top element APPROX_top_count

Ordering should always be the [first/last] thing you do last

To optimize BigQuery, but big tables on the [right/left] left

Methodology; Looking back at data to gain insight [abbreviation] BI

GCP service; Build, deploy, and scale ML models faster, with pre-trained and custom tooling within a unified AI platform vertex AI

GCP service; Train high-quality custom machine learning models with minimal effort and machine learning expertise AutoML

ML methodology; Break down text to tokens (word or sentences) and labels them syntactic analysis

ML methodology; Group text into negative, positive and neutral together with a score of how much it's expressed. sentiment analysis

[True/False] You can query out to a pandas dataframe using BigQuery using %%bigquery df, in notebook True

GCP service; An intelligent cloud data service to visually explore, clean, and prepare data for analysis and machine learning. Dataprep

GCP services for the "prepare" phase of a ML project [.p .w, .c] (without prefix) dataprep, dataflow, dataproc

GCP services for the "preprocess" phase of a ML project [.w, .c, .y] (without prefix) dataflow, dataproc, BigQuery

GCP service; Lets you work with human labelers to generate highly accurate labels for a collection of data that you can use in machine learning models. data labeling service

GCP service; Enable using machine learning pipelines to orchestrate complicated workflows running on Kubernetes. kubeflow

GCP service; Lets you create and execute machine learning models in BigQuery using standard SQL queries. Can iterate on models BigQuery ML

GCP service; Repository for ML components AI hub

You can import models from XX to BigQuery ML tensorflow

Pre-trained models only yield a good result if the data applied is [common/uncommon] common

[True/False] AutoML can train from zip files True

For AutoML, labels [can/can't] contain _ and [can/can't] contain special characters can, can't

For AutoML, custom models are [permanent/temporary] temporary

For an AutoML model, one can predict using an XX command and YY file curl, JSON

It's recommended to use [a single/multiple] ML models to solve a complicated problem multiple

For Auto ML vision; Images need to be encoded in XX, can maximal be of the file size YY base64, 30 MB

Auto ML vision can handle XX to YY labels 0, 20

Auto ML vision models works best if there are XX times more items of the most common label than the leas common 100

For Auto ML NLP; model will be deleted after XX if not used and after YY if used 60 days, 6 months

Auto ML NLP is for [structured/unstructured] text data while auto ML tables is for [structured/unstructured] text data unstructured, structured

For auto ML tables; the data can be between AA and BB million rows, CC and DD columns and must be <EE 1000, 100, 2, 1000, 100 GB

For auto ML tables prediction; the maximum input size for BQ table or multiple CSV files is XX and for a single CSV file, tie maximum size is YY 100 GB, 10 GB

ML; Remove features with XX or more null values 50%

ML; Decrease regularization => [increase/decrease] in overfitting decrease

Dataflow is for [known/unknown] data sizes unknown

Dataproc is for [known/unknown] data sizes known

HDFS [does/does not] scale well does not

Using persistent disk and HDFS cluster means the data is XX when the cluster is over lost

GCP service; recommended for time-series data bigtable

Stackdriver is now called google cloud's operations suite

[True/False] Firestore supports flexible schemas True

VM instances with additional security control Shielded VMs

In cloud firestore, having multiple indexes [do, /do not] lead to bigger filesize do

GCP service; Is a good replacement for Java ELT pipelines cloud dataflow

GCP service; Is a good replacement for MongoDB cloud firestore

GCP service; Is a Jupiter + VM service cloud datalab

A XX can be used to remove many forms of sensitive data such as government identifiers data loss prevention API

Transactional databases has [fixed/non fixed] schema fixed

XX databases are structured data stores analytical

You can use XX to access HBase in cloud Bigtable HBase API

Custom file formats should always be stored in XX cloud storage

A document provides for indexing on XX rather than single keys columns

XX models are designed to support drilling down and slicing and dicing OLAP

GCP service; is the only globally available OLTP database in GCP cloud spanner

Cloud Spanner has a row limit of [size] 4 GB

Cloud Spanner uses XX as export connector and can export to Apache YY and ZZ format dataflow, AVRO, CSV

Cloud Bigtable uses XX for import and export and can export to YY[C], Apache ZZ and AA[S] dataflow, cloud storage, Avro, SequenceFile

Cloud firestore indexes; Created by default for each property, indexes multiple values for an entity built-in, composite

Cloud firestore uses a SQL like language called GQL

For export from firestore to BQ, entities in the export has to have a consistent schema, property values larger than XX is truncated to XX 64 KB

Cloud firestore [is/is not] suitable for application requiring low-latency writes (< 10 ms) is not

XX SQL dialect is preferred for BigQuery standard

BigQuery can use CSV, AA[J], BB[A], CC[O], DD[P] JSON, Avro, ORC, Parquet

If a CSV file is not in XX, BigQuery will try to convert it but it might not be correct UTF-8

XX is the prefered format for loading data since the datablocks can be read in parallel. Avro

You can define an XX with streaming insert to BigQuery to detect duplicat. However, this will minimize YY insertID, throughput

In BQ, XX tables rather than joining tables denormalize

Fully managed kubernetes engine Cloud run

Managed Redis service used for caching Cloud Memorystore

XX is the successor to SSL i.e legacy TLS

Cloud spanner, maximum CPU utilization to target; for regional, for multiregional 65%, 45%

A Cloud Spanner node can handle XX of data [volume] 2 TB

Cloud Memorystore; expiration time for memory given in seconds TTL

Lifecycle policy will change the storage type on an object, but will not delete it. However, a XX policy can. data retention

[True/False] Loading data with similar filenames, e.g timestamps, can cause hotspot for cloud storage True

[CPU/Storage] When scaling nodes in cloud spanner, XX utilization is more important than YY utilization. CPU, Storage

BigQuery insert only supports [Fileformat] JSON

Is a metric on how well service-level objective is being met [Abbreviation] SLI

Is a U.S financial reporting regulation governing publicly traded companies [Abbreviation] SOX

Is a U.S healthcare regulation with data access and privacy rules [Abbreviation] HIPAA

Define responsibilities for delivering a service an consequences when they're not met. SLA

Is the number of actual positive cases that were correctly identified during ML training. Recall

Is a GCP ML tool to make chatbots Dialogflow

Is a fully managed service for securely connecting and managing IoT devices, from a few to millions. Can ingest data to other GCP services. IoT Core

N4 instance has [higher/lower] IoPs than an N1 higher

[True/False] Cloud dataflow does not supports python False

Repeated messages in Cloud Pub/Sub can be a sign of no message acknowledgment

A datapipeline captures each change as a source system capture and stores it in a data store. Change data capture

Is a serveless managed compute service for running code in response to events that occur in the cloud Cloud functions

Is a distributed architecture that is driven by business goals. Microservices are a variation of it [Abbreviation] SOA

[True/False] Cloud functions can run python scripts True

A XX can distribute work across region if it's not supported by the service (or specified in the question) global load balancer

Nested and repeated fields can be used to reduce the amount of XX in BigQuery JOINS

Consists of identically configured VMs groups and shall only be used when migrating legacy cluster from on-prem. [Abbreviation] MIGs

Is kubernetes' way of representing storage allocated or provisioned by a pod PersistentVolumes

The XX are used to designate kubernetes pods with unique identifiers StatefulSets

An XX is an object that controls external access to services running in kubernetes cluster ingress

GCP serveless services do not require conventional infrastructure provisioning but can be configured using .XX files in app engine yaml

Cloud functions are for XX processing. Not continually monitoring metrics event-driven

Is the command line interface for cloud Bigtable cbt

To start MIGs, the minimum and maximum number of XX togheter with an instance YY is required. instances, instance

Is Kubernetes' command line interface kubectl

For Kubernetes engine, the CPU utilization for the whole XX (not cluster) is used to scale the deployment deployment

For App Engine, file that; configures the runtime e.g python version app.yaml

For App Engine, file that; is used to configure task queues queue.yaml

For App Engine, file that; is used to override routing rules dispatch.yaml

For App Engine, file that; is used to schedule tasks cron.yaml

In stackdriver (monitoring), the retention time in days for; Admin activity audit logs, System event audit logs, Access transparency logs, Data access audit loggs 400, 400, 400, 30

Stackdriver service; is used to collect information about time to execute functions in a call stack Stackdriver Trace

Is google's own encyrption protocol for data in transit QUICC

Is a U.S act against collection of information online under the age 13 [Abbreviation] COPPA

Is a U.S program to promote a standard approach to assessment, authorisation and monitoring of cloud resource. [Abbreviation] FedRAMP

To run BigQuery queries, you need the role; role/BigQuery.XX jobUser

Entity & Kind => [GCP service] datastore, firestore

[True/False] Jobs with HDFS has to be rewritten to work on cloud Dataflow True

Dataflow; 1:1 relationship between input and output in python dataflow map

Dataflow; Non 1:1 relationship between input an output in python dataflow flatmap

Is BigQueries commandline tool bq

Use XX to transfer data from on-prem to cloud [CLI tool] gsutil

Use storage XX service when transferring data from another cloud transfer

AI platforms notebooks are now in vertex AI XX as well workbench

Pub/Sub + dataflow provides [in/out of] order processing in

The XX file system is where the actual data is store in cloud Bigtable colossus

Keep names [long/short] in Bigtable => reduces metadata short

In BigQuery XX is ordering of data in stored format, and is only supported on partitioned tables clustering

In BigQuery, XX queries are queries queued to run when the resources are available. Does not count toward concurrence limit batch

The only way to achieve strong consistency in cloud Bigtable is to have one replica solely for reads while the other replica is for XX. failover

In cloud spanner, XX on key can decrease hotspots hashing

[True/False] An UUID generator usually creates the UUID based on sequential data e.g time which can create hotspots in cloud spanner True

Universally unique identifiers = UUID

In Bigtable; you want [many/a few] tall and [wide/narrow] tables. a few, narrow

No specified ingestion => XX partitioning ingestion time

[True/False] Clustering keys does not need to be integer or timestamps, they can be data, bool, geography, INT64, numeric, string, timestamp True

Parquet [is/is not] supported in drive, but [is/is not] in cloud storage is not, is

Cloud functions [is/is not] a good compute service for event processing is

GCP service; provides storage with a filesystem accessed from compute engine and kubernetes engine Cloud Filestore

Cloud dataflow can transmit summaries every [time] minute

[True/False] A subscription can receive information from more than one topic. False

Pub/Sub pull requires XX for endpoint to pull via the API authorized credidentials

Pub/sub push requires endpoint to be reachable via XX and have YY certificate installed DNS, SSL

[True/False] Cloud Dataprep can be used to gain BI insight and see missing and misconfigured data True

Are the only formats supported for export in cloud Dataprep [C, J] CSV, JSON

In Datastudio, XX connectors are designed to query data from up to 5 sources blended

[True/False] Better to conda/pip install in jupyter notebook than cloud shell True

ML; Evaluates a model by splitting the data into K-segments k-fold validation

Modes with high bias tends to oversimplify models i.e underfit

Models with high variance tends to [underfit/overfit] overfit

You can use cloud dataproc + spark XX for machine learning MLib

Feature engineering can also reduce the XX to train besides improving accuracy time

Spark MLib includes XX for frequent pattern mining. BQ ML and AutoML does not! association rules

In CloudSQL, an external replica is more for XX purposes and don't add throughput backup

[True/False] Datasets in BigQuery are immutable so the location can't be updated True

You need to use XX as an intermediary to send BigQuery data to different regions cloud storage

What is most important for upscaling/downscaling; CPU or storage utilisation? CPU

Cloud IoT core is mainly for XX while Pub/sub + Dataflow and VertexAi handles the rest device management

Edge computing consists of edge device, XX device and cloud platform gateway

From an IoT device, data can also be sent by IoT core [M, S] MQTT, stackdriver

For high-precision arithmetics, use a GPU

Distributed training; enables synchronous distributed training on multiple GPUs on one machine. Each variable is mirrored across all GPUs MirroredStrategy

Distributed training; enables synchronous strategy which variables are not mirrored and both GPU and CPU are used. CentralStorageStrategy

Distributed training; enables synchronous distributed training on multiple GPUs on multiple machines. Each variable is mirrored across all GPUs MultiWorkerMirroredStrategy

A group of TPUs working togheter is a TPU pod

App engine is used for XX and should not be used for training machine learning models web applications

Anomaly detection is classified as [supervised/unsupervised] learning unsupervised

ML; is an algorithm to train binary classifiers based on artificial neurones perceptron

Principal component analysis is for XX dimension reduction

LXX regularisation should be chosen over LYY when you want less relevant features to have weights close to 0. 1, 2

ML; If you have an unbalanced dataset, use undersampling

ML; Area under curve = AUC

ML bias; human bias in favour of decisions made by machines over those made by humans automation bias

ML bias; when the dataset does not accurately reflect the state of the world. Reporting bias

ML bias; generalizes characteristic of an individual for the whole group. group attribution bias

GCP ML API; provides real-time analysis of time-series data and can provide anomaly detection cloud interface API

The cloud vision API supports a maximum XX images per batch 2000

For dialogflow; XX categories a speaker's intention for a single statement intents

For dialogflow; XX are nouns extracted from dialogs entities

For dialogflow; XX are used to connect a service to an integration fulfilments

For dialogflow; XX are applications that process end-user interactions such as deciding what to recommend. integrations

Google recommends a minimum sampling rate of XX Hz for speech-to-text 16000

[True/False] The gPRC API is only available with the advance version of translation API True

[True/False] For GCP translation API, there's a need to pass parameter into API when there's a special function call for translation e.g when importing the library to Python False

GCP serice equivalent; HBase Cloud Bigtable

GCP serice equivalent; redis Cloud Memorystore

GCP serice equivalent; Apache Beam, Apache Pig Cloud Dataflow

GCP serice equivalent; Apache airflow Cloud Composer

GCP serice equivalent; MongoDB Cloud Firestore

GCP serice equivalent; Apache Flink Cloud Dataflow

GCP serice equivalent; Cassandra Cloud Bigtable

GCP serice equivalent; Apache Kafka Pub/Sub

In stackdriver monitoring, only XX logg has a 30 days retention period compare to 400 days for the other loggs Data access audit

Click

Type

Listen

Games

Print