GCP: Own questions

The exercise was created 2021-10-19 by Pontusnord. Question count: 340.




Select questions (340)

Normally, all words in an exercise is used when performing the test and playing the games. You can choose to include only a subset of the words. This setting affects both the regular test, the games, and the printable tests.

All None

  • Open-source distributed file system that provides high throughput access to application data by partitioning data across many machines. (abbreviation) HDFS
  • Framework for job scheduling and cluster resource management (task coordination) YARN
  • MapReduce; Operation to be performed in parallel on small portion of dataset, output of operation, operation to combine the results Map, key-value pair, reduce
  • Apache ecosystem; Data warehouse with HDFS storages and enable SQL like queries and MapReduce abstractions Hive
  • Apache ecosystem; High level scripting language for ETL workloads pig
  • Apache ecosystem; Framework for writing fast distributed programs for data processing and analysis such as MapReduce jobs with fast in-memory approach. Spark
  • Apache ecosystem; stream processing framework for bounded and unbounded sets. Has a message queue. Flink/kafka
  • Apache ecosystem; Programming model to define and execute data processing pipelines with ETL, batch and stream processing. Beam
  • IAM member types; single person, non-person (application), multiple people google account, service account, google group
  • IAM roles; 1. Owner, Editor viewer 2. Finer-grained control managed by GCP 3. Finer-grained control combination primitive roles, predefined roles, custom roles
  • IAM best practice; Use XX roles when they exist over YY roles. predefined, primitive
  • 1. GCP monitoring, logging and diagnostics solution, Main functions; (D, E, M, A, T, L) stackdriver, debugger, error reporting, monitoring, alerting, tracing, logging
  • Concept; Primary objective is data analysis for large volumes of data, complex queries and uses data warehouses OLAP
  • Concept; Primary objective is data processing, manage databases and modifying data using simple queries OLTP
  • Concept; Stores data by row Row format
  • Concept; Stores data by column column format
  • Concept; Gives you infrastructures pieces such as VMs but you have to maintain it (abbreviation), GC option IaaS, Compute engine
  • Concept; Gives you infrastructure pieced togheter so you just can deploy your code on the platform (abbreviation), GC option PaaS, App engine
  • Compute choice, mainly used for; Websites, mobile apps, gaming backends, RESTful APIs, IoT apps app engine
  • Compute choice, mainly used for; containerised workloads, cloud-native distributed systems and hybrid applications kubernetes engine
  • Compute choice, mainly used for; Currently deployed and on-premise software that you want to run on the cloud, any workload requiring a specific OS or configuration compute engine
  • Preemtible VMs are around XX% cheaper and terminate after YY hours 80, 24
  • Fully managed block storage (SSD/HDD) suitable for VM/containers. persistent disk
  • Affordable object/blob storage suitable for e.g images, videos cloud storage
  • Storage class; High performance with none storage duration standard
  • Storage class; Access once per month or less. 30 day duration nearline
  • Storage class; Access infrequent data. 90 day duration coldline
  • Storage class; Lowest cost for backup and disaster recovery. 365 day duration archive
  • GCP service; Fully managed relational database for SQL (MySQL, PostgreSQL), not scalable and fits small GBs data. Good for YY workloads. cloud SQL, OLTP
  • GCP service; Mission-critical relational database. Combines benefits of relational and non-relational databases. Supports YY. cloud spanner, horizontal scaling
  • GCP service; Columnar database for high throughput and low latency e.g IoT, user analytics, time-series for non-structured key/value data. Has a row index known as a YY. Ideal for handling large amounts of data for a long period of time bigtable, row key
  • True/False; BigTable supports SQL queries? False
  • Bucket names must be globally unique
  • GCP service; Highly scalable NoSQL database for structured data for web and mobile applications cloud datastore
  • Storage choice; your data is unstructured cloud storage
  • Storage choice; your data is structured and you're doing transactional workload using NoSQL cloud datastore
  • Storage choice; your data is structured and you're doing transactional workload using SQL [One database, horizontal scalability] cloud SQL, cloud spanner
  • Storage choice; your data is structured and you're doing analytics workload [ms latency, s latency] cloud bigtable, bigquery
  • Dataproc provides XX using hadoop YY metric autoscaling, YARN
  • BQ can submit jobs by; [W, C, R] web UI, command-line tool, console, REST API
  • GCP service; Clean and transform data with UI cloud dataprep
  • IAM project role; Permissions for read-only actions that do not affect state, such as viewing (but not modifying) existing resources or data. viewer
  • IAM project role; Viewing the data and permissions for actions that modify state, such as changing existing resources. editor
  • IAM project role; Viewing the data, modify and change existing resources. Manage roles and set up billing for project owner
  • Default BigQuery encoding UTF-8
  • GCP service to use: Migrate hadoop job to google cloud; without rewrite, with rewrite cloud dataproc, maybe cloud dataflow
  • BigQuery streaming restrictions; max row size, max througput in records/s per project 1 MB, 100000
  • In BigQuery, use XX[S] and YY[A] to consolidate the data instead of splitting it up into smaller tables structs, arrays
  • ML model; Forecast a number linear regression
  • ML model; Classify with binary or multiclass options logistic regression
  • ML model; Recommend something matrix factorization
  • ML model; Explore data clustering
  • IoT data streaming expects [some/no] delays no
  • [True/False] You can write you own connectors with apache beam? And you can use pipelines using Java, Python and GO? True
  • [True/False] Dataflow can autoscale workers? True
  • [True/False] You can't connect google sheets data together with BigQuery data in data studio? False
  • A data lake is usually in a cloud storage bucket
  • Cleaned data is typically stored in the data warehouse
  • BigQuery can be seen as a serverless data warehouse
  • The ETL is usually done between the data XX and the data YY lake, warehouse
  • [True/False] On federated (external) queries on BigQuery, you get cacheing False
  • GCP service; Scales to GB and TB. Ideal for back-end database. Record based storage cloud SQL
  • GCP service; Scales to PB. Easily to connect external data sources for ingestion. Column based storage BigQuery
  • [True/False] Cloud SQL can be considered as RDBMS True
  • Fast in memory analysis in BigQuery BI engine
  • GCP service; A fully managed and highly scalable data discovery and metadata management service. cloud data catalog
  • GCP service; Fully managed service designed to help you discover, classify, and protect your most sensitive data Cloud data loss prevention
  • GCP service; A fully managed scalable workflow orchestration service. Can automate pipelines cloud composer
  • Cloud composer automated pipelines are written in python
  • Data storage and ETL options on GCP; you data is relational [S, SP] Cloud SQL, Cloud Spanner
  • Data storage and ETL options on GCP; you data is NoSQL [F, B] Cloud firestore, cloud bigtable
  • [True/False] A cloud bucket is associated with a certain region True
  • Storage; XX names can be set to private but YY names can't and should never be sensitive object, bucket
  • Cloud storage; google handles everything, and gives encryption keys to the encryption keys. You can also control what the top encryption key is CMEK
  • Cloud storage; Customer-supplied encryption keys CSEK
  • Bucket access control; allows you to use IAM alone to manage permissions. IAM applies permissions to all the objects contained inside the bucket or groups of objects with common name prefixes., [is/not recommended] uniform, is
  • Bucket access control; enables you to use IAM and ACLs together to manage permissions. [is/not recommended] fine-grained, not
  • Are a legacy access control system for Cloud Storage designed for interoperability with Amazon S3. [is/not recommended] ACLs, not
  • Workload type; Fast, reveal snapshot, simple query, 80% writes and 20% reads transactional
  • Workload type; Read the whole dataset, complex query, 20% writes and 80% read analytical
  • GCP service; NoSQL database built for global apps that lets you easily store, sync, and query data for your mobile and web apps - at global scale. Cloud firestore
  • GCP service; most cost effective to store relational data cloud SQL
  • GCP service; Big database with relational data requiring to be globally distributed cloud spanner
  • GCP service; Require hight throughput and ultra low latency for relational data cloud bigtable
  • Which GCP service is the most cost effective for relational data; BigQuery vs Bigtable BigQuery
  • GCP service; Good as a relational data lake since it can handle the 3rd parry RDBMS; MySQL, PostgreSQL, MS SQL server cloud SQL
  • You want XX data warehouse with scalable scale one
  • In BigQuery, permission are at XX level dataset
  • [True/False] You can authenticate IAM using gsuite and gmail? True
  • Concept; SQL query and look like a read-only table with more fine grained control to only share tables and not the whole dataset. view
  • [True/False] You can't run BigQuery to export data from a view False
  • Precomputed views that periodically cache the results of a query for increased performance and efficiency materialized views
  • [True/False] Cached queries on BigQuery are charged False
  • BigQuery can XX schemas but it's not 100% sure to work autodetect
  • BigQuery service which provides connectors and pre-built load jobs. Good for EL jobs [abreviation] DTS
  • BigQuery service; Can handle late data data backfill
  • [True/False] BigQuery supports user-defined functions in SQL and Javascript True
  • A BigQuery user-defined function (UDF) is stored as an XX and YY[can/can't] be shared object, can
  • Schema design; Stores data efficient and saves space normalized
  • Schema design; Allow duplicate field values for column. Is fast to query and can be easily parallised denormalized
  • Schema design; is not efficient for GROUP BY filtering denormalized
  • For RDBMS, JOINS are [expensive/inexpensive] expensive
  • A struct has the type XX in BigQuery RECORD
  • An array has the type XX in BigQuery REPEATED
  • Having super XX schemas improve performance since BigQuery is YY based wide, column
  • Function; Unpacks a field to a single row value UNNEST
  • Function; Aggregates field to array ARRAY_AGG
  • BigQueries ways of partitioning tables [T C, I T, I R] time column, ingestion time, integer range
  • [True/False] BigQuery does not support JSON parsing False
  • Use the SQL function XX to format values to the same format cast
  • GCP service; Fully managed, cloud-native, enterprise data integration service for quickly building and managing data pipelines cloud data fusion
  • Which GCP service is considered easiest to use since it has a simple GUI; cloud dataproc, cloud data fusion, cloud dataflow cloud data fusion
  • GCP services, ETL to solve data quality issues; recommended, latency/throughput issues, reusing spark pipelines, need for visual pipeline building [no cloud prefix] BigQuery/Dataflow, bigtable, dataproc, data fusion
  • Labels can be used on [D, T] datasets, tables
  • Cloud dataproc can [automatically/manually] scale you cluster automatically
  • GCP Service; Cloud dataproc can write with petabit bisection bandwidth to [C S, B, C B] cloud storage, bigquery, cloud bigtable
  • For optimising dataproc; make sure that the cluster region and storage region is close
  • For optimizing dataproc; does not use more than XX input files 10000
  • Storage option; Datastore. Unstructured data cloud storage
  • Storage option; Large amount of sparse data. HBase-compliant. Low latency and high scalability cloud bigtable
  • Storage option; Data warehouse. Storage API BigQuery
  • Since autoscaling exist, start with a XX cluster and it will YY if needed small, upscale
  • Data fusion component; used for handlings connections, transforms and data quality wrangler
  • Cloud data fusion is made for XX[streaming/batch] data batch
  • [True/False] Cloud data fusion wrangler can access data from other providers than GCP True
  • Color to indicate in airflow that a DAG has not ben run since a previous DAG failed pink
  • [True/False] A PCollection can both represent streaming and batch data? True
  • Dataflow handles late arriving data using "smart" data watermarking
  • Cloud dataflow; Handles parallel intermediate jobs such as filtering data, formatting, and extracting. And you will need to provide a YY ParDo, DoFn
  • XXByKey is more effective than GroupByKey since dataflow know how to parallelise it combine
  • Additional input to a ParDo transform e.g inject additional data at runtime side input
  • Dataflow SQL integrates with Apache Beam SQL and support XX syntax ZetaSQL
  • Processing streaming data; Pub/sub -> XX -> YY or ZZ [No prefix] dataflow, bigquery, bigtable
  • Pub/Sub is fast since it stores messages in multiple XX for up to YY days by default locations, 7
  • Apache beam window type; Non-overlapping intervals, Dataflow name fixed-time window, tumbling window
  • Apache beam window type; Used for computing i.e gives a time window every interval time, GCP name sliding window, hopping window
  • Apache beam window type; Minimum gap duration between windows e.g a website visit with bursting data session window
  • Cloud dataflow trigger; datetime stamp Event time
  • Cloud dataflow trigger; Triggers on the time a element is processed processing time
  • Cloud dataflow trigger; Condition of data contained in the element data-driven
  • Cloud dataflow trigger; Mix of different triggers composite
  • Acceptable time for data insight with BigQuery, Cloud Bigtable s, ms
  • For cloud Bigtable, you need the data to be pre-XX sorted
  • Optimizing cloud Bigtable; XX related data, YY data evenly, place ZZ values in the same row group, distribute, identical
  • To XX data results in worse performance for Bigtable, it should be YY [data quantity] little, >300 GB
  • Which is the most performance heavy work for BigQuery? I/O[number of columns] or Computing[function uses] I/O
  • BigQuery function to get back previous values LAG
  • BigQuery cached queries are stored for 24 hours
  • [True/False] WITH clauses inhibits BigQuery caching True
  • Self-join i.e join a table with itself is [good/bad] in BigQuery bad
  • Ordering i.e ORDER BY, has to be performed on XX worker a single
  • Approximate functions should be used if an error of around XX is tolerable 1%
  • Approximate function to find the top element APPROX_top_count
  • Ordering should always be the [first/last] thing you do last
  • To optimize BigQuery, but big tables on the [right/left] left
  • Methodology; Looking back at data to gain insight [abbreviation] BI
  • GCP service; Build, deploy, and scale ML models faster, with pre-trained and custom tooling within a unified AI platform vertex AI
  • GCP service; Train high-quality custom machine learning models with minimal effort and machine learning expertise AutoML
  • ML methodology; Break down text to tokens (word or sentences) and labels them syntactic analysis
  • ML methodology; Group text into negative, positive and neutral together with a score of how much it's expressed. sentiment analysis
  • [True/False] You can query out to a pandas dataframe using BigQuery using %%bigquery df, in notebook True
  • GCP service; An intelligent cloud data service to visually explore, clean, and prepare data for analysis and machine learning. Dataprep
  • GCP services for the "prepare" phase of a ML project [.p .w, .c] (without prefix) dataprep, dataflow, dataproc
  • GCP services for the "preprocess" phase of a ML project [.w, .c, .y] (without prefix) dataflow, dataproc, BigQuery
  • GCP service; Lets you work with human labelers to generate highly accurate labels for a collection of data that you can use in machine learning models. data labeling service
  • GCP service; Enable using machine learning pipelines to orchestrate complicated workflows running on Kubernetes. kubeflow
  • GCP service; Lets you create and execute machine learning models in BigQuery using standard SQL queries. Can iterate on models BigQuery ML
  • GCP service; Repository for ML components AI hub
  • You can import models from XX to BigQuery ML tensorflow
  • Pre-trained models only yield a good result if the data applied is [common/uncommon] common
  • [True/False] AutoML can train from zip files True
  • For AutoML, labels [can/can't] contain _ and [can/can't] contain special characters can, can't
  • For AutoML, custom models are [permanent/temporary] temporary
  • For an AutoML model, one can predict using an XX command and YY file curl, JSON
  • It's recommended to use [a single/multiple] ML models to solve a complicated problem multiple
  • For Auto ML vision; Images need to be encoded in XX, can maximal be of the file size YY base64, 30 MB
  • Auto ML vision can handle XX to YY labels 0, 20
  • Auto ML vision models works best if there are XX times more items of the most common label than the leas common 100
  • For Auto ML NLP; model will be deleted after XX if not used and after YY if used 60 days, 6 months
  • Auto ML NLP is for [structured/unstructured] text data while auto ML tables is for [structured/unstructured] text data unstructured, structured
  • For auto ML tables; the data can be between AA and BB million rows, CC and DD columns and must be <EE 1000, 100, 2, 1000, 100 GB
  • For auto ML tables prediction; the maximum input size for BQ table or multiple CSV files is XX and for a single CSV file, tie maximum size is YY 100 GB, 10 GB
  • ML; Remove features with XX or more null values 50%
  • ML; Decrease regularization => [increase/decrease] in overfitting decrease
  • Dataflow is for [known/unknown] data sizes unknown
  • Dataproc is for [known/unknown] data sizes known
  • HDFS [does/does not] scale well does not
  • Using persistent disk and HDFS cluster means the data is XX when the cluster is over lost
  • GCP service; recommended for time-series data bigtable
  • Stackdriver is now called google cloud's operations suite
  • [True/False] Firestore supports flexible schemas True
  • VM instances with additional security control Shielded VMs
  • In cloud firestore, having multiple indexes [do, /do not] lead to bigger filesize do
  • GCP service; Is a good replacement for Java ELT pipelines cloud dataflow
  • GCP service; Is a good replacement for MongoDB cloud firestore
  • GCP service; Is a Jupiter + VM service cloud datalab
  • A XX can be used to remove many forms of sensitive data such as government identifiers data loss prevention API
  • Transactional databases has [fixed/non fixed] schema fixed
  • XX databases are structured data stores analytical
  • You can use XX to access HBase in cloud Bigtable HBase API
  • Custom file formats should always be stored in XX cloud storage
  • A document provides for indexing on XX rather than single keys columns
  • XX models are designed to support drilling down and slicing and dicing OLAP
  • GCP service; is the only globally available OLTP database in GCP cloud spanner
  • Cloud Spanner has a row limit of [size] 4 GB
  • Cloud Spanner uses XX as export connector and can export to Apache YY and ZZ format dataflow, AVRO, CSV
  • Cloud Bigtable uses XX for import and export and can export to YY[C], Apache ZZ and AA[S] dataflow, cloud storage, Avro, SequenceFile
  • Cloud firestore indexes; Created by default for each property, indexes multiple values for an entity built-in, composite
  • Cloud firestore uses a SQL like language called GQL
  • For export from firestore to BQ, entities in the export has to have a consistent schema, property values larger than XX is truncated to XX 64 KB
  • Cloud firestore [is/is not] suitable for application requiring low-latency writes (< 10 ms) is not
  • XX SQL dialect is preferred for BigQuery standard
  • BigQuery can use CSV, AA[J], BB[A], CC[O], DD[P] JSON, Avro, ORC, Parquet
  • If a CSV file is not in XX, BigQuery will try to convert it but it might not be correct UTF-8
  • XX is the prefered format for loading data since the datablocks can be read in parallel. Avro
  • You can define an XX with streaming insert to BigQuery to detect duplicat. However, this will minimize YY insertID, throughput
  • In BQ, XX tables rather than joining tables denormalize
  • Fully managed kubernetes engine Cloud run
  • Managed Redis service used for caching Cloud Memorystore
  • XX is the successor to SSL i.e legacy TLS
  • Cloud spanner, maximum CPU utilization to target; for regional, for multiregional 65%, 45%
  • A Cloud Spanner node can handle XX of data [volume] 2 TB
  • Cloud Memorystore; expiration time for memory given in seconds TTL
  • Lifecycle policy will change the storage type on an object, but will not delete it. However, a XX policy can. data retention
  • [True/False] Loading data with similar filenames, e.g timestamps, can cause hotspot for cloud storage True
  • [CPU/Storage] When scaling nodes in cloud spanner, XX utilization is more important than YY utilization. CPU, Storage
  • BigQuery insert only supports [Fileformat] JSON
  • Is a metric on how well service-level objective is being met [Abbreviation] SLI
  • Is a U.S financial reporting regulation governing publicly traded companies [Abbreviation] SOX
  • Is a U.S healthcare regulation with data access and privacy rules [Abbreviation] HIPAA
  • Define responsibilities for delivering a service an consequences when they're not met. SLA
  • Is the number of actual positive cases that were correctly identified during ML training. Recall
  • Is a GCP ML tool to make chatbots Dialogflow
  • Is a fully managed service for securely connecting and managing IoT devices, from a few to millions. Can ingest data to other GCP services. IoT Core
  • N4 instance has [higher/lower] IoPs than an N1 higher
  • [True/False] Cloud dataflow does not supports python False
  • Repeated messages in Cloud Pub/Sub can be a sign of no message acknowledgment
  • A datapipeline captures each change as a source system capture and stores it in a data store. Change data capture
  • Is a serveless managed compute service for running code in response to events that occur in the cloud Cloud functions
  • Is a distributed architecture that is driven by business goals. Microservices are a variation of it [Abbreviation] SOA
  • [True/False] Cloud functions can run python scripts True
  • A XX can distribute work across region if it's not supported by the service (or specified in the question) global load balancer
  • Nested and repeated fields can be used to reduce the amount of XX in BigQuery JOINS
  • Consists of identically configured VMs groups and shall only be used when migrating legacy cluster from on-prem. [Abbreviation] MIGs
  • Is kubernetes' way of representing storage allocated or provisioned by a pod PersistentVolumes
  • The XX are used to designate kubernetes pods with unique identifiers StatefulSets
  • An XX is an object that controls external access to services running in kubernetes cluster ingress
  • GCP serveless services do not require conventional infrastructure provisioning but can be configured using .XX files in app engine yaml
  • Cloud functions are for XX processing. Not continually monitoring metrics event-driven
  • Is the command line interface for cloud Bigtable cbt
  • To start MIGs, the minimum and maximum number of XX togheter with an instance YY is required. instances, instance
  • Is Kubernetes' command line interface kubectl
  • For Kubernetes engine, the CPU utilization for the whole XX (not cluster) is used to scale the deployment deployment
  • For App Engine, file that; configures the runtime e.g python version app.yaml
  • For App Engine, file that; is used to configure task queues queue.yaml
  • For App Engine, file that; is used to override routing rules dispatch.yaml
  • For App Engine, file that; is used to schedule tasks cron.yaml
  • In stackdriver (monitoring), the retention time in days for; Admin activity audit logs, System event audit logs, Access transparency logs, Data access audit loggs 400, 400, 400, 30
  • Stackdriver service; is used to collect information about time to execute functions in a call stack Stackdriver Trace
  • Is google's own encyrption protocol for data in transit QUICC
  • Is a U.S act against collection of information online under the age 13 [Abbreviation] COPPA
  • Is a U.S program to promote a standard approach to assessment, authorisation and monitoring of cloud resource. [Abbreviation] FedRAMP
  • To run BigQuery queries, you need the role; role/BigQuery.XX jobUser
  • Entity & Kind => [GCP service] datastore, firestore
  • [True/False] Jobs with HDFS has to be rewritten to work on cloud Dataflow True
  • Dataflow; 1:1 relationship between input and output in python dataflow map
  • Dataflow; Non 1:1 relationship between input an output in python dataflow flatmap
  • Is BigQueries commandline tool bq
  • Use XX to transfer data from on-prem to cloud [CLI tool] gsutil
  • Use storage XX service when transferring data from another cloud transfer
  • AI platforms notebooks are now in vertex AI XX as well workbench
  • Pub/Sub + dataflow provides [in/out of] order processing in
  • The XX file system is where the actual data is store in cloud Bigtable colossus
  • Keep names [long/short] in Bigtable => reduces metadata short
  • In BigQuery XX is ordering of data in stored format, and is only supported on partitioned tables clustering
  • In BigQuery, XX queries are queries queued to run when the resources are available. Does not count toward concurrence limit batch
  • The only way to achieve strong consistency in cloud Bigtable is to have one replica solely for reads while the other replica is for XX. failover
  • In cloud spanner, XX on key can decrease hotspots hashing
  • [True/False] An UUID generator usually creates the UUID based on sequential data e.g time which can create hotspots in cloud spanner True
  • Universally unique identifiers = UUID
  • In Bigtable; you want [many/a few] tall and [wide/narrow] tables. a few, narrow
  • No specified ingestion => XX partitioning ingestion time
  • [True/False] Clustering keys does not need to be integer or timestamps, they can be data, bool, geography, INT64, numeric, string, timestamp True
  • Parquet [is/is not] supported in drive, but [is/is not] in cloud storage is not, is
  • Cloud functions [is/is not] a good compute service for event processing is
  • GCP service; provides storage with a filesystem accessed from compute engine and kubernetes engine Cloud Filestore
  • Cloud dataflow can transmit summaries every [time] minute
  • [True/False] A subscription can receive information from more than one topic. False
  • Pub/Sub pull requires XX for endpoint to pull via the API authorized credidentials
  • Pub/sub push requires endpoint to be reachable via XX and have YY certificate installed DNS, SSL
  • [True/False] Cloud Dataprep can be used to gain BI insight and see missing and misconfigured data True
  • Are the only formats supported for export in cloud Dataprep [C, J] CSV, JSON
  • In Datastudio, XX connectors are designed to query data from up to 5 sources blended
  • [True/False] Better to conda/pip install in jupyter notebook than cloud shell True
  • ML; Evaluates a model by splitting the data into K-segments k-fold validation
  • Modes with high bias tends to oversimplify models i.e underfit
  • Models with high variance tends to [underfit/overfit] overfit
  • You can use cloud dataproc + spark XX for machine learning MLib
  • Feature engineering can also reduce the XX to train besides improving accuracy time
  • Spark MLib includes XX for frequent pattern mining. BQ ML and AutoML does not! association rules
  • In CloudSQL, an external replica is more for XX purposes and don't add throughput backup
  • [True/False] Datasets in BigQuery are immutable so the location can't be updated True
  • You need to use XX as an intermediary to send BigQuery data to different regions cloud storage
  • What is most important for upscaling/downscaling; CPU or storage utilisation? CPU
  • Cloud IoT core is mainly for XX while Pub/sub + Dataflow and VertexAi handles the rest device management
  • Edge computing consists of edge device, XX device and cloud platform gateway
  • From an IoT device, data can also be sent by IoT core [M, S] MQTT, stackdriver
  • For high-precision arithmetics, use a GPU
  • Distributed training; enables synchronous distributed training on multiple GPUs on one machine. Each variable is mirrored across all GPUs MirroredStrategy
  • Distributed training; enables synchronous strategy which variables are not mirrored and both GPU and CPU are used. CentralStorageStrategy
  • Distributed training; enables synchronous distributed training on multiple GPUs on multiple machines. Each variable is mirrored across all GPUs MultiWorkerMirroredStrategy
  • A group of TPUs working togheter is a TPU pod
  • App engine is used for XX and should not be used for training machine learning models web applications
  • Anomaly detection is classified as [supervised/unsupervised] learning unsupervised
  • ML; is an algorithm to train binary classifiers based on artificial neurones perceptron
  • Principal component analysis is for XX dimension reduction
  • LXX regularisation should be chosen over LYY when you want less relevant features to have weights close to 0. 1, 2
  • ML; If you have an unbalanced dataset, use undersampling
  • ML; Area under curve = AUC
  • ML bias; human bias in favour of decisions made by machines over those made by humans automation bias
  • ML bias; when the dataset does not accurately reflect the state of the world. Reporting bias
  • ML bias; generalizes characteristic of an individual for the whole group. group attribution bias
  • GCP ML API; provides real-time analysis of time-series data and can provide anomaly detection cloud interface API
  • The cloud vision API supports a maximum XX images per batch 2000
  • For dialogflow; XX categories a speaker's intention for a single statement intents
  • For dialogflow; XX are nouns extracted from dialogs entities
  • For dialogflow; XX are used to connect a service to an integration fulfilments
  • For dialogflow; XX are applications that process end-user interactions such as deciding what to recommend. integrations
  • Google recommends a minimum sampling rate of XX Hz for speech-to-text 16000
  • [True/False] The gPRC API is only available with the advance version of translation API True
  • [True/False] For GCP translation API, there's a need to pass parameter into API when there's a special function call for translation e.g when importing the library to Python False
  • GCP serice equivalent; HBase Cloud Bigtable
  • GCP serice equivalent; redis Cloud Memorystore
  • GCP serice equivalent; Apache Beam, Apache Pig Cloud Dataflow
  • GCP serice equivalent; Apache airflow Cloud Composer
  • GCP serice equivalent; MongoDB Cloud Firestore
  • GCP serice equivalent; Apache Flink Cloud Dataflow
  • GCP serice equivalent; Cassandra Cloud Bigtable
  • GCP serice equivalent; Apache Kafka Pub/Sub
  • In stackdriver monitoring, only XX logg has a 30 days retention period compare to 400 days for the other loggs Data access audit

All None

Shared exercise

https://spellic.com/eng/exercise/gcp-own-questions.10640811.html

Share