GCP MLE

Övningen är skapad 2023-01-11 av Pontusnord. Antal frågor: 446.




Välj frågor (446)

Vanligtvis används alla ord som finns i en övning när du förhör dig eller spelar spel. Här kan du välja om du enbart vill öva på ett urval av orden. Denna inställning påverkar både förhöret, spelen, och utskrifterna.

Alla Inga

  • Minimum required datapoints to create a dataset in vertex AI 1000
  • [True/False] Feature definitions should change over time True
  • [True/False] Do not build monolithic models, make them small and simple True
  • [with/with not] Make an API for prediction for an ML model XX many inputs with not
  • [V., S., J., T., N., V I.] Googles pre-trained ML models vision API, speech API, jobs API, translation API, natural language API, video intelligence API
  • [better/worse] A simple ML model with lots of data to train is XX than a complex fancy model with little data to train with better
  • [B., S.] A ML pipeline should handle both XX and YY data. Not doing so is a common problem for model failures in practice batch, streaming
  • [D. C., B. I.] Steps in the ML workflow which usually takes the most time data collection, building infrastructure
  • Component of Vertex AI: Used to host data managed datasets
  • Component of Vertex AI: Used as a repository of features feature store
  • Component of Vertex AI: Used to have humans label your data data labeling
  • Component of Vertex AI: Used to host jupyter lab instances workbench
  • Component of Vertex AI: Automate, monitor, and govern ML systems pipelines
  • Component of Vertex AI: Can do both AutoML and custom training training
  • Component of Vertex AI: During experimentation, can be used as a black-box tool to tune hyperparameter for a model. Can use tensorpanel to compare vizier
  • Component of Vertex AI: Deploy a trained model. endpoints
  • [True/False] To host an model on Vertex AI endpoint, it has to have been trained on Vertex AI False
  • Component of Vertex AI: Hosts ML metadata and artifacts such as evaluation metrics ML metadata
  • For a deployed model enpoint, A/B testing can be conducted by tweaking the XX traffic split
  • According to Google, this type of model can be used for reinforcement learning, pattern recognition, self-driving cars, and cyber security GAN
  • [U., G.] On Vertex AI Notebooks, there are 2 types of notebook; XX managed and YY managed notebooks. user, google
  • Flow to train custom container models: XX -> YY -> ZZ -> Vertex Training dockerfile, cloud build, container registry
  • Store unstructed data such as image, video, and audio in XX and use YY cloud storage, data labeling
  • Best practice service to use for preprocessing tabular data bigquery
  • Best practice service to use for preprocessing unstructured data dataflow
  • Dataflow can convert data into binary data formats like TFRecord
  • Use the security blueprint in workbench notebooks to secure pii data
  • Human XX lead to XX in ML models since you choose the data to train with biases
  • Type of bias: You interact with the dataset which creates a bias in the data interaction bias
  • Type of bias: You have a class skewed dataset, e.g more men are firefighter => almost only men in the dataset latent bias
  • Type of bias: You fiddle with the dataset and remove specific data selection bias
  • Type of bias: Some of class goes unreported compare to how it is reporting bias
  • Type of bias: For human labeling when we label in how we percive the world confirmation bias
  • Biases can occur XX in the ML pipeline everywhere
  • [W. T., T.] Tools in GCP which can see fairness performance in GCP what-if tool, tensorboard
  • [Can, Can not] A confusion matrix XX be used to identify model biases can
  • Google confusion matrix; X-axis:, y-axis model predictions, true values
  • Confusion matrix component [abbreviation]: Predicted correct for the correct label TP
  • Confusion matrix component [abbreviation]: Predicted correct for the false label FP
  • Confusion matrix component [abbreviation]: Predicted false for the correct label FN
  • Confusion matrix component [abbreviation]: Predicted false for the false label TN
  • Precision = TP/(TP+FP)
  • Recall = TP/(TP+FN)
  • Visualize and gain insights such as classunbalance, summary statistics, and missing value for large datasets facets
  • GCP service: IaaS, raw compute, storage and network compute engine
  • GCP service: Containerized applications in a cluster Google kubernetes engine
  • GCP service: PaaS, bind code to libraries app engine
  • GCP service: Execute code in a response to events. Serverless cloud functions
  • Cloud storage class: For hot data used commonly standard
  • Cloud storage class: For data accessed one per month nearline
  • Cloud storage class: For data accessed every 90 days coldline
  • Cloud storage class: For data accessed once a year archive
  • Storage service: Unstructured data, blob storage cloud storage
  • Storage service: Structured data, transactional workloads, SQL, local scalable cloud SQL
  • Storage service: Structured data, transactional workloads, SQL, globally scalable cloud spanner
  • Storage service: Structured data, transactional workloads, NoSQL firestore
  • Storage service: Structured data, analytical workloads, NoSQL cloud bigtable
  • Ingestion and process service: No code, GUI solution datafusion
  • GCP Ai service: Pre-trained models to extract unstructured data to structured data document ai
  • Document AI component: Googles general model to analyse the data general
  • Document AI component: Googles specialized model to identify special data such as receipts, drivers licenses specialized
  • Document AI component: You create a model to extract data from an unstructed data source custom
  • GCP AI service: Ai powered contact center experience contact center ai
  • GCP Ai service: Rapidly generate healthcare insights and analytics with one end-to-end solution healthcare data engine
  • Dataflow can XX to meet a high demand in the pipeline autoscale
  • When building a datapipline, it is important to factor in if it is streaming and/or batch data, and what to do with XX data coming in. late
  • [J., P., G.] Languages with pipeline templates for Apache Beam/Dataflow java, python, go
  • [True/False] Dataflow is serverless and NoOps True
  • With BigQuery, you can pay as you go and use BQ flatrate
  • [True/False] In BigQuery, data is not encrypted at rest by default False
  • [C. M., F. M] Steps/commands required to create a BQML model create model, from ml.predict
  • General model class to choose: Supervised, classify data. E.g is email a spam? logistic regression
  • General model class to choose: Supervised, predict a number. E.g shoe sales next month? linear regression
  • General model calss to choose: Unsupervised, identify patterns and clusters. E.g grouping photos cluster analysis
  • BQML does automatically XX of categorical data and automatically YY the dataset into training and evaluation one-hot encoding, splits
  • [True/False] It is not mandatory to specify the model_type in BQML False
  • [B., P. A., A., C. T.] Ways to build ML models on GCP BQML, pre-built APIs, autoML, custom training
  • Compare to other GCP lowcode solutions, BQML only support this datatype tabular
  • Google ML solution: No training data pre-built APIs
  • Google ML solution: small to medium amount of data autoML
  • [B., C.] Google ML solution: Medium to large amount of data BQML, custom
  • [P. A., A.] Google ML solution: No options to choose hyperparameters pre-built APIs, autoML
  • Google ML solution: Medium options to choose hyperparameters BQML
  • Google ML solution: Lots of options to choose hyperparameters custom
  • Google ML solution: No time to train models pre-built APIs
  • [B., A.] Google ML solution: Medium time required to train BQML, autoML
  • Google ML solution: Long time required to train custom
  • Google ML solution: Familar with SQL and have data in BigQuery BQML
  • Google ML solution: Have little ML expertise pre-built APIs
  • Google ML solution: Want to build a custom model with own training data with minimal code autoMl
  • Google ML solution: Want to build a custom model with own training data and full controll custom
  • To get feature importance in BQML, one can inspect the XX e.g using SELECT * FROM ML.XX(MODEL `mydataset.mymodel`, (<query>)) weights
  • Pre-built APIs: Converts audio to text for data processing speech-to-text API
  • Pre-built APIs: Recognizes parts of speech called entities and sentiment natural language API
  • Pre-built APIs: Convert text from one language to another translation API
  • Pre-built APIs: Convert text into high-quality voice audio text-to-speech API
  • Pre-built APIs: Recognizes content in static images vision API
  • Pre-built APIs: Recognizes motion and action in videos video intelligence API
  • [S., M., C.] Common production challenges for ML models; scalability, monitoring, CI/CD
  • Google ML solution: Gives retailers the ability to provide google search quality recommendations retail product discovery
  • [A., C. T., F. S., V., E. A., P.] Vertex AI provides; To train, feature repository, tune hyperparameters, interpreting the data, monitor the ML pipelines autoML, custom training, feature store, vizier, explainable AI, pipelines
  • [MATLAB notation] Confusion matrix setup [TP, FN; FP, TN]
  • [Precision/Recall] Proritise catching alot of spam emails -> high recall
  • [Precision/Recall] Proritise to only catch spam emails -> high precision
  • GCP Feature stores is a centralised feature repository to server features at scale with low XX latency
  • A collection semantically related features entity type
  • In the feature store, each entity must have a unique XX and must be of type YY id, string
  • Is the process of importing feature values computed by feature engineering jobs into a featurestore feature ingestion
  • XX is the process of exporting features stored for training or inference. YY XX for high throughput and serving large volumes of data for offline processing. ZZ XX for low-latency data retrieval of small batches of data for real-time process. feature serving, batch, online
  • [S., R.] The feature store solves the common pain point of being hard to XX and YY features. share, reuse
  • For source data in the feature store, column name must be of type string
  • [C., A., B.] For source data in the feature store, the supported file formats/sources are; CSV, Avro, Bigquery
  • It is important to XX the data before doing any feature engineering cleaning
  • XX is the process of creating new improved features by combinding different features feature engineering
  • [N., Ca., B., Cr., E., H.] Feature types; numerical, categorical, bucketized, crossed, embedding, hashed
  • XX can be used to do automatic feature extraction since manual can be timeconsuming PCA
  • By using PCA to reduce the dimensions of your feature space, the model will be less likely to XX overfit
  • [Lo., La.] If possible, map raw data to numerical features e.g instead of using a street name, obtain and use the XX and YY longitude, lattitude
  • It is essential that a feature is known at XX time prediction
  • It is important that a numerical feature has meaningful meaning with its magnitude
  • [Should/Should not] Feature definitions XX change over time should
  • It is important to consider if there is a time XX for the feature, i.e if the data comes after 3 days, the model is for 3 days back delay
  • Words feature should be a XX so it can hold relationships to other words word vector
  • Rule of thumb: Each feature should have XX unique examples 5
  • It is important to consider if a XX feature should be one-hot encoded or left as is. E.g what to do if ratings are 1-5 and a user gives no rating? numeric
  • [ML/Statistics] Mindeset: Let's collect more on the outliers, create a separate model for them ML
  • [ML/Statistics] Mindeset: Let's exclude the outliers in our model statistics
  • [ML/Statistics] XX is usually best to use when you have a small ammount of data statistics
  • BQML has two types of feature preprocessing; Occurs during training, Uses the TRANSFORM clause to define the preprocessing automatic, manual
  • [True/False] BQML handles the data split True
  • BQML by default assumes that Numbers are XX features and String are YY features numerical, categorical
  • XX is a synthetic feature formed by multiplying two or more features. This also reduces the number of features reducing the risk of YY. feature crosses, overfitting
  • Feature type: Depends on space e.g distance spatial
  • Feature type: Depends on time e.g pickup time temporal
  • For temporal features, it is important to XX them, e.g using normalization scale
  • In Dataflow, each DAG inputs and outputs a Pcollection
  • [True/False] A PCollections does not store all of its data in memory, can be distributed over multiple servers where the data is stored. True
  • Since Dataflow is distributed, only local save when you have a XX node cluster one
  • [G., T.] Tensoflow is efficent since it enables XX and YY acceleration GPU, TPU
  • XX is the most efficent format for data in TensorFlow TFRecords
  • Finish ML preparation pipeline: Raw Data -> XX -> YY -> ZZ -> Model Traning data extraction, data analysis, data preparation
  • Meaning of EDA and CDA exploratory data analysis, classic data analysis
  • Flow for CDA: Problem -> Data -> XX -> YY -> Conclusion model, analysis
  • Flow for EDA: Problem -> Data -> XX -> YY -> Conclusion analysis, model
  • Bayesian statistics purpose is the determine XX probabilities based on YY probabilities and new information posterior, prior
  • EDA type: Simplest for analyzing data. For one variable univariate
  • EDA type: Used to find out if there are relationship between two variables bivariate
  • Regression evaluation metric: SUM(|y-y*|) / N = MAE
  • Regression evaluation metric: SUM((y-y*)^2) / N = MSE
  • Regression evaluation metric: SQRT(SUM((y-y*)^2) / N) = RMSE
  • Transform a linear regression model by adding a XX activation function to output between 0 and 1 to use as a logistic regression model sigmoid
  • By adding a XX term to the loss, overfitting can be combated regularization
  • By adding XX to the model training, overfitting can be combated early stopping
  • XX regularization will keep the wieght values smaller L2
  • XX regularization will keep the models sparser L1
  • [True/False] You can not use L1 and L2 regularization at the same time False
  • XX can bee seen as an equivalent replacement for YY regularization, and can therfore be used instead since it is computationally cheaper to compute. early stopping, L2
  • For logistic regression, use a XX plot of bucketed bias to find sliced where your model performs poorly calibration
  • [I., Ta., Te., V.] Raw data datatypes supported by AutoML image, tabular, text, video
  • AutoML default data split in %: training, validation, test 80, 10, 10
  • XX is the square of the Pearson correlation coefficent between the observed and predicted values. The higher the value indicates a YY quality model. R^2, higher
  • Classification Metric: Area under the precision recall curve PR AUC
  • Calssification Metric: Area under reciever operating charecteristic curve ROC AUC
  • Classification Metric: Cross entropy between the model predictions and the target values log loss
  • Classification Metric: Harmonic mean of precision and recall F1 score
  • In Vertex AI, batch predictions is XX meaning that the model will not return a result until it has processed all prediction request. asynchronus
  • In Vertex AI, online predictions is XX meaning that the model will quickly return a prediction, but only accepts one prediction request per API call. synchronous
  • AutoML tabular minimum requirements; Number of columns, rows 2, 1000
  • AutoML tabular maximum requirements; Number of columns, rows [millions], data size 1000, 100, 100 GB
  • In BQML, XX is very computational expensive and can therefore only be done on a flat-rate plan matrix factorization
  • RMSE is bad for categorical data, use XX instead cross-entropy
  • XX is the process of taking a small subset of the data for each step. Reducing the memory usage and is easier to parallelise. mini-batching
  • [True/False] Batch gradient descent uses a mini-batch and not the full data False
  • [Directly, Indirectly] Performance metrics should be XX connected to business goals why loss functions can be YY connected to the business goals. directly, indirectly
  • Other notation for False positive (FP) = XX error type I
  • Other notation for False negative(FN) = XX error type II
  • [Overfitting/Underfitting] Be aware of XX as you increase model complexity overfitting
  • ML technique: Takes different subset of the data for validation iteretevily to go thourgh all data. Then average the performance result. cross validation
  • Strategy for splitting data: Use when you have lot of data fixed splitting
  • Strategy for splitting data: Use when you have a small amount of data cross validation
  • You want a XX dataset when you build a model so you can quickly test the whole development pipeline small
  • Carefull with what field you split your data, it might become an unusable XX target
  • Create a repeatable 80% dataset in BQ. WHERE XX( YY( ZZ(date)), AA) < BB MOD, ABS, FARM_FINGERPRINT, 10, 8
  • Tensorflow is an open-source high-performance library for numberical computations that uses XX directed graphs
  • Tensorflow uses directed graphs since it makes it more XX and can be easily adapted to another device or language. portable
  • The lowest level of Tensorflow is called Core Tensorflow (XX), and you [can/can not] add your own code here C++, can
  • [S., D., C., V.] When you create a tensorflow tensor, you specify a XX, the YY, and if it is ZZ or a AA shape, data, constant, variable
  • Tensorflow: Records operations for automatic differentiation gradient tape
  • tf.data.Dataset allows you to; Create XX from in-memory dicts and list of YY and out-of-memory ZZ data files. data pipelines, tensors, sharded
  • tf.data.Dataset allows you to; Preprocess data in XX and YY results of costly operations parallel, cache
  • TF dataset consisting of; contains one or more text files, contains TFRecords, one or more binary file TextLineDataset, TFRecordDataset, FixedLengthRecordDataset
  • [<shortest>, <longest>] In tensorflow to create a dataset from in-memory tensors, use tf.data.Dataset.XX (for one element in dataset) or tf.data.Dataset.YY (for many elements in the dataset) from_tensors, from_tensor_slices
  • With XX + multithead loading, the CPUs thread keeps preparing the data for next batch while the GPU work prefetching
  • Feature column API take care of packing the inputs into the input vector of the model e.g by automatic XX categorical input values one-hot encoding
  • Feature column API function to create a categorical column: If you know the keys beforehand -> tf.feature_column.categorical_column_with_XX vocabulary_list
  • Feature column API function to create a categorical column: If your data is already indexed -> tf.feature_column.categorical_column_with_XX identity
  • Feature column API function to create a categorical column: If you do not have a vocabulary of all possible values -> tf.feature_column.categorical_column_with_XX hash_bucket
  • Tensorflow can directly operate on sparse tensors -> saves XX and YY time memory, computation
  • [Sparse/Dense] tf.feature_column.embedding_column represents data as a lower-dimensional XX vector dense
  • tf.keras.layers.Discretization turns continuous numerical features into XX data with descrete ranges. bucket
  • tf.keras.layers.XX turns String categorical values into an encoded representation that can be read by an embedding layer or Dense layer StringLookup
  • tf.keras.layers.XX turns Integer categorical values into an encoded representation that can be read by an embedding layer or Dense layer IntegerLookup
  • [In/Outside of, N., R.] When running on a TPU, you should always place preproccesing XX the tf.data pipeline. Except for the YY and ZZ operations which runs well on a TPU and are common in the first layer of an image model. in, normalization, rescaling
  • Activation function: f(x) = max(0, x), popular since 10 times faster than sigmoid ReLU
  • A problem with the normal ReLU activation function is that a layer can die if it only get inputs XX 0 <
  • Activation function: Combined sigmoid and ReLu to let it be smooth softplus
  • Activation function: ReLU but lets in some when the input is <0 leaky ReLU
  • Activation function: ReLU but have a parameter that controlls how much gets in when the input is <0 parametric ReLU
  • A Keras sequential model consists of XX layers and has YY input and ZZ output stacked, one, one
  • An example of a non sequential DNN is model with XX connections or a model with YY branches residual, multi
  • [Overfitting/Underfitting] The deeper the DNN network is, the more prone it is to XX overfitting
  • The ADAM optimizer is famous for being computational efficent giving it low XX requirements memory
  • [N., S.] The ADAM optimizer has problems with XX or YY gradients noisy, sparse
  • To train a keras.model, use history = model. fit
  • Keras: Datatype of predictions where, predictions = model.predict(input_samples, steps=1) numpy array
  • [Sparse/Dense, Correlated/Independent] Linear models are good for XX and YY features. DNNs are good for ZZ and AA features. sparse, independent, dense, correlated
  • In Keras, the XX API is more felxible than the sequential API since it can handle non-linear topologies functional
  • If a layer is XX, both models training data will help to train the layer i.e require more less data shared
  • [Lower/Higher] L1 regularization has a XX chance of making weights 0 higher
  • Tensorflow in Vertex AI can be used to do XX training distributed
  • For submitting a Vertex AI custom job using CLI, python-XX flag is a CSV file which lists cloud storage URIs specifying Python package files used to setup models. The maxium amount of CS URIs is YY package-uris, 100
  • For submitting a Vertex AI custom job using CLI for distributed training, specify multiple XX flags in the call compared to one for non distrbuted. worker-pool-spec
  • For submitting a Vertex AI custom job using CLI, command-line argmuents XX the commands in the config.yaml overrides
  • [True/False] ML systems can easily build up technical debt True
  • MLOps level of automation: Build and deploy manually 0
  • MLOps level of automation: Automate the training phase 1
  • MLOps level of automation: Automate training, validation and deployment 2
  • Build you own container using XX and retrieve code directly from Github, Cloud source repository and artifact registry cloud build
  • Orchestrates multiple containers, handled loadbalance and adapts to declared state kubernetes
  • [True/False] Kubernetes support both stateful applications such a nginx & apache web server, and stateful applications which stores session data persistently. True
  • Managed service for Kubernetes within Google Cloud Google kubernetes engine
  • [U., R.] Google Kubernete Engine can XX and YY nodes automatically. upgrade, repair
  • GCP Service: Fully fledged VM on GCP compute engine
  • GCP Service: Enables stateless containers via web request cloud run
  • A problem with autoscaling, called XX, is that the number of deployed replicas are fluctioating since the metric to spawn them is fluctionating. This can be combated using a YY for the deployment. thrasing, cooldown
  • Deployment strategy: Like blue green but it is rolled out gradually canary deployment
  • Deployment strategy: Two versions, but only one towards the users while their interactions are mirrored in the other version shadow testing
  • [True/False] Kubeflow/TFX can only be used with the Tensorflow ML framework False
  • Pre-built Kubeflow pipelines or pipeline components can be found and shared at the public XX AI hub
  • Compared to GKE, using XX, only one click is required to setup a ML pipeline vertex AI pipelines
  • MLOps framework: Lower-level, direct control of Kubernetes resources control kubeflow
  • MLOps framework: Higher-level abstractions. Prescriptive but customisable components with pre-defined ML types TFX
  • TFX brings Googles best practices for robust and XX ML workloads scalable
  • Using TFX XX, you can answer what data was used to train a specific model and what statistics the model has. lineage tracking
  • The top 3 features of Vertex AI pipelines is; 1) XX orchestration, 2) Rapid, reliable, repeatable YY 3) ZZ and re-use componentes workflow, experimentation, share
  • Vertex AI pipelines provides a visual XX interfaces where each block is an ML task that can be done in sequence and in YY. graphical, parallel
  • Googles cloud ML Python package to capture metrics for different hyperparameter, good for hyperparameter tuning. Import XX hypertune
  • For making a hyperparameter job, it is important to have the hyperparameters as XX to the model training. input parameters
  • When pushing a trained model to Vertex AI, it is important to first create a model XX, then a model YY, and lastly you can get predictions from it. object, version
  • [Dw., Dc.]Kubeflow pipelines containerizes implementations of ML tasks which can invoke services such as XX and YY. dataflow, dataproc
  • The steps in a Kubeflow pipeline can programatically be specified via the XX SDK Python
  • In an ML pipeline, it is common to have a metric XX which allows the model to be deployed to production threshold
  • Kubeflow component type: Just load the component from its description and compose pre-built
  • Kubeflow component type: The containerization is done for you, but you write the code in Python lightweight
  • Kubeflow component type: Write the component code and package it into a Docker container custom
  • Kubeflow Python function to wrap a function func into a prebuilt Docker container to make a lighweight kubeflow component. Kfp.components.XX(func, base_image=BASE_IMAGE) func_to_container_op
  • [True/False] To make a custom Kubeflow component, you can only use Python to create the model False
  • TFX is designed to make ML workflows XX between different enivroments portable
  • TFX deployment target: High performance servers for batch and streaming inference TF serving
  • TFX deployment target: For inference on IoT and mobile devices TF lite
  • TFX deployment target: For deployment to low latency web applications TF JS
  • TFX deployment target: For sharing models and trasfer learning TF hub
  • A TFX component is an implemented ML XX task
  • A TFX component produces and consumes XX artifacts
  • A data aware TFX pipeline can speedup model retraining by checking if XX is necessary between runs, or can be feteched from cache. recomputation
  • TFX standard component: Entry point for data ingest. Support splitting and partitioning. Inputs CSV, TF Records, Avro, Parquet. Outputs: TF YY and TF sequence YY example gen, examples
  • TFX standard component: Performs complete pass on data to gather summary statistics, e.g per feature. This includes; Mean, Standard deviation, Quantile ranges, null counts statistics gen
  • TFX standard component: Automatically generates schemas for TFX. Can also be used to create protobuffers schema gen
  • TFX standard component: Identifies anomolies in data and visualizes them. For example, it can detect YY by comparing traning and serving data and detect ZZ by looking at series of data for different data splits. example validator, train-serving skew, data drift
  • TFX standard component: Does preprocessing such as normalization, feature engineering, tf.Transform operations. transform
  • By bringing in XX to your TF graph, you can reduce train-serving skew from differences in feature engineering, which is one of the largest sources of error in production ML systems. feature engineering
  • TFX standard component: Trains a TF model. Produces at least one saved TF model which can be shared. trainer
  • XX is an easy-to-use, scalable hyperparameter optimization framework that solves the pain points of hyperparameter search for a Keras model keras tuner
  • TFX standard component: Uses Keras tuner API to tune hyperparameters. Outputs a hyperparameter artifact. Typically, you only run it one time. tuner
  • TFX standard component: Visualizes model evaluation. Outputs evaluation metrics and a model blessing to show if it is fit for production. evaluator
  • TFX standard component: Validates the model in the model infrastructure. Prevents bad models from being pushed. Output a model blessing if it is fit for production. infraValidator
  • TFX standard component: Pushes the model to production. Inputs a model blessing. Can deploy to different target e.g TF lite, TF JS, TF serving pusher
  • TFX standard component: Does batch inference on TFRecords on an exported model. Can do it remote in the cloud or local in-memory bulkinferer
  • [First/Latest] If no model has been blessed in the TFX pipeline, the Resolver node will make the XX model blessed first
  • TFX library: Monitor ML development at scale. See summary statistics, data distributions, compare different datasets, do anomoly detection and automatically generate schemas. tensorflow data validation
  • TFX library: Preprocess and do feature engineering with TF. Useful for distributed compute and can use Apache Beam. Will automatically adapt to TF/ML YY practices. tensorflow transform, best
  • TFX library: Visualize data about ML experimentation. Graph of metrics, view of weights, and can easily be shared. tensorboard
  • TFX libarary: Do model evaluation. Can incorproate Fairness indicators for responsible AI development. Can view common AI fairness metrics. tensorflow model analysis
  • TFX evaluation libraries: XX during training, YY after training and is [less/more] granualar tensorboard, tensorflow model analysis, more
  • ML experiments typically starts in a XX instance notebook
  • TFX ML orchestrators are XX and therefore, the pipelines can run both on-prem and on GCP portable
  • No mather what TFX orchestrator you choose, TFX will produces the same standard XX for the graph. DAGs
  • [A. A., K. P., A. B.] TFX supports the following orchestrators; apache airflows, kubeflow pipelines, apache beam
  • GCP fully managed implementation of Apache airflow cloud composer
  • On GCP, TFX runs on XX pipeline which in turn runs on YY kubeflow, google kubernetes engine
  • A TFX custom component can be created by making a: XX function, YY, or by ZZ existing component classes. python, container, extending
  • To create a TFX custom component from a Python function, the XX decorater and input/output YY are needed. @component, type hints
  • By using TFX custom componentes from XX, other languages other than Python can be used to define them. containers
  • [True/False] You can extend TFX components to work with other database systems such as presto, hive, snowflake or Oracle. True
  • TFX part which contains: trace on what data was used per model. Cache outpus of components so they do not have to be rerun again. Enable retraining from checkpoint. metadata store
  • XX is the process when the weights are not set to random values, but instead taken from a previous trained model. Is supported in TFX thanks to the metadata store. warm starting
  • You always want to XX your training applications so you do not have to worry about dependencies, can use the in Kubeflow, and make them portable between runtime environments. containerize
  • Process of containerizing a PyTorch, Scikit, and XGBoost application: 1) Create model XX script, 2) Create YY, 3) Build th image and push to ZZ training, Dockerfile, container registry
  • [More/Less] For continous training, if deterioration is fast we shold retrain XX frequent. more
  • Challenges for continous training is to find the retraining intervall which achieves the XX but keeps the YY down. business requirements, cost
  • A downside of Apache Airflow is that setup, logging, management can be tedious and XX. This is something that GCPs YY tries to combat. time consuming, cloud composer
  • Apache Airflow component: Is represented by a node in your DAG. Is an implementation of an operator. task
  • Apache Airflow component: Performs an action or tell another system to perform an action. Can also be set a sensors to keep running until a criterion is met. operator
  • A operator in a Apacahe Airflow task can do the common operations of executing XX commands, call arbitrary YY functions, or call other ZZ services. bash, python, GCP
  • Apache Airflows: Operators that do nothing but shows up on the graph for completeness dummy operators
  • Apache Airflows: Operator which check that the values of a metrics given as a SQL expression are within a certain tolerance of values in BigQuery. BigQueryXXOperators IntervalCheck
  • Apache Airflows: Operator which check that the result of a query is within a certain tolerance on an expected pass value. BigQueryXXOperator ValueCheck
  • In Apache Airflows, if an operation fails, you can XX the whole operation or just send a message to YY fail, pub/sub
  • Open source framework from Databricks to standardize the data prep/training/deploy loop Mlflow
  • [Coupled/Decoupled] In GCP, compute and storage is XX decoupled
  • A regression model that uses L1 regularization techniques is called a XX regression lasso
  • A regression model that uses L2 regularization tecnhiques is called a YY regression ridge
  • Main advantage of using TFRecords; Fast XX since it is a sequence of bytes, easy of YY the data which makes it good to ZZ the dataset over multiple workers. loading, shuffling, distribute
  • XX is the process when the data is read at the same time as the training. parallel interleave
  • Keras layer which stacks the input to a 1-dimensional vector. flatten
  • TF metric which tells you how often the predictions are equal to the labels. tf.keras.metrics.XX Accuracy
  • TF metric which approximate the area under the curve of the ROC Precision/Recall curve. It measure the quality of a binary classifier. tf.keras.metrics.XX AUC
  • [True/False] Altough Google tries to migrate from AI platform to Vertex AI, some CLI commands still says ai-platform True
  • Worker mode: Every worker can work by themselves and do not have to synchronize. Is not good if the workers [are/are not] equal in performance. async, are not
  • Worker mode: Every worker is in sync with eachother. This [is/is not] the recommended architecture. sync allreduce, is
  • TF distributed training strategy: Synchronize one machine with many accelerators. Creates a replica of the model on each GPU. Data distribution and gradient updates are updated automatically. mirrored
  • TF distributed training strategy: Synchronize one machine with many TPU cores. Creates a replica of the model on each TPU core. Data distribution and gradient updates are updated automatically. TPU
  • [Can/Can not] You XX train with current data and predict with stale data can not
  • If using GCP products for BQ, try to use XX connectors than to build your own pipeline. pre-built
  • [RMSE/MAE] XX is more sensitive to outliers than YY RMSE, MAE
  • In AutoML, the confusion matrix is only available for classification models with XX or fewer values for the target column. 10
  • For a recommendation system, XX feedback is feedback the users can give undirectly such as time on website. implicit
  • For a recommendation system, XX feedback is feedback the users can give directly such as give ratings on a product explicit
  • [The same/Differently] A recommendation system is trained XX depending on if it uses explicit or implicit feedback differently
  • Average rank aka XX is the most common metric for implicit matrix factoring (recommendation system) mean percentile rank
  • [Production/Test] It is better to fail at the XX stage than to fail at the YY stage. test, production
  • You should avoid Keras XX() function when working with lookup layers with very large vocabularies. This function set the state (trainable/nontrainable) of a preprocessing layer. adapt
  • For the Keras Functional API, unlike the Keras Sequential API, we have to provide the XX of the input to the model. shape
  • [Fastest/Slowest] For submitting a job with gcloud ai custom-jobs, you will get the XX performance using a single-region bucket in the same location, compared to the default of a multi-region fastest
  • XX is a machine learning technique that trains an algorithm across multiple decentralized edge devices or servers holding local data samples, without exchanging them, being secure. federated learning
  • [Words/Sentences] Do sentiment analysis on XX rather than YY. sentences, words
  • GCP no code solution for visually exploring, cleaning, and preparing structured and unstructured data for analysis, reporting, and machine learning dataprep
  • The XX curve is an appropriate metric for imbalanced classification when the output can be set using different thresholds precision-recall
  • Cloud Composer is not a cost-efficient solution for one pipeline because its environment is always XX active
  • [Accelerator type] Require high-precision arithmetic -> always XX GPU
  • MirroredStrategy memory from one instance while MultiWorkerMirroredStrategy own memory per worker -> more XX to hold the dataset memory
  • TPU nodes are not recommended unless XX by the application. required
  • Cloud Data Loss Prevention API can be sued to detect XX in a dataset Pii
  • Hashing is an irreversible transformation that ensures XX and [does/ does not] lead to an expected drop in model performance because you keep the same feature set while enforcing referential integrity anonymization, does not
  • Large increase in loss is typically caused by anomalous values in the input data that cause XX traps or YY gradients NaN, exploading
  • [Does/Does not] A learning rate schedule that is not tuned typically shows a loss that starts oscillating after some steps but XX jump back to the top. does not
  • Regularization reduce XX requirements by pushing the weights for meaningless features to 0. Also, regularization tends to cause the training error to YY RAM, increase
  • PCA is a valid feature selection method only if the most important variable are the one that have the most XX variation
  • XX can be used to identify features which are not highly correlated to the target, which can be removed to [increase/reduce] model complexity correlation analysis
  • Ensuring that categorical features are one-hot encoded and that continuous variables are binned, and create feature crosses for a subset of relevant features will make the model converge [faster/slower] but it increases model YY requirements, and it [is/is not] expected to boost model performance because neural networks inherently learn AA. faster, RAM, is not, feature crosses
  • [Should/Should not] Vertex AI Vizier XX be used for systems that do not have a known objective function or are too costly to evaluate using the objective function. should
  • [C., T. T.]Vizier requires sequential trials and does not optimize for XX or YY. cost, tuning time
  • [True/False] Running tuning locally does not optimize for reproducibility and scalability True
  • [effective/inefficient] Grid Search is XX for high spaces in time, cost, and computing power. inefficient
  • Grid Search is a brute-force approach and it is not feasible to fully XX parallelize
  • Hyperparameter search method: Can limit the search iterations on time and parallelize all trial random search
  • [Does/Does not] RMSE XX penalize high variance as much as MSE because the root operation reduces the importance of higher values. does not
  • A XX approach means that the model is split between workers. You can use TensorFlow YY to implement this. model-parallel, mesh
  • [GCP service] If you need a low latency feature extraction from BQ, export it to XX memorystore
  • [True/False] Memorystore is a fully managed service True
  • Vertex AI Model Monitoring is a fully managed solution for monitoring XX that, by definition, requires minimal YY. training-serving skew, maintenance
  • [True/False] Model retraining fix training-serving skew. False
  • Post-training XX is the recommended option for reducing model latency when re-training is not possible. quantization
  • Pruning helps in compressing model size, but it is expected to provide less XX improvements than quantization. latency
  • Clustering helps in compressing model XX, but it [does/does not] reduce latency. size, does not
  • XX with YY improves performance on the minority class while speeding up convergence and keeping the predictions calibrated. downsamlping, upweightning
  • [Increasing/Decreasing] XX the model’s complexity boosts the predictive ability of the model, which is expected to optimize loss convergence when underfitting. increasing
  • [May/May not] Canary deployments XX affect user experience. may
  • Multi-armed bandit deployment approach may affect user experience, even if on a small subset of users. This approach could cause XX when moving between services. downtime
  • Duplicating the preprocessing adds unnecessary dependencies between the training and serving code and could cause XX. training-serving skew
  • [True/False] For data type input error, it is better to combine all preprocessing steps in a function, and update the default serving signature to accept the input data type wrapped into the preprocessing function call. True
  • [Increases/Decreases] Self-managed -> XX running cost decreases
  • [Overfitting/Underfitting] Training loss down but validation loss is going up -> XX overfitting
  • XX-learning is an unsupervised reinforcement algorithm Q
  • [Q, D, D P G] The main reinforcement algorithms are X and YY Q-learning, deep deterministic policy gradient
  • [Supervised/Unsupervised] K-Nearest Neighbors KNN is an XX ML method supervised
  • [True/False] BQML Anova can manage time-series forecasts and automatically handle anomalies and seasonality. True
  • [True/False] Linear regression can not cut of seasonality False
  • Not all frauds are caused by strange movements (outliers) -> use XX models to decipher them e.g XGBoost complex
  • For unsatisfactory medical models, it is better to deploy a XX model with a classification threshold rather than to try to deduce overfitting in a DNN. logistic regression
  • If you already have a ML workflow consisting of containers, it is better to use XX rather than cloud composer kubeflow pipelines
  • If you do not have a fairly uniform distribution, you can use XX scaling which is able to compress data ranges into log(x) log
  • XX is similar to scaling but uses the standard deviation each value is from the mean to scale. z-score
  • Is a model trained on outputs of many different models for the same training data metamodel
  • XX, create many different models for the same data and use the combined output. ensamble
  • Embeddings are used for XX data categorical
  • When triggering on uploads of data, it is better to use Cloud Storage which trigger XX than to use Pub/Sub cloud functions
  • A XX is a deep learning model that can give a different importance to each part of the input data. transformer
  • XX Cloud TPUs are approximately 70% cheaper compared to normal cloud TPUs preemptible
  • [Correlated/Uncorrelated] Partial least square creates new variables that are XX uncorrelated
  • Maximum Likelihood estimator requires XX for variables independence
  • [Does/Does not] Scale-tiers XX require the application to be containerized does not
  • XX is a tool to check the performance of TF models helping to obtain an optimized version. TFProfiler
  • [True/False] k-anonymity anonymizes the data in such as way that it is impossible to identify person-specific information but you maintain all the information contained in the record True
  • Bagging and boosting is an example of XX ensamble learning
  • [Dataprep/Data fusion] Build visual data pipelines for integrating data data fusion
  • [Dataprep/Data fusion] Interact with the content of data to iteratively refine and combine it. dataprep
  • XX is a lifelike conversational AI with state-of-the-art virtual agents. It has two versions; XX YY (advance) and XX ZZ (standard) dialogflow, CX, ES
  • XX is Googles Cloud multi/hybrid cloud solution anthos
  • [TFX/Kubeflow] XX gives you more control over the whole dev to prod life-cycle compared to YY TFX, Kubeflow
  • Decision trees are explainable as they are and do not need to use Vertex XX explainable AI
  • [Parametric/Nonparametric] K-nearest neighbours and Decision trees are examples of XX algorithms nonparametric
  • XX is a service which provides engieer-to-engineer assistance for both GCP and Tensorflow and is free for big enterprises using GCP. tensorflow enterprise
  • [True/False] Tensorflow I/O can directly read some file formats, such as Parquet into a TF model True
  • Naive Bayes and K-Nearest Neighbours are examples of XX learning lazy
  • Tensorflow XX is a Python library for statistical analysis and probability which can be processed on TPU and GPUs probability
  • [True/False] CNN are supported by BQML since it can stora image data False
  • [True/False] TFX and Kubeflow is a managed service False
  • [E., S., P., G] You can save cost by; use notebooks as XX instances, setup an automatic YY routine, use ZZ VMs, get monitoring alerts about AA usage. ephemeral, shutdown, preemptible, GPU
  • The XX tool is an open source tool that can show you which features affect your model the most. It also lets you interactively try new inferences what-if
  • Language Intepretability Tool (LIT) is an open source tool developed specifically for the explanation and visualization of XX processing models NLP
  • XX is an explainability technique for deep neural networks which gives info about what contributes to the model’s prediction. integrated gradient
  • What-if-tool is for structured data, not XX images
  • Array and Struct transformations are not available in AutoML but is in XX BQML
  • XX is for multi-class classification what Sigmoid is for logistic regression softmax
  • BigQuery I/O connector is the way to connect directly to BigQuery from XX dataflow
  • [True/False] You can do canary deployment with solely cloud build False
  • [B, C. S] Avoid storing ML data in block storage like filestore, use XX and YY instead BigQuery, Cloud Storage
  • In Vertex AI, there are two types of logs; XX logging which logs data connected to the container, YY logging which logs access and latency information container, access
  • [On/Off] It is important to turn XX eager mode, which lets you execute operation one by one for a Tensorflow model, before deploying to production. off
  • Vertex AI datasets manages CSV files automatically, but you need to have header with only XX characters, blankspace as YY, and ZZ as delimter. alphanumeric, underscore, comma
  • [True/False] XRAI is an optimization of the integrated gradient method True
  • XX can be calculated and used for a affinity system trained with a small amount of data cosine similarity
  • If you export a Vertex AI dataset, no additional copies of the data is generate, only a XX file with the cloud storage YY are given. JSON, URIs
  • Cloud composer is for XX, not transformation orchestration
  • GCP service: XX is good to use for cleaning a dataset dataprep
  • [True/False] You can import a tensorflow model to BigQuery if the model type is supported by BigQuery. True
  • To send out message for predictions to user, Build a notification system on XX and use XX Cloud Messaging server to send the notification firebase
  • Better to store metadata information about BQ tables in the XX data catalog
  • [Is/Is not] If security is important, employing and deploying directly from AI platform prediction XX an option is not
  • Search the XX before making your own feature feature store
  • [Is/Is not] It XX recommended to always train with checkpoints and save them in cloud storage with a folder for each experiment is
  • If using Vertex AI pre-built containers, ensure that the model artifact excactly has the following filename; TF, Scikit-learn [YY or ZZ], XGBoost. PyTorch save_model.pb, model.joblib or model.pkl, model.bst, model.pth
  • If possible, recommended to use XX pipelines to orchestrate the ML workflow vertex AI
  • Do train-serving skew detection by setting up a model XX job monitoring
  • Do data drift detection by turning on XX which does not require access to the source data drift detection
  • You can also use XX in Vertex Explainable AI to detect data drift or train-serving skews. feature attributions
  • For big data, use XX for EDA rather than notebooks BigQuery
  • By default, Dataflow assigns both public and private IP adresses to workers. However, the XX IP can be disabled to boost security public
  • [True/False] Simple models might not train faster with GPUs or distributed training since they do not benefit from hardware parallelism True
  • [Does/Does not] Sci-kit learn XX support distributed training does not
  • [True/False] For small datasizes, it is better to use a high end machine rather than a distributed set of machines True
  • Asynchronous distributed training with powerful GPUs requires a lot of network XX bandwidth
  • If the request volume fluctuates, it can be a good idea to XX the instance scaling fix
  • Use XX encoding rather than array of floats to encode images base64
  • [True/False] You can not upload a TFX or Kubeflow SDK pipeline to AI platform pipelines False

Alla Inga

(
Utdelad övning

https://spellic.com/swe/ovning/gcp-mle.11332775.html

)