Symbolické strojové učení
-
Přednášející: Jiří Kléma, Filip Železný
Cvičící: Jáchym Barvínek, Ondřej Hubáček, Petr Ryšavý, Martin Svatoš
Cvičení
Zkouška
06.06.2018
(5 pnts) Difference between PAC-learning agent and mistake-bound agent.
(10 pnts) Space-version agent. There are given two agent with different hypotheses spaces. First is all possible 3-conjunctions (non-negative) of n variables. Second is all n-conjunctions of positive and negative literals.
(3 pnts) For each agent: does it learn online?
(3 pnts) For each agent: does it learn efficiently?
(4 pnts) For the first agent: given the first negative observation (0,1,1,1,…,1), what will be the agent's decision on the next observation (0,1,0,1,…)?
(15 pnts) Relative Least General Generalization (rlgg). Given background knowledge B = {half(4,2), half(2,1), int(2), int(1)}. What will be the rlgg of o1 = even(4) and o2 = even(2) relative to the background?
(10 pnts) Apply algorithm, draw tables, theta functions.
(5 pnts) Make a reduction step relative to B. Why is it needed?
(10 pnts) Bayesian networks.
(2 pnts) Find optimal, efficient, complete network (something like Season → Temperature → (two children: → Ice Cream Sales, → Heart Attack Rate)).
(2 pnts) Then compute CPT (conditional probability tables).
(3 pnts) Compute Pr(Spring|Good Ice Cream Sales, No Heart Attack)
(3 pnts) Compute Pr(Heart Attack|Winter, Bad Sales).
(5 pnts) Q-learning. Given 5 small questions, response True/False and provide your reasoning.
(1 pnt) Can Q-learning be extended to infinite states or action space? How would it handle this?
(1 pnt) Does Q-learning use on-policy update? What is the difference from off-policy update?
(1 pnt) Does Q-learning always converge? If so, is it conditioned by anything? By what?
(1 pnt) Is Q-learning just an instance of temporal difference learning? If not, what is different?
(1 pnt) What is the difference between Q-learning and direct utility estimation or adaptive dynamic programming? What is better?
(5 pnts) Q-learning representation.
There is a robot moving in a swimming pool, which can move in either of 3 dimensions and it has exactly one propeller for each dimension. It can also move with two different speeds. There is a treasure at a specific place and a specific depth. There are mines at some places as well. If the robot hits a mine or the wall, it restarts at a random position.
(3 pnts) Describe states, actions, rewards of a specific game. You may provide two different representations.
(2 pnts) Describe Q-learning representation, the update rule, gamma, alpha value. How are Q values defined?
31.5.2021
(5 pnts) Probabilities
(1 pnt) Mathematically describe conditional independence.
(2 pnts) If P(X,Y|W) and P(Y,Z|W) are conditionally independent, does it imply conditional independence P(X,Z|W)?
(1 pnt) Can bayesian graph contain a cycle?
(1 pnt) Is it true that if parents of X are given, X is independent of all other variables in graph? If no, correct this statement.
(5 pnts) Factors computation - given some graph and compute conditional probability
(10 pnts) Space-version agent. There are given two agent with different hypotheses spaces. First is all possible 3-conjunctions (non-negative) of n variables. Second is all n-conjunctions of positive and negative literals.
(3 pnts) For each agent: does it learn online?
(3 pnts) For each agent: does it learn efficiently?
(4 pnts) For the first agent: given the first negative observation (0,1,1,1,…,1), what will be the agent's decision on the next observation (0,1,0,1,…)?
(15 pnts) RLGG relative to B
(10 pnts) Compute RLGG relative to B: RLGG(B→X1, B→X2).
(5 pnts) Make a reduction of previous result.
(5 pnts) Some theoretical question, if concept learning agent learns, does the pac agent learn? Something in this sense…
(10 pnts) Reinforcement learning
(5 * 1 pnt) Small questions asking about ADP and TD learning, shortly describe them, compare convergence, how fast they are, how do they compute utility…
(4 pnts) 3 same grids given with different optimal policies, 2 terminal states, one with reward +1 and one with reward -1, other states have negative rewards r_a in grid A, r_b in grid B, r_c in grid C, determine based on the policies which absolute value of these rewards is the biggest and the smallest, justify your answer.
(1 pnt) In the grid agent behaves according to the optimal policy. However, in two different episodes it makes different sequences of states, why is it possible?
2.6.2023 & 6.6.2023
RL
TD learning/Sarsa/Q-learning calculate the value function and then decide how V/Q changes with different gamma/alpha parameters
Convergence of TD
Definition of some hyperparams, value function
BN
Given a network count how many parameters do we need
Given a network determine independent set R
Definition of cond. independence, proof that causal chain is cond. independant
NLP
Naive Bayes, bag of words (10 pnts)
Word embeddings - what is it, how to obtain etc. (4 pnts)
TF-IDF - formula
Transformers - main idea, diagram
Co-occurence matrix, PPMI, truncated SVD (10 pnts)
Perplexity - formula
LDA, Dirichlet dist.
RNN diagram
CoLT
learning conjunctions, proof that mistake bound is n+1, describe algorithm for k-term DNF
decide if something is PAC efficient/proper if it is a subset of something that is PAC efficient
adapt winnow/halving, define |H|, define mistake bound
23.6.2023
(~10-12 pnts) RL
(2 pnts) True-False. There are given 2 policies: 1) not necessarily greedy 2) greedy policy. Q1 are Q values of 1st policy at time t, Q2 are values of 2nd policy at time t, Q3 are values of 2nd policy at time t+1. What is the relationship between V1 V2 and V3?
(~3-4 pnts) Describe Monte-Carlo, differences between First-Visit and Every-Visit.
(1 pnt) Describe GLIE.
(~4-5 pnts) Smth about bandits.
(5 pnts) BN
(~16-20 pnts) NLP
(10 pnts) 4 sentences are given (as in the sample exam), each sentence is a document.
(4 pnts) Smth about softmax and negative log likelihood
(4 pnts) Smth about CNN, draw a diagram
(14 pnts) CoLT
(5 pnts) Define VC dimension, determine VC dimension of given hypothesis class
(8 pnts) Some decision tree was given with depth = 4.
Express the tree as a 4-DNF
Express the tree as a 3-CNF.
How can we learn k-decision trees in the PAC learning model?
Ukázkové příklady z roku 2019 + řešení
Ukázkové příklady z roku 2023
Nahoru