User:IssaRice/Logical inductor construction

From Machinelearning

Notes from the Logical Induction paper as I walk through the construction of LIA in section 5.

Lemma 5.1.1 (Fixed Point Lemma)

"Observe that is equal to the natural inclusion of the finite-dimensional cube in the space of all valuations ." -- I think what this is saying is that since , we can think of as being sort of a subset of . Except it's not strictly speaking a subset, since the functions in and have different domains. How can we make it a subset? The "natural" way to do this is to set everything outside of to zero. But that's exactly what is. One thing I'm still not sure about is the "finite-dimensional" part; doesn't having make the cube infinite-dimensional? -- it's finite-dimensional, since if is a finite set, we can think of as being basically , where .

Definition of fix: I found it helpful to look at the graph of ; this looks like the identity function in the interval , but then becomes constant once it hits either of the endpoints. If you've already thought about the definition of continuous threshold indicator (definition 4.3.2), then you will recognize that .

"the compact, convex space " -- this intuitively makes sense, since basically "looks like" a cube. But I'm not sure how to verify this. -- Here's what I eventually came up with: in order to talk about a set being "compact" or "convex", we need some kind of structured space. But what kind of structured space? we can have metric spaces, normed spaces, inner product spaces, vector spaces, topological spaces, and on and on. The paper doesn't tell us what kind of space, so we have to figure it out on our own. Ok. But knowing the words "compact" and "convex", we can restrict what kind of space it can be. In particular, one way to go about this is to take the most general kind of structured space that each adjective ("compact" or "convex") can apply to, and then take the less general of the two, which gives us the most general kind of space that can take both adjectives. Now, theoretically, there can be a problem here, where when we do that, we get two spaces that are not comparable (formally: we have a partial order of structured spaces, and we are finding subsets of this partial order that can take each adjective, and then finding the maximum of the union, but because we're working on a partial order, a maximum might not exist -- we might just have a bunch of maximal elements). Luckily, in this case we don't need to worry about this problem... Anyway, compactness makes sense for topological spaces, which are very general. And convexity requires us to be able to add elements and scale them (since we need to form the expression for ). So maybe it's a topological vector space, but actually we don't need to be so general. I think we can just think of our space as being the subset of the Euclidean space . I'm not entirely confident though; the paper talks about "the space of all valuations ", so maybe that's supposed to be the ambient space.

For the fixed point reasoning: we don't actually have a fixed point of ; instead, it's a fixed point of , where and . If , then the graph of is just the graph of but shifted to the left. You will see that this intersects the graph of the identity function at ; this is the fixed point. On the other hand, if , then we shift the graph of to the right, and now the fixed point is at .

The key property of that we use in the proof:

  • If buys a share of on day , then the price of on day is 1 (the maximum possible).
  • If sells a share of on day , then the price of on day is 0 (the minimum possible).

One question to ask is, couldn't we just avoid using Brouwer's fixed point theorem by just setting the prices to obey the above property? There are two problems with this. One is that the definition of the th day prices would depend on 's behavior, which depends on the th day prices! So the definition would be circular. The other problem is that we can't guarantee that the map would be continuous if we just magically set it to obey some property.

Something else I got confused about: I was thinking that the above key property only talks about day . Couldn't the trader make a lot of money on previous days, and then just make no money on day , so that it would still be making lots of money? The answer is that we are only dealing with a trading strategy in this lemma, not a full trader. Later, in lemma 5.1.3, we recursively use this idea to deal with a full trader.

How do we come up with the definition of fix? Here's one way to think about it. Let be the market price (of some sentence on day ) and be the trading volume of that sentence on the same day. We want to return some quantity that fixes up the price, after the fact, so that the trader would have earned less money. Why after the fact? Because doing it in real time is confusing! The trader depends on the prices for day n, which is what we want to decide. But we want to know what the trader does on day n in order to determine the prices. So there is a circular dependency. This sort of thing happens in game theory too, where everything does eventually take place in real time, but we can think iteratively by "pressing pause" and considering counterfactuals and going down different paths in the game tree. Here is another way to think about it. We get an infinite number of "retries". We first set the price to something arbitrary. Then we get to see what the trader would do in that situation. Ah, it buys some shares, making money in expectation. We want to prevent that, so we pretend that wasn't the real market price, and see what the trader does in the new, adjusted price. It's also a hint that fixed points are going to appear. Ideally we would like to say something like:

This will work, but it's after the fact. We eventually would like to fix the prices in real time. So one way to think about this is to adjust the prices a little at a time, querying the trading strategy each time to see if we should nudge the prices up or down. Here is one idea:

where is some small adjustment. In other words, if a trader buys at the current price, then nudge it up a little to make it a slightly worse deal for the trader. And if the trader sells at the current price, then nudge it down. Now, there are three problems with this map. As a function of , it's still discontinuous (ugly). Also, we don't check whether the values we return stay within the allowed interval of valuations. But most importantly, if we only adjust the prices a little, that means the trader can still make money from us. So what can we do? Well, we can apply the map many times: (actually, maybe we should allow the s to change as well, to react in response). Now it's like a turn-based game: the trader says he's going to buy, so we adjust up; the trader still says he's buying, so we adjust up again; the trader now says he's going to sell, so we adjust down (oops, now we're in exactly the same spot, leading to an infinite loop...).

We can fix the second problem above with the following:

Now if the prices go out of bounds, they get adjusted back to the bound, but otherwise they are unaffected.

Let be a trading strategy that takes the price of a sentence and outputs how much it would trade at that price. So then we start with some price , which gets fed into to get . Then we have the next price . And the next trade volume is and so on. So we get .

So what can we do to avoid the infinite loop? One idea might be to decrease the step size with time (the number of iterations). This sort of thing is reminiscent of some versions of gradient descent. Another idea is to make the step size a function of : if the trader trades a lot, then we adjust a lot, but if the trader trades only a little, it signals that we're getting close to ... a fixed point!

This turns out to be equivalent to , which is the form of fix given in the paper. To see this, just split into cases based on , , or . The version in the paper makes it obvious that the function is continuous, but it is more difficult to see what is going on.

We want some such that , i.e. we are satisfied with the price so that we don't try to change it. Thanks to both being continuous, this is guaranteed by Brouwer's fixed point theorem.

We imagined that we were working with just one sentence, but it turns out that (for some reason) basically the same thing works when dealing with all sentences.

Another thing to wonder is that the rate of adjustment is just . What if we did instead? or ? or ? It seems like these would change the slope of the graph of , which means we might introduce more fixed points (just graph the identity function with the new functions); one of the nice things about slope=1 is that it is parallel to the identity function, so we get exactly one fixed point, or infinitely many. Actually, the scalar on x is still 1, so this probably doesn't matter.

The following is used in the Fixed Point Lemma (5.1.1):

Writing the -strategy as

we have

But so the two sums cancel to obtain .

I think here is a simpler way to do the summation at the end: Write as . Then we have

where the first equality follows from the linearity of , the second because , and the third because we can change the summation from to and by the definition of .

Definition/Proposition 5.1.2 (MarketMaker)

Define by (this is the same map given in the proof).

Let's see what does to . By the Fixed Point Lemma, for any we have . This means that as we max over these worlds, we will never exceed 0. So . Or in other words . (I had two confusions here: first, for some reason I mistakenly thought was negative for a while, so got confused about why was always contained in the interval. Second, I wondered why the left endpoint was minus infinity; couldn't any negative number also work? it couldn't, because we don't know how negative the trader could be; in other words, the trader could perform arbitrarily badly and we don't have a lower bound on that.)

What I'm not sure about is why we need to do this thing. Couldn't we fix some at the start, and consider the map ? Then by the same reasoning, for maps into so we have a neighborhood in that maps into this open set. It seems like we didn't need to form the max over all the worlds. Of course, doing this stuff becomes important when doing the brute force search later in the proof, but if we're merely trying to verify the pricing exists, it doesn't seem like we need to do this finite support stuff.

"Hence there is some neighborhood in around with image in ." -- This uses one of the equivalent definitions of continuity (often called the topological definition of continuity), namely, is continuous iff for every open set such that , there exists an open set such that and . Here we know that is open and . So there exists an open set such that and .

One thing I'm unclear on is, are we using a metric on ? If so, what is the metric?

Lemma 5.1.3 (MarketMaker Inexploitability)

This proof is pretty simple.

We use the geometric series . (Then subtract since index starts at 1 rather than 0.)

Definition/Proposition 5.2.1 (Budgeter)

Why does Budgeter get all of ? It seems to use only up to .

What's up with the then-clause? Does it even trigger? I think it's just there to make sure Budgeter is defined in all cases. The else-clause doesn't make sense for some values of .

Another question I had was why the budget is a positive integer, rather than a positive rational. After all, our markets run on rational prices. I think this is because in the way TradingFirm is constructed, all we care about is that we can increase the budget as much as we want, to let the good traders do what they want. The key property is Lemma 5.2.2.3. So basically we just need a subset of the rationals that is not bounded above. I think also works well because it's very easy to enumerate (maybe also it's important that it's easy to enumerate in increasing order).

For the inf expression in the else-clause: the fact that we're in the else-clause means that the "if" part is false, so for (which is less than ) in particular, the sum . So the denominator . Now suppose that according to , the trader wins money on day n. Then is negative, so that the whole fraction is negative. Thus the output of max is 1, so that Budgeter returns . In other words, if the trader is going to win money anyway, then Budgeter does not interfere with what the trader is doing.

But now suppose that according to , the trader loses money on day n. Thus is positive. As long as the amount of money lost is within the wealth of the trader (which is , the expression in the denominator), the fraction is at most 1, so max outputs 1. In other words, Budgeter does nothing because the trader "stays within budget". However, if the trader loses a lot of money on day n, so that it loses more than the wealth it has gained up to that point, then Budgeter kicks in. Suppose I have $10 and am about to lose $30. To stay within my budget I would have to spend at most $10, i.e. I would need to scale my spending by . The same idea applies to the case of Budgeter, where is the amount I would have spent ($30) and is the amount I actually have ($10). Why do we take the inf over possible worlds? Because we want to defend against the worst case scenario. Suppose I have $10 (for certain), but now Alice says I am about to lose $30 and Bob says I am about to lose $50. Now, I don't know which of Alice or Bob is right, but I know that at least one of them is right. In order to guarantee that I won't overspend, I must scale my spending by 10/50, because that is the more pessimistic of the scenarios: if Bob is right, then I lose $50 * (10/50) = $10, and if Alice is right I lose $30*(10/50) = $6, so in either case I stay within my budget. And indeed we have .

Lemma 5.2.2 (Properties of Budgeter)

Part 1

If the trader never overspends, then budgeter does nothing. The proof of this is basically the same as the reasoning in the previous section.

Part 2

If we inductively alter the trader on every single day, then we never overspend.

It seems like a simple inductive proof could work here, but instead we get a horrid thing. Actually it seems like the same proof.

"(since )" -- it took me a while to understand what they were doing here. What they're saying is that normally when you take off the "inf" and instantiate it to a specific thing in the class, the expression should go up but instead here it goes down, and the reason it does that is that the at the front of the expression is negative, so it reverses things. I had mistakenly been staring at the inside the max, so was confused about this.

Part 3

If the trader can exploit the market, then we can give it the budget it needs so that it can do its thing.

Proposition 5.3.1 (Redundant Enumeration of e.c. Traders)

I think the key point of this proof is that the in is the same as the in , so that the input to the trader computation (which says what day it is) controls how long the computation can run.

Suppose we tried skipping the enumeration of the polynomials. Given , we could decode it into a pair of integers , and run for steps. This would find all e.c. traders, but it would also find traders who aren't e.c., because the runtime isn't connected to the input to .

But given an -strategy, shouldn't we be able to tell whether it is e.c. or not?

Definition/Proposition 5.3.2 (TradingFirm)

Why do we want to blank out the traders by defining ? shouldn't giving the later traders less weight be sufficient? This assumption is used to make the sum into . In other words, on any day we are only dealing with a finite number of traders.

Lemma 5.3.3 (Trading Firm Dominance)

See also