Eliciting Continuous Scalars
Posted: 2018-03-16.
Continuing the series on elicitation, we'll take a look at a special class of properties or statistics of distributions: those that are scalar real numbers (as opposed to vectors) and whose value is a continuous function of the distribution.
Our problem was introduced in Lambert et al 2008, but to understand it, we'll start back in the 1970s and 1980s, exploring Leonard Savage's advice on how to sell gold and connections to Myerson's key lemma in single-parameter mechanism design.
Selling Gold
Imagine, Savage (1971) says, selling gold to someone with an unknown value $r$/ounce. We can elicit $r$ by first offering to sell one ounce for $0$, then an ounce for $0.1$, then for $0.2$, and so on. She will continue agreeing to purchase the gold until the price reaches $r$. This process "truthfully" reveals her true value: She will not want to stop early, because this leaves out some gold that she can purchase for less than her value; and she will not want to continue past $r$ because then she will pay more for some gold than it is worth to her.
Now, Savage points out, we can first of all generalize this to a continuous increase in price for an infinitesimal amount of gold, so that the total amount of gold sold and total price paid are \begin{align} \text{amount } &= \int_{z=0}^r 1 dz , \\ \text{price } &= \int_{z=0}^r z dz . \end{align} and secondly, that the amount of gold sold at each price need not necessarily be constant; one can modify it to any amount, characterized by a nonnegative function $f(z)$, so that the amount and price are \begin{align} \text{amount } &= \int_{z=0}^r f(z) dz , \\ \text{price } &= \int_{z=0}^r z f(z) dz . \end{align} Savage goes on to show that every truthful mechanism for selling gold is of this form, modulo mild assumptions on $f$ and an additional fixed gift (irrespective of the purchaser's report) of some constant amount of gold and constant amount of cash.
Savage's proof merely notes that for any truthful method of assigning a total amount of gold and cash transfer, the expected utility must be convex as a function of $r$, the purchaser's true valuation per ounce; and the allocation rule for a fixed report $r'$ is necessarily linear in the true valuation $r$, hence must be a subtangent of this convex function.
If you don't know where that argument is heading already, you might like to go back and read up on proper scoring rules. In short, Savage extends precisely this reasoning to characterize all proper scoring rules as subtangents of some convex "expected utility function".
Now, Savage's story turns to probability as follows (phrased in more modern terms). Instead of ounces of gold, imagine that we sell "shares" of a "security" where one share pays off one dollar if th event occurs. Then one can view a belief of probability $p$ that an event will happen as a valuation of $p$ dollars per share, analogous to a valuation of $p$ dollars per ounce of gold. (For example, if one owns 3.5 shares, and the event happens, one will get rewarded $\$$3.5; so a belief of $0.5$ chance of the event happening leads to a valuation of $\$$1.75 for those shares.) Therefore, Savage points out, we can apply the above theory to eliciting probabilities and obtain the scoring rule characterization discussed in my previous posts.
Aside on Mechanism Design
The equations above may look familiar from another context; they're essentially the same as in Myerson's Lemma, part of Myerson's work on revenue-maximizing auctions for a single item. The lemma characterizes all truthful "interim" allocation and pricing functions that map a single bidder's reported valuation for the item, $v$, to a probability of allocation $p(v)$ and a price $\pi(v)$. Here "interim" means the induced functions after fixing the reports of all other bidders.
To apply Savage's argument to get Myerson's Lemma (and this connection seems to first appear in Frongillo and Kash 2014, though perhaps not explicitly), imagine that instead of selling some unknown number of ounces of gold, we have a single item for sale. Now instead of selling ounces, we will sell units of probability of obtaining the item. A quasilinear bidder with private valuation $v$ for the item will be willing to pay up to $v$ per unit of probability of obtaining the item, and Savage's argument goes through. Namely, Myerson's lemma states that truthful mechanisms are exactly those that can be written as follows: for a nonnegative $f$ and report $v$, \begin{align} \text{probability of allocation } p(v) &= \int_{z=0}^v f(z) dz , \\ \text{payment } \pi(v) &= \int_{z=0}^v z f(z) dz , \end{align} possibly plus some constant probability of allocation and/or constant payment. One sometimes sees Myerson's Lemma stated instead with a payment rule of $\pi(v) = v p(v) - \int_{z=0}^v p(z)dz$, which follows from integration by parts (exercise!).
Scalar Properties
Back to elicitation. As in previous posts in the series, suppose that we want to elicit a property (statistic) of a distribution that is known to an expert; in this case, where $\mathcal{P}$ is the class of distributions on a set of possible outcomes $\mathcal{Y}$, the property is a continuous real-valued function $\Gamma: \mathcal{P} \to \mathbb{R}$. This property is elicitable if there exists a scoring rule $S: \mathbb{R} \times \mathcal{Y} \to \mathbb{R}$ where, letting $S(r;p) := \mathbb{E}_p S(r,Y)$ (the expected score for report $r$ under belief $p$ on random variable $Y$), $$ \Gamma(p) = \arg\max_r S(r;p) $$ for all $p \in \mathcal{P}$.
Lambert, Pennock, and Shoham (2008), which introduced and formalized property elicitation, considered the question of characterizing all $\Gamma$ of the above form. They showed, in essence, that a clever extension of Savage's gold argument still applies in this setting. Namely, if $\Gamma$ is continuous and real-valued, then one can design a scoring rule that pays "per unit" of the property just as Savage paid "per ounce" of gold, up to the buyer's private-knowledge price.
To state their characterization, it is helpful in hindsight to use a term introduced by Osband (1985): identifiability of a property.
Identifiability
Say we have a property $\Gamma: \mathcal{P} \to \mathbb{R}$ that is elicited by a scoring rule $S$. If $S$ is differentiable, then notice that for any $r \in \Gamma(p)$ -- recall, this means that an agent with belief $p$ is maximally happy by reporting $r$ -- we must have \[ 0 = \frac{d}{dr} S(r;p) = \mathbb{E} \frac{d}{dr} S(r,y) . \] This is just the "first-order condition" of optimality of the report $r$. If the derivative weren't zero, then a local change in $r$ would give higher expected score. So for each report $r$, we have a function $v_r(y) = \frac{d}{dr}S(r,y)$ such that $\mathbb{E}_{y\sim p} v_r(y) = 0$ exactly for the distributions $p$ with property $r$, i.e. $\Gamma(p) = r$.
Now, let's generalize this idea. Let $v$ be a collection of functions $\{v_r: r \in \mathcal{R}\}$, where $\mathcal{R}$ is the report space of the property. The functions are of the form $v_r: \mathcal{Y} \to \mathbb{R}$, and if for all $r$, we have \[ \mathbb{E}_p v_r(Y) = 0 ~~ \iff ~~ r \in \Gamma(p) , \] then we say $v$ is an identification function for $\Gamma$. (Notice we could think of $v$ as a function of two variables, $v(r,y) = v_r(y)$, but I find that approach harder to grok.) In this case, we say $\Gamma$ is an identifiable property.
Note also that $\alpha v$ would also be an identification function for $\Gamma$ for any positive $\alpha$, so from now on, let's assume without loss of generality that $\sum_y v_r(y) = 1$.
Fact 1. Let $\Gamma: \mathcal{P} \to \mathcal{R}$ be a property with report space $\mathcal{R}$ an open subset of $\mathcal{R}^d$. If $\Gamma$ is elicited by a differentiable (with respect to $r$) scoring rule $S(r,y)$, then $\Gamma$ is identifiable, in particular is identified by \[ v_r(y) = \nabla_r S(r, y) . \] What I mean by this notation is that we fix $y$ and consider the function $S(\cdot, y)$; now take its gradient at $r$.
Proof. Excercise (generalizes the argument above).
In particular, this applies to scoring rules that are concave and differentiable as a function of $r$. These are of interest because, intuitively, they're easy for a strategic agent to maximize.
Note we can think of identifiability as saying that level sets (recall: these are sets of $p$ that share an optimal report $r$) satisfy linear equalities. Namely, for a given report $r$, all the $p$ that are optimal for a given $r$ satisfy \begin{align*} 0 &= \mathbb{E}_p v_r(Y) \\ &= \langle p, v_r \rangle \\ &= \sum_y p(y) v_r(y) . \end{align*}
As observed e.g. by Lambert et al., this implies that the level sets are very nice: If $p_1,p_2 \in \Gamma(r)$, then $\alpha_1 p_1 + \alpha_2 p_2 \in \Gamma(r)$ for any constants $\alpha_1, \alpha_2$ where $\alpha_1 p_1 + \alpha_2 p_2 \in \mathcal{P}$.
Fact 2. If $\Gamma$ is identifiable, then all of its level sets are "affine maximal", i.e. intersections of $\mathcal{P}$ with lower-dimensional hyperplanes.
Proof. Exercise.
The practical upshot is that if you draw the level sets of identifiable properties, you get points, lines, planes, and so on. For example, finite properties aren't identifiable, because their level sets look like Voronoi diagrams or power diagrams.
Characterization
Now we can set up Lambert et al.'s characterization for continuous, one-dimensional (scalar) properties. First, they use a bit of topology related to continuous functions to show:
Theorem 1. If $\Gamma: \mathcal{P} \to \mathbb{R}$ is continuous and elicitable, then it is also identifiable.
(Proof omitted.)
Now we're finally ready for the main result!
Theorem 2. If $\Gamma: \mathcal{P} \to \mathbb{R}$ is continuous and elicitable, then all scoring rules eliciting it are of the following form, possibly plus a constant: \[ S(r,y) = \int_{z=a}^r f(z) v_z(y) dz \] where $v$ is the identification function for $\Gamma$, and $f(z)$ is nonnegative.
(Proof omitted).
References
(1971) Leonard J. Savage. Elicitation of personal probabilities and expectations. Journal of the American Statistical Association.(1981) Roger B. Myerson. Optimal auction design. Mathematics of Operations Research.
(1985) Kent Osband. Providing incentives for better cost forecasting. PhD thesis.
(2008) Nicolas S Lambert, David M Pennock, and Yoav Shoham. Eliciting properties of probability distributions. Proceedings of the 9th ACM Conference on Electronic Commerce.
(2014) Rafael Frongillo and Ian Kash. General truthfulness characterizations via convex analysis. Proceedings of the 10th Conference on Web and Internet Economics.
 
