#How can SVM maximize the margin in decision boundary by minimizing the $\sum_{j=1}^d\theta_j$? One thing to notice that this minimization is attempted with the constraints of: - $y_i=1\implies \theta^\text{T}x_i\ge1$ ("predictions of positive examples should give a value that $\ge1$") and - $y_i=-1\implies \theta^\text{T}x_i\le-1$ ("predictions of negative examples should give a value that $\le-1$"). This means that, the value of prediction should provide sufficient **quantity** for each training example. This quantity $\theta^\text{T}x_i$, geometrically, is the **projection of $x_i$ onto the "parameter vector" $\theta$**. The length of a projection is determined by: - the length of participating vectors, - the angle they form. Now that the length of one participating vector, $x_i$, is fixed (i.e. determined by input training data), to make the projection meet the requirement for quantity, we can either: - make sure $\theta$ is big, or - make sure the angle is right. The first idea is not graceful: it basically looks like as if the model is standing in front of a crowd of audiences and shouting out "YES YES YES! THIS $x_i$ WORKS!!" Instead, we want the angle to be better positioned. Thus, we seek to minimize $\theta$.