You might get better answers to this question over at Math Stack Exchange, but let me give it a try.
First of all, you should try to decide what the property that you want your rectangle to maximize actually is. I'll take a wild guess and say that it might be the number of red pixels minus the number of blue pixels inside the rectangle.
If my guess is right (or at least a good approximation), then the first thing you probably want to do is integrate your data. That is, let $f(x,y)$ be $1$ if the pixel at $(x,y)$ is red, and $-1$ if it's blue. Then, for each point $(x,y)$, you want to calculate
$$ F(x,y) = \sum_{i=0}^x \sum_{j=0}^y f(i,j) \;. $$
This is easy to calculate in two passes over the data, e.g. in pseudocode:
// first pass: integrate over x for each y
for y from 0 to y_max:
sum = 0;
for x from 0 to x_max:
sum = sum + data[x,y];
out[x,y] = sum;
// second pass: integrate over y for each x
for x from 0 to x_max:
sum = 0;
for y from 0 to y_max:
sum = sum + out[x,y];
out[x,y] = sum;
or, indeed, even in just one pass:
// do first row:
sum = 0;
for x from 0 to x_max:
sum = sum + data[x,0];
out[x,0] = sum;
// do other rows:
for y from 1 to y_max:
sum = 0;
for x from 0 to x_max:
sum = sum + data[x,y];
out[x,y] = sum + out[x,y-1];
(The careful reader will also note that both of these code examples can be made to work in-place just by replacing all instances of out with data.)
Now, if one corner of your rectangle were fixed at $(0,0)$, all you'd need to do to optimally place the other corner would be to find the maximum of $F$. (This can be done easily and efficiently just by looping over all points and keeping track of the maximum value seen so far and its location.)
However, if both corners can be at arbitrary locations, then things get a bit more complicated. Essentially, what you want to maximize is
$$\begin{aligned}
G(x_1,y_1; x_2,y_2)
&= \sum_{i=x_1}^{x_2} \sum_{j=y_1}^{y_2} f(i,j) \\
&= F(x_1,y_1) - F(x_1,y_2) - F(x_2,y_1) + F(x_2,y_2)
\end{aligned}$$
where the latter formula can be derived using the inclusion–exclusion principle.
Unfortunately, while the analogous problem in one dimension is relatively easy to solve, in two dimensions it gets trickier. If you don't have too many points, you could just loop over every possible pair of points and select the one that maximizes $G$, but the time that takes scales with the square of the number of points (and thus with the fourth power of the data resolution).
Instead, you may be better off exploiting the relative smoothness of the integrated data by first sampling $G$ at a lower resolution to find an approximate maximum, and then refining it, either just by sampling the vicinity of the approximate maximum more finely, or perhaps using simple hill cimbing. Or you could try starting a simple hill climbing process from each low-resolution sample point to find the nearby local maximum, and then take the highest of these maxima.
(You could even consider more advanced optimization techniques, such as simulated annealing, but these may well be overkill for your purposes, and might even be less efficient than the simpler methods described above.)