One possible approach using built-in methods could be the following.
- Using
ImageCorrespondingPoints we compare the first image i1 in a list of screenshots imgs to all other images.
- We select
i2 as the image with the maximum number of corresponding points with i1.
- We compute the difference between the positions (in
i1 and i2) of all corresponding points and take the TrimmedMean to a given threshold th to compute a shift vector delta between i1 and and i2.
- We use
ImageCompose and with the shift vector delta to merge i1 and i2. We remove i1 and i2 from the list of images imgs and prepend the merged image to the list of images imgs.
- We repeat the process until the list of images
imgs has the length of one meaning that only one merged image is left.
Here two methods which implement this approach, where ImageJoinList performs all the above steps and calls ImageComposeAdd to merge two images based on a shift vector.
ImageComposeAdd[i1_,i2_,delta_]:=Module[{d1,d2,ci1,ci2,cci,cpad},
d1=ImageDimensions[i1];
d2=ImageDimensions[i2];
ci1=d1#&/@{{0,0},{1,0},{0,1},{1,1}};
ci2=d2#+delta&/@{{0,0},{1,0},{0,1},{1,1}};
cci={{Min[#[[All,1]]],Min[#[[All,2]]]},{Max[#[[All,1]]],Min[#[[All,2]]]},{Min[#[[All,1]]],Max[#[[All,2]]]},{Max[#[[All,1]]],Max[#[[All,2]]]}}&@Join[ci1,ci2];
cpad=cci-ci1;
ImageCompose[ImagePad[i1,{{-cpad[[1,1]],cpad[[-1,1]]},{-cpad[[1,2]],cpad[[-1,2]]}},Blue],i2,delta-cpad[[1]],{0,0}]
]
ImageJoinList[imgs_List,th_:0.2,printQ_:False]/;Length[imgs]>1:=Module[{i1,i2,i2idx,iRest,IRestCPs,delta},
i1=imgs[[1]];
iRest=imgs[[2;;]];
IRestCPs=ParallelMap[ImageCorrespondingPoints[i1,#]&,iRest,DistributedContexts->All];
i2idx=Last@Last@SortBy[MapIndexed[{Length[#1[[1]]],First@#2}&,IRestCPs],First];
i2=iRest[[i2idx]];
iRest=Drop[iRest,{i2idx}];
delta=TrimmedMean[First[-Differences[IRestCPs[[i2idx]]]],th];
If[printQ,Print[MapThread[HighlightImage[#1,#2]&,{{i1,i2},IRestCPs[[i2idx]]}]~Join~{delta}]];
ImageJoinList[{ImageComposeAdd[i1,i2,delta]}~Join~iRest,th,printQ]
]
ImageJoinList[imgs_List,th_,printQ_]/;Length[imgs]==1:=First@imgs
Here an example using a data set of 10 screenshots (generated with the code from the question) which is feed into ImageJoinList in random order:
screenshots = takeNext[10]
ImageJoinList[RandomSample[screenshots], 0.2, True]
The result of each stage of the iteration is printed. Here one example

where the corresponding points are plotted over the images in pink and the shift vector is computed to be {0.,1254.}. Note that for this sage ImageCorrespondingPoints was able to find a significant number of points despite the fact, that the overlapping region is just text. The result

looks pretty good upon first inspection.
The implementation is pretty rough around the edges and it is not very fast (even when running the map over ImageCorrespondingPoints in parallel). One might be able to improve the reliability and performance by playing with the Options of ImageCorrespondingPoints. I think both ImageAlign and ImageStitch use this method under hood.
ImageCorrespondingPoints is arguably a bit overkill if one really assumes, that all screenshots have the same size and it is just a matter of finding the right order and vertical shift. I tried writing something my self by overlaying the screenshots with a variable shift and minimizing the difference but I was not able to get this to run because the image overlay, trim and pad methods of Mathematica are horrible to use (or at least I am not familiar enough with them to even get these rather simple tasks running properly).