5

Today, I read a good web page about love which owns many beautiful pictures. It is fussy for me to download every picture hand-by-hand, so I'd like to use Mathematica to do this case automatically.

I searched the documentation, then I found Import is a relate function.

Trial:

Import["http://mp.weixin.qq.com/s?__biz=MzA5NDY3OTAyNA==&mid=205311623&idx=1&sn=7b565a6ed5789732f698d5a6b4c5c652&scene=2&from=timeline&isappinstalled=0#rd", "Images"]

enter image description here

Obviously, this trial was failed.

Question

Is it possible to download pictures from web page automatically? Any suggestions or hints?

xyz
  • 605
  • 4
  • 38
  • 117
  • 2
    Many (but by no means all) modern web sites rely heavily on mechanisms such as Javascript. HTML parser that Import uses to find images on a web page doesn't understand these mechanisms, and can find only those images directly linked from the HTML document. In this case, they are the pen and progress indicator, rest of images are delivered through other means. I don't believe there is a built-in way to emulate a whole browser in built-in functionality of current Mathematica. – kirma Mar 08 '15 at 06:28
  • 1
    http://www.kylen314.com/archives/1647 This article maybe helpful – wuyingddg Mar 08 '15 at 06:39
  • @kirma It's also good to note that since Javascript is just a scripting language like Mathematica, it can't do anything Mathematica can't. If we download the source code we have access to the same data that the Javascript script has, and if it manages to find image URLs in the source code then so can we (as wuyingddg has shown). – C. E. Mar 08 '15 at 07:43
  • 2
    @Pickett This is of course true, but unless we want to be overly abstract, it has to be recognized that collecting data - for instance all images an user might consider to be present on a web page while she reads it - needs to emulate a browser (which is computably feasible for Mma), but might even require emulating user interaction, such as scrolling the page. Solution by wuyingddg is valid in this specific situation, but by no means universal solution to a general, unfortunate problem that arises in the modern Web. – kirma Mar 08 '15 at 08:45
  • 1
    @kirma For most websites the case is that the logic Javascript carries out is quite simple and what you really want to get your hands on is the input data (e.g. image URLs.) This is (for reusability and ease-of-use reasons) stored in the HTML itself, and usually in well known places such as data attributes. With jSoupLink we can extract the input data with the same syntax that you would use in Javascript, so it's not harder or easier in Mathematica than in Javascript. – C. E. Mar 08 '15 at 09:21
  • 1
    @kirma There are some websites that are very complicated with a lot of non-trivial Javascript, but I classify those as web applications (think Google Docs). In those cases it can be hard to locate what you're looking for. But the vast majority of websites are not like that. – C. E. Mar 08 '15 at 09:23
  • I agree with @kirma and think the title should be modified to restrict focus on img elements specifically. There are so many other ways of putting "pictures" on web pages (especially in HTML5, e.g., svg, canvas, and yes: plugins like CDF and Flash, etc. etc.) that the question seems too broad. – Jens Mar 08 '15 at 18:54

1 Answers1

12

Based on this article: kylen314.com/archives/1647

st = Import[
   "http://mp.weixin.qq.com/s?__biz=MzA5NDY3OTAyNA==&mid=205311623&\
idx=1&sn=7b565a6ed5789732f698d5a6b4c5c652&scene=2&from=timeline&\
isappinstalled=0#rd", "XMLObject"];
PicAddress = 
  Cases[st, 
   XMLElement["img", {"data-src" -> src_, ___}, {}] :> src, {0, 
    Infinity}];
Import /@ PicAddress

enter image description here

wuyingddg
  • 1,943
  • 10
  • 14