Forums > News > New official screenshot 'scraping' available to all! (+ new proof.)
Simon Carless (1833) on 10/29/2016 2:42 PM · edited · Permalink · Report
We're happy to announce that MobyGames' new screenshot grabbing/'scraping' functionality is now available to all users. This allows you to easily add Steam, PlayStation Store, Nintendo eShop, iOS, or Android (Google Play) screenshots to our 'promo images' sections.
As an example, you can see on the 'promo images' page for Mechanic Master that there's a new dialog box. Just hit the 'Contribute' button on any game page and add images and they will be [EDIT!] added automatically, and you'll need to click through to submit them.
There are a few issues/exceptions - particularly around differences in regional Steam stores & with age gates for some of the stores, and we only grab from profiles with images, but it works in general. (Approvers working on the system early have submitted a LOT of the Steam releases in the DB already, but other stores - especially the Nintendo eShop - have been added to less, so check those out!)
[One other long-needed change that rolled out with this code update - you are now required to provide another URL as proof when you submit most item types that aren't 'scraped'. This helps us make sure the info is accurate, thanks! And thanks to Tracy Poff for all this coding work as per usual.]
Tracy Poff (2095) on 10/29/2016 3:13 PM · edited · Permalink · Report
they will be queued for approval automatically
Rather, the images will be added automatically, but you still need to submit them by clicking through the wizard. This gives you a chance to edit the types of any images as needed.
Simon Carless (1833) on 10/29/2016 3:23 PM · Permalink · Report
Ah, thanks for clarification Tracy, edited :)
Tracy Poff (2095) on 11/10/2016 11:07 PM · Permalink · Report
Someone must still select which game on a store matches which game in our DB--we often have several different versions of a game as separate entries (e.g. GOTY edition vs. standard), so that can't really be done automatically.
GTramp (81955) on 11/11/2016 12:26 PM · edited · Permalink · Report
There seems to be an issue with scraping from certain sources. Google Play - images are most often scraped in three or two identical sets, differing only in resolution (and sometimes even not that - entirely identical sets). Some contributors do notice it and clean it up (which is quite tedious), but some guys don't notice or don't care, so we have cases like this: http://www.mobygames.com/game/sonic-the-hedgehog/promo
Almost same with Appstore. Images come in two identical sets, one for for ipad, the other one for iphone.
I clean up my promo art submissions but I come across multiple entries with repeating promo shots, especially from Google Play. This is kind of sad to see and I can't think of a solution right now -- you can't make people go check what they've submitted if it's automated.
P.S. if clean-up is required anyway, is it possible to implement a mechanism for deleting shots in batches? Checking boxes; or as in the first stage of contribute credits wizard, images are deleated instantly over there, without reloading the page.
Karsa Orlong (151750) on 11/11/2016 1:26 PM · Permalink · Report
[Q --start GTramp wrote--]Almost same with Appstore. Images come in two identical sets, one for for ipad, the other one for iphone.[/Q --end GTramp wrote--]
We have two platforms here with different resolution, so it's correct to add two sets in this particular case.
Karsa Orlong (151750) on 11/11/2016 1:45 PM · Permalink · Report
It's not higher it's different!
Karsa Orlong (151750) on 11/11/2016 6:27 PM · Permalink · Report
That's right. Same screens for same platform - choose the best resolution, delete the others. Already settled some time ago, no discussion here. And yes, nobody cares, excluding me and You. Have a good fun with these ;)
Kennyannydenny (128068) on 11/11/2016 8:02 PM · Permalink · Report
More people care than just you two. I have discussed this is the past, that duplicate images are just completely useless. I was told that since they were different resolutions both were being accepted (this was even before the auto approve). If they're completely the same, i just don't get why that's being accepted. Often the exact same promo art is pulled from the official site, xbox.com and then also Steam.
3 exactly the same sets of the same source (Google Play Store) is completely nuts. I just don't see how we would ever benefit from that.
Simon Carless (1833) on 11/11/2016 9:28 PM · Permalink · Report
We would prefer that people don't grab e.g. from the Xbox Store if the same images exist in the PSN store, yes. I think in most/many cases people are NOT doing that, so that's the good news :)
Simon Carless (1833) on 11/11/2016 9:27 PM · edited · Permalink · Report
I've added a few Google App Store images also via scraper, and here are my preferred rules: if some of the resolutions are identical, please delete the dupes. If the aspect ratios are different between the different versions, please retain. If it's the same aspect ratio but different resolutions, it's somewhat up to you but I would prefer that you delete the others and keep the highest resolution.
I agree that it's annoying that Google Play has all these slightly different dupes - I talked to Tracy about only pulling the first set or something but I'm not sure we always know which set are the highest resolution. Have you guys worked that out - is it always the same # of screenshots x3 or does it differ a lot?
Patrick Bregger (305645) on 11/11/2016 9:41 PM · Permalink · Report
I can't say if this is a rule, but in the corrections I have processed related in the matter it was always: first set high resolution, second set lower resolution (and sometimes small additional differences to the first set), third set identical to second set.
GTramp (81955) on 11/12/2016 9:55 AM · edited · Permalink · Report
My god, there are so many entries with triple same sets of promo images from Google Play -- this is crazy. I just keep looking through game entries and they're everywhere. Something needs to be done about that.
Tracy Poff (2095) on 11/12/2016 2:49 PM · Permalink · Report
Unfortunately, there's not really anything I can do. Sometimes the images aren't duplicates, and anyway google just gives me a big bucket of images with no indication about some being duplicates of others. You can see this in the play store's web interface, too--you can click through the image gallery and see the duplicates there, with no differentiation.
Simon Carless (1833) on 11/12/2016 3:10 PM · Permalink · Report
I've been noticing that they are VERY rarely not duplicates tho - is there an argument to be made that losing the 5% that aren't duplicates is OK compared to what we do now...?
Tracy Poff (2095) on 11/13/2016 11:35 PM · edited · Permalink · Report
No, I mean, google just gives a list of images, say 1-12, and there is no distinction as to whether this is one set of twelve different images, or two sets of six, three sets of four, four sets of three, six sets of two, or twelve copies of the same image. Or maybe two sets of five and a set of two. It's just a single list of images, with no internal structure. It's not always three sets of images, so I can't even just blindly throw away the final two-thirds of the images in hopes of solving the problem.
That said, if someone will provide me with a couple of links to games that have actually identical (i.e. same content and resolution) images, I'll see if I might be able to do something about that situation.
Pseudo_Intellectual (67148) on 11/14/2016 3:48 AM · Permalink · Report
It seems to me that if we just want to keep the highest-quality shots, the file sizes would probably give it away: assign the largest file a score of 100%, weigh all the other files percentages of that one, and throw away all the ones that are lower than ~75%. That's a totally cocktail napkin calculation, but it seems that some similar weighting would do the sorting for us in most cases.
Tracy Poff (2095) on 11/14/2016 4:30 AM · Permalink · Report
A clever idea, but it'd have too many false positives. For example, 75% would throw away all but one of these images, and almost any percentage would be likely to throw away at least some desirable images (say, title screens, which might be likely to compress better, or phone shots for a game with big tablet shots). It doesn't fix the problem of images with identical content and resolution, either, which I think is a bigger problem than identical-content-but-different-resolution.
Pseudo_Intellectual (67148) on 11/15/2016 2:17 AM · Permalink · Report
It seems to me that identical files would give themselves away with identical file sizes, no?
GTramp (81955) on 11/14/2016 8:18 AM · edited · Permalink · Report
Well, here's one (source). Just added it today and not cleaning it up so that you could study it.
Tracy Poff (2095) on 11/15/2016 2:18 PM · edited · Permalink · Report
Okay, I should be able to work with this. Those duplicates do have identical hashes, so I should be able to prevent the duplicates from being added. Thanks for the links.
Edit: this change is now live; exact duplicates should no longer be added.
Edit redux: this only applies to images scraped all at once; it doesn't protect against duplicates from different stores or manually uploaded images.
GTramp (81955) on 11/17/2016 4:12 AM · edited · Permalink · Report
OK, so it works in case of 3 identical sets of images. But what if there are two identical sets and one different? Same images still seem to slip through. Please take a look at this.
Tracy Poff (2095) on 11/17/2016 6:16 AM · Permalink · Report
The images look extremely similar, but are not quite identical.
Simon Carless (1833) on 11/17/2016 4:34 PM · Permalink · Report
Yeah, some of the difference are INSANELY small. Maybe it's a spot the difference game in itself? :P
chirinea (47526) on 11/12/2016 3:01 PM · Permalink · Report
When I began contributing those, I was taking care of removing the duplicates. I was told to leave different resolutions on file even if they did happen to show the same thing, as they could've been rendered differently. I understand that both examples you gave are from contributions I made, so I'll go back to all my Google Play contributions and clean them. Sorry for the inconvenience.
Simon Carless (1833) on 11/12/2016 3:10 PM · Permalink · Report
Reminder - if they're absolutely identical it's good to remove them, but if they are different aspect ratios it's arguably good to keep them. (And I am personally OK with different resolutions at same aspect ratio - we do it for wallpapers sometimes.) Sorry if we/I confused you on this.
chirinea (47526) on 11/12/2016 3:20 PM · Permalink · Report
No problem, Simon! I'm keeping different resolutions, like this. Here we have 3 sets with the same content but different resolutions. I'm deleting the ones with the same resolution/content (for those I wish we could have that checksum I talked about with Tracy). Please, let me know if you decide I should also remove different resolutions at the same aspect ratio.
Simon Carless (1833) on 11/12/2016 3:30 PM · Permalink · Report
We are not going to be strict about different resolutions with same aspect ratio.
BTW, what Tracy is referring to with 'there might be different images' is that when we were testing, we found some images that looked ALMOST identical but were actually ever so slightly different images. Not sure if that was a mistake/rare tho cos I've rarely seen it since.
chirinea (47526) on 11/12/2016 10:41 PM · Permalink · Report
[Q --start GTramp wrote--]Chirinea, no offence intended man. There are similar contributions by other people, plenty of them. Please understand that I didn't choose yours specifically. [/Q --end GTramp wrote--]None taken! I too think duplicate shots look terrible and you didn't do anything wrong pointing out my mistakes.
Evolyzer (21909) on 11/13/2016 9:18 AM · edited · Permalink · Report
a terrible example (edit: 14.11.16 I cleaned up the mess)
Patrick Bregger (305645) on 11/13/2016 10:08 AM · Permalink · Report
Well, those are three identical sets. It does not really disprove the existence of a pattern, it just shows that the sequences are not always different. At least two of those pictures are not screenshots, by the way.
Ilya Petrovsky on 1/20/2017 6:21 AM · edited · Permalink · Report
spam
Karsa Orlong (151750) on 11/13/2016 12:55 PM · Permalink · Report
I quess there is only one resonable solution here - removing Google Play from automatic scrapping. Otherwise we will have to check all contributions from this source :/
Simon Carless (1833) on 11/13/2016 1:45 PM · Permalink · Report
Well, from an archival point of view, you can definitely argue that Google Play separately stores each of the images, so it is 'archivally correct' to keep everything. It is, after all, actually the data that the site returns to us. (But from a user point of view we look a bit silly publicly displaying everything which is why I'd prefer not to.)
My impression is that a lot of people adding on Google Play have been doing the screenshot culling anyhow.