In memoriam, Donald Sutherland

Forums > News > New official screenshot 'scraping' available to all! (+ new proof.)

user avatar

Simon Carless (1834) on 10/29/2016 2:42 PM · edited · Permalink · Report

We're happy to announce that MobyGames' new screenshot grabbing/'scraping' functionality is now available to all users. This allows you to easily add Steam, PlayStation Store, Nintendo eShop, iOS, or Android (Google Play) screenshots to our 'promo images' sections.

As an example, you can see on the 'promo images' page for Mechanic Master that there's a new dialog box. Just hit the 'Contribute' button on any game page and add images and they will be [EDIT!] added automatically, and you'll need to click through to submit them.

There are a few issues/exceptions - particularly around differences in regional Steam stores & with age gates for some of the stores, and we only grab from profiles with images, but it works in general. (Approvers working on the system early have submitted a LOT of the Steam releases in the DB already, but other stores - especially the Nintendo eShop - have been added to less, so check those out!)

[One other long-needed change that rolled out with this code update - you are now required to provide another URL as proof when you submit most item types that aren't 'scraped'. This helps us make sure the info is accurate, thanks! And thanks to Tracy Poff for all this coding work as per usual.]

user avatar

Tracy Poff (2095) on 10/29/2016 3:13 PM · edited · Permalink · Report

they will be queued for approval automatically

Rather, the images will be added automatically, but you still need to submit them by clicking through the wizard. This gives you a chance to edit the types of any images as needed.

user avatar

Simon Carless (1834) on 10/29/2016 3:23 PM · Permalink · Report

Ah, thanks for clarification Tracy, edited :)

user avatar

ZeTomes (36265) on 11/3/2016 4:44 AM · Permalink · Report

I predict a lot of images will be repeated...

user avatar

MrFlibble (18361) on 11/10/2016 3:52 PM · Permalink · Report

This made me think, what about making a script that will automatically harvest screenshots from said sources, without any involvement from contributors?

user avatar

Tracy Poff (2095) on 11/10/2016 11:07 PM · Permalink · Report

Someone must still select which game on a store matches which game in our DB--we often have several different versions of a game as separate entries (e.g. GOTY edition vs. standard), so that can't really be done automatically.

user avatar

GTramp (81961) on 11/11/2016 12:26 PM · edited · Permalink · Report

There seems to be an issue with scraping from certain sources. Google Play - images are most often scraped in three or two identical sets, differing only in resolution (and sometimes even not that - entirely identical sets). Some contributors do notice it and clean it up (which is quite tedious), but some guys don't notice or don't care, so we have cases like this: http://www.mobygames.com/game/sonic-the-hedgehog/promo

Almost same with Appstore. Images come in two identical sets, one for for ipad, the other one for iphone.

I clean up my promo art submissions but I come across multiple entries with repeating promo shots, especially from Google Play. This is kind of sad to see and I can't think of a solution right now -- you can't make people go check what they've submitted if it's automated.

P.S. if clean-up is required anyway, is it possible to implement a mechanism for deleting shots in batches? Checking boxes; or as in the first stage of contribute credits wizard, images are deleated instantly over there, without reloading the page.

user avatar

Karsa Orlong (151775) on 11/11/2016 1:26 PM · Permalink · Report

[Q --start GTramp wrote--]Almost same with Appstore. Images come in two identical sets, one for for ipad, the other one for iphone.[/Q --end GTramp wrote--]

We have two platforms here with different resolution, so it's correct to add two sets in this particular case.

user avatar

GTramp (81961) on 11/11/2016 1:36 PM · Permalink · Report

Yeah, but there's no need in keeping two sets of absolutely same shots. Just keep those with higher resolution and those that differ from promo shots already on file.

user avatar

Karsa Orlong (151775) on 11/11/2016 1:45 PM · Permalink · Report

It's not higher it's different!

user avatar

GTramp (81961) on 11/11/2016 1:49 PM · Permalink · Report

Well, maybe in case of iOS, I don't care and don't wanna argue. The aspect ratio is different (though not always). Maybe this can be helpful for someone (anyone?)

But Android shots - those come in sets of 3 which is embarrassing.

user avatar

Karsa Orlong (151775) on 11/11/2016 6:27 PM · Permalink · Report

That's right. Same screens for same platform - choose the best resolution, delete the others. Already settled some time ago, no discussion here. And yes, nobody cares, excluding me and You. Have a good fun with these ;)

user avatar

Kennyannydenny (128154) on 11/11/2016 8:02 PM · Permalink · Report

More people care than just you two. I have discussed this is the past, that duplicate images are just completely useless. I was told that since they were different resolutions both were being accepted (this was even before the auto approve). If they're completely the same, i just don't get why that's being accepted. Often the exact same promo art is pulled from the official site, xbox.com and then also Steam.

3 exactly the same sets of the same source (Google Play Store) is completely nuts. I just don't see how we would ever benefit from that.

user avatar

Simon Carless (1834) on 11/11/2016 9:28 PM · Permalink · Report

We would prefer that people don't grab e.g. from the Xbox Store if the same images exist in the PSN store, yes. I think in most/many cases people are NOT doing that, so that's the good news :)

user avatar

Simon Carless (1834) on 11/11/2016 9:27 PM · edited · Permalink · Report

I've added a few Google App Store images also via scraper, and here are my preferred rules: if some of the resolutions are identical, please delete the dupes. If the aspect ratios are different between the different versions, please retain. If it's the same aspect ratio but different resolutions, it's somewhat up to you but I would prefer that you delete the others and keep the highest resolution.

I agree that it's annoying that Google Play has all these slightly different dupes - I talked to Tracy about only pulling the first set or something but I'm not sure we always know which set are the highest resolution. Have you guys worked that out - is it always the same # of screenshots x3 or does it differ a lot?

user avatar

Patrick Bregger (303492) on 11/11/2016 9:41 PM · Permalink · Report

I can't say if this is a rule, but in the corrections I have processed related in the matter it was always: first set high resolution, second set lower resolution (and sometimes small additional differences to the first set), third set identical to second set.

user avatar

GTramp (81961) on 11/11/2016 10:25 PM · Permalink · Report

From what I noticed, the second and third sets are bigger resolution. I guess the order can vary which is bad.

user avatar

GTramp (81961) on 11/12/2016 9:55 AM · edited · Permalink · Report

My god, there are so many entries with triple same sets of promo images from Google Play -- this is crazy. I just keep looking through game entries and they're everywhere. Something needs to be done about that.

user avatar

Tracy Poff (2095) on 11/12/2016 2:49 PM · Permalink · Report

Unfortunately, there's not really anything I can do. Sometimes the images aren't duplicates, and anyway google just gives me a big bucket of images with no indication about some being duplicates of others. You can see this in the play store's web interface, too--you can click through the image gallery and see the duplicates there, with no differentiation.

user avatar

Simon Carless (1834) on 11/12/2016 3:10 PM · Permalink · Report

I've been noticing that they are VERY rarely not duplicates tho - is there an argument to be made that losing the 5% that aren't duplicates is OK compared to what we do now...?

user avatar

Tracy Poff (2095) on 11/13/2016 11:35 PM · edited · Permalink · Report

No, I mean, google just gives a list of images, say 1-12, and there is no distinction as to whether this is one set of twelve different images, or two sets of six, three sets of four, four sets of three, six sets of two, or twelve copies of the same image. Or maybe two sets of five and a set of two. It's just a single list of images, with no internal structure. It's not always three sets of images, so I can't even just blindly throw away the final two-thirds of the images in hopes of solving the problem.

That said, if someone will provide me with a couple of links to games that have actually identical (i.e. same content and resolution) images, I'll see if I might be able to do something about that situation.

user avatar

Pseudo_Intellectual (66542) on 11/14/2016 3:48 AM · Permalink · Report

It seems to me that if we just want to keep the highest-quality shots, the file sizes would probably give it away: assign the largest file a score of 100%, weigh all the other files percentages of that one, and throw away all the ones that are lower than ~75%. That's a totally cocktail napkin calculation, but it seems that some similar weighting would do the sorting for us in most cases.

user avatar

Tracy Poff (2095) on 11/14/2016 4:30 AM · Permalink · Report

A clever idea, but it'd have too many false positives. For example, 75% would throw away all but one of these images, and almost any percentage would be likely to throw away at least some desirable images (say, title screens, which might be likely to compress better, or phone shots for a game with big tablet shots). It doesn't fix the problem of images with identical content and resolution, either, which I think is a bigger problem than identical-content-but-different-resolution.

user avatar

GTramp (81961) on 11/14/2016 7:53 AM · Permalink · Report

What about simply comparing checksum and flagging images with identical checksum? Or file size.

user avatar

Pseudo_Intellectual (66542) on 11/15/2016 2:17 AM · Permalink · Report

It seems to me that identical files would give themselves away with identical file sizes, no?

user avatar

Rwolf (23400) on 11/15/2016 11:42 AM · Permalink · Report

Yes, but the reverse does not follow. Identical file size does not mean identical files. (I've got a lot of .png files with the same size, but different content)

user avatar

GTramp (81961) on 11/14/2016 8:18 AM · edited · Permalink · Report

Well, here's one (source). Just added it today and not cleaning it up so that you could study it.

Another example (source)

user avatar

Tracy Poff (2095) on 11/15/2016 2:18 PM · edited · Permalink · Report

Okay, I should be able to work with this. Those duplicates do have identical hashes, so I should be able to prevent the duplicates from being added. Thanks for the links.

Edit: this change is now live; exact duplicates should no longer be added.

Edit redux: this only applies to images scraped all at once; it doesn't protect against duplicates from different stores or manually uploaded images.

user avatar

GTramp (81961) on 11/15/2016 10:39 PM · Permalink · Report

Great news! Thank you Tracy.

user avatar

GTramp (81961) on 11/17/2016 4:12 AM · edited · Permalink · Report

OK, so it works in case of 3 identical sets of images. But what if there are two identical sets and one different? Same images still seem to slip through. Please take a look at this.

user avatar

Tracy Poff (2095) on 11/17/2016 6:16 AM · Permalink · Report

The images look extremely similar, but are not quite identical.

user avatar

GTramp (81961) on 11/17/2016 6:40 AM · Permalink · Report

You're right, I failed to notice that difference at the first glance. LOL ))) This and that, this and that again )))

user avatar

Simon Carless (1834) on 11/17/2016 4:34 PM · Permalink · Report

Yeah, some of the difference are INSANELY small. Maybe it's a spot the difference game in itself? :P

user avatar

chirinea (47516) on 11/12/2016 3:01 PM · Permalink · Report

When I began contributing those, I was taking care of removing the duplicates. I was told to leave different resolutions on file even if they did happen to show the same thing, as they could've been rendered differently. I understand that both examples you gave are from contributions I made, so I'll go back to all my Google Play contributions and clean them. Sorry for the inconvenience.

user avatar

Simon Carless (1834) on 11/12/2016 3:10 PM · Permalink · Report

Reminder - if they're absolutely identical it's good to remove them, but if they are different aspect ratios it's arguably good to keep them. (And I am personally OK with different resolutions at same aspect ratio - we do it for wallpapers sometimes.) Sorry if we/I confused you on this.

user avatar

chirinea (47516) on 11/12/2016 3:20 PM · Permalink · Report

No problem, Simon! I'm keeping different resolutions, like this. Here we have 3 sets with the same content but different resolutions. I'm deleting the ones with the same resolution/content (for those I wish we could have that checksum I talked about with Tracy). Please, let me know if you decide I should also remove different resolutions at the same aspect ratio.

user avatar

Simon Carless (1834) on 11/12/2016 3:30 PM · Permalink · Report

We are not going to be strict about different resolutions with same aspect ratio.

BTW, what Tracy is referring to with 'there might be different images' is that when we were testing, we found some images that looked ALMOST identical but were actually ever so slightly different images. Not sure if that was a mistake/rare tho cos I've rarely seen it since.

user avatar

GTramp (81961) on 11/12/2016 10:19 PM · Permalink · Report

Chirinea, no offence intended man. There are similar contributions by other people, plenty of them. Please understand that I didn't choose yours specifically.

user avatar

chirinea (47516) on 11/12/2016 10:41 PM · Permalink · Report

[Q --start GTramp wrote--]Chirinea, no offence intended man. There are similar contributions by other people, plenty of them. Please understand that I didn't choose yours specifically. [/Q --end GTramp wrote--]None taken! I too think duplicate shots look terrible and you didn't do anything wrong pointing out my mistakes.

user avatar

chirinea (47516) on 11/12/2016 6:45 PM · Permalink · Report

All the ones I've sent should be fixed now, please, let me know if you find any more from me.

user avatar

Evolyzer (21862) on 11/13/2016 9:18 AM · edited · Permalink · Report

a terrible example (edit: 14.11.16 I cleaned up the mess)

user avatar

Patrick Bregger (303492) on 11/13/2016 10:08 AM · Permalink · Report

Well, those are three identical sets. It does not really disprove the existence of a pattern, it just shows that the sequences are not always different. At least two of those pictures are not screenshots, by the way.

user avatar

Ilya Petrovsky on 1/20/2017 6:21 AM · edited · Permalink · Report

spam

user avatar

Karsa Orlong (151775) on 11/13/2016 12:55 PM · Permalink · Report

I quess there is only one resonable solution here - removing Google Play from automatic scrapping. Otherwise we will have to check all contributions from this source :/

user avatar

Simon Carless (1834) on 11/13/2016 1:45 PM · Permalink · Report

Well, from an archival point of view, you can definitely argue that Google Play separately stores each of the images, so it is 'archivally correct' to keep everything. It is, after all, actually the data that the site returns to us. (But from a user point of view we look a bit silly publicly displaying everything which is why I'd prefer not to.)

My impression is that a lot of people adding on Google Play have been doing the screenshot culling anyhow.