Author Topic: The Deduplication Project  (Read 2889 times)

0 Members and 1 Guest are viewing this topic.

Offline attractivo

  • Hero Member
  • *****
  • Posts: 586
Re: The Deduplication Project
« Reply #15 on: June 18, 2017 - 09:19:15 »
~
« Last Edit: August 27, 2017 - 20:39:07 by attractivo »

Offline attractivo

  • Hero Member
  • *****
  • Posts: 586
Re: The Deduplication Project
« Reply #16 on: June 19, 2017 - 19:09:48 »
"Ultimate DATs Collection" and "The Deduplication Project" have come to an end now. i hope you have enjoyed both projects.


Final Stats:

dats-size before deduping:      112,5 GB
dats-size after deduping:             1,6 GB

dats-count before deduping:     69621
dats-count after deduping:          9904

hashes-count before deduping: uncountable!
hashes-count after deduping:  11.117.841 unique hashes!
« Last Edit: August 27, 2017 - 20:39:43 by attractivo »

Offline cannonwillow

  • Jr. Member
  • **
  • Posts: 85
Re: The Deduplication Project
« Reply #17 on: June 23, 2017 - 18:29:36 »
Quick question

are _old folders within the new folders tree also not needed for a complete deduped collection?

example:   New/No-Intro/_Old/*

thanks in advance and for the amazing work.

cannonwillow


Offline attractivo

  • Hero Member
  • *****
  • Posts: 586
Re: The Deduplication Project
« Reply #18 on: June 23, 2017 - 19:03:21 »
the 9904 dats inside New are all needed for a complete deduped hash-unique collection.
« Last Edit: August 27, 2017 - 20:40:12 by attractivo »

Offline cannonwillow

  • Jr. Member
  • **
  • Posts: 85
Re: The Deduplication Project
« Reply #19 on: June 23, 2017 - 19:27:48 »
thanks for clearing that up.

 8)

Offline cannonwillow

  • Jr. Member
  • **
  • Posts: 85
Re: The Deduplication Project
« Reply #20 on: July 24, 2017 - 08:58:38 »
@tractivo, I have found some undeduped roms within the deduped collection (using the 9904 dat collection). such as gdl-0024.chd and gds-0031.chd as can be seen in the picture attached. one is from \Emulator  and the other from \Mame, Mess . I have seen a few others but cant recall from where. Maybe this is only .chd related.

and how could this amazing project end up on page 2, it really should be stickyed...

cannonwillow
« Last Edit: July 24, 2017 - 09:09:34 by cannonwillow »

Offline attractivo

  • Hero Member
  • *****
  • Posts: 586
Re: The Deduplication Project
« Reply #21 on: July 24, 2017 - 09:27:53 »
no need for sticky as this will not see further updates :) this was a one time only collection that should help varify all files that normally were unvarifiable. because no rommanager is able to load 100gb of dats at once :) but that aside thanks for your feedback.
« Last Edit: August 27, 2017 - 20:40:49 by attractivo »

Offline cannonwillow

  • Jr. Member
  • **
  • Posts: 85
Re: The Deduplication Project
« Reply #22 on: July 24, 2017 - 10:12:44 »
I thought by using the deduped dats (9904) I could minimize the size of my rom collection, especially random stuff that may be useful later. instead of deleting when I give up on it or not sure what it is. hate to figure out howto and then start the search all over again. A dat can be pulled from the normal udc collection or anywhere, shoved into the datroot with the deduped collection, uncheck all areas in romvault that probably do not apply, sort order useful here. hunt down the rest. TB's are really getting cheap these days but nobody wants copies of the same thing except for backup (and obviously not on the same drive). and actually organized.



thanks for the undeduped .chd explanation

Offline attractivo

  • Hero Member
  • *****
  • Posts: 586
Re: The Deduplication Project
« Reply #23 on: July 24, 2017 - 10:24:41 »
all true. there you go ;)
« Last Edit: August 27, 2017 - 20:41:03 by attractivo »

Offline cannonwillow

  • Jr. Member
  • **
  • Posts: 85
Re: The Deduplication Project
« Reply #24 on: July 24, 2017 - 10:34:45 »
had to go to microcenters store yesterday to buy more memory. when romvault (using deduped) goes into virtual memory during scan/fix might as well turn the monitors off, drink one last beer, and check it when you wake up the next day and then again the next day. extra memory was designed for romvault :D

Offline NLS

  • Sr. Member
  • ****
  • Posts: 322
Re: The Deduplication Project
« Reply #25 on: July 24, 2017 - 11:53:31 »
Very interesting project and amazing work by @tractivo.

Would prefer a generic solution (non existent) though, that automatically uses dats in the order a user decides (because now we all have to agree with @tractivo's order), finds duplicates and fixes things.

Still it is interesting as it might "clean up" left overs we all have.
---
NLS

Offline attractivo

  • Hero Member
  • *****
  • Posts: 586
Re: The Deduplication Project
« Reply #26 on: July 24, 2017 - 12:10:40 »
~
« Last Edit: August 27, 2017 - 20:41:20 by attractivo »

Offline attractivo

  • Hero Member
  • *****
  • Posts: 586
Re: The Deduplication Project
« Reply #27 on: July 24, 2017 - 14:20:29 »
~
« Last Edit: August 27, 2017 - 20:41:31 by attractivo »

Offline cannonwillow

  • Jr. Member
  • **
  • Posts: 85
Re: The Deduplication Project
« Reply #28 on: September 27, 2017 - 11:25:02 »
not trying to bring up a dead project, but cant scan these roms a second time without becoming unfixable,  Console/[Misc]/Nintendo - Super Nintendo Entertainment System - SuperNESBase (20160518)[@tractivo]/SuperNESBase/Emulators/....(bsnes_v091-32bit, higan_v094-64bit, snes9x-1.53-x32, snes9x-1.53-x64). just curious as to why romvault keeps wanting to rescan and giving error from these.

Offline attractivo

  • Hero Member
  • *****
  • Posts: 586
Re: The Deduplication Project
« Reply #29 on: September 28, 2017 - 08:40:08 »
romvault has a bug with superdats that are forcepacking unzip.

open the dat in a texteditor and in the header remove the line
<clrmamepro forcepacking=''unzip''/>
save the dat and load it with romvault. refix the files for that dat. thats it