Zerochan.net is one of famous anime/game/CG imageboards with strong community and modest crossposting with other imageboards.
It has specific tagging system - close to e-shuushuu-net - but not to mainstream danbooru / safebooru / yande-re / konachan / sankaku.
That’s why Zerochan is a good distinct source for investigation of non-photographic images and their metadata.
This release devoted to dates between 01.01.2015 (ID=1820240) and 31.12.2016 (ID=2064142)
right before release zerochan-2017 on russian tracker (https://rutracker.org/forum/viewtopic.php?t=5478026) you can get magnet there
and giant zerochan-2018-2020 here (https://nyaa.iss.one/view/1304539) most of description and processing STUFF can be found there
Release contains:
- 61348 images in 270 zipped folders (1820xxx-2064xxx and several addons) partitioned mostly by 1.000-th ID
- filtered by size
~ least(image_height,image_width)>=1080 – fullHD wallpapers as minimum
~ image_height*image_width>=1200000 – 1100x1100 included
~ image_width/image_height between 0.25 and 4 – not too disproportional
- renamed “zerochan - id - up_to_3_sources ~ up_to_5_characters (up_to_2_artists).ext”
~ tags concatenated via “+”, spaces replaced with underscores
~ maximum file name length 220 symbols, characters tags may be truncated if too long
- image format - JPG
- some gentle deduplication made (only visually identical images dropped)
- some metadata for every image “ZERO_POSTS_2016.TSV” in root folder 61348 rows
- tag info for Copyright / Characters / Artists “ZERO_TAGS_2016.TSV” - 1498790 rows
and also additional cross-release metadata:
- for every image “ZERO_POSTS_TORR_2017.TSV” 172360 rows for zerochan-2017
- tag info “ZERO_TAGS_2017.TSV” - 1028999 rows
- rename script from zerochan-2017 naming to used here zero_rename_2017.bat
- you cannot run it at once because of Windows limitation 16k commands in batch
- ZERO_RAW_2015-2020.JSON initial data for all 3 releases - 979103 rows
- NOTE post references in JSON URL may differ from page ID where it found, unfortunately I didn’t save page ID
This torrent is not so huge compared to 2018-2020 because less count of images and no PNG at Zerochan that times.
Earlier interval (2014-) of Zerochan has big (80%+ ?) intersection with Sankaku (https://nyaa.iss.one/view/750972),
Safebooru (https://nyaa.iss.one/view/719463) and e-shuushuu (https://nyaa.iss.one/view/513582, https://nyaa.iss.one/view/771715)
so I see no sense to go deeper. Not yet. Hoops, I did it again !
Comments - 1
SomaHeir
Thanks!!!