This is FURRY-CENTRIC site rip originated from **e621:net** imageboard grabbed mostly via **tbib:org** crossposting:
- **TBIB** for interval **04.2016-07.2022** post ID 5.000.000 - 11.000.000
- **E621** only topmost **up to 07.2016** post ID 10.000 - 999.999
**This rips is not intended to be "complete and maximum quality" but rather "representative the best of"
to help anybody to open the furry world while not bumping into yiff (furry hentai, often male/male) and comix stockpiles**
Another reason is neural network training over art images.
There are [promising results](https://github.com/aperveyev/booru_yolo) for specie-specific head classes (dragonhead, ponyhead, Judy Hopps, Nick Wilde, ...), stay tuned.
Manually:
- comic and 4koma, most of line-arts, segmented scans and overtexted covers filtered out
- crops done when large simple or dirty background, occationally gamma correction and other nontrivial improvements made
Also a lot of ~~handjob~~ manual filtering done to avoid obviously unsafe art and throttle most of furry fetishes.
Despite furry is not SFW by definition, (almost) no frontal nudity and evident adult activity left here so R14+ seems applicable.
### This release contains:
- **279.736 JPG images**
* renamed to contain **ID - up_to_3_copyrights ~ up_to_5_species_or_characters (up_to_2_artists)**
* PNG >> JPG (94% quality) [converted](https://imagemagick.org), some of them "sampled" to reasonable size / volume
* deduplicated using [AntiDupl](https://github.com/ermig1979/AntiDupl) up to 4% similarity
* splitted / zipped into folders by ID range and also **Q**uestionable and e**X**tra separated (use [MaxView](https://www.faststone.org/FSMaxViewDetail.htm) or unzip to browse)
.
- additional TSV (tab separated text) metadata
* key parameters for every image (from imageboard and released) [spreadsheet](https://www.libreoffice.org/download/download-libreoffice) capable
* tag-to-image relations - 8.167.078 rows; involve some [tool](https://gnuwin32.sourceforge.net/packages/gawk.htm) to use
.
### More about sampling
1) [detected](https://exiftool.org) image properties
```
exiftool -filecreatedate -imagesize -filesize# -filetype -JPEGQualityEstimate -csv -r B:\TBIB\ > exif.txt
```
2) [sophisticatedly](https://www.oracle.com/cis/database/technologies/xe-downloads.html) used
```
select 'magick convert "'||sourcefile||'" '||
case when iw/ih between 0.8 and 1.2 and px>4000000 then '-resize 1920x1920^>'
when iw/ih<0.8 and px>5000000 then '-resize 2480x2480^>'
when iw/ih>1.2 and px>6000000 then '-resize 2560x2560^>'
else to_char(null) end||' '||
case when jq>=98 then '-quality 94' else to_char(null) end||' '||
case when filesize/(iw*ih)>0.7 then '-blur 4' else to_char(null) end||
' "'||replace(sourcefile,'\tbib\','\tbic\')||'"' mm
from exif e
where ( jq between 98 and 100
or (iw/ih between 0.8 and 1.2 and px>4000000)
or (iw/ih<0.8 and px>5000000)
or (iw/ih>1.2 and px>6000000) )
and ((filesize>1600000 and jq>84) or filesize>4000000 or (filesize/(iw*ih)>0.7) )
order by fpath desc, fname
```
3) image left untouched when minimal or negative effect of sampling
.
### More about metadata
.
#### TBIB_E621_2022.tsv
.
FID - imageboard post ID (e621 when < 1000000, tbib when >= 5000000)
**for torrent content**
FPATH - folder / zip name
FNAME - file name
TORR_FSIZE - file size, bytes
TORR_ISIZE - image size WxH
TORR_JQ - JPEG quality
TORR_MD5 - checksum
**imageboard originated** if available
ORIG_DT - posting date
ORIG_RATE - Safe / Questionable
ORIG_ISIZE - WxH
ORIG_EXT - image type (extension)
ORIG_MD5 - checksum
**imagemagick:org** calculated
TENTR - enthropy (complexity)
TSKEW - skewness (black/white balance)
TSTDDEV - (black/white contrast)
TCOLORS - count of colors
**keras-craft text detector** calculated
TXSIZE - total text area
TXCNT - number of text pieces
.
#### TBIB_E621_2022_TAGS.tsv
.
FID - imageboard post ID
TAG - string tag
TAG_CAT - tag category COPYRIGHT / CHARACTER / SPECIE / ARTIST / GENERAL or UNKNOWN
Comments - 0