Style Coherent Fashion Recommender with GenAI

This fashion recom­men­der sys­tem is not focu­sed on indi­vi­du­al fashion items but on com­ple­te out­fits. It works by ana­ly­zing an input image of a per­son wea­ring an out­fit and then pro­vi­ding recom­men­da­ti­ons in the form of images fea­turing other indi­vi­du­als dres­sed in coor­di­na­ted out­fits. This approach ensu­res that the recom­men­da­ti­ons are not just about sin­gle pie­ces of clot­hing but are com­pre­hen­si­ve ensem­bles.

Motivation

After recent deve­lo­p­ments in GenAI, par­ti­cu­lar­ly in Visi­on-Lan­guage Models (VLMs), the pro­ject I star­ted in Decem­ber 2022, to deve­lop a fashion recom­men­der seems alre­a­dy out­da­ted. The ori­gi­nal pro­ject from 99flairs is out­lined here. Dri­ven by curio­si­ty, I deci­ded to revi­sit the pro­ject in a redu­ced form.

Objective

The objec­ti­ve of this pro­ject is to deve­lop a fashion/outfit recom­men­der sys­tem that enhan­ces user inter­ac­tion by pro­vi­ding per­so­na­li­zed recom­men­da­ti­ons. The focus is on achie­ving style cohe­rence, ensu­ring the sys­tem accu­ra­te­ly under­stands and reflects the user’s pre­fer­red fashion styl­es, and main­tai­ning rich diver­si­ty, offe­ring a broad ran­ge of clot­hing items within the recom­men­da­ti­ons. The aim is not to deve­lop a com­ple­te recom­men­der sys­tem, as that would requi­re exten­si­ve input bey­ond the scope of this show­ca­se. Ins­tead, the pro­ject con­cen­tra­tes on the most chal­len­ging aspect that cur­rent fashion recom­men­ders strugg­le with: balan­cing style cohe­rence with a diver­se sel­ec­tion of fashion items.

Approach

It is worth men­tio­ning, that the ques­ti­on of how a machi­ne can learn a fashion style has been addres­sed befo­re by Wei-Lin Hsiao and Kris­ten Grauman in their excel­lent rese­arch paper: Lear­ning the Latent “Look”: Unsu­per­vi­sed Dis­co­very of a Style-Coher­ent Embed­ding from Fashion Images. The main pro­blem is the sub­jec­ti­ve natu­re of style and the gre­at obs­ta­cle of crea­ting a labe­led data set. Then, pre-trai­ned LLMs and VLMs were not as advan­ced and available as they are today. Curr­ent­ly, I have the luxu­ry of easi­ly imple­men­ting a VLM that has been trai­ned on mil­li­ons of images within hours, and that is what I have done here.

The sys­tem, hos­ted on AWS, beg­ins by using a Lamb­da func­tion to resi­ze the input image. This image, along with a well-engi­nee­red prompt, is then sent to Anthropic’s Clau­de Son­net 3 model. The model gene­ra­tes a list of fashion-rela­ted voca­bu­la­ry describ­ing the style of the out­fit. This list is then sent to OpenAI’s text-embed­ding-ada-002 model, which returns a vec­tor repre­sen­ta­ti­on. Appro­xi­m­ate­ly 17,000 images (16,500 women, 500 men) have been pro­ces­sed and stored as vec­tors in Qdrant’s vec­tor data­ba­se. For fashion recom­men­da­ti­ons, the sys­tem uses the­se vec­tors for simi­la­ri­ty sear­ches within Qdrant, rather than sto­ring new vectors.For com­ple­ten­ess, I should add that the images are stored on AWS S3, and new­ly uploa­ded images are not stored. The data­set, I am working with is available here. I have built an inter­face. You can try out the Gene­ra­ti­ve AI Fashion Recom­men­der here.

Architecture and Results

More Results

Learn how we helped 100 top brands gain success