--- license: unlicense datasets: - poloclub/diffusiondb language: - en metrics: - wer pipeline_tag: image-to-text --- # Untitled7-colab_checkpoint This model was lovingly named after the Google Colab notebook that made it. It is a finetune of Microsoft's [git-large-coco](https://huggingface.co/microsoft/git-large-coco) model on the 1k subset of [poloclub/diffusiondb](https://huggingface.co/datasets/poloclub/diffusiondb/viewer/2m_first_1k/train). It is supposed to read images and extract a stable diffusion prompt from it but, it might not do a good job at it. I wouldn't know I haven't extensivly tested it. As the title suggests this is a checkpoint as I formerly intended to do it on the entire dataset but, I'm unsure if I want to now... This is my first public model so please be nice! ## Intended use Fun! ```python # Load model directly from transformers import AutoProcessor, AutoModelForCausalLM processor = AutoProcessor.from_pretrained("SE6446/Untitled7-colab_checkpoint") model = AutoModelForCausalLM.from_pretrained("SE6446/Untitled7-colab_checkpoint") ################################################################# # Use a pipeline as a high-level helper from transformers import pipeline pipe = pipeline("image-to-text", model="SE6446/Untitled7-colab_checkpoint") ``` ## Out-of-scope use Don't use this model to discriminate, alienate or in any other way harm/harass individuals. You guys know the drill... ## Bias, Risks and, Limitations This model does not produce accurate prompts, this is merely a bit of fun (and waste of funds). However it can suffer from bias present in the orginal git-large-coco model. ## Training *I.e boring stuff* - lr = 5e-5 - epochs = 150 - optim = adamw - fp16 If you want to further finetune it then you should freeze the embedding and vision tranformer layers