File size: 4,089 Bytes

4beca9f
 
 
 
 
 
 
 
 
 
00aec5d
4beca9f
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
00aec5d
4beca9f
 
 
 
ea6d01d
 
 
 
 
 
4beca9f
 
 
 
ea6d01d
4beca9f
 
ea6d01d
4beca9f
ea6d01d
4beca9f
ea6d01d
4beca9f
ea6d01d
 
 
 
4beca9f

{
    "config": {
        "query_token_id": "[unused0]",
        "doc_token_id": "[unused1]",
        "query_token": "[Q]",
        "doc_token": "[D]",
        "ncells": null,
        "centroid_score_threshold": null,
        "ndocs": null,
        "load_index_with_mmap": false,
        "index_path": "index",
        "nbits": 2,
        "kmeans_niters": 4,
        "resume": false,
        "similarity": "cosine",
        "bsize": 64,
        "accumsteps": 1,
        "lr": 3e-6,
        "maxsteps": 500000,
        "save_every": null,
        "warmup": null,
        "warmup_bert": null,
        "relu": false,
        "nway": 2,
        "use_ib_negatives": false,
        "reranker": false,
        "distillation_alpha": 1.0,
        "ignore_scores": false,
        "model_name": null,
        "query_maxlen": 32,
        "attend_to_mask_tokens": false,
        "interaction": "colbert",
        "dim": 128,
        "doc_maxlen": 512,
        "mask_punctuation": true,
        "checkpoint": "colbert-ir\/colbertv2.0",
        "triples": null,
        "collection": [
            "list with 47683 elements starting with...",
            [
                "This paper studies the correction of challenging authentic Finnish learner texts at beginner level (CEFR A1). Three state-of-the-art large language models are compared, and it is shown that GPT-4 outperforms GPT-3.5, which in turn outperforms Claude v1 on this task. Additionally, ensemble models based on classifiers combining outputs of multiple single models are evaluated. The highest accuracy for an ensemble model is 84.3{\\%}, whereas the best single model, which is a GPT-4 model, produces sentences that are fully correct 83.3{\\%} of the time. In general, the different models perform on a continuum, where grammatical correctness, fluency and coherence go hand in hand.",
                "In recent years, large pre-trained language models (PLMs) have achieved remarkable performance on many natural language processing benchmarks. Despite their success, prior studies have shown that PLMs are vulnerable to attacks from adversarial examples. In this work, we focus on the named entity recognition task and study context-aware adversarial attack methods to examine the model{'}s robustness. Specifically, we propose perturbing the most informative words for recognizing entities to create adversarial examples and investigate different candidate replacement methods to generate natural and plausible adversarial examples. Experiments and analyses show that our methods are more effective in deceiving the model into making wrong predictions than strong baselines.",
                "This paper investigates effects of noisy source texts (containing spelling and grammar errors, informal words or expressions, etc.) on human and machine translations, namely whether the noisy phenomena are kept in the translations, corrected, or caused errors. The analysed data consists of English user reviews of Amazon products translated into Croatian, Russian and Finnish by professional translators, translation students, machine translation (MT) systems, and ChatGPT language model. The results show that overall, ChatGPT and professional translators mostly correct\/standardise those parts, while students are often keeping them. Furthermore, MT systems are most prone to errors while ChatGPT is more robust, but notably less robust than human translators. Finally, some of the phenomena are particularly challenging both for MT systems and for ChatGPT, especially spelling errors and informal constructions."
            ]
        ],
        "queries": null,
        "index_name": "index",
        "overwrite": false,
        "root": "\/coc\/pskynet6\/dheineman3\/4440\/colbert-acl\/src\/experiments",
        "experiment": "notebook",
        "index_root": null,
        "name": "2024-08\/11\/23.16.25",
        "rank": 0,
        "nranks": 2,
        "amp": true,
        "gpus": 2
    },
    "num_chunks": 2,
    "num_partitions": 32768,
    "num_embeddings_est": 8137376.50328064,
    "avg_doclen_est": 170.6557159423828
}