Model Card for Model ID

This model is designed to classify whether a given text contains offensive language or not. It was trained on a set of words labeled as either "normal" or "offensive." The model is capable of distinguishing between these two categories with high accuracy.

Model Details

Model Description

This model is a binary classifier designed to classify words as offensive (profanity) or neutral (normal). It takes a list of words as input and classifies each word into one of two categories. The model's goal is to help with the automatic detection of offensive words in text, which can be useful for content filtering systems, platform moderation, and various applications where control over the use of inappropriate language is required.

The model uses a neural network built with several layers, each of which plays a role in the learning and prediction process. The architecture of the model consists of several dense layers, enabling the model to effectively extract complex patterns from the data.

Key features of the model architecture:

Leaky ReLU (Leaky Rectified Linear Unit): This activation function is applied in each hidden layer. It helps avoid the "vanishing gradient" problem by allowing small negative values to pass through the layer. This improves training, especially for deep networks, where regular ReLU may lead to "dead neurons" if they become too "inactive."

Batch Normalization: This method is used to normalize the outputs of each layer, helping to accelerate training and make it more stable. This is particularly important for deep neural networks, as batch normalization reduces internal covariate shift, improving overall convergence and performance.

Dropout: Dropout helps prevent overfitting by randomly disabling certain neurons during training. This reduces reliance on specific features and improves the model's ability to generalize to new data.

The network architecture consists of several layers, each with its number of neurons and regularization components:

A dense layer with 512 neurons at the input, which takes the transformed data (vectorized words) and passes it forward. This layer includes Leaky ReLU activation and batch normalization to enhance the learning process. Then, there are three more dense layers with progressively decreasing numbers of neurons (256, 128, 64). This allows the model to progressively extract more abstract features and reduce dimensionality while retaining important information for classification. Finally, there is a dense layer with a single neuron at the output, using the sigmoid activation function to make a binary decision (0 — non-offensive, 1 — offensive word). The model is trained using the RMSprop optimizer with a low learning rate (0.0001), allowing it to train smoothly with minimal fluctuations. The binary cross-entropy loss function is used, which is ideal for binary classification tasks where the goal is to evaluate the probability of belonging to one of two categories.

Developed by: LaciaStudio/LaciaAI
Model type: text-classification
Language(s) (NLP): Russian, English
License: cc-by-nc-4.0

Uses

Direct Use

The model is designed for direct use in text classification tasks, where it is necessary to automatically identify and filter offensive or inappropriate words. It can be used in various systems for:

Content moderation on forums, social media, or chats to automatically filter undesirable or offensive messages. Automatic message processing in customer support services, where it is important to filter out inappropriate language. Content management systems (CMS) to prevent the publication of offensive material. Users of this model can be developers creating moderation systems, social media platforms, as well as companies working with large volumes of user-generated content, where it is necessary to maintain specific language and behavior standards.

Downstream Use

This model can be fine-tuned and integrated into a variety of downstream applications that require the automatic filtering or moderation of offensive or inappropriate language. Specific use cases include:

Social Media Platforms: Fine-tuning the model to classify and filter offensive comments, posts, and messages on platforms like Facebook, Twitter, or Instagram, helping maintain a safe environment for users.
Customer Support Systems: When integrated into customer service bots, this model can detect and filter out inappropriate language in customer inquiries, ensuring that responses remain professional and appropriate.
Content Moderation in Online Communities: The model can be plugged into systems designed to moderate content in online communities, forums, or chat platforms, ensuring that users adhere to community guidelines and maintain respectful conversations.
Content Creation Platforms: In platforms that allow user-generated content (like YouTube or Twitch), the model can be used to automatically detect offensive language in comments or streams and apply moderation actions (e.g., flagging, muting, or banning users).

By fine-tuning this model for domain-specific tasks or integrating it into broader applications, it helps to reduce the workload of human moderators and ensures consistent enforcement of content policies.

Out-of-Scope Use

This model is not intended for use in the following scenarios:

Real-time Speech or Audio Moderation: The model is designed to classify text input and may not perform well when applied to real-time speech recognition or audio processing systems, as it does not analyze audio or spoken language directly.
Context-Dependent Language: The model may struggle with detecting offensive language in highly context-dependent situations where tone, sarcasm, or irony plays a significant role. It could incorrectly classify non-offensive comments as offensive, or vice versa, due to lack of understanding of context.
Multilingual Use Without Adaptation: The model is trained primarily for a specific language and may not perform well on texts in languages other than the one it was trained on, unless it is fine-tuned on additional multilingual datasets.
Legal, Medical, or Sensitive Situations: The model is not designed for use in high-stakes scenarios, such as legal or medical applications, where precise language analysis and interpretation are critical. Its use in such domains could lead to misclassification of sensitive content.
Malicious Intent: The model should not be used to suppress free speech, target specific individuals or groups, or silence dissenting opinions in a harmful manner. Its primary purpose is content moderation to create respectful and safe environments, not to control or censor opinions unjustly.

The model is intended to be used responsibly within its design and domain, and should not be deployed in situations where it could have negative social or ethical implications.

Bias, Risks, and Limitations

This model, like all machine learning models, comes with certain biases, risks, and limitations that need to be considered before deployment.

Biases:

Cultural and Linguistic Bias: The model is trained primarily on datasets in a specific language and cultural context. As a result, it may not perform as well on text from different cultures or languages. The model could misclassify offensive or non-offensive words in languages, dialects, or slang outside of its training data.
Subgroup Bias: The model may exhibit bias against specific subgroups or communities, especially if the training data contains unbalanced representations. For example, certain groups may use offensive language differently, and the model may not always detect these nuances, leading to misclassification or unfair treatment.
Contextual Bias: The model does not account for the full context in which a word is used. Sarcasm, irony, or figurative language can lead to misclassifications. A seemingly offensive word in one context might be harmless in another, but the model may not always distinguish these situations accurately.

Risks:

False Positives/Negatives: The model may mistakenly flag non-offensive words as offensive (false positives) or fail to flag offensive words (false negatives). These errors can lead to frustration for users and undermine the effectiveness of content moderation systems.
Over-Moderation: The model could be overly sensitive and incorrectly flag legitimate user content as offensive, leading to censorship of harmless posts and restricting free expression. This could result in user dissatisfaction or alienation.
Evolving Language: Language evolves rapidly, and new slang or offensive terms may emerge that the model has not been trained to recognize. This could lead to a model that becomes outdated and unable to handle new trends in language.

Limitations:

Lack of Context Understanding: While the model can classify words based on their general meaning, it doesn't understand context in a deep way. Complex sentences with layered meanings or ambiguous language may not be correctly classified by the model, leading to inaccuracies.
Inability to Handle Multi-modal Inputs: The model only processes text input and does not account for multimedia content (e.g., images, videos, or audio) that may contribute to offensive or harmful messages. It is not suitable for systems that require multimodal analysis.

Understanding these biases, risks, and limitations is crucial for deploying the model in real-world applications. Users should consider them when implementing the model in content moderation systems, ensuring that it complements human oversight and is used responsibly.

Recommendations

Users (both direct and downstream) should be aware of the following recommendations to mitigate the risks, biases, and limitations of the model:

Continuous Monitoring and Human Oversight: It is essential to continuously monitor the performance of the model after deployment, especially in live content moderation systems. Human moderators should be involved in the decision-making process to ensure that false positives and negatives are identified and corrected promptly. Human judgment should be used to interpret ambiguous cases where the model may struggle.
Test Across Multiple Subgroups and Languages: The model should be tested across different demographic groups, languages, and dialects to identify potential biases or performance gaps. If the model is used in multilingual environments, it's crucial to fine-tune it with data from diverse linguistic sources to ensure its accuracy.
Consideration of Context: Users should be cautious about relying on the model in cases where context plays a crucial role. For example, sarcasm, irony, and figurative language can result in misclassifications. It’s recommended to incorporate context-awareness or additional models that can handle sentiment analysis or contextual understanding for more accurate results.
Transparency with Users: When implementing this model in user-facing applications, it is important to provide transparency about how the model works and its limitations. Users should be informed that the model is automated and that, while it aims to filter offensive content, it may not always be accurate. This transparency helps manage user expectations and fosters trust.
Ethical Use of the Model: Ethical considerations should be made when deploying the model, particularly in sensitive environments. For example, over-moderation can stifle free speech and cause user frustration. It's important to balance content moderation with respect for freedom of expression, ensuring that the model's use aligns with ethical guidelines.
Limitations on Sensitive Content: The model should not be used in contexts where it could harm individuals or communities, such as in mental health forums, crisis communication, or discussions of sensitive topics. While the model may be able to detect offensive language, it lacks the sensitivity to handle nuanced or emotional contexts appropriately.

How to Get Started with the Model

Use the code below to get started with the model:

import pickle
import re
import numpy as np
import tensorflow as tf

model = tf.keras.models.load_model("path/to/model.h5") 
with open("path/to/vectorizer.pkl", "rb") as vec_file:
    vectorizer = pickle.load(vec_file)

def clean_text(text):
    return re.sub(r"[^\w\s]", "", text).lower()

def classify_text(text, threshold=0.5):
    text = clean_text(text)
    words = text.split()
    X = vectorizer.transform(words).toarray()
    predictions = model.predict(X)
    result = {}
    for word, pred in zip(words, predictions):
        result[word] = {"probability": round(float(pred), 3), "classification": "Dirty word" if pred >= threshold else "Normal"}
    return result

if __name__ == "__main__":
    text = "text example here"
    result = classify_text(text)
    print("Classification result:")
    for word, data in result.items():
        print(f"{word}: {data['probability']} ({data['classification']})")

Training Details

The model was trained using a combination of two text files: "mats.txt" (containing offensive words) and "normal.txt" (containing regular words). The dataset contains a total of 10,923 words for each category. The labels are as follows: 0 for normal words and 1 for offensive words.

Data preprocessing steps involved:

Tokenization and vectorization: The words from both files were transformed into a numerical format using the CountVectorizer from scikit-learn, which converts the text into a sparse matrix of word counts. Labeling: The labels for each word were assigned manually, where normal words were labeled as 0 and offensive words as 1. Data splitting: The data was split into training and testing sets using an 80-20 ratio. The total dataset size used for training and evaluation was 21,846 words (10,923 from each category).

Training Procedure The model was trained on the processed data using the following approach:

Preprocessing Vectorization: The words were converted into a numerical format using a CountVectorizer, which creates a sparse matrix representing the frequency of words in the dataset. The vectorizer was saved for future use and can be loaded during inference. Training Hyperparameters Optimizer: The model was compiled using the RMSprop optimizer with a learning rate of 0.0001.

Loss Function: The binary classification problem was modeled using the binary cross-entropy loss function.

Metrics: The model was evaluated using accuracy as the metric.

Batch size: 8192 (for large batch training, suitable for the available hardware).

Epochs: 110 epochs were used for training the model.

Regularization: L2 regularization with a regularization factor of 0.00001 was applied to all layers to prevent overfitting.

Activation Functions: LeakyReLU with an alpha value of 0.1 was used for non-linearity, followed by BatchNormalization and Dropout layers to further improve the model's generalization.

Callbacks Model Checkpoint: During training, the best model based on validation accuracy was saved using ModelCheckpoint. CSV Logger: Training progress was logged using CSVLogger. Early Stopping: Early stopping was used to terminate training if the model's validation accuracy stopped improving. Final Model Evaluation Validation data: The validation data consisted of 20% of the data that was held out during training (testing set). Checkpoint restoration: After training, the best model was restored from the checkpoint for final evaluation and saving. Training Results Hardware used: Training was done on a multi-core processor, utilizing the TensorFlow settings for multi-threading. Mixed Precision: Mixed precision was enabled for faster training using float16 precision where possible.

Training Hyperparameters

Training regime: mixed_float16 precision was used during training. This regime allows the model to take advantage of both float16 and float32 precision, improving the training speed and reducing memory usage without compromising model performance.

Speeds, Sizes, Times

Model size: The model consists of 2,862,465 parameters.
Checkpoint size: The checkpoint file (model_checkpoint.keras) is approximately 11 MB.
Training duration: The training process took approximately 5 minutes on an intel multi-core processor with AVX.
Batch size: 8,192 samples per batch were used during training.
Throughput: On average, the model processed over 50,000 samples per second during training with the specified hardware.
Evaluation time: Testing on 4,370 samples took approximately 0.5 seconds on the same hardware.

Evaluation

The evaluation process aimed to measure the model's performance in identifying offensive and non-offensive words. Below are the details of the protocols and results.

Testing Data:

The testing data consisted of 4,370 labeled samples (2,200 non-offensive, 2,170 offensive) from the same dataset used during training. The data was split using an 80/20 ratio for training and testing.

Factors:

The evaluation focused on the following factors:

Class balance:

Both classes (offensive and non-offensive) were approximately equal in the dataset. Language diversity: The dataset was primarily composed of single words in Russian. Metrics The following metrics were used to evaluate model performance:

Accuracy: Measures the overall correctness of predictions.
Precision: Measures the proportion of true positives to predicted positives for each class.
Recall: Measures the proportion of true positives to actual positives for each class.
F1-Score: A harmonic mean of precision and recall, providing a balanced measure.
Results: Accuracy: 99% Precision: Class 0 (non-offensive): 98% Class 1 (offensive): 99% Recall: Class 0: 99% Class 1: 98% F1-Score: Class 0: 99% Class 1: 99% Macro Average F1-Score: 99%

Summary: The model demonstrates high performance across all evaluation metrics, with an overall accuracy of 99%. The precision and recall are well-balanced, indicating that the model performs consistently for both offensive and non-offensive classes. The slightly lower recall for class 1 suggests that the model may occasionally fail to detect offensive content.

Environmental Impact

Carbon emissions can be estimated using the Machine Learning Impact calculator presented in Lacoste et al. (2019).

Hardware Type: Local hardware with a single AMD GPU (non-used) and Intel CPU.
Minutes Used: Approximately 5 minutes for training (based on model creation and checkpointing).
Cloud Provider: Not applicable (locally hosted).
Compute Region: Not applicable (local region).
Electricity Usage: Estimated at 100+W for CPU combined during active training.
Carbon Emitted: Approximately 0.85 kg CO2eq, assuming an average emission factor of 0.43 kg CO2eq per kWh (global average).

Additional Considerations:

Energy Efficiency: Efforts were made to optimize training by using mixed precision (fp16) and checkpointing to avoid redundant computations.
Environmental Context: The calculations assume a typical energy grid mix. Lower emissions may apply if renewable energy sources were used for electricity generation.
Recommendations: For future training or fine-tuning, using more energy-efficient hardware or running the process in regions with greener energy grids is encouraged.

Model Architecture and Objective

Architecture: The model is a sequential fully connected neural network designed for binary text classification. It determines whether an input word is offensive or not.
Objective: To classify input text into two categories: "normal" (0) and "offensive" (1).

Compute Infrastructure

Hardware

Processor: Intel
GPU: AMD
RAM: 16 GB

Software

Operating System: Windows.
Framework: TensorFlow 2.x with mixed precision (fp16) enabled.

Additional Libraries:

scikit-learn for data preprocessing and feature extraction.
pickle for vectorizer persistence.
CountVectorizer for transforming text into numeric features.

Citation [optional]

BibTeX:

@misc{luna_v1, author = {LaciaStudio}, title = {Binary Text Classification Model for Offensive Language Detection}, year = {2024}, url = {https://huggingface.co/Lacia/Luna_v1}, note = {Model designed for text classification tasks, specifically distinguishing between normal and offensive language.}, }

APA:

LaciaStudio. (2024). Binary Text Classification Model for Offensive Language Detection. Retrieved from https://huggingface.co/Lacia/Luna_v1.

Glossary

CountVectorizer: A tool for converting a collection of text documents into a matrix of token counts. It is used for feature extraction from text data.
Mixed Precision: A method of using both 16-bit and 32-bit floating-point types during training to accelerate computations while maintaining model accuracy.
Binary Classification: A type of classification task where the goal is to categorize instances into one of two distinct classes (e.g., normal or offensive).
Regularization (L2): A technique used to prevent overfitting by penalizing large weights in the model, encouraging simpler and more generalizable solutions.

Metrics:

Precision: The proportion of true positives among all predicted positives.
Recall: The proportion of true positives among all actual positives.
F1-score: The harmonic mean of precision and recall, balancing the two metrics.

More Information [optional]

This model is already used in the LaciaStudio project - Telegram bot AgnesGPTX. Claiming this model as your own or under your name is strictly prohibited! It is intended for non-commercial, home use only.

Model Card Authors [optional]

LaciaStudio

Model Card Contact

e-mail: [email protected]