Spaces:

ManojINaik
/

manojapinew

Configuration error

App Files Files Community

ManojINaik commited on Nov 16, 2024

Commit

93cf301

verified ·

1 Parent(s): 23155fb

Upload 4 files

Browse files

Files changed (4) hide show

Dockerfile +26 -0
README.md +60 -12
app.py +101 -0
requirements.txt +9 -0

Dockerfile ADDED Viewed

	@@ -0,0 +1,26 @@

+FROM python:3.9-slim
+WORKDIR /code
+# Install system dependencies
+RUN apt-get update && \
+    apt-get install -y --no-install-recommends \
+    build-essential \
+    git \
+    && rm -rf /var/lib/apt/lists/*
+# Copy requirements first for better caching
+COPY requirements.txt .
+RUN pip install --no-cache-dir -r requirements.txt
+# Copy the rest of the application
+COPY . .
+# Create cache directory
+RUN mkdir -p ./model_cache
+# Expose the port
+EXPOSE 7860
+# Command to run the application
+CMD ["uvicorn", "app:app", "--host", "0.0.0.0", "--port", "7860"]

README.md CHANGED Viewed

@@ -1,12 +1,60 @@
----
-title: Manojapinew
-emoji: 🏢
-colorFrom: blue
-colorTo: yellow
-sdk: streamlit
-sdk_version: 1.40.1
-app_file: app.py
-pinned: false
----
-Check out the configuration reference at https://huggingface.co/docs/hub/spaces-config-reference

+# Fine-Tuned LLM API
+This is a FastAPI-based API service for the fine-tuned model "ManojINaik/Strength_weakness". The model is optimized for text generation with 4-bit quantization for efficient inference.
+## API Endpoints
+### GET /
+Health check endpoint that confirms the API is running.
+### POST /generate/
+Generate text based on a prompt with optional parameters.
+#### Request Body
+```json
+{
+    "prompt": "What are the strengths of Python?",
+    "history": [],  // Optional: List of previous conversation messages
+    "system_prompt": "You are a very powerful AI assistant.",  // Optional
+    "max_length": 200,  // Optional: Maximum length of generated text
+    "temperature": 0.7  // Optional: Controls randomness (0.0 to 1.0)
+}
+```
+#### Response
+```json
+{
+    "response": "Generated text response..."
+}
+```
+## Model Details
+- Base Model: ManojINaik/Strength_weakness
+- Quantization: 4-bit quantization using bitsandbytes
+- Device: Automatically uses GPU if available, falls back to CPU
+- Memory Efficient: Uses device mapping for optimal resource utilization
+## Technical Details
+- Framework: FastAPI
+- Python Version: 3.9+
+- Key Dependencies:
+  - transformers
+  - torch
+  - bitsandbytes
+  - accelerate
+  - peft
+## Example Usage
+```python
+import requests
+url = "https://your-space-name.hf.space/generate"
+payload = {
+    "prompt": "What are the strengths of Python?",
+    "temperature": 0.7,
+    "max_length": 200
+}
+response = requests.post(url, json=payload)
+print(response.json()["response"])
+```

app.py ADDED Viewed

	@@ -0,0 +1,101 @@

+from fastapi import FastAPI, HTTPException
+from pydantic import BaseModel
+from transformers import AutoTokenizer, AutoModelForCausalLM, BitsAndBytesConfig, pipeline
+import torch
+from typing import Optional, List
+app = FastAPI(title="LLM API", description="API for interacting with LLaMA model")
+# Model configuration
+class ModelConfig:
+    model_name = "ManojINaik/Strength_weakness"  # Your fine-tuned model
+    device = "cuda" if torch.cuda.is_available() else "cpu"
+    max_length = 200
+    temperature = 0.7
+# Request/Response models
+class GenerateRequest(BaseModel):
+    prompt: str
+    history: Optional[List[str]] = []
+    system_prompt: Optional[str] = "You are a very powerful AI assistant."
+    max_length: Optional[int] = 200
+    temperature: Optional[float] = 0.7
+class GenerateResponse(BaseModel):
+    response: str
+# Global variables for model and tokenizer
+model = None
+tokenizer = None
+generator = None
+@app.on_event("startup")
+async def load_model():
+    global model, tokenizer, generator
+    try:
+        print("Loading model and tokenizer...")
+        # Configure quantization
+        bnb_config = BitsAndBytesConfig(
+            load_in_4bit=True,
+            bnb_4bit_quant_type="nf4",
+            bnb_4bit_compute_dtype=torch.float16,
+            bnb_4bit_use_double_quant=False
+        )
+        tokenizer = AutoTokenizer.from_pretrained(ModelConfig.model_name)
+        model = AutoModelForCausalLM.from_pretrained(
+            ModelConfig.model_name,
+            quantization_config=bnb_config,
+            device_map="auto",
+            trust_remote_code=True
+        )
+        generator = pipeline(
+            "text-generation",
+            model=model,
+            tokenizer=tokenizer,
+            device_map="auto"
+        )
+        print("Model loaded successfully!")
+    except Exception as e:
+        print(f"Error loading model: {str(e)}")
+        raise e
+@app.post("/generate/", response_model=GenerateResponse)
+async def generate_text(request: GenerateRequest):
+    if generator is None:
+        raise HTTPException(status_code=500, detail="Model not loaded")
+    try:
+        # Format the prompt with system prompt and chat history
+        formatted_prompt = f"{request.system_prompt}\n\n"
+        for msg in request.history:
+            formatted_prompt += f"{msg}\n"
+        formatted_prompt += f"Human: {request.prompt}\nAssistant:"
+        # Generate response
+        outputs = generator(
+            formatted_prompt,
+            max_length=request.max_length,
+            temperature=request.temperature,
+            num_return_sequences=1,
+            do_sample=True,
+            pad_token_id=tokenizer.pad_token_id,
+            eos_token_id=tokenizer.eos_token_id
+        )
+        # Extract the generated text
+        generated_text = outputs[0]['generated_text']
+        # Remove the prompt from the response
+        response = generated_text.split("Assistant:")[-1].strip()
+        return {"response": response}
+    except Exception as e:
+        raise HTTPException(status_code=500, detail=f"Error generating text: {str(e)}")
+@app.get("/")
+def root():
+    return {"message": "LLM API is running. Use /generate endpoint for text generation."}

requirements.txt ADDED Viewed

	@@ -0,0 +1,9 @@

+fastapi==0.104.1
+uvicorn==0.24.0
+huggingface-hub==0.19.4
+pydantic==2.5.2
+transformers==4.35.2
+torch==2.1.1
+accelerate==0.24.1
+bitsandbytes==0.41.1
+peft==0.6.0