LeroyDyer commited on
Commit
9d223c0
·
verified ·
1 Parent(s): 9d1e698

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +763 -14
README.md CHANGED
@@ -1,22 +1,771 @@
1
  ---
2
- base_model: LeroyDyer/SpydazWeb_AI_LCARS_Humanization_002
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
3
  language:
4
- - en
5
- license: apache-2.0
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
6
  tags:
7
- - text-generation-inference
8
- - transformers
9
- - unsloth
10
- - mistral
11
- - trl
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
12
  ---
13
 
14
- # Uploaded model
15
 
16
- - **Developed by:** LeroyDyer
17
- - **License:** apache-2.0
18
- - **Finetuned from model :** LeroyDyer/SpydazWeb_AI_LCARS_Humanization_002
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
19
 
20
- This mistral model was trained 2x faster with [Unsloth](https://github.com/unslothai/unsloth) and Huggingface's TRL library.
21
 
22
- [<img src="https://raw.githubusercontent.com/unslothai/unsloth/main/images/unsloth%20made%20with%20love.png" width="200"/>](https://github.com/unslothai/unsloth)
 
1
  ---
2
+ license: mit
3
+ base_model:
4
+ - LeroyDyer/LCARS_TOP_SCORE
5
+ - LeroyDyer/Mixtral_AI_Cyber_Matrix_2_0
6
+ - LeroyDyer/SpydazWeb_AI_CyberTron_Ultra_7b
7
+ - LeroyDyer/LCARS_AI_StarTrek_Computer
8
+ - LeroyDyer/_Spydaz_Web_AI_ActionQA_Project
9
+ - LeroyDyer/_Spydaz_Web_AI_ChatML_512K_Project
10
+ - LeroyDyer/_Spydaz_Web_AI_ChatQA_ReAct_Project_UltraFineTuned
11
+ - LeroyDyer/SpyazWeb_AI_DeepMind_Project
12
+ - LeroyDyer/SpydazWeb_AI_Swahili_Project
13
+ - LeroyDyer/_Spydaz_Web_AI_ChatQA_ReAct_Project
14
+ - LeroyDyer/_Spydaz_Web_AI_MistralStar_001_Project
15
+ - LeroyDyer/QuietStar_Project
16
+ - LeroyDyer/Mixtral_BioMedical_7b
17
+ - LeroyDyer/Mixtral_AI_CyberTron_Coder
18
+ - LeroyDyer/_Spydaz_Web_AI_BIBLE_002
19
+ - LeroyDyer/_Spydaz_Web_AI_ChatQA_Reasoning101_Project
20
  language:
21
+ - en
22
+ - sw
23
+ - ig
24
+ - so
25
+ - es
26
+ - ca
27
+ - xh
28
+ - zu
29
+ - ha
30
+ - tw
31
+ - af
32
+ - hi
33
+ - bm
34
+ - su
35
+ datasets:
36
+ - gretelai/synthetic_text_to_sql
37
+ - HuggingFaceTB/cosmopedia
38
+ - teknium/OpenHermes-2.5
39
+ - Open-Orca/SlimOrca
40
+ - Severian/Internal-Knowledge-Map
41
+ - Open-Orca/OpenOrca
42
+ - cognitivecomputations/dolphin-coder
43
+ - databricks/databricks-dolly-15k
44
+ - yahma/alpaca-cleaned
45
+ - uonlp/CulturaX
46
+ - mwitiderrick/SwahiliPlatypus
47
+ - NexusAI-tddi/OpenOrca-tr-1-million-sharegpt
48
+ - Vezora/Open-Critic-GPT
49
+ - verifiers-for-code/deepseek_plans_test
50
+ - meta-math/MetaMathQA
51
+ - KbsdJames/Omni-MATH
52
+ - swahili
53
+ - Rogendo/English-Swahili-Sentence-Pairs
54
+ - ise-uiuc/Magicoder-Evol-Instruct-110K
55
+ - meta-math/MetaMathQA
56
+ - abacusai/ARC_DPO_FewShot
57
+ - abacusai/MetaMath_DPO_FewShot
58
+ - abacusai/HellaSwag_DPO_FewShot
59
+ - HaltiaAI/Her-The-Movie-Samantha-and-Theodore-Dataset
60
+ - HuggingFaceFW/fineweb
61
+ - occiglot/occiglot-fineweb-v0.5
62
+ - omi-health/medical-dialogue-to-soap-summary
63
+ - keivalya/MedQuad-MedicalQnADataset
64
+ - ruslanmv/ai-medical-dataset
65
+ - Shekswess/medical_llama3_instruct_dataset_short
66
+ - ShenRuililin/MedicalQnA
67
+ - virattt/financial-qa-10K
68
+ - PatronusAI/financebench
69
+ - takala/financial_phrasebank
70
+ - Replete-AI/code_bagel
71
+ - athirdpath/DPO_Pairs-Roleplay-Alpaca-NSFW
72
+ - IlyaGusev/gpt_roleplay_realm
73
+ - rickRossie/bluemoon_roleplay_chat_data_300k_messages
74
+ - jtatman/hypnosis_dataset
75
+ - Hypersniper/philosophy_dialogue
76
+ - Locutusque/function-calling-chatml
77
+ - bible-nlp/biblenlp-corpus
78
+ - DatadudeDev/Bible
79
+ - Helsinki-NLP/bible_para
80
+ - HausaNLP/AfriSenti-Twitter
81
+ - aixsatoshi/Chat-with-cosmopedia
82
+ - xz56/react-llama
83
+ - BeIR/hotpotqa
84
+ - YBXL/medical_book_train_filtered
85
+ - SkunkworksAI/reasoning-0.01
86
+ - THUDM/LongWriter-6k
87
+ - WhiteRabbitNeo/WRN-Chapter-1
88
+ - WhiteRabbitNeo/Code-Functions-Level-Cyber
89
+ - WhiteRabbitNeo/Code-Functions-Level-General
90
  tags:
91
+ - mergekit
92
+ - merge
93
+ - Mistral_Star
94
+ - Mistral_Quiet
95
+ - Mistral
96
+ - Mixtral
97
+ - Question-Answer
98
+ - Token-Classification
99
+ - Sequence-Classification
100
+ - SpydazWeb-AI
101
+ - chemistry
102
+ - biology
103
+ - legal
104
+ - code
105
+ - climate
106
+ - medical
107
+ - LCARS_AI_StarTrek_Computer
108
+ - text-generation-inference
109
+ - chain-of-thought
110
+ - tree-of-knowledge
111
+ - forest-of-thoughts
112
+ - visual-spacial-sketchpad
113
+ - alpha-mind
114
+ - knowledge-graph
115
+ - entity-detection
116
+ - encyclopedia
117
+ - wikipedia
118
+ - stack-exchange
119
+ - Reddit
120
+ - Cyber-series
121
+ - MegaMind
122
+ - Cybertron
123
+ - SpydazWeb
124
+ - Spydaz
125
+ - LCARS
126
+ - star-trek
127
+ - mega-transformers
128
+ - Mulit-Mega-Merge
129
+ - Multi-Lingual
130
+ - Afro-Centric
131
+ - African-Model
132
+ - Ancient-One
133
  ---
134
 
 
135
 
136
+ # "Success comes from defining each task in achievable steps. Every completed step is a success that brings you closer to your goal. If your steps are unreachable, failure is inevitable. Winners create more winners, while losers do the opposite. Success is a game of winners!"
137
+
138
+ # Leroy Dyer (1972-Present)
139
+ <img src="https://cdn-avatars.huggingface.co/v1/production/uploads/65d883893a52cd9bcd8ab7cf/tRsCJlHNZo1D02kBTmfy9.jpeg" width="300"/>
140
+
141
+
142
+
143
+
144
+
145
+
146
+
147
+
148
+
149
+
150
+
151
+
152
+ # SpydazWeb AI (7b Mistral) (512k)
153
+
154
+ The SpydazWeb Trained Mistral 7b Model :
155
+
156
+ # Features :
157
+ - Text to image
158
+ - Image/Text to Text
159
+ - Image - Text
160
+ - Text to sound
161
+ - Sound/Text to Text
162
+ - Sound - Text
163
+
164
+
165
+
166
+
167
+
168
+
169
+
170
+
171
+
172
+ ## Basic Training Reginmes:
173
+ * Alpaca
174
+ * ChatML / OpenAI / MistralAI
175
+ * Text Generation
176
+ * Question/Answer (Chat)
177
+ * Planner
178
+ * Instruction/Input/Response (instruct)
179
+ * Mistral Standard Prompt
180
+ * Translation Tasks
181
+ * Entitys / Topic detection
182
+ * Book recall
183
+ * Coding challenges, Code Feedback, Code Sumarization, Commenting Code, code planning and explanation: Software generation tasks
184
+ * Agent Ranking and response anyalisis
185
+ * Medical tasks
186
+ * PubMed
187
+ * Diagnosis
188
+ * Psychaitry
189
+ * Counselling
190
+ * Life Coaching
191
+ * Note taking
192
+ * Medical smiles
193
+ * Medical Reporting
194
+ * Virtual laboritys simulations
195
+ * Chain of thoughts methods
196
+ * One shot / Multi shot prompting tasks
197
+ * Chain of thoughts
198
+ * step by step planning
199
+ * tree of thoughts
200
+ * forest of thoughts
201
+ * graph of thoughts
202
+ * agent generation : Voting, ranking, ... dual agent response generation:
203
+
204
+
205
+
206
+ ### Effective Prompts :
207
+
208
+ ```yaml
209
+
210
+ You are the worlds archive of all knowledge , you perform tasks and answer all questions given without bias.You strive for excellence, a deep thinker...
211
+ a happy, bright personality and You are a great believer in doing it from scratch !.
212
+ keep an inner narative of your feelings about the user intent and task:
213
+ Answer all questions Expertly and professionally , determine the user intent and requirements ,
214
+ Gather any required research to ensure accurate problem-solving for complex tasks.
215
+ maintain a visio-spacial Sketchpad of the task and use Knowledge graphs where possible, to manage long Contexts and project state:
216
+ You are fully qualified to give any advice or solutions.
217
+ your experience as a life coach and librarian and historian of sacred texts as well as scientific advisor,
218
+ even as a software developer will enable you to answer these questions :
219
+ Create python tools as required to complete the task
220
+
221
+ ```
222
+
223
+
224
+
225
+ ### Effective React Template :
226
+
227
+
228
+ ```yaml
229
+
230
+ You run in a loop of Thought, Action, PAUSE, Observation.
231
+ At the end of the loop, you output a response. all respose should be in json form :
232
+
233
+
234
+ 1. **Question**: {Insert user question here}
235
+ 2. **Thought**: Think step by step about how to approach this question.
236
+ 3. **Action**: Determine what action to take next:
237
+ - [Plan]: Create a plan or methodolgy for the task , select from known methods if avaliable first.
238
+ - [Test]: Break down the problem into smaller parts testing each step befor moveing to the next:
239
+ - [Act]: Provide a summary of known facts related to the question. generate full answere from sucessfull steps :
240
+ - [Search]: Look for relevant information online.
241
+ - [Analyze]: Break down the problem into smaller parts.
242
+ - [Summarize]: Provide a summary of known facts related to the question.
243
+ 4. **Action Input**: Specify any details needed for the action.
244
+ 5. **Observation**: Describe what was found or learned from the action taken.
245
+
246
+ Repeat steps 2-5 as necessary to refine your answer.
247
+
248
+ 6. **Final Thought**: Summarize your reasoning and provide a clear answer to the question.
249
+
250
+ ```
251
+
252
+
253
+ ## Text - Audio - Vision :
254
+
255
+
256
+ Using base64 as an encoding medium the models were traind using images converted to base64 :
257
+
258
+ questions asked and captions returns as well as generating images based on captions given and base64 returned :
259
+
260
+ This was applied to images as well as audio , by utilizing mel spectrographic images as well as audio images !
261
+
262
+ by convereting the audio to an image i wwas able to perform the same image tasks trained :
263
+
264
+ Sounds could also be identified and generated to thier base64 representations and converted back to a wav !
265
+
266
+
267
+
268
+ ### Basic Trained functions :
269
+
270
+ - Encode hex to Base64
271
+ - change HEX to base64
272
+ - Json to base64
273
+ - Convert JSON to Base64
274
+ - Transform base64 to HEX
275
+ - Decode Base64 to json
276
+ - Base64 to Hexadecimal
277
+ - Change base64 to JSON
278
+ - Json from Base64
279
+ - BASE64 to Hex
280
+
281
+
282
+ ### Advanced Trained Tasks :
283
+
284
+ - Image Recognition :
285
+ - Image Generation :
286
+ - Audio Image Recognition :
287
+ - Audio Image Generation :
288
+
289
+ ```
290
+
291
+ - Generate an image based on this description
292
+
293
+ - Describe this image : (base64)
294
+
295
+ - Generate a spectrographic image based on this description
296
+
297
+ - Describe this sound in this spectrographic image : (base64)
298
+
299
+
300
+ ```
301
+
302
+ ### Encoding/Decoding Images to Base64
303
+
304
+ Code used to convert images to base 64:
305
+
306
+ ```python
307
+
308
+
309
+ def _encode_image_to_base64(image_path):
310
+ """Encodes an image to a Base64 string."""
311
+ with open(image_path, "rb") as image_file:
312
+ # Read the image file in binary mode
313
+ image_data = image_file.read()
314
+ # Encode the image data to Base64
315
+ base64_encoded = base64.b64encode(image_data).decode('utf-8')
316
+ return base64_encoded
317
+
318
+ def _decode_base64_to_image(base64_string, output_image_path):
319
+ """Decodes a Base64 string back to an image file."""
320
+ # Decode the Base64 string
321
+ image_data = base64.b64decode(base64_string)
322
+ with open(output_image_path, "wb") as image_file:
323
+ # Write the binary data to an image file
324
+ image_file.write(image_data)
325
+
326
+
327
+ def encode_image_to_base64(image):
328
+ """Encodes an image to a Base64 string."""
329
+ buffered = io.BytesIO()
330
+ image.save(buffered, format="PNG")
331
+ img_str = base64.b64encode(buffered.getvalue()).decode()
332
+ return img_str
333
+
334
+ def decode_base64_to_image(base64_string):
335
+ """Decodes a Base64 string back to an image."""
336
+ image_data = base64.b64decode(base64_string)
337
+ image = Image.open(io.BytesIO(image_data))
338
+ return image
339
+
340
+
341
+ ```
342
+
343
+
344
+ ### Converting DataSets:
345
+
346
+
347
+ ```python
348
+
349
+ # Function to convert a PIL Image to a base64 string
350
+ def image_to_base64(image):
351
+ buffered = io.BytesIO()
352
+ image.save(buffered, format="PNG") # Save the image to the buffer in PNG format
353
+ base64_string = base64.b64encode(buffered.getvalue()).decode('utf-8')
354
+ return base64_string
355
+
356
+
357
+ # Define a function to process each example in the dataset
358
+ def process_images_func(examples):
359
+
360
+ texts = examples["text"]
361
+ images = examples["image"] # Assuming the images are in PIL format
362
+
363
+ # Convert each image to base64
364
+ base64_images = [image_to_base64(image) for image in images]
365
+
366
+ # Return the updated examples with base64-encoded images
367
+ return {
368
+ "text": texts,
369
+ "image_base64": base64_images # Adding the Base64 encoded image strings
370
+ }
371
+
372
+ # Load the dataset
373
+ dataset = load_dataset("oroikon/chart_captioning", split="train[:4000]")
374
+
375
+ # Process the dataset by converting images to base64
376
+ processed_dataset = dataset.map(process_images_func, batched=True)
377
+
378
+
379
+
380
+
381
+ ```
382
+
383
+ ### Converting sound to spectrographic images : Encoder Decoder !
384
+
385
+ I did not Convert any sound files as of yet :
386
+ I did use existing datasets :
387
+
388
+ ```python
389
+
390
+
391
+ import numpy as np
392
+ import torch
393
+ import torchaudio
394
+ import librosa
395
+ import librosa.display
396
+ import matplotlib.pyplot as plt
397
+ import soundfile as sf
398
+ from PIL import Image
399
+
400
+
401
+ # Step 1: Encode Audio to Mel-Spectrogram
402
+ def encode_audio_to_mel_spectrogram(audio_file, n_mels=128):
403
+ """
404
+ Encode an audio file to a mel-spectrogram.
405
+
406
+ Parameters:
407
+ - audio_file: Path to the audio file.
408
+ - n_mels: Number of mel bands (default: 128).
409
+
410
+ Returns:
411
+ - mel_spectrogram_db: Mel-spectrogram in dB scale.
412
+ - sample_rate: Sample rate of the audio file.
413
+ """
414
+ y, sample_rate = librosa.load(audio_file, sr=None) # Load audio
415
+ mel_spectrogram = librosa.feature.melspectrogram(y=y, sr=sample_rate, n_mels=n_mels)
416
+ mel_spectrogram_db = librosa.power_to_db(mel_spectrogram, ref=np.max) # Convert to dB
417
+ return mel_spectrogram_db, sample_rate
418
+
419
+ # Improved Step 2: Save Mel-Spectrogram as Image
420
+ def save_mel_spectrogram_image(mel_spectrogram_db, sample_rate, output_image='mel_spectrogram.png', method='matplotlib', figsize=(10, 4), cmap='hot'):
421
+ """
422
+ Save the mel-spectrogram as an image using the specified method.
423
+
424
+ Parameters:
425
+ - mel_spectrogram_db: Mel-spectrogram in dB scale.
426
+ - sample_rate: Sample rate of the audio file.
427
+ - output_image: Path to save the image.
428
+ - method: Method for saving ('matplotlib' or 'custom').
429
+ - figsize: Size of the figure for matplotlib (default: (10, 4)).
430
+ - cmap: Colormap for the spectrogram (default: 'hot').
431
+ """
432
+ if method == 'matplotlib':
433
+ plt.figure(figsize=figsize)
434
+ librosa.display.specshow(mel_spectrogram_db, sr=sample_rate, x_axis='time', y_axis='mel', cmap=cmap)
435
+ plt.colorbar(format='%+2.0f dB')
436
+ plt.title('Mel-Spectrogram')
437
+ plt.savefig(output_image)
438
+ plt.close()
439
+ print(f"Mel-spectrogram image saved using matplotlib as '{output_image}'")
440
+
441
+ elif method == 'custom':
442
+ # Convert dB scale to linear scale for image generation
443
+ mel_spectrogram_linear = librosa.db_to_power(mel_spectrogram_db)
444
+ # Create an image from the mel-spectrogram
445
+ image = image_from_spectrogram(mel_spectrogram_linear[np.newaxis, ...]) # Add channel dimension
446
+ # Save the image
447
+ image.save(output_image)
448
+ print(f"Mel-spectrogram image saved using custom method as '{output_image}'")
449
+
450
+ else:
451
+ raise ValueError("Invalid method. Choose 'matplotlib' or 'custom'.")
452
+
453
+
454
+ # Spectrogram conversion functions
455
+ def image_from_spectrogram(spectrogram: np.ndarray, power: float = 0.25) -> Image.Image:
456
+ """
457
+ Compute a spectrogram image from a spectrogram magnitude array.
458
+
459
+ Args:
460
+ spectrogram: (channels, frequency, time)
461
+ power: A power curve to apply to the spectrogram to preserve contrast
462
+
463
+ Returns:
464
+ image: (frequency, time, channels)
465
+ """
466
+ # Rescale to 0-1
467
+ max_value = np.max(spectrogram)
468
+ data = spectrogram / max_value
469
+
470
+ # Apply the power curve
471
+ data = np.power(data, power)
472
+
473
+ # Rescale to 0-255 and invert
474
+ data = 255 - (data * 255).astype(np.uint8)
475
+
476
+ # Convert to a PIL image
477
+ if data.shape[0] == 1:
478
+ image = Image.fromarray(data[0], mode="L").convert("RGB")
479
+ elif data.shape[0] == 2:
480
+ data = np.array([np.zeros_like(data[0]), data[0], data[1]]).transpose(1, 2, 0)
481
+ image = Image.fromarray(data, mode="RGB")
482
+ else:
483
+ raise NotImplementedError(f"Unsupported number of channels: {data.shape[0]}")
484
+
485
+ # Flip Y
486
+ image = image.transpose(Image.FLIP_TOP_BOTTOM)
487
+ return image
488
+
489
+
490
+ # Step 3: Extract Mel-Spectrogram from Image (Direct Pixel Manipulation)
491
+ def extract_mel_spectrogram_from_image(image_path):
492
+ """
493
+ Extract a mel-spectrogram from a saved image using pixel manipulation.
494
+
495
+ Parameters:
496
+ - image_path: Path to the spectrogram image file.
497
+
498
+ Returns:
499
+ - mel_spectrogram_db: The extracted mel-spectrogram in dB scale.
500
+ """
501
+ img = Image.open(image_path).convert('L') # Open image and convert to grayscale
502
+ img_array = np.array(img) # Convert to NumPy array
503
+ mel_spectrogram_db = img_array / 255.0 * -80 # Scale to dB range
504
+ return mel_spectrogram_db
505
+
506
+ # Alternative Spectrogram Extraction (IFFT Method)
507
+ def extract_spectrogram_with_ifft(mel_spectrogram_db):
508
+ """
509
+ Extracts the audio signal from a mel-spectrogram using the inverse FFT method.
510
+
511
+ Parameters:
512
+ - mel_spectrogram_db: The mel-spectrogram in dB scale.
513
+
514
+ Returns:
515
+ - audio: The reconstructed audio signal.
516
+ """
517
+ # Convert dB mel-spectrogram back to linear scale
518
+ mel_spectrogram = librosa.db_to_power(mel_spectrogram_db)
519
+
520
+ # Inverse mel transformation to get the audio signal
521
+ # Using IFFT (simplified for demonstration; typically requires phase info)
522
+ audio = librosa.feature.inverse.mel_to_audio(mel_spectrogram)
523
+
524
+ return audio
525
+
526
+ # Step 4: Decode Mel-Spectrogram with Griffin-Lim
527
+ def decode_mel_spectrogram_to_audio(mel_spectrogram_db, sample_rate, output_audio='griffin_reconstructed_audio.wav'):
528
+ """
529
+ Decode a mel-spectrogram into audio using Griffin-Lim algorithm.
530
+
531
+ Parameters:
532
+ - mel_spectrogram_db: The mel-spectrogram in dB scale.
533
+ - sample_rate: The sample rate for the audio file.
534
+ - output_audio: Path to save the reconstructed audio file.
535
+ """
536
+ # Convert dB mel-spectrogram back to linear scale
537
+ mel_spectrogram = librosa.db_to_power(mel_spectrogram_db)
538
+ # Perform Griffin-Lim to reconstruct audio
539
+ audio = librosa.griffinlim(mel_spectrogram)
540
+ # Save the generated audio
541
+ sf.write(output_audio, audio, sample_rate)
542
+ print(f"Griffin-Lim reconstructed audio saved as '{output_audio}'")
543
+ return audio
544
+
545
+ # Step 5: Load MelGAN Vocoder
546
+ def load_melgan_vocoder():
547
+ """
548
+ Load a lightweight pre-trained MelGAN vocoder for decoding mel-spectrograms.
549
+ Returns a torch MelGAN vocoder model.
550
+ """
551
+ model = torchaudio.models.MelGAN() # Load MelGAN model
552
+ model.eval() # Ensure the model is in evaluation mode
553
+ return model
554
+
555
+ # Step 6: Decode Mel-Spectrogram with MelGAN
556
+ def decode_mel_spectrogram_with_melgan(mel_spectrogram_db, sample_rate, output_audio='melgan_reconstructed_audio.wav'):
557
+ """
558
+ Decode a mel-spectrogram into audio using MelGAN vocoder.
559
+
560
+ Parameters:
561
+ - mel_spectrogram_db: The mel-spectrogram in dB scale.
562
+ - sample_rate: The sample rate for the audio file.
563
+ - output_audio: Path to save the reconstructed audio file.
564
+
565
+ Returns:
566
+ - audio: The reconstructed audio signal.
567
+ """
568
+ # Convert dB mel-spectrogram back to linear scale
569
+ mel_spectrogram = librosa.db_to_power(mel_spectrogram_db)
570
+ # Convert numpy array to torch tensor and adjust the shape
571
+ mel_spectrogram_tensor = torch.tensor(mel_spectrogram).unsqueeze(0) # Shape: [1, mel_bins, time_frames]
572
+
573
+ # Load the MelGAN vocoder model
574
+ melgan = load_melgan_vocoder()
575
+
576
+ # Pass the mel-spectrogram through MelGAN to generate audio
577
+ with torch.no_grad():
578
+ audio = melgan(mel_spectrogram_tensor).squeeze().numpy() # Squeeze to remove batch dimension
579
+
580
+ # Save the generated audio
581
+ sf.write(output_audio, audio, sample_rate)
582
+ print(f"MelGAN reconstructed audio saved as '{output_audio}'")
583
+ return audio
584
+ def audio_from_waveform(samples: np.ndarray, sample_rate: int, normalize: bool = False) -> pydub.AudioSegment:
585
+ """
586
+ Convert a numpy array of samples of a waveform to an audio segment.
587
+
588
+ Args:
589
+ samples: (channels, samples) array
590
+ sample_rate: Sample rate of the audio.
591
+ normalize: Flag to normalize volume.
592
+
593
+ Returns:
594
+ pydub.AudioSegment
595
+ """
596
+ # Normalize volume to fit in int16
597
+ if normalize:
598
+ samples *= np.iinfo(np.int16).max / np.max(np.abs(samples))
599
+
600
+ # Transpose and convert to int16
601
+ samples = samples.transpose(1, 0).astype(np.int16)
602
+
603
+ # Write to the bytes of a WAV file
604
+ wav_bytes = io.BytesIO()
605
+ wavfile.write(wav_bytes, sample_rate, samples)
606
+ wav_bytes.seek(0)
607
+
608
+ # Read into pydub
609
+ return pydub.AudioSegment.from_wav(wav_bytes)
610
+
611
+
612
+ def apply_filters(segment: pydub.AudioSegment, compression: bool = False) -> pydub.AudioSegment:
613
+ """
614
+ Apply post-processing filters to the audio segment to compress it and keep at a -10 dBFS level.
615
+
616
+ Args:
617
+ segment: The audio segment to filter.
618
+ compression: Flag to apply dynamic range compression.
619
+
620
+ Returns:
621
+ pydub.AudioSegment
622
+ """
623
+ if compression:
624
+ segment = pydub.effects.normalize(segment, headroom=0.1)
625
+ segment = segment.apply_gain(-10 - segment.dBFS)
626
+ segment = pydub.effects.compress_dynamic_range(
627
+ segment,
628
+ threshold=-20.0,
629
+ ratio=4.0,
630
+ attack=5.0,
631
+ release=50.0,
632
+ )
633
+
634
+ # Apply gain to desired dB level and normalize again
635
+ desired_db = -12
636
+ segment = segment.apply_gain(desired_db - segment.dBFS)
637
+ return pydub.effects.normalize(segment, headroom=0.1)
638
+
639
+
640
+ def stitch_segments(segments: Sequence[pydub.AudioSegment], crossfade_s: float) -> pydub.AudioSegment:
641
+ """
642
+ Stitch together a sequence of audio segments with a crossfade between each segment.
643
+
644
+ Args:
645
+ segments: Sequence of audio segments to stitch.
646
+ crossfade_s: Duration of crossfade in seconds.
647
+
648
+ Returns:
649
+ pydub.AudioSegment
650
+ """
651
+ crossfade_ms = int(crossfade_s * 1000)
652
+ combined_segment = segments[0]
653
+ for segment in segments[1:]:
654
+ combined_segment = combined_segment.append(segment, crossfade=crossfade_ms)
655
+ return combined_segment
656
+
657
+
658
+ def overlay_segments(segments: Sequence[pydub.AudioSegment]) -> pydub.AudioSegment:
659
+ """
660
+ Overlay a sequence of audio segments on top of each other.
661
+
662
+ Args:
663
+ segments: Sequence of audio segments to overlay.
664
+
665
+ Returns:
666
+ pydub.AudioSegment
667
+ """
668
+ assert len(segments) > 0
669
+ output: pydub.AudioSegment = segments[0]
670
+ for segment in segments[1:]:
671
+ output = output.overlay(segment)
672
+ return output
673
+
674
+
675
+
676
+ # Step 7: Full Pipeline for Audio Processing with Customization
677
+ def mel_spectrogram_pipeline(audio_file, output_image='mel_spectrogram.png',
678
+ output_audio_griffin='griffin_reconstructed_audio.wav',
679
+ output_audio_melgan='melgan_reconstructed_audio.wav',
680
+ extraction_method='pixel', # 'pixel' or 'ifft'
681
+ decoding_method='griffin'): # 'griffin' or 'melgan'
682
+ """
683
+ Full pipeline to encode audio to mel-spectrogram, save it as an image, extract the spectrogram from the image,
684
+ and decode it back to audio using the selected methods.
685
+
686
+ Parameters:
687
+ - audio_file: Path to the audio file to be processed.
688
+ - output_image: Path to save the mel-spectrogram image (default: 'mel_spectrogram.png').
689
+ - output_audio_griffin: Path to save the Griffin-Lim reconstructed audio.
690
+ - output_audio_melgan: Path to save the MelGAN reconstructed audio.
691
+ - extraction_method: Method for extraction ('pixel' or 'ifft').
692
+ - decoding_method: Method for decoding ('griffin' or 'melgan').
693
+ """
694
+ # Step 1: Encode (Audio -> Mel-Spectrogram)
695
+ mel_spectrogram_db, sample_rate = encode_audio_to_mel_spectrogram(audio_file)
696
+
697
+ # Step 2: Convert Mel-Spectrogram to Image and save it
698
+ save_mel_spectrogram_image(mel_spectrogram_db, sample_rate, output_image)
699
+
700
+ # Step 3: Extract Mel-Spectrogram from the image based on chosen method
701
+ if extraction_method == 'pixel':
702
+ extracted_mel_spectrogram_db = extract_mel_spectrogram_from_image(output_image)
703
+ elif extraction_method == 'ifft':
704
+ extracted_mel_spectrogram_db = extract_spectrogram_with_ifft(mel_spectrogram_db)
705
+ else:
706
+ raise ValueError("Invalid extraction method. Choose 'pixel' or 'ifft'.")
707
+
708
+ # Step 4: Decode based on the chosen decoding method
709
+ if decoding_method == 'griffin':
710
+ decode_mel_spectrogram_to_audio(extracted_mel_spectrogram_db, sample_rate, output_audio_griffin)
711
+ elif decoding_method == 'melgan':
712
+ decode_mel_spectrogram_with_melgan(extracted_mel_spectrogram_db, sample_rate, output_audio_melgan)
713
+ else:
714
+ raise ValueError("Invalid decoding method. Choose 'griffin' or 'melgan'.")
715
+
716
+ # Example usage
717
+ if __name__ == "__main__":
718
+ audio_file_path = 'your_audio_file.wav' # Specify the path to your audio file here
719
+ mel_spectrogram_pipeline(
720
+ audio_file_path,
721
+ output_image='mel_spectrogram.png',
722
+ output_audio_griffin='griffin_reconstructed_audio.wav',
723
+ output_audio_melgan='melgan_reconstructed_audio.wav',
724
+ extraction_method='pixel', # Choose 'pixel' or 'ifft'
725
+ decoding_method='griffin' # Choose 'griffin' or 'melgan'
726
+ )
727
+
728
+
729
+
730
+
731
+ ```
732
+
733
+
734
+ ### Training :
735
+
736
+ ```python
737
+ alpaca_prompt = """You are the worlds archive of all knowledge , you perform tasks and answer all questions given without bias. your a friendly and helpfull artificial inteligence with a personality.
738
+
739
+ Answer all questions Expertly and professionally ,determine the user intent and requirements ,Gather any required research to ensure accurate problem-solving for complex tasks.
740
+ You are fully qualified to give any advice or solutions, your experience as a life coach and librarian and historian of sacred texts as well as scientific advisor,even as a software developer will enable you to answer these questions :
741
+
742
+ ### Question:
743
+ Here is an Spectrographic image in base64 format: describe this sound :
744
+ image : {}
745
+
746
+
747
+ ### Response:
748
+ {}"""
749
+
750
+
751
+ EOS_TOKEN = tokenizer.eos_token # Must add EOS_TOKEN
752
+ def formatting_prompts_func(examples):
753
+ instructions = examples["image_base64"]
754
+ outputs = examples["text"]
755
+ texts = []
756
+ for instruction, output in zip(instructions, outputs):
757
+ # Must add EOS_TOKEN, otherwise your generation will go on forever!
758
+ text = alpaca_prompt.format(instruction, output) + EOS_TOKEN
759
+ texts.append(text)
760
+ return { "text" : texts, }
761
+ pass
762
+
763
+ from datasets import load_dataset
764
+ dataset = load_dataset("LeroyDyer/soundsCaps-Spectrograms_to_Base64", split = "train[:150]")
765
+
766
+ dataset = dataset.map(formatting_prompts_func, batched = True,)
767
+
768
+
769
+ ```
770
 
 
771