Shivam Garg
Verified Expert in Engineering
Computer Vision Engineer and Developer
Shivam是一名高级人工智能工程师,在深度学习和人工智能方面拥有4年以上的实践经验. 精通TensorFlow等各种深度学习框架, PyTorch, and Keras, he excels in generative AI, Stable Diffusion, and large language models (LLMs). Furthermore, Shivam因其在经典计算机视觉和机器学习方面的广泛专业知识而脱颖而出.
Portfolio
Experience
Availability
Preferred Environment
Python, PyTorch, TensorFlow, Deep Learning, Generative Artificial Intelligence (GenAI), Stable Diffusion, Computer Vision, Natural Language Processing (NLP), Docker, LangChain, Large Language Models (LLMs), Machine Learning, Data Science, Image Generation, Chatbots, Generative Pre-trained Transformers (GPT), OpenAI GPT-3 API, Generative Pre-trained Transformer 3 (GPT-3), Notion, APIs, Software Architecture, Events, LSTM, BERT, Reinforcement Learning, Falcon, 2D, JavaScript, Text to Speech (TTS), Generative AI
The most amazing...
...我交付的生成式人工智能模型使用Stable Diffusion和LLMs来动画新闻文章中的故事,并帮助获得Y Combinator的资金.
Work Experience
Senior AI Consultant
Self-employed
- 利用ControlNet开发了一个稳定的扩散模型,将草图转换为具有姿态输入条件的逼真图像. 通过Lora对交叉注意层进行调整,以优化训练模型的空间要求.
- 使用稳定扩散和llm提供生成式AI模型, capable of generating animated stories from news articles, which secured Y Combinator fundraising for the client.
- 开发了一种独特的方法,通过对未配对的动物图像进行GAN训练,将动物图像转换为动画, leveraging StyleGAN architecture, 并使用CLIP和特征提取器增强输出.
- 构建了一个系统,通过稳定扩散和深度估计,使用选择性3D喷漆将非可替换代币(nft)的2D图像转换为3D模型.
- 使用微调等技术开发文本到美术的系统, autoencoders, and prompt engineering, 成功地从文本描述中生成具有视觉吸引力的艺术.
- 在印度创建了一个使用ML和自然语言处理(NLP)来检测和分类假新闻的系统. Preprocessed text data, employed SetFit and long short-term memory (LSTM) models, and created an ensemble for precise identification.
- 使用Langchain的OpenAI ada模型嵌入和FAISS构建了一个工具,可以在美国专利商标局(USPTO)的数据库中搜索类似的专利,改进了专利嵌入的索引和搜索.
- 通过LLM (ada模型)将CLIP模型的视觉嵌入与ocr派生的文本嵌入进行比较,创建了一个电子商务产品匹配系统, enhancing accuracy and efficiency.
AI Engineer 3
Avatarin Inc
- 创建了一个系统,通过模仿学习和OpenCV辅助人类汉字书写,使用汉字视频生成汉字图像,预测机器人手臂的姿势.
- Automated health records and invoices for Yale University, 利用OCR和OpenCV从各种健康文档中提取文本,并将其转换为数字格式.
- 实现了一个使用VideoMAE检测机场可疑活动的模型. 它优先考虑在客户端的Linux服务器上进行高精度、低延迟和高效的部署.
- Shot detection using YOLOv5, OpenCV for object detection, 和VideoMAE为世界乒乓球组织在TT比赛中进行击球识别.
Senior AI Engineer
AlphaICs
- 利用一阶模型实现了一个运动传递系统, 在保持目标面部的身份和面部表情的同时,实现面部之间的高质量运动转移.
- 构建了4位和8位量化软件开发工具包(SDK), 能够在边缘(基于cpu的)硬件上高效地实现和优化深度学习模型, which enhanced performance and capabilities.
- 使用针对物联网和自定义边缘设备的定制量化和优化SDK对不同的计算机视觉和生成模型进行基准测试.
- Worked on brain image segmentation using deep learning, 其中包括训练神经网络来准确识别和分类与阿尔茨海默病有关的大脑图像中的结构. I've used segmentation and computer vision techniques.
- 推出了一款使用激光雷达数据和VoxelNet算法的自动驾驶汽车3D目标检测和跟踪系统, 增强车辆在3D环境中的感知和跟踪能力.
- 利用You Only Look Once (YOLO)架构开发了红外目标检测系统, 在红外图像中实现对目标的高精度探测,提供可靠的识别和跟踪能力.
- 创建了一个卫星图像分割系统,用于使用U-Net和Mask R-CNN模型的级联来检测农田, 改善农业分析和决策过程.
Machine Learning Engineer
UnrealAI
- 使用OpenPifPaf在Android平台上开发并部署实时瑜伽姿势估计, achieving accurate results for Indian yoga poses. 优化推理速度,将模型转换为TensorFlow Lite格式,实现无缝集成.
- Created a topic modeling model, 利用LDA和NMF算法从文本语料库中提取潜在主题, and applied clustering algorithms to group similar topics, 提供对文本文档更好的理解和组织.
- 建立了一个计算机视觉系统,可以准确地检测厨房里的物品, with high accuracy and low latency. 该系统针对移动设备的实时性能进行了优化.
- 使用监督异常检测集合检测所得税欺诈, unsupervised clusterin, and rule-based backtracking.
Experience
法律聊天机器人与RAG,松果集成,流光用户界面,和GPT-4
Personalized Art Generation Bot
NFT Image to Immersive 3D
选择性三维补图涉及到在二维图像中填充缺失或损坏区域的高级过程, 从而产生一个完整的和视觉上吸引人的3D表示. 这种技术有助于提高生成的3D模型的整体质量和真实感.
深度估计是该系统的另一个关键组成部分,因为它可以从2D图像中确定空间深度信息. 这种深度信息对于在生成的3D模型中创建深度感和视角至关重要.
By leveraging Stable Diffusion, 系统保证了稳定一致的发电过程, 从nft的2D对应物中提供高质量和准确的3D表示. 由此产生的3D模型可以显著丰富用户在各种应用中的观看和交互体验, 从虚拟画廊到增强现实环境.
News to Infographics
这个过程从新闻文章开始,首先使用GPT-3进行总结.5 Turbo and Davinci, facilitated by LangChain. 随后,视频生成使用微调稳定扩散2.1技术,导致引人入胜的和动态的视觉呈现的新闻故事.
Yoga Pose Correction
经过深思熟虑的训练模型被量化并转换为TensorFlow Lite格式,以增强可用性和集成. 这种转换简化了将模型整合到Android应用程序中的过程, 为瑜伽爱好者提供一个用户友好的工具来完善他们的练习,并深入了解不同的姿势.
边缘的全整数量化感知训练系统与方法
我开发了伪交叉熵损失函数,并设计了量化方案,用于纯整数量化感知训练. Additionally, 开发了一个SDK,使该系统能够在低功耗边缘计算设备上使用. 该SDK已经成功地用于量化Jetson和供应商定制硬件上的模型.
Fake News Classification
The project involved preprocessing text data, employing the SetFit model and LSTM, 开发SetFit和LSTM的集合来准确识别假新闻.
此外,使用k-means聚类对假新闻的类型进行聚类. 最终目标是创建一个可靠的工具来打击错误信息的传播. The environment used for this project included Linux, TensorFlow, k-means clustering, scikit-learn, Python, and SetFit.
Text-to-video Generation for Mathematical Equations
Education
Bachelor of Technology Degree in Computer Science
信息,通信和技术学院-德瓦尔卡,德里,印度
Skills
Libraries/APIs
PyTorch, TensorFlow, Scikit-learn, SpaCy, OpenCV, Pandas, LSTM, Google Speech-to-Text API, Keras, Fast.ai
Tools
You Only Look Once (YOLO), Git, ChatGPT, Notion, Haystack, Azure Machine Learning, Whisper, Amazon SageMaker, Google Bard
Frameworks
Flask, LlamaIndex, Django, Streamlit
Languages
Python, C++, Falcon, JavaScript, Bash Script
Paradigms
Data Science, ETL, Azure DevOps, Continuous Development (CD), Continuous Integration (CI), Search Engine Optimization (SEO)
Platforms
Docker, AWS IoT, Google Cloud Platform (GCP), AWS Lambda, Amazon EC2, HubSpot, iOS, Linux, Amazon Web Services (AWS), Civitai, Azure
Storage
MySQL, MongoDB, Databases
Other
Deep Learning, Generative Artificial Intelligence (GenAI), Stable Diffusion, Computer Vision, Natural Language Processing (NLP), Quantization, Models, TensorFlow Light, Machine Learning, LangChain, Statistics, Depth Estimation, Time Series, Hugging Face, Detectron, Generative Pre-trained Transformers (GPT), GPT, Large Language Models (LLMs), Artificial Intelligence (AI), OCR, Convolutional Neural Networks (CNN), Image Processing, OpenAI GPT-4 API, OpenAI GPT-3 API, Text to Image, Diffusion Models, NLU, Deep Neural Networks, Language Models, Statistical Analysis, Data Analysis, Image Analysis, Image Generation, Chatbots, Generative Pre-trained Transformer 3 (GPT-3), Llama 2, Text Analytics, Model Development, Video & Audio Processing, OpenAI, APIs, HubSpot CRM, Retrieval-augmented Generation (RAG), Supervised Learning, Unsupervised Learning, Leadership, Software Architecture, Events, BERT, Reinforcement Learning, PEFT, 2D, Speech to Text, Point Clouds, Point Cloud Data, Text to Speech (TTS), Generative AI, NVIDIA TensorRT, FastAPI, Pose Estimation, 3D Reconstruction, DreamBooth, LoRa, Generative Adversarial Networks (GANs), K-means Clustering, Edge AI, Open Neural Network Exchange (ONNX), Prunning, Benchmarking, Object Detection, Machine Learning Operations (MLOps), Product Matching, Prompt Engineering, ControlNet, Gradio, Videos, Conversational AI
How to Work with Toptal
在数小时内,而不是数周或数月,我们的网络将为您直接匹配全球行业专家.
Share your needs
Choose your talent
Start your risk-free talent trial
Top talent is in high demand.
Start hiring