Efficient Data Extraction for Freelancers

Unlock the potential of data extraction projects on top freelancing sites with Vollna. Experience tailored filters, instantaneous alerts, and insightful analytics to boost your success.
Signup for free to get access to all filter attributes and instant notifications when new jobs are posted.
Setup filter
"Data Extraction"



Get access to over 30+ filter attributes, setup instant notifications, integrate with your CRM and marketing tools, and more.
Start free trial
714 projects published for past 72 hours.
Job Title Budget
Data scraper / data extraction specialist to build up a data directory
not specified 11 hours ago
Client Rank - Good

Payment method verified
$3 026 total spent
10 hires
37 jobs posted
27% hire rate,
4.35 of 4 reviews
HK Hong Kong
Good
This role primarily involves scraping data from various websites, followed by crucial data cleaning, updating, duplicate checking, and validation to ensure accuracy and usability.

We are looking for someone who can consistently deliver high-quality results and become a valuable extension of our team for this essential data work.

Responsibilities:

Perform web scraping using to extract data from designated websites based on provided criteria.
Clean scraped data to remove inconsistencies, errors, and irrelevant information.
Update existing datasets with newly scraped or corrected information.
Conduct thorough duplicate checks to maintain data integrity.
Validate data against source websites or other reliable sources to ensure accuracy.
Organize and format data in a clear and usable manner, typically a database.
Communicate progress and any data-related issues encountered.

Required Skills:

Proven experience with web scraping techniques and tools.
Strong data cleaning and manipulation skills.
Excellent attention to detail and accuracy.
Ability to identify and resolve data inconsistencies.
Good communication skills and the ability to follow instructions precisely.
Reliable internet connection and a suitable work environment.

Preferred Qualifications:

Previous experience with similar data scraping and cleaning projects on Upwork.
Experience with data validation techniques.
Skills: API Integration, Data Scraping, Data Mining
Budget: not specified
11 hours ago
  • Data Science & Analytics, Data Extraction/ETL
CS2 demo to video
not specified 11 hours ago
Client Rank - Excellent

Payment method verified
$9 013 total spent
112 hires
138 jobs posted
81% hire rate,
4.99 of 80 reviews
FR France
Excellent
Hi there,

I am looking for someone who can write a script to generate a video for CS2 (Counter Strike 2) given:
- a demo file
- a timestamp to start
- a duration
- the ID/name of the player as POV for the video

The process should be fully automated from the CLI, I shouldn't have to do any manual step. Start your reply with cs2 so I know you read this. I know this is possible because some websites already do it (see clutchkings dot gg).

I will only hire people who can show me they have a working solution (we will do a screenshare, I will give you the demo file + the details, and you will show me that your script can generate the video).

Thanks for taking the time to read,
John
Skills: Scripting, Automation, Data Extraction, Gaming, Video Production
Budget: not specified
11 hours ago
  • Web, Mobile & Software Dev, Scripts & Utilities
Twitter Scrapper for Data - Urgent
not specified 10 hours ago
Client Rank - Excellent

Payment method verified
$5 459 total spent
66 hires
136 jobs posted
49% hire rate,
4.69 of 27 reviews
US United States
Excellent
Hey, I need someone who can build a scrapper. Need approx 5000 rows of data, Need someone who can start right now and finish asap
Skills: Data Scraping, Python, Data Mining, Data Extraction
Budget: not specified
10 hours ago
  • Data Science & Analytics, Data Mining & Management
Data Annotation / Labelling Expert – General Virtual Assistant Needed
50 USD 10 hours ago
Client Rank - Excellent

Payment method verified
$14 650 total spent
550 hires
404 jobs posted
100% hire rate,
4.99 of 511 reviews
US United States
Excellent
We are seeking a detail-oriented Data Annotation and Labelling Expert to join our team as a General Virtual Assistant. In this role, you will work on preparing and labelling data (images, videos, text, or audio) to support machine learning and AI projects. You will follow clear guidelines to tag, sort, and organize large datasets accurately.
Skills: Data Entry, Lead Generation, List Building, Online Research, Administrative Support, Accuracy Verification, Prospect List, Virtual Assistance, Data Extraction, Google Sheets, Data Collection, Email Campaign Setup, Social Media Management, Article Writing, Email Copywriting
Fixed budget: 50 USD
10 hours ago
  • Admin Support, Virtual Assistance
OCR Accuracy Improvement with Tesseract
~543 - 1,086 USD 10 hours ago
Client Rank - Excellent

Payment method verified
$42 440 total spent
28 hires, 1 active
1 open job
4.99 of 23 reviews
Registered: November 22, 2009
CA Canada
Excellent
I'm looking for a skilled developer to enhance our current Tesseract OCR setup in PHP. We're experiencing issues with accuracy, particularly in handling shaded or low-quality images.

Key requirements include:
- Improve text detection accuracy
- Optimize image preprocessing for better readability
- Refine regex patterns for more reliable data extraction

Familiarity with Tesseract and image processing is ideal. Open to exploring other tools if they yield better results.

Skills: PHP, OCR
Fixed budget: 750 - 1,500 CAD
10 hours ago
  • Websites, IT & Software, Design, Media & Architecture, OCR
Data extraction from a website
1,000 USD 10 hours ago
Client Rank - Excellent

Payment method verified
$54 232 total spent
117 hires
344 jobs posted
34% hire rate,
4.58 of 80 reviews
US United States
Excellent
Need someone who can write scripts for data extraction. Something that can bypass captchas as well as using rotating IP addresses.

We have a list of 120 million names that need to be used to match and find additional data.

Only apply if you have big time experience.
Skills: Python, Data Extraction, Data Mining
Fixed budget: 1,000 USD
10 hours ago
  • Data Science & Analytics, Data Extraction/ETL
Document Creation from Data File
100 USD 9 hours ago
Client Rank - Excellent

Payment method verified
$32 962 total spent
237 hires
276 jobs posted
86% hire rate,
4.97 of 155 reviews
IE Ireland
Excellent
We are seeking a skilled freelancer to create 37 individual MS Word documents based on a larger data file. The ideal candidate will have experience in document formatting and data extraction, ensuring each document is accurately created and adheres to our specifications.

Each MSWORD file should have the name of the player, the Waterford Team crest and their associated strengths and areas for improvement all in one document for the player to review.

Attention to detail and organizational skills are essential to successfully complete this task. If you have a strong command of MS Word and can work efficiently, we would love to hear from you!

Overall documents:
Strength:
Word
https://docs.google.com/document/d/1NSdPXUqGGildx83yhnICOSy78X0lr58S/edit?usp=drive_link&ouid=115093592624587340842&rtpof=true&sd=true
Table excel
https://docs.google.com/spreadsheets/d/1tr8yjOihW4aw90BGMs5YZU5VpOmG0Bsf/edit?usp=drive_link&ouid=115093592624587340842&rtpof=true&sd=true

Improvement:
Word
https://docs.google.com/document/d/1BHcKR15_DVcRNMoIcMMNfXkzGex5N-Bz/edit?usp=sharing&ouid=115093592624587340842&rtpof=true&sd=true
Table excel
https://docs.google.com/spreadsheets/d/1Sgs8HSUYheuO96LyOy-SKtZk-kIDqLVx/edit?usp=drive_link&ouid=115093592624587340842&rtpof=true&sd=true
Skills: Data Entry, Microsoft Excel, Google Docs, Microsoft Word, PDF Conversion
Fixed budget: 100 USD
9 hours ago
  • Admin Support, Data Entry & Transcription Services
Extract Data from PDF Documents using AI System
not specified 9 hours ago
Client Rank - Excellent

Payment method verified
$149 267 total spent
56 hires
149 jobs posted
38% hire rate,
4.99 of 17 reviews
US United States
Excellent
I need to extract data from non-scanned PDF documents.

Each document contains multiple data records, with each data record having up to approximately 10 data fields (eg, Brand Name, Property Name, Address, City, State, Zip, Country, Franchisee, Franchisee Contact, Franchisee Phone).

Although the data in each document is similar, not all documents follow the same format. There are approximately 11 different Document Formats.

The scope of this project is as follows.

1) Create a process for extracting data from these documents using an established, AI data extraction system (such as Unstract, Amazon Textract, or similar).

2) Process the first batch of approximately 115 documents and ensure that the project is functioning accurately.

3) Instruct the client how to execute the process for future batches of documents.

4) The client will create an account with the selected processor and pay all processing fees.

Processing Requirements

1) The structured data output should be in CSV format.

2) Preferably, each batch of processed documents will be compiled into a single output file. Alternatively, there can be 11 separate CSV files, with each file containing all records from documents having the same Document Formats.

3) The process should run on a cloud-based system.

4) The processing service should be pay-as-you-go. Typically, I am processing only 115 document annually, and all documents are processed at the same time, so a subscription is not needed for this project.
Skills: Machine Learning, Data Extraction
Budget: not specified
9 hours ago
  • Data Science & Analytics, Data Extraction/ETL
I need someone with experience for data extraction (Characteristics, Baseline and Outcomes) for SRMA
50 USD 9 hours ago
Client Rank - Medium

Payment method verified
no reviews
UA Ukraine
Medium
I need an expert to do data extraction for 7 studies for a SRMA study
Skills: Data Extraction, Spreadsheet Software
Fixed budget: 50 USD
9 hours ago
  • Data Science & Analytics, Data Extraction/ETL
Scraping Booking.com
not specified 9 hours ago
Client Rank - Risky

Payment method not verified
no reviews
FR France
Risky
I would like to obtain an Excel file that retrieves the following information from Booking.com:
• A list of all hotels available in Paris, regardless of the booking date
• The rating of each hotel (out of 10)
• The category of each hotel (e.g., 3 stars)
Skills: Python, Data Mining, Data Scraping, Data Extraction, Scrapy, Data Science, Data Collection, Web Crawler, ChatGPT, OpenAI API, Web Scraping, Data Entry, Web Scraping Framework, Scraper Site
Budget: not specified
9 hours ago
  • Data Science & Analytics, Data Mining & Management
Extract Data from PDF Documents using AI, LLM (Unstract, Textract, Google Document AI, etc.)
350 USD 9 hours ago
Client Rank - Excellent

Payment method verified
$149 267 total spent
56 hires
149 jobs posted
38% hire rate,
4.99 of 17 reviews
US United States
Excellent
I need to extract data from non-scanned PDF documents.

Each document contains multiple data records, with each data record having up to approximately 10 data fields (eg, Brand Name, Property Name, Address, City, State, Zip, Country, Franchisee, Franchisee Contact, Franchisee Phone).

Although the data in each document is similar, not all documents follow the same format. There are approximately 11 different Document Formats.

The scope of this project is as follows.

1) Create a process for extracting data from these documents using an established, AI data extraction system (such as Unstract, Amazon Textract, Google Document AI or similar).

2) Process the first batch of approximately 115 documents and ensure that the project is functioning accurately.

3) Instruct the client how to execute the process for future batches of documents.

4) The client will create an account with the selected processor and pay all processing fees.

Processing Requirements

1) The structured data output should be in CSV format.

2) Preferably, each batch of processed documents will be compiled into a single output file. Alternatively, there can be 11 separate CSV files, with each file containing all records from documents having the same Document Formats.

3) The process should run on a cloud-based system.

4) The processing service should be pay-as-you-go. Typically, I am processing only 115 document annually, and all documents are processed at the same time, so a subscription is not needed for this project.
Skills: Document AI, Machine Learning, Data Extraction, Artificial Intelligence
Fixed budget: 350 USD
9 hours ago
  • Data Science & Analytics, Data Extraction/ETL
Create trading bot for pocket option binary trading
not specified 9 hours ago
Client Rank - Risky

Payment method not verified
no reviews
FI Finland
Risky
I am looking for an experienced Python developer to build a browser automation bot using Selenium that can trade on Pocket Option, a binary options platform.

✅ Key Tasks:
Manual log in to Pocket Option
Navigate to the trading interface
Select the trading pair (e.g., EUR/USD)
have some condition and check logic matching in the candles
Set trade duration (e.g., 10 seconds)
Enter trade amount
Click Buy (UP) or Sell (DOWN) based on strategy signals
Optionally read strategy logic from an external Python file or webhook
I will explain the logic once we actually start to work to gether

🎯 Requirements:
Proficient in Python and Selenium WebDriver
knowledge of XPath/CSS selectors
Experience with browser automation, especially for trading platforms
Ability to handle sessions, page loads, and interaction timing
Knowledge of technical indicators (RSI, MACD, etc.) is a plus

🛠 Deliverables:
Fully functional browser trading bot for pocket option
Instructions to run the bot (and modify logic)
Works reliably in demo mode (real mode later)

🔐 Security Note:
We do not share credentials. Developer must work with manual login or session cookies.

📅 Timeline:
what is the earliest possible you can provide?
Skills: Bot Development, Web Scraping, Data Extraction, Data Scraping, Cryptocurrency Trading, Forex Trading, Trading Automation, Trading Strategy, Python, Selenium, Beautiful Soup, Browser Automation, Web Crawling, Scrapy, Web Scraping Software
Budget: not specified
9 hours ago
  • Data Science & Analytics, Data Extraction/ETL
N8n
15 - 47 USD / hr
8 hours ago
Client Rank - Excellent

Payment method verified
$16 733 total spent
21 hires
33 jobs posted
64% hire rate,
3.78 of 11 reviews
TR Turkey
Excellent
n8n Automation Expert Needed for AI Chatbot CRM Integration (Supabase + Zoho/Salesforce

We are developing a SaaS-based AI chatbot platform for universities that automates the full student journey — from inquiry to application to registration.

Our chatbot (Mr. SIT) is already live, integrated with Supabase and OpenAI via n8n. We now need an experienced n8n expert to complete Phase 2 and 3 of our automation roadmap.

What’s Already Done
- Chatbot integrated with Supabase + Langchain + OpenAI
- OCR-based passport scanning and student data extraction
- Application submission flow (stored in Supabase)
- Status lookup by passport
- Vector search across knowledge base & programs

Your Tasks
-Create CRM automation flows (Zoho CRM or Salesforce)
-Push student data from chat into CRM
-Build logic for issuing acceptance letters and tracking -deposit/payment stages
-Integrate chatbot and admin panel workflows using n8n
- Assist with chatbot training panel, feedback, and analytics automation

Tech Stack
-n8n (self-hosted on Coolify)
-Supabase (Postgres + vector DB)
-Langchain + OpenAI
-Zoho CRM / Salesforce API
- Frontend: React-based SaaS dashboard (being designed)

Requirements
-Proven n8n experience
-Strong API automation background (especially Zoho/Salesforce)
-Workflow structuring, conditional logic, webhook handling
-English communication skills
-Bonus: Langchain or AI agent familiarity

🎥 IMPORTANT: How to Apply
You must include a video (2–5 minutes) where you:

-Show your past n8n workflows
-Explain your automation logic or API usage
-Demonstrate confidence with complex workflows
-Applications without a video will not be considered.

Project Info
-Start: ASAP
-Duration: 3–5 weeks, with long-term opportunity
-Budget: Competitive, based on experience

We’re excited to work with someone who can bring this automation to life. Looking forward to your proposals!
Skills: AI Agent Development, AI Development, LangChain, n8n, Zoho CRM, Supabase
Hourly rate: 15 - 47 USD
8 hours ago
  • Web, Mobile & Software Dev, Web Development
Web data extraction and cleaning
50 USD 8 hours ago
Client Rank - Excellent

Payment method verified
$3 148 total spent
27 hires
17 jobs posted
100% hire rate,
4.84 of 20 reviews
IE Ireland
Excellent
To create web scraping code and clean out the extracted data in Python. The website with properties or other product tbc based on allowances.
Skills: Data Scraping, Python
Fixed budget: 50 USD
8 hours ago
  • Data Science & Analytics, Data Extraction/ETL
Google Sheets Database System Development
600 USD 8 hours ago
Client Rank - Good

Payment method verified
$3 538 total spent
8 hires
11 jobs posted
73% hire rate,
5.00 of 5 reviews
US United States
Good
📋 What You'll Build (Simple Version)
You'll create a Google Sheets system that works like a digital filing cabinet for a service business. The system will track clients, cases/projects, and finances.
Imagine it like this:

2 big binders (workbooks) with tabs and folders
Forms that fill out the binders automatically
Everything connects like a spider web

🎯 Job Overview
Position: Google Sheets System Builder
Time Estimate: 20-30 hours
Pay Rate: Competitive (depending on experience)
Purpose: Replace paper forms with easy digital system
✅ Skills You MUST Have

Google Sheets Expert ⭐⭐⭐⭐⭐

Know how to make formulas
Can create data connections
Understand drop-down lists


Google Forms Creator ⭐⭐⭐⭐

Make forms that save to Sheets
Know about question types


Basic Understanding of:

Client data organization
Simple financial tracking
Workflow automation



🔨 STEP-BY-STEP BUILD INSTRUCTIONS
STEP 1: Create First Workbook (Client Database)
✅ DO:

Start by creating a workbook called "Client Database"
Create these 3 tabs (sheets) inside:

"Client Management"
"Project Tracking"
"Service Checklists"



❌ DON'T:

Don't put financial data in this workbook
Don't mix client data with money data

STEP 2: Build Client Management Sheet
Data to Extract from Client Profile Form:

Client name (columns: Last, First, Middle)
Contact info: Address, Email, Phone
Project/Case number (format: [Prefix]-XXXXXXXX)
Service needed
Payment status
Important dates

✅ DO:

Create columns for each piece of data
Use data validation for drop-downs

Example: Status = "Active, Pending, Closed"


Add color coding for payment status

Green = Paid
Yellow = Partial
Red = Overdue



❌ DON'T:

Don't make text fields where drop-downs work better
Don't forget to freeze the header row

STEP 3: Build Project Tracking Sheet
Data to Extract from Milestone Tracker:

Project/Case number linking
5 milestone phases
Task descriptions by phase
Due dates and completion dates
Status indicators

✅ DO:

Create milestone tracking with:

Drop-down for status: "Not Started, In Progress, Complete"
Date fields that auto-calculate days remaining
Color-coded progress bars


Link to Client Management using project number
Show deadlines with countdown

❌ DON'T:

Don't make users type project numbers manually
Don't forget to highlight overdue items

STEP 4: Build Service Checklists Sheet
Data to Extract from Pre-Service Checklists:

Different service type checklists
Required document lists
Filing/submission requirements
Next steps

✅ DO:

Create separate checklist templates for each service
Use Google Sheets checkboxes
Show percentage complete
Auto-count checked items

❌ DON'T:

Don't make one giant list
Don't forget to show progress

STEP 5: Create Second Workbook (Financial)
✅ DO:

Create workbook called "Financial Tracking"
Create these 3 tabs:

"Expense Tracking"
"Invoice Tracking"
"Payment Records"



Data from Expense Tracking Template:

Date/Transaction date
Expense categories
Client ID connection
Billable status
Receipt number

STEP 6: Create Google Forms
Form 1: Client Intake Form
✅ DO:

Copy questions from Intake Interview Guide
Make form direct to "Client Management" sheet
Use sections to organize questions
Test submission before finishing

Form 2: Expense Submission Form
✅ DO:

Include all fields from expense template
Add file upload for receipts
Auto-fill staff name
Make it save to Financial workbook

STEP 7: Connect Everything
✅ DO:

Use VLOOKUP or XLOOKUP for connections
Test all dropdown links
Make formulas update in real-time
Create dashboard with summary stats

❌ DON'T:

Don't hardcode values
Don't break links accidentally

STEP 8: Set Permissions
✅ DO:

Client Database = Share with all staff
Financial Database = Limited access only
Make forms public with protection
Test access levels

🧪 Testing Checklist

Enter fake client data and track through system
Fill out form - verify it saves correctly
Check all formulas calculate correctly
Test all dropdown menus
Verify permissions work properly

📦 What to Deliver

2 complete Google Sheets workbooks
4 working Google Forms
Written instructions for basic use
Test data examples

📁 Files to Use
Main Source Files:

Client Profile Sheet.pdf
Case Milestone Tracker.pdf
Pre-Filing Checklists by Service Type.pdf
Expense Tracking Template.pdf
Client Intake Interview Guide.pdf

Addon Reference Files:

Phone Intake Scripts (for quick form)
Filing Status Tracking (for status lists)
Skills: Data Extraction, Google Sheets Automation, Google Workspace, Data Entry, Database Design, Google Sheets, Google Apps Script
Fixed budget: 600 USD
8 hours ago
  • Web, Mobile & Software Dev, Scripts & Utilities
Extract/Scrap the list of Psychiatry Residency programs
50 USD 8 hours ago
Client Rank - Risky

Payment method not verified
no reviews
US United States
Risky
I want to extract the list of all 343 Psychiatry Residency programs from FREIDA (https://freida.ama-assn.org/search/list?spec=43236), including Program Name, Program Director Name, Program Director's Telephone, Program Director's Email, and Location, as well as any other contacts for each program into an Excel file.
Skills: Data Scraping, Microsoft Excel, Data Extraction
Fixed budget: 50 USD
8 hours ago
  • Data Science & Analytics, Data Extraction/ETL
Expert Data Extraction: Compile VS Battles Wiki Character Data from Text Dump to Excel
100 USD 8 hours ago
Client Rank - Risky

Payment method not verified
no reviews
US United States
Risky
Only freelancers located in the U.S. may apply.
Project Summary:
Seeking a skilled data specialist or programmer experienced in text parsing and data extraction to process a large text file containing compiled VS Battles Wiki character profiles. The goal is to accurately extract key data points (Character Name, Combat Tier, and Origin/Franchise) for each character and deliver the results in a clean, structured Excel spreadsheet.
Project Description:
I have compiled the raw text content from numerous individual character profiles on the VS Battles Wiki into a single, large text file. This file serves as the primary source for this data extraction project.
The final deliverable I require is a single Microsoft Excel file (.xlsx) that contains one row for each reliably identifiable unique character from the text dump. The output file should include the following columns:
* Character Name: The primary name for the character. Need logic to identify and select the most appropriate name, potentially handling common aliases or variations if present within a profile block.
* Tier: The character's combat or power Tier. This information is present in the profile text in various formats (e.g., "High 6-C", "Low 2-C | 1-C", "Varies from...", "Unknown"). The extraction needs to be robust to capture these variations accurately and handle cases where the Tier might be missing or explicitly marked as "Unknown".
* Origin: The franchise, series, or source material the character belongs to. This information is also present in the profile text in different formats (e.g., "Origin: [[Franchise Name]]", "Origin: Franchise Name", "Origin:" followed by text on the next line). The extraction should identify the specific franchise name and handle cases where the origin is missing, unclear, or listed generically (e.g., "Characters", "Video Game", "Female"). Prioritize specific franchise names over generic terms or "Unknown".
* (Optional but preferred) URL: If the URL of the character's profile page can be reliably extracted or constructed from the data within the text dump, include it in a separate column.
Input Files I Will Provide:
* Primary Source: A single, large text file (.txt) containing the combined raw text content of all character profiles from the VS Battles Wiki. This file is comprehensive and contains the data that needs to be parsed.
* Supplementary Files (For Reference): I also have two Excel files (.xlsx) that are results of previous partial extraction attempts focusing on different ways Tier and Origin information can be formatted in the profiles. These files can serve as helpful examples of the data variations you will encounter and demonstrate the kind of specific origin/tier values I am looking for. They are supplementary and not the primary source for extraction.
Key Requirements & Expectations:
* Develop and use a script (likely in Python with libraries like re for regex parsing, pandas for data handling) to read and parse the large text file.
* Implement robust parsing logic to extract Character Name, Tier, and Origin based on the diverse formats within the text.
* Apply logic to consolidate data for the same character if they appear multiple times or with slight name variations in the text dump (grouping similar names if necessary).
* Handle missing data or generic origins/tiers appropriately (e.g., mark as "Unknown").
* Ensure all identifiable characters from the text dump are included in the output (aiming for a number potentially over 31,000 unique characters).
* Output a clean, well-organized Excel (.xlsx) file with the specified columns.
* (Optional but preferred) Provide the source code of the extraction script used.
Skills Preferred:
* Data Extraction
* Text Parsing / Data Parsing
* Python
* Regular Expressions (Regex)
* Pandas (or similar data handling library)
* Excel
I will share the input text file and the supplementary files privately with freelancers who send promising proposals or whom you invite to interview.
Skills: Data Extraction, Python, Microsoft Excel
Fixed budget: 100 USD
8 hours ago
  • Data Science & Analytics, Data Extraction/ETL
Web Scraper Needed for Naturopathic Physicians Data Extraction
not specified 7 hours ago
Client Rank - Risky

Payment method not verified
no reviews
US United States
Risky
We are seeking an experienced web scraper to help us gather a list of Naturopathic Physicians who specialize in a specific area of practice. The ideal candidate should have expertise in data extraction and web scraping techniques to ensure accurate and comprehensive results. You will be responsible for identifying the right online databases and platforms to extract the necessary information efficiently. If you have a proven track record in similar projects, we would love to hear from you!
Skills: Data Scraping, Data Mining, Data Extraction, Scrapy
Budget: not specified
7 hours ago
  • Data Science & Analytics, Data Extraction/ETL
Data Entry Specialist for German Job Portals (Full-Time)
5 - 8 USD / hr
7 hours ago
Client Rank - Excellent

Payment method verified
$184 453 total spent
93 hires
92 jobs posted
100% hire rate,
4.89 of 71 reviews
DE Germany
Excellent
We are looking for a disciplined and detail-oriented Data Specialist to extract and transfer applicant data from German job boards into our recruiting system. This is a full-time remote role (8 hours/day, Monday-Friday, 9 AM - 6 PM German time). No German language skills required—just high focus, accuracy, and commitment to repetitive tasks.

Your Responsibilities:
Extract candidate data (names, contact details, work experience) from German job portals (no language skills needed—you’ll work with templates).

Accurately input data into our recruiting system without errors.

Follow strict formatting and organizational guidelines.

Maintain consistent daily output with high attention to detail.

Work independently with minimal supervision.

Requirements:
✔ Disciplined & reliable – Must be available 9 AM - 6 PM German time (CET/CEST), Monday-Friday.
✔ Sharp focus & accuracy – Zero tolerance for sloppy data entry.
✔ Comfort with repetitive tasks – This is a structured, process-driven role.
✔ Stable internet & distraction-free workspace – No multitasking; full focus required.
✔ Basic computer skills – Comfort with web browsers, Excel/Sheets, and data entry tools.

Nice to Have (Not Required):
Experience with ATS/CRM systems (e.g., Greenhouse, Lever).

Familiarity with simple automation (e.g., macros, Zapier).

We Offer:
Long-term remote work with consistent tasks.
Flexible breaks within the 9 AM - 6 PM window.
Competitive pay (fixed or hourly, negotiable based on speed/accuracy).
How to Apply:
Send your Upwork profile + a brief response:
Describe your experience with repetitive, detail-heavy tasks.
Confirm your availability for 9 AM - 6 PM German time.
Skills: Data Extraction, Data Entry
Hourly rate: 5 - 8 USD
7 hours ago
  • Data Science & Analytics, Data Extraction/ETL
AI Engineer Needed to Improve and Expand Demand Letter Generation System (Legal NLP)
60 - 120 USD / hr
7 hours ago
Client Rank - Good

Payment method verified
$1 820 total spent
14 hires
34 jobs posted
41% hire rate,
5.00 of 9 reviews
US United States
Good
We are a legal-tech company with a functioning AI-driven system that generates demand letters for personal injury law firms using structured case data. We're now looking for a highly capable AI/NLP engineer to help us refine, expand, and optimize this system — particularly in the areas of data extraction, dataset creation, template engineering, and model fine-tuning. Heavy focus on Anthropic (Claude).

This is not a project starting from scratch — we already have:
A working backend that ingests structured case data (accident summary, treatment, billing, injuries, etc.)
A process for generating letters using prompt-based LLMs
Hundreds of finalized demand letters from multiple law firms, which will serve as the foundation for dataset creation
A clear product direction and an internal team managing design and deployment

Now we need your help to level it up.

What We’re Looking For:

A machine learning / NLP engineer who can:
Help us create a higher-quality system to extract structured data from existing demand letters using NLP tools or rule-based approaches.
Create a high-quality dataset mapping structured case data to final letter output, forming the foundation for training or templating.
Build or refine dynamic templates that map structured data to high-quality, legally appropriate demand letters
Prompt engineering, embedding techniques, and potential LLM fine-tuning.
Collaborate with our internal team to handle law firm-specific variations in tone, letter structure, and phrasing.
Offer strategic input on the balance between template-driven generation and model-driven generation.
Help improve the consistency, clarity, and legal tone of the generated letters.

Your Skills Should Include:
Strong experience with OpenAI’s GPT API and prompt engineering techniques
Hands-on experience in legal or business document extraction and generation
Ability to work with large sets of unstructured documents and convert them into labeled, structured formats (JSON, CSV)
Understanding of template-based vs. generative approaches — and when to use each
Familiarity with LLM fine-tuning workflows (bonus if you’ve fine-tuned GPT-3.5/4 Turbo or open-source models)
Good communication and the ability to think like a product builder, not just a coder

Bonus Experience:
Experience working with legal tech or insurance-based document workflows
Familiarity with LangChain, RAG systems, and retrieval-augmented generation
Background working with law firms or understanding of U.S. personal injury legal language

What We Provide:
A large volume of real-world, finalized demand letters
A backend system is already built to handle data input
A defined workflow for generating, editing, and reviewing letters
Access to internal support for legal accuracy, formatting, and tone

Project Scope:
Immediate:
Help generate law firm-specific template
Assist in the system understanding how to extract structured data from existing documents and building a usable dataset

Long-term: Assist with improving accuracy, tone control, and automation reliability across different types of personal injury cases

Engagement Details:
Hourly or Milestone-Based depending on your preference
10–20 hours/week to start, with potential for ongoing collaboration

Fully remote, must be fluent in English (US legal context)

To Apply: Please include:
A short overview of your experience with NLP, legal/business document automation, and dataset creation.
Examples of similar projects (especially using GPT, Claude, or open-source LLMs).
Your thoughts on how you’d approach building a structured dataset from our demand letters.
Skills: AI Development, Deep Learning, Artificial Intelligence, Natural Language Processing, Data Science, Machine Learning
Hourly rate: 60 - 120 USD
7 hours ago
  • Data Science & Analytics, AI & Machine Learning
AI Engineer Needed to Improve and Expand Demand Letter Generation System (Legal NLP)
not specified 6 hours ago
Client Rank - Good

Payment method verified
$1 820 total spent
14 hires
34 jobs posted
41% hire rate,
5.00 of 9 reviews
US United States
Good
We are a legal-tech company with a functioning AI-driven system that generates demand letters for personal injury law firms using structured case data. We're now looking for a highly capable AI/NLP engineer to help us refine, expand, and optimize this system — particularly in the areas of data extraction, dataset creation, template engineering, and model fine-tuning. Heavy focus on Anthropic (Claude).

This is not a project starting from scratch — we already have:
A working backend that ingests structured case data (accident summary, treatment, billing, injuries, etc.)
A process for generating letters using prompt-based LLMs
Hundreds of finalized demand letters from multiple law firms, which will serve as the foundation for dataset creation
A clear product direction and an internal team managing design and deployment

Now we need your help to level it up.

What We’re Looking For:

A machine learning / NLP engineer who can:
Help us create a higher-quality system to extract structured data from existing demand letters using NLP tools or rule-based approaches.
Create a high-quality dataset mapping structured case data to final letter output, forming the foundation for training or templating.
Build or refine dynamic templates that map structured data to high-quality, legally appropriate demand letters
Prompt engineering, embedding techniques, and potential LLM fine-tuning.
Collaborate with our internal team to handle law firm-specific variations in tone, letter structure, and phrasing.
Offer strategic input on the balance between template-driven generation and model-driven generation.
Help improve the consistency, clarity, and legal tone of the generated letters.

Your Skills Should Include:
Strong experience with OpenAI’s GPT API and prompt engineering techniques
Hands-on experience in legal or business document extraction and generation
Ability to work with large sets of unstructured documents and convert them into labeled, structured formats (JSON, CSV)
Understanding of template-based vs. generative approaches — and when to use each
Familiarity with LLM fine-tuning workflows (bonus if you’ve fine-tuned GPT-3.5/4 Turbo or open-source models)
Good communication and the ability to think like a product builder, not just a coder

Bonus Experience:
Experience working with legal tech or insurance-based document workflows
Familiarity with LangChain, RAG systems, and retrieval-augmented generation
Background working with law firms or understanding of U.S. personal injury legal language

What We Provide:
A large volume of real-world, finalized demand letters
A backend system is already built to handle data input
A defined workflow for generating, editing, and reviewing letters
Access to internal support for legal accuracy, formatting, and tone

Project Scope:
Immediate:
Help generate law firm-specific template
Assist in the system understanding how to extract structured data from existing documents and building a usable dataset

Long-term: Assist with improving accuracy, tone control, and automation reliability across different types of personal injury cases

Engagement Details:
Hourly or Milestone-Based depending on your preference
10–20 hours/week to start, with potential for ongoing collaboration

Fully remote, must be fluent in English (US legal context)

To Apply: Please include:
A short overview of your experience with NLP, legal/business document automation, and dataset creation.
Examples of similar projects (especially using GPT, Claude, or open-source LLMs).
Your thoughts on how you’d approach building a structured dataset from our demand letters.
Skills: AI Development, Deep Learning, Artificial Intelligence, Natural Language Processing, Data Science, Machine Learning
Budget: not specified
6 hours ago
  • Data Science & Analytics, AI & Machine Learning
Scraper to Extract Search Data from CSV Input
50 USD 6 hours ago
Client Rank - Good

Payment method verified
$3 316 total spent
4 hires
15 jobs posted
27% hire rate,
5.00 of 3 reviews
PL Poland
Good
I’m looking for an experienced scraper developer to create a simple, reliable web scraper/crawler for one of the popular email finder services.

✅ Key Requirements:
• The script should read search queries from a CSV file (e.g., LinkedIn URL or information for another column).
• It should then search (past query to search input) the email finder platform for each entry.
• Extract and parse the results (e.g., emails).
• Update the same row in the CSV file with the extracted data by adding additional columns
• This tool will run locally with no need for cloud deployment.

⚙️ Additional Notes:
• You can use Selenium or Playwright
• Preferred language: Java or Node.js
• Do not need to handle login/session because this script will run locally with the already authorised session.


If this MVP works well, we’ll extend it with support and potential integration into a larger data pipeline.
Skills: Selenium, Python, Node.js, JavaScript, Web Scraping, CSV, Data Extraction
Fixed budget: 50 USD
6 hours ago
  • Web, Mobile & Software Dev, Scripts & Utilities
YouTube Scraper
5 - 12 USD / hr
6 hours ago
Client Rank - Medium

Payment method verified
$6 total spent
1 hires
4 jobs posted
25% hire rate,
5.00 of 1 reviews
CA Canada
Medium
YouTube scraping, I need a big list of YT channels & contact info for the niche and keywords that I'll give you.
Skills: Data Scraping, YouTube Data API, Python, Data Mining, Data Extraction
Hourly rate: 5 - 12 USD
6 hours ago
  • Data Science & Analytics, Data Mining & Management
Meet | Teams | Zoom recording meets and transcribe
1,000 USD 4 hours ago
Client Rank - Excellent

Payment method verified
$34 301 total spent
68 hires
55 jobs posted
100% hire rate,
4.77 of 55 reviews
MX Mexico
Excellent
We’re looking for a skilled developer (or team) experienced in AI, NLP, and AWS to build a system similar to Fireflies.ai or Read.ai. The goal is to automatically analyze meetings, transcribe them, detect speakers, summarize content, and extract actionable insights — without requiring manual upload of recordings.

Core features required:
✅ Automatically fetch meeting recordings from platforms like Google Meet or Zoom (no manual upload)
✅ Automatic transcription (using AWS Transcribe, Whisper, or similar)
✅ Speaker diarization (identify and label who is speaking)
✅ AI-generated summaries and key takeaways
✅ Extract tasks, decisions, and insights mentioned in the conversation
✅ Use AWS services like S3, Lambda, Transcribe, and optionally DynamoDB
✅ Frontend not required for now — just a working API or JSON output is fine
✅ Option to use OpenAI/ChatGPT for advanced analysis

Preferred Stack & Technologies:
AWS Lambda / Python
Chatgpt / Whisper
Google cloud for some services.
Skills: API Integration, Automation, Data Extraction
Fixed budget: 1,000 USD
4 hours ago
  • Web, Mobile & Software Dev, Scripts & Utilities
Experienced Web Scraper Needed: Extract Company Info from Domains + Smart Name Parsing
not specified 4 hours ago
Client Rank - Risky

Payment method not verified
no reviews
US United States
Risky
Description:
We’re looking to hire a talented web scraper to help transform a list of raw domain names into actionable business data. This is part of a lead generation project targeting newly launched or small business websites.

Scope of Work:
You’ll receive a list of domain names such as:

bestconstruction.com

floridaplumbmasters.net

elitehvacpros.org

For each domain, we need you to:

Visit the website and extract:

Company Name — either from parsing the domain (e.g., "BestConstruction.com" ➝ "Best Construction") and/or from content on the site (like the title tag, H1, schema, or footer)

Phone Number — in a clean U.S. format

Email Address — visible on the site (mailto links, contact pages, etc.)

Website URL

Requirements:

Must be able to handle batch inputs (hundreds of thousands of domains)

Use or replicate a tool like the “Ninja Keyword Splitter” to cleanly separate domain words into real business names

Return clean data in a CSV format

Bonus If You Can Also:

Pull WHOIS data when contact info is missing from the site

Tag email types (e.g., info@ = generic, john@ = direct)

Pull social links (Facebook, LinkedIn, Instagram) if listed on the website

Detect business category if possible (via keywords or meta data)

Preferred Stack:

Python (Scrapy, BeautifulSoup, Requests, Selenium, Regex)

Experience working with contact info scraping, text parsing, and data cleaning

Familiarity with WHOIS APIs is a plus

Deliverables:

A working scraper/script that takes a list of domains and outputs the needed fields

A test batch of 1,000 processed domains

Documentation or video walkthrough of how to run/update the tool

Optional: Deployment guidance if we decide to run this on a server

To Apply, Please Include:

How you’d approach the company name parsing from domains

Relevant scraping projects or tools you’ve built

Estimated turnaround time for a batch of 1,000 domains

Whether you’re available for ongoing scraping work (we have multiple verticals)

We’re looking to get started immediately. Top candidates will be invited to a quick call to clarify details and timeline.
Skills: Data Scraping, Data Mining, Python, Data Extraction, Lead Generation
Budget: not specified
4 hours ago
  • Data Science & Analytics, Data Extraction/ETL
Experienced Data Engineer: Build API to BigQuery Pipeline (GCP, Python) - Segment 1
not specified 4 hours ago
Client Rank - Medium

Payment method verified
2 jobs posted
no reviews
US United States
Medium
Featured
Only freelancers located in the U.S. may apply.
Project Overview:

We are looking for a skilled Data Engineer to develop the first phase (Segment 1) of a data pipeline. This involves extracting data from a third-party cloud application's v1 REST API (used in the healthcare industry) and loading it into Google BigQuery for future analytics and reporting.

Crucially, this project involves handling sensitive Protected Health Information (PHI). Adherence to strict security protocols is paramount, and signing a HIPAA Business Associate Agreement (BAA) is a non-negotiable requirement before project commencement.

We will provide detailed API documentation (OpenAPI YAML spec for ~230+ endpoints) and access to a sandbox environment for development. This contract is specifically for Segment 1. Successful completion may lead to engagement for Segment 2 (more advanced data work) under a separate agreement.

Responsibilities (Segment 1):

-API Integration & Authentication:
--Develop secure Python code for OAuth2 Client Credentials authentication (including token refresh).
--Extract data from all necessary v1 API endpoints as defined in the documentation.
--Implement robust handling for API parameters (filter, responseFields) and pagination (lastId mechanism) to ensure complete data retrieval.
--Manage API technical rate limits gracefully (delays, backoff); be mindful of contractual volume limits (Client accepts potential overage fees).

-Sandbox & Live Access:
--Conduct all initial development and testing in the sandbox.
--Support the process of gaining vendor approval for live API access based on successful sandbox work.

-BigQuery Loading & Data Segregation:
--Design appropriate BigQuery table schemas for the extracted API data.
--Output 1 (Primary Load): Set up a primary BigQuery dataset and load the extracted data into corresponding tables.
--Output 2 (Analytics Subset): Create a second, separate BigQuery dataset containing read-only views based on a subset of tables from the primary dataset (specific tables TBD by Client).
--Output 3 (Anonymized Subset): Create a third, separate BigQuery dataset containing read-only views based on the analytics subset views. These views must be anonymized by removing specific PHI fields (e.g., names, DoB, contact info, addresses) while retaining necessary identifiers (e.g., patient ID, chart number) for analysis.

-Automation:
--Automate the extraction and primary BigQuery loading process to run reliably nightly using GCP tools (e.g., Cloud Functions, Cloud Scheduler).

-Access Control Design:
--Design and document a GCP IAM strategy ensuring read-only access can be granted exclusively to the anonymized dataset (Output 3), preventing access to the datasets containing raw PHI.

-Documentation & Code Quality:
--Deliver clean, well-commented, maintainable Python code.
--Provide clear documentation (setup, configuration, schemas, IAM design).



Required Skills & Experience:
-Proven experience integrating with complex REST APIs (OAuth2, pagination, rate limits).
-Strong Python skills for data extraction/processing.
-Solid experience with Google Cloud Platform (GCP):
--BigQuery: Schema design, SQL (views), data loading.
--Cloud Functions & Cloud Scheduler (or similar GCP automation tools).
--IAM: Understanding roles/permissions for data security.
-Experience building ETL/ELT pipelines.
-Data warehousing and modeling concepts.
-Excellent communication and ability to work independently.

Essential: Experience handling sensitive data (e.g., PHI) and understanding data privacy/security best practices.

Important Notes:
HIPAA BAA Required: You must sign a HIPAA Business Associate Agreement. Please confirm your understanding and acceptance in your proposal.

Phased Project: This posting is for Segment 1 only.

To Apply:
Please submit your proposal detailing:
-Your relevant experience (API integration, Python, GCP, BigQuery, automation, sensitive data).
-Confirmation you understand and agree to sign a HIPAA BAA.
-Your proposed approach for Segment 1.
-Your estimated timeline for Segment 1.
-Your rate or fixed price bid for Segment 1.

We look forward to your application!
Skills: BigQuery, Data Analysis, Google Sheets, Looker Studio, SQL, REST API, RESTful API, ETL Pipeline, Data Warehousing & ETL Software, Python, Google Sheets Automation, Data Modeling, Automation
Budget: not specified
4 hours ago
  • Data Science & Analytics, Data Analysis & Testing
Email Scraping Expert Needed for Data Collection
100 USD 2 hours ago
Client Rank - Risky

Payment method verified
$898 total spent
5 hires
14 jobs posted
36% hire rate,
3.08 of 3 reviews
MX Mexico
Risky
We are seeking a skilled email scraping expert to assist us in gathering targeted email lists for our marketing campaigns. The ideal candidate should have experience of expertise with various scraping tools and techniques to extract data from websites effectively. You will be responsible for identifying relevant sources, ensuring data accuracy, and delivering results within deadlines. If you are detail-oriented and have a strong background in data collection, and also expertise in email campaigns we would love to hear from you! IF YOU CAN WORK ON SEO AND WORDPRESS TOO that is a huge bonus..
Skills: Data Scraping, Data Mining, Data Entry, Microsoft Excel, Lead Generation, Data Extraction
Fixed budget: 100 USD
2 hours ago
  • Data Science & Analytics, Data Extraction/ETL
Web Scraper Needed for Hotel Listings in Singapore
15 USD 56 minutes ago
Client Rank - Excellent

Payment method verified
$1 793 total spent
36 hires
70 jobs posted
51% hire rate,
4.72 of 13 reviews
AU Australia
Excellent
We are seeking a skilled web scraper to compile a comprehensive list of all hotels in Singapore. The successful candidate will have experience in data extraction and will utilize efficient scraping techniques to ensure accurate and timely results. This initial project will set the stage for future scraping tasks in other cities. If you have a strong background in web scraping and data management, we would love to hear from you!
Skills: Data Scraping, Data Entry, Data Mining, Microsoft Excel
Fixed budget: 15 USD
56 minutes ago
  • Admin Support, Data Entry & Transcription Services
Web Researcher & Data
30 USD 28 minutes ago
Client Rank - Medium

Payment method verified
2 jobs posted
no reviews
US United States
Medium
I am seeking a professional researcher to identify and collect email contacts for corporate representatives who may be interested in balloon decoration services for their special events. The focus will be on companies located in the United States, with an emphasis on those based in California. This task will primarily involve using LinkedIn and other relevant platforms to source accurate contact information.

If you are available and qualified for this project, please submit a proposal for consideration. I look forward to reviewing your application.
Skills: Data Entry, PDF Conversion, Company Research, Virtual Assistance, Lead Generation, B2B Lead Generation, Sales Lead Lists, Data Extraction, Transaction Data Entry, Prospect List, Data Mining, LinkedIn Sales Navigator, CRM Software, Salesforce, LinkedIn
Fixed budget: 30 USD
28 minutes ago
  • Data Science & Analytics, Data Mining & Management
Reddit data scraper for story app
3 - 10 USD / hr
22 minutes ago
Client Rank - Good

Payment method verified
$1 988 total spent
7 hires
22 jobs posted
32% hire rate,
5.00 of 4 reviews
US United States
Good
We are a consumer app that wants to understand the top posts on Reddit.

We want to get the top posts on Reddit for a list of subreddits, and extract the titles and post body. We want to use this information to summarize and extract patterns from the top posts.

You should be able to quickly extract information from subreddits that we specify, and organize them into a spreadsheet.
Skills: Data Scraping, Data Extraction, Data Mining, Scripting, Scrapy, Growth Hacking
Hourly rate: 3 - 10 USD
22 minutes ago
  • Data Science & Analytics, Data Extraction/ETL
Build Google Sheets CRM for Land Wholesaling (Comps, Offers, Scoring)
not specified 6 minutes ago
Client Rank - Risky

Payment method not verified
no reviews
US United States
Risky
I’m looking for a Google Sheets (or Excel) expert to help build a custom CRM system for my land wholesaling business.

This will be a modular spreadsheet system to evaluate vacant land deals, track leads, and generate smart offers based on comps pulled from Zillow and Redfin.

I’ve already mapped out the full system and scoring logic — I just need a talented builder to put it together in a clean, fast, user-friendly format.
Skills: Data Entry, Microsoft Excel, Computer Skills, Copy & Paste, Underwriting, Real Estate Appraisal, Data Extraction, Real Estate, Data Scraping, Broker's Price Opinion, Real Estate Acquisition, Real Estate Financial Model, Real Estate Investment Assistance, Real Estate Virtual Assistance, Real Estate Lead Generation
Budget: not specified
6 minutes ago
  • Admin Support, Virtual Assistance
Call to action
Freelancing is a business
Make it more profitable with Vollna

Streamline your Upwork workflow and boost your earnings with our smart job search and filtering tools. Find better clients and land more contracts.