Académique Documents
Professionnel Documents
Culture Documents
SIGN UP
TEAM UPDATES
1. Freebase
Freebase is an open platform for data sharing. It contains a wide range of topics from ctional
characters to Modest Mouse. You can even curate your data with data plotting feature. You can
plot your datasets in timeline or map.
2. UN Data
This database contains large datasets, consisting virtually all the public data collected by the
United Nation. To access the API you have to sign up (it will only take a couple of minutes).
3. WorldBank
Where else to look for nancial data of the world but the WorldBank? You can get virtually any
countrys financial and economy standings here. Some other topics included are:
4. Data.gov
Data.gov is leading the way in democratizing public sector data and driving innovation. This
movement has spread throughout cities, states, and countries. 5 of 50+ categories:
Agriculture
Arts, Recreation, and Travel
Banking, Finance, and Insurance
Births, Deaths, Marriages, and Divorces
Business
5. Infochimps
Infochimps contains paid and free datasets just about anything. Whats cool about Infochimps is
that you can download datasets into csv format. Wats more is that you can ddle with the API
to extract the data speci c to your needs. Try Twitter as your search metric and you will see
what I mean.
7. Google Scholar
The Google Scholar is a free search engine that contains all kinds of academic literatures. Citing
journal publishers, universities research papers, and other scholarly materials do not just make
your content looks smarter, but as well as more trustworthy.
8. Data Market
Data Market contains in-house and third party datasets. Its a good place to explore data related
to economics, healthcare, food and agriculture, and the automotive industry.
And heres a random collection of datasets.
Torrent downloads and uploads on Pirate Bay
Social media & networks from Stanford Uni
Human Emotions by We Feel Fine: to allow other artists to more easily make pieces that
explore these human emotions
LittleSis profiles whos who in the biggest organisations in the world
NY Times bestseller
Trending Topics : Trending Topics serves Hot Wikipedia Topics daily. It gets you the top hits
on Wikipedia by search query.
Google Flu Trends
NY Times People: User data for com, including the user pro les, activities, news feeds, and
networks.
CrunchBase: Plenty of information about startups and large tech companies
Google Analytics
Social networks: Facebook/ Twitter/ Pinterest/ LinkedIn
Project management tools: Basecamp
Sales management tools: Salesforce
Survey tools: SurveyMonkey
Photo sharing tools: Flickr
Email marketing: MailChimp
You can also get some crazy amount of datasets and related stuff from Datamob.
DataWrangling is a place with a large volume of datasets from a wide range of elds. To make it
easier for you, we have scraped the list for you below. However, do note that list may not be up
to date as it was last updated in 2009. Be it so, its still a good place to start digging for data.
Tips on using this list: Each link comes with tags. You can do a search using keyword to nd the
appropriate database for use.
Happy data digging, people!
Announcing the Article Search API Open Blog NYTimes.com (tags: article, api, nytimes,
text, corpus, newspaper)
Twitter API Wiki / REST API Documentation: Social Graph Methods (tags: graph, network,
api, social, twitter)
Information Extraction: The RISE Repository of Information Sources (tags: information,
textmining, extraction, reviews, jobs)
Using the Wikipedia link dataset Henry Haselgrove (tags: graph, network, link, wikipedia,
pagerank)
Visualizing the Growth of Target, 1962-2008 | FlowingData (tags: visualization, retail,
finance, gis, map, location, store, via:magnetbox, target)
The Economy According To Mint (tags: finance, commercial, consumer, mint, spending)
Repositories (tags: links, textmining, books, rdf, ocr, documents)
Subsidyscope.com (tags: government, banking, csv, tarp, bailout)
Best Buy Remix Welcome to the Best Buy Remix Developer Network (tags: retail, data, api,
product, bestbuy)
twibs : find the businesses on twitter (tags: directory, businesses, twitter, companies)
True Marble Imagery Free Download (tags: gis, geo, map, mapping, images, satellite)
Massive Scrape of Twitters Friend Graph blog.infochimps.org Organizing Huge
Information Sources (tags: textmining, twitter, network, socialnetwork, pagerank, graph,
queryminer)
Twitter Scrape (rough draft) get.theinfo | Google Groups (tags: twitter, socialnetwork,
graph)
API Documentation BackType (tags: api, blog, comments, textmining, stream, trends,
backtype, queryminer)
ltering,
opendata)
UC Berkeley. Sheldon Margen Public Health Library. Statistical/Data Resources (tags: health,
links, resources, publichealth, berkeley)
ICWSM 2009 International AAAI Conference on Weblogs and Social Media (tags: blog,
crawl, corpus, network, web, link)
BART For Developers (tags: urban, transportation, feeds, public, sanfrancisco, bart, api)
Tim Davis: UF Sparse Matrix Collection : sparse matrices from a wide range of applications
(tags: spare, matrix)
Others Online Behavioral Targeting, Analytics and Advertising Service for Publishers, Ad
Networks, Widgets, WiFi Networks (tags: analytics, audience, segmentation, toolbar,
commercial, sem, search, advertising)
HumanScan : BioID : Downloads : BioID Face Database (tags: face, detection, image)
Face Detection (tags: facerecognition, opencv, face, links)
Building a (fast) Wikipedia of ine reader (tags: django, wikipedia, compressed, textmining,
howto)
gov: The Obama-Biden Transition Team | Join the Discussion: Healthcare (tags: textmining,
opinion, comment, topic, government, queryminer)
UN General Assembly Voting Data (tags: un, voting, statistics, government)
NORB Object Recognition Dataset, Fu Jie Huang, Yann LeCun, New York University (tags:
image, 3d)
Reddits Secret API (tags: reddit, api, json)
Amazon Web Services Public Datasets Data Wrangling Blog (tags: amazon, ebs, ec2, s3,
publicdata, hadoop)
Amazon Web Services (AWS) Hosted Public Datasets (tags: amazon, ebs, publicdata)
Executive PayWatch Database (tags: ceo, compensation, pay, economics, business, labor)
Research Datasets :: CID Data :: Center for International Development at Harvard University
(CID) (tags: economics, international, development)
NACDA: Search Holdings (tags: aging, statistics, studies)
LIFE photo archive hosted by Google (tags: images, photo, pictures, search)
Main Task QA Data (tags: question, answering, trec, nlp, machinelearning)
ADL Gazetteer Development (tags: named_entity, location, placenames, geo, nlp)
The New York Times Annotated Corpus YooName named entity recognition (tags:
named_entity, nytimes, corpus, people, organizations, locations)
downloading ossmole Google Code How to get FLOSSmole data for your own use
(tags: opensource, project, activity, mysql, dump)
Google Flu Trends | How does this work? (tags: google, health, trends, search, prediction,
epidemiology, biodefence, queries, queryminer)
Multi-Domain Sentiment Dataset (tags: sentiment, review, product, amazon)
Chris Pounds Name Generation Page (tags: bizzare, sci , phrase, name, word, generators,
random, perl)
TradingSolutions Data Sources (tags: trading, finance, s, api, list)
Announcing the New York Times Campaign Finance API Open Code New York Times
Blog (tags: nyt, api, campaign, donations, fec)
Beautiful Data WikiContent (tags: book, data, wiki, via:jhammerb)
public domain sounds | free sound library (tags: sound, publicdomain, audio)
Net ix API Welcome to the Net ix Developer Network (tags: net ix, api, movie, mashup,
netflixprize, ratings)
Data Catalog (tags: dc, government, feeds, transparency, opendata
Open beats Closed: Best Buy’s new APIs OReilly Radar (tags: retail, bestbuy, api)
Voter registration data; or, HERE IS YOUR HOPE, YOU FOOLS! The Edge of the American
West (tags: voter, registration, politics, 2008)
Tickermine (tags: custom, research, retail, finance, market, service, analyst)
Linked Movie Data Base (tags: rdf, movies, movie, api)
Big Huge Thesaurus API: Access 145,000 Words and Phrases (tags: webservice, api,
thesaurus, textmining, nlp, rest)
import/parse/fec.py at master from aaronsws watchdog GitHub (tags: fec, python, parser,
government, campaign)
The Watchdog Project: volunteer (tags: government, transparency, parsing, election, python)
Dataset of the day: Where are the Obamacans? | Off the Map Of cial Blog of FortiusOne
(tags: obama, goverment, mashup, gis, geo, map, campaign, donations)
Activity Recognition: Datasets, Bibliography and others (tags: activity, recognition, intent)
Normalized Campaign Contribution Data (tags: cmu, politics, campaign, donations, fec,
via:jhammerb, government)
YouTube Dataset (tags: youtube, research, crawl, socialnetwork, network, graph, web)
CRAWDAD (tags: wireless, RF, radio, signal, dartmouth, network)
API Documentation Twitter Development Talk | Google Groups (tags: twitter, text, api)
Web FAQ collection | ILPS (tags: faq, question_answering, questions, web, crawl, corpus, xml,
textmining)
Yahoo! Music API YDN (tags: api, yahoo, music, artists)
Search Query Performance report Google AdWords Help Center (tags: adwords, ppc,
search, metrics, webanalytics, sem, query, queryminer)
Wordze Keyword Research Tool (tags: queryminer, keyword, tool, research, commercial,
search, adwords)
Frontal Face Databases (tags: facerecognition, face, image, recognition)
Searchable Catalogs of Data (tags: links, catalogs, social)
Download Database baseball1.com (tags: baseball, database, publicdata, statistics, sports)
radiohead Google Code (tags: lidar, visualization, radiohead, google, video)
80 Million Tiny Images (tags: images, words, english, search, visualization, imagemap)
Time Series Center | Harvard University (tags: timeseries, anomaly, detection, astronomical,
physics)
OpenVisuals Open Source Visualization Framework (tags: visualization, community, design,
processing)
BGN: Domestic Names State and Topical Gazetteer Download Files (tags: gis, usgs)
NGA: Country Files (tags: country, cities, geo)
Datasets (tags: benchmark, clustering, regression, machinelearning, list, statistics,
mathematics)
Isomap Datasets (tags: nonlinear, dimensionality, reduction, faces, digits, images, manifold)
Yahoo! Search Blog: BOSS The Next Step in our Open Search Ecosystem (tags: api, open,
search, yahoo, BOSS, queryminer)
Download the Database IP Address Lookup Community Geotarget IP Project (tags:
geocoding, geoip, internet, ip, ipaddress, mysql)
Airline Data Project (tags: airline, statistics, finance, revenue, location, travel)
predictionmarket)
Reuters Spotlight Article and Media API (tags: news, text, articles, api, content, media, xml,
images, publicdata)
DataSets Scikits Trac (tags: scipy, python, machinelearning, statistics, resource)
[Wikitech-l] page counters (tags: wikipedia, pageviews, trends, textmining, seo, topic)
Wikipedia article traf c statistics (tags: via:chl, wikipedia, web, analytics, seo, topic,
textmining, traffic)
Yahoo! Internet Location Platform YDN (tags: yahoo, geo, geocoding, location, landmarks,
gis)
How to find images on the internet Random knowledge (tags: images, links, lists, archive)
Yahoo offers geographic data to Web sites | Tech news blog CNET News.com (tags: gis,
webservice, yahoo, api, location, landmark)
Instructions for Obtaining Search Engine Transaction Logs (tags: query, search, log, excite,
altavista, alltheweb, transaction)
TechTC Technion Repository of Text Categorization Datasets
(tags: datamining,
Juiced Google Analytics Python API: Juice Analytics (tags: search, statistics, keywords,
analytics, api, python, web, seo, google, google_analytics, juice)
Country Name and ISO 3166 Code MySQL Import File (tags: mysql, states, countries,
isocode)
Semantic Search the US Library of Congress (tags: via:inkdroid, libraries, mashup, rdf,
semantic, search, semanticweb, books, api, webservice)
geocoded Hotels GeoNames Blog (tags: hotels, geonames)
GeoNames webservice and data download (tags: locations, cities, countries, gis)
Index of /download/worldcities (tags: cities, gis)
ualberta dependency based thesaurus and word count data (tags: corpus, text, similarity,
terms)
CommonCrawl About (tags: web, crawler, bot)
Datasets and corpus / corpora for biological literature and text mining , information
extraction and information retrival and document classi cation (tags: bioinformatics, text,
corpora, domainspecific, genomics, corpus)
Of ce of Defects Investigation (ODI), Flat File Downloads (tags: defect, recall, automobile,
fightclub, nhtsa, saefty)
p2psim kingdata : DNS server latency network distance matrices (tags: distance, matrix,
network, p2p, dns, latency, nmf, queryminer)
Sep Kamvar / Personalization / (tags: pagerank, web, matrix, matlab)
opentick.com (tags: opentick, trading, beta, feeds, finance)
WikiXMLDB: Querying Wikipedia with XQuery (tags: wikipedia, xml, ec2)
kiwitobes.com Blog Archive Walmart Growth Video (tags: walmart, visualization, video,
freebase, store, retail, locations, opening)
Open Cell Id dataset phone geolocation from GSM cellids (tags: gis, mobile, geolocation)
The Cornell Web Lab The Cornell Web Lab (tags: cornell, web, archive, hadoop, crawl)
im2gps: estimating geographic information from a single image (tags: imagerecognition,
via:csantos, gis, cmu, gps, imageprocessing, paper, hack, freaking_awesome)
Datasets: MUSCLE WP2 Evaluation, Integration and Standards (tags: image, video, audio,
currency, sports, imagerecognition)
Open Economics Store Index (tags: economics, list)
welcome @ omdb (tags: free, movie, database, netflixprize)
Cogblog Blog Archive Cogmap APIs (tags: api, cogmap, person, name, organization,
record_linkage)
Wal-Mart : Freebase The Worlds Database (tags: retail, locations, stores)
Cogmap: The Org Chart Wiki (tags: record_linkage, identity, name, organization, orgchart,
marketing)
German English Parallel Corpus de-news, Daily News 1996-2000
(tags: german,
2007 IEEE AVSS Detection and Tracking Algorithm Datasets (tags: tracking, video, detection,
image, recognition, vehicle, pedestrian)
Eigenvector Research, Inc. : Datasets Available to Download (tags: NIR, spectra, chemistry,
semiconductor, pharmaceutical, matlab)
OTCBVS (tags: image, recognition, detection, pedestrian, thermal, tracking, facerecognition,
illumination)
99 Wikipedia Sources Aiding the Semantic Web AI3:::Adaptive Information (tags: links,
directory, record_linkage, extraction, wikipeida, named_entity, recognition, textmining,
semanticweb, paper)
UNdata (tags: UN, publicdata, government, statistics)
AudioScrobbler Data (tags: audioscrobbler, recommendation, collaborative, filtering, music)
The Linking Open Data dataset cloud (tags: directory, rdf, semantic, data, soup, graph)
Free Economic Data | Economic, Financial, and Demographic Data (tags: nance, economics,
portal, links)
::MLSP 2008::: MLSP competition (tags: machinelearning, trading, competition, backtest,
matlab, code, finance, via:DeliciousRob)
Computer Vision Test Images (tags: computer, vision, image, ray, trace, ngerprint, stereo,
detection, via:chl)
The Dataverse Network Project | The Dataverse Network Project (tags: statistics, repository,
harvard)
DVN Home (tags: harvard, repository, social, science, research, portal, links)
Ohio voter registration data (tags: voter, voting, politics, government, name, address,
registration)
Voter List Data Files Election Department, Clark County, Nevada (tags: voting, voter,
registration, name, address, data, election, politics, government, nevada)
Temperature data (HadCRUT3 and CRUTEM3) (tags: climate, temperature, netcdf)
MNIST handwritten digit database, Yann LeCun and Corinna Cortes (tags: handwriting,
mnist, image, recognition)
LFW : Labelled Faces in the Wild (tags: facerecognition, face, recognition, umass, image)
Making random contacts (37signals) (tags: generator, names)
Test (Sample) Data Generators (tags: generator, tools, list, via:jd)
Compete Compete Developer Resources (tags: compete, api, web, statistics, traf c,
analytics, mashup)
Machine Learning (Theory) The Peekaboom Dataset (tags: peekaboom, vision, image, large,
human, computation, machinelearning, recognition)
Ocean Processes and Modeling: Ocean Data (tags: links, oceanography, satellite)
BlogoCenter datasets (tags: blog, ucla)
Tagged datasets for named entity recognition tasks (tags: nlp, corpus, tagged, named_entity,
recognition, list)
icio.us stats deli.ckoma (tags: del.icio.us)
The Financial Data Finder A G (tags: finance, links)
Freebase Wikipedia Extraction (WEX) (tags: wikipedia, xml, structured, corpus)
The arXiv.org API (tags: arxiv, api, open, paper, academic)
England Football Results Betting Odds | Premiership Results & Betting Odds (tags: gambling,
soccer, football, excel, statistics)
HughesData Main Hughes Lab (tags: rna, bioinformatics, microarray, expression, gene,
machinelearning)
Stanford MicroArray Database (tags: bioinformatics, microarray, expression, gene,
machinelearning, stanford)
ArrayExpress Home (tags: bioinformatics, microarray, expression, gene, machinelearning)
Gene Expression Omnibus (GEO) Main page (tags: bioinformatics, microarray, expression,
gene, machinelearning)
Index of /courts.gov (tags: corpus, text, legal, law, court, ruling, opensource, publicdata)
Welcome to Openvest (tags: python,
via:jolby)
Statistical Science Web: Datasets (tags: links, statistics)
Data Mining: Text Mining, Visualization and Social Media: TailRank, Spinn3r, TechMeme and
TechCrunch: New Attention (tags: crawler, blog, corpus)
Aleix Face Database (tags: facerecognition, machinelearning, face, image)
Data Repository Evaluation (tags: umd, links, statistics, government, sports, via:rickladd)
PMC FTP Service (tags: biology, medicine, articles, text, journal, authors)
uspop2002 data set (tags: music, similarity, machinelearning)
Internet Archive: Details: Amazon ASIN listing and similarity graph (tags: ASIN, amazon,
recommendation, collaborative, filtering, via:keyvowel)
European Climate Assessment Daily Weather Data (tags: weather, europe, ascii, netcdf)
CSE 250B Project 4, Fall 2006 (tags: subset, netflixprize, dimensionality, reduction)
G3DATA (tags: extract, from, graphs, hack, google, trends)
cwm a general purpose data processor for the semantic web (tags: python, processor,
semantic, web, rdf)
WebBase Project (tags: link, analysis, sturcture, web, crawler, stanford)
sam roweis : data (tags: machine, learning, matlab, python, hackers, image)
Index of /data/sequence/mnist (tags: mnist, xml, format)
MNIST handwritten digit database (tags: mnist)
Book-Crossing Dataset (tags: data, set, collaborative, filtering, datamining, books, movie)
allmovie (tags: movie, netflixprize, source)
Submissions Guidelines for the Collectorz.com Online Movie Database (tags: movie, source)
Cinema.com (tags: plot, synopsis, movie, netflixprize, prize)
LUMIERE (tags: netflixprize, prize, european, movie, revenue)
Data dumps Meta (tags: mediawiki, wikipedia, import, mysql, sql)
phone *** address * e-mail intitle:curriculum vitae Google Search (tags: resume,
google)
PREVIOUS POST
NEXT POST
Ai Ching
Ching is the Chief Email Of cer and dedicates her time to nd growth hacking ninja ways. Former P&G and
Experimental Psychologist, Chings addiction includes supporting new projects on Kickstarter and travelling.
ABOUT
Our Team
Careers
RESOURCES
Piktochart Resources
SUPPORT
Contact Us
CALL US
+1 302 703 7458
+6 128 011 4745
+44 127 479 2745
WRITE TO US
pikto.delight
info@piktochart.com
SOCIAL MEDIA
Facebook
Twitter
Google +
Pinterest