Realizing Dream Jobs: A Search Engine For Discovering The Best Skills

Employment Innovation Fellowships sponsored by the Ford Foundation
Applicants
Lance Legel
Alumnus (Spring 2014)
lancelegel@gmail.com
954-740-0845
Megan Majd
Resident
meganmajd@gmail.com
310-706-5221
Subject
We propose to build a web platform to connect people to the most useful skills. This platform will be
based on a search engine, which enables discovery of what skills are most demanded across jobs. Our
minimum viable product could be significantly helpful for youth who lack intuition about what skills
they should learn for the jobs they may desire pursuing. This could be a first step toward an integrated
platform for discovering skills, mastering skills, and connecting skills to employers.
Angle
Globalization and information technology e.g. the World Wide Web have disrupted virtually all
professions, and will increasingly do so in coming decades. In a recent review of labor economics1, a
report to the National Bureau of Economic Research2 concludes: with the rise of artificial intelligence
and robotics that can do many jobs of knowledge and physical workers, imbalance between supply and
demand of skills is likely to increase potentially to a breaking point for the whole economy.
In the confusing disruption, young people especially often end up empty-handed, at various stages of
the discovery-to-education-to-employment pipeline:
1. Discovering skills most useful
2. Mastering skills in theory and practice
3. Connecting skills to employers
Fortunately, new solutions and initiatives are closing leaks in the pipeline. For example, LinkedIn is
now seeking to offer a systemic and scalable solution across the second and third parts of the pipeline.
Historically focused on connecting skills to employers, it recently acquired Lynda.com for $1.5 billion,
to help its 350 million users master skills they need to get jobs they seek. Meanwhile, initiatives by
broad coalitions (e.g. RTI, World Bank) are essential to long-term equitability, and so we value these
efforts no less. We propose to integrate with all of this by starting with a focus on the first problem in
the pipeline: what to learn?
Professionals have some intuition about what skills are most useful and most demanded in their chosen
line of work. But youth often lack the exposure to the market in the first place to identify whats best to
learn. So young people may especially benefit from a platform that helps them quickly extract intuition
about what skills are most needed for jobs they want. Meanwhile, even experienced professionals,
perhaps in transition from one job to the next, or perhaps seeking to refine their craft, would be able to
use our proposed platform to acquire a statistical intelligence about what they may want to learn next.

1
2
Computerization, atomization, crowdsourcing and the new economics of employment (Feb. 2015)
Robots Are Us: Some Economics of Human Replacement (Feb. 2015)
Why do we not already have such a platform? Surely, historically, it has been impossible: all the data
was disparate in siloes of newspaper classifieds, and so on, while statistical algorithms for effective
data science (e.g. natural language processing) were crude and inaccessible. Only recently has all of
this become readily and widely available across the World Wide Web. Open source libraries of the
appropriate algorithms, and a virtually endless supply of cheap cloud computing resources for
processing all of the data as needed, have mostly emerged only in the past five years or less.
Structure
We aim to develop an online web platform for discovering important skills. To do this we will scan
thousands of job postings on the World Wide Web, extract the skills that are identified in these
postings, organize this information in databases, and then make this easily accessible through a user
interface similar to Google Search.
The online graphical user interface for the minimum viable product that we propose could be very
simple. Imagine that upon visiting a webpage www.[x].com the user is shown a minimalist set of
options, with no distractions, advertisements, or anything like that. There is only a search bar that a
user can type text into and press enter, just like at Google.com. There may be a simple question to
guide the search, such as Whats your dream job? or What do you want to learn today?. Notice
that these are two different searches one about jobs, another about skills. Just like Googles
AutoComplete that happens while you type, we will build the job skills search engine to immediately
and automatically match what is the most likely job or skill the user has searched for, by referencing
two indexed dictionaries of each, which we will have stored in databases. If we identify that the user
has searched for a job, we will return as a result a statistical graphical visualization (e.g. see Figure 1)
on the skills that are most demanded for the job. If we identify that the user has searched for a skill, we
will return in the same fashion a visualization of the top job professions that the skill is in demand for.
Both visualizations will be similar, and work on top of the same data, but they just consider the data
from different vantage points.
As next steps beyond the minimum viable product if we are successful, initially we will aim to help
direct users to learning resources (e.g. rated courses like CourseRank.com) and actual job listings that
weve used to derive our data (e.g. Indeed.com). We do not focus on those steps in this application
despite their awesome potential because we recognize how challenging and significant it would be to
execute just the first task. Prior to developing these components for helping to close the discovery-toeducation-to-employment gaps, and only if we meet our metrics for success of the minimum viable
product identified in the Takeaway section, we would actively solicit feedback for the total user
experience to pursue, through a formal user-centered design process.
There is open source algorithmic precedent for our prototype idea. Recently, a curious data scientist
published a tutorial with code for the following3:
Scanning websites that list jobs

Extracting from the website text the skills identified as needed for those jobs
Computing statistics across all jobs for each skill
See Figure 1 for a visualization of some of the results. Note that in Figure 1, data is derived across the
United States, while in that project the developer also collected and visualized data that was local to
cities, e.g. New York City. We would be able to do the same, especially if that proved to be easier and
smarter for initial launching of the minimum viable product. Ultimately, we would aim to design our
data scanning and organizations to provide an easy way for users to filter the geography of their job
skill searches.
Web Scraping Indeed.com for Key Data Science Job Skills (Mar. 2014)
Figure 1 Visualizing the current demand for skills across data science jobs in the United States
(Jesse Steinweg-Woods)
Timeline
Here are key milestones of the project, with loosely estimated target deadlines:
May 15 All stakeholders have met and agreed on a vision

May 30 Computer architecture designed for scanning World Wide Web
June 15 Sources of job listings have been mapped
June 30 Dictionary of skill names has been defined
July 15 Parsing of job listings with extraction of skill names: skills in jobs indexed
July 30 Prototypes of visual analytics on interesting subset of jobs
August 15 Refining, debugging, organizing databases for production-ready website service
August 30 Prototype of real-time data retrieval via text query in a search bar at www.[x].com
September 15 Technical design of user interface architecture
September 30 Prototype of partially functional user interface with real data
October 15 Finishing functionality to user interface
October 30 Fixing general problems and polishing overall quality of user experience
November 15 and beyond Launch and marketing of minimum viable product
Takeaway
Our initial minimum viable product will assist people to understand the job market better, so that they
can better educate themselves for it. The core goal is to provide a free search engine through which
anyone can identify the skills most demanded for jobs of most interest.
If the primary goal is achieved, then further opportunities open up further down the education-toemployment pipeline. For example, once skills are mapped to jobs being searched, then we would be
capable of introducing a useful mapping of the best massive open online courses (MOOCs) to each
skill. Then the user can, over long periods of time, enjoy a virtuous cycle of discovering skills, learning
skills, and searching for jobs... So beyond showing courses for learning newly discovered important
skills, we would hope to be able to eventually pursue providing cohesive support for actually getting
jobs for the professions that users search for, as Indeed.com does. This could be possibly integrated
with existing leaders, e.g. LinkedIn.
We define multiple key metrics of success, with target benchmarks:
Key Performance Indicator Minimum Goal
Skills Mapped
10,000
Jobs Mapped
100,000
Users by Summer 2016
10,000
Courses Mapped
1 / Skill
Ultimately, we hope our platform will lead as many people as possible to realize their dream jobs.
Budget
The funds will be primarily, if not completely, dedicated to cloud computing through Amazon Web
Services. Because L.L. and his friendly colleagues are expert-level programmers on the tasks at hand,
there will be zero cost in terms of actual coding. The costs boil down to matters of scaling up number
of computers needed to process terabytes of data across the World Wide Web: its likely that we will
scan many thousands of web pages, save databases with indexed information on millions of words and
phrases, and train models to automatically learn useful representations of this information, all of which
will likely take hundreds of computer hours on dozens of high-performance computer architectures
working in parallel. L.L. has implemented similar cloud computing projects on about 1/10th of this
scale with a budget on the order of $500. The algorithms are mostly the same as what he has worked
with in the past, and the costs should scale almost linearly: he estimates being able to achieve the web
scanning and database organizing components of the project with under $5000. Remaining funds
should be sufficient for maintaining web servers for at least half of a year, during which users around
the world could freely search and learn. We allocate up to $2,000 for marketing via targeted methods
online such as Google AdWords and Facebook Ads, which enable us to target users typing in things
like find a job. We would only pay for those users who actually click on our advertisement (typically
on the order of $0.50 per new user). Ideally, our platform would be desirable to share organically via
social media, especially among leaders tackling this problem, and among those most in need of its
solution.
Item
Estimation
Mapping and processing job skill data across World Wide Web
$5,000
Servicing our pre-processed data for our users at www.[x].com
$3,000
Marketing and advertising of www.[x].com online
$2,000

Realizing Dream Jobs: A Search Engine For Discovering The Best Skills

Transféré par

Informations du document

Description originale:

Titre original

Copyright

Formats disponibles

Partager ce document

Partager ou intégrer le document

Options de partage

Avez-vous trouvé ce document utile ?

Ce contenu est-il inapproprié ?

Droits d'auteur :

Formats disponibles

Realizing Dream Jobs: A Search Engine For Discovering The Best Skills

Transféré par

Droits d'auteur :

Formats disponibles

Employment Innovation Fellowships sponsored by the Ford Foundation

Scanning websites that list jobs

May 15 All stakeholders have met and agreed on a vision

Users by Summer 2016

Vous aimerez peut-être aussi