Skip to content

Navigation Menu

Sign in
Appearance settings

Search code, repositories, users, issues, pull requests...

Provide feedback

We read every piece of feedback, and take your input very seriously.

Saved searches

Use saved searches to filter your results more quickly

Appearance settings

Latest commit

 

History

History
History
executable file
·
35 lines (25 loc) · 1.57 KB

File metadata and controls

executable file
·
35 lines (25 loc) · 1.57 KB
Copy raw file
Download raw file
Open symbols panel
Edit and raw actions
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
/*
Given a list of URLs, find top 10 most visited URLs
Some points this problem might cover:
*/
/*
1. MapReduce related problem
http://stackoverflow.com/questions/17928158/find-top-10-most-frequent-visited-url-data-is-stored-accross-network#
To mimic the process of MapReduce.
One most voted solution:
It says you can't use map-reduce directly which is a hint the author of the question wants you to think how map reduce works, so we will just mimic the actions of map-reduce:
pre-processing: let R be the number of servers in cluster, give each server unique id from 0,1,2,...,R-1
(map) For each (string,id) - send the tuple to the server which has the id hash(string) % R.
(reduce) Once step 2 is done (simple control communication), produce the (string,count) of the top 10 strings per server. Note that the tuples where those sent in step2 to this particular server.
(map) Each server will send all his top 10 to 1 server (let it be server 0). It should be fine, there are only 10*R of those records.
(reduce) Server 0 will yield the top 10 across the network.
Notes:
The problem with the algorithm, like most big-data algorithms that don't use frameworks is handling failing servers. MapReduce takes care of it for you.
The above algorithm can be translated to a 2 phases map-reduce algorithm pretty straight forward.
*/
/*
2. Top K words in a document:
This is an actaul problem on LintCode
http://www.zrzahid.com/top-k-or-k-most-frequent-words-in-a-document/
http://www.geeksforgeeks.org/find-the-k-most-frequent-words-from-a-file/
*/
Morty Proxy This is a proxified and sanitized view of the page, visit original site.