博客
关于我
MIT6.824 lab1 提示思路
阅读量:242 次
发布时间:2019-03-01

本文共 3159 字,大约阅读时间需要 10 分钟。

  • The application Map and Reduce functions are loaded at run-time using the Go plugin package, from files whose names end in .so.
  • If you change anything in the mr/ directory, you will probably have to re-build any MapReduce plugins you use, with something like go build -buildmode=plugin ../mrapps/wc.go

 

这两条是对Map和Reduce函数的解释,通过.so在运行时加载。

 

  • This lab relies on the workers sharing a file system. That's straightforward when all workers run on the same machine, but would require a global filesystem like GFS if the workers run on different machines.

这些worker共享同一个文件系统,如果当worker跑在不同的机器上时,需要全局的文件系统。

 

  • A reasonable naming convention for intermediate files is mr-X-Y, where X is the Map task number, and Y is the reduce task number.

建议将中间文件名命名为mr-X-Y,X是Map任务数量,Y是reduce任务数

 

The worker's map task code will need a way to store intermediate key/value pairs in files in a way that can be correctly read back during reduce tasks. One possibility is to use Go's encoding/json package. To write key/value pairs to a JSON file:

 

worker的map任务需要存储时可以使用json

 

  • The map part of your worker can use the ihash(key) function (in worker.go) to pick the reduce task for a given key.

 

  • You can steal some code from mrsequential.go for reading Map input files, for sorting intermedate key/value pairs between the Map and Reduce, and for storing Reduce output in files.

可以从串行化的程序中参考到一些思路。

 

  • The master, as an RPC server, will be concurrent; don't forget to lock shared data.

对于RPC服务,可能会有并发问题。注意加锁。

 

 

  • Use Go's race detector, with go build -race and go run -race. test-mr.sh has a comment that shows you how to enable the race detector for the tests.

使用Go并发检测器

 

  • Workers will sometimes need to wait, e.g. reduces can't start until the last map has finished. One possibility is for workers to periodically ask the master for work, sleeping with time.Sleep() between each request. Another possibility is for the relevant RPC handler in the master to have a loop that waits, either with time.Sleep() or sync.Cond. Go runs the handler for each RPC in its own thread, so the fact that one handler is waiting won't prevent the master from processing other RPCs

 

  • The master can't reliably distinguish between crashed workers, workers that are alive but have stalled for some reason, and workers that are executing but too slowly to be useful. The best you can do is have the master wait for some amount of time, and then give up and re-issue the task to a different worker. For this lab, have the master wait for ten seconds; after that the master should assume the worker has died (of course, it might not have).

注意对worker就行心跳检测

 

  • To test crash recovery, you can use the mrapps/crash.go application plugin. It randomly exits in the Map and Reduce functions.

 

To ensure that nobody observes partially written files in the presence of crashes, the MapReduce paper mentions the trick of using a temporary file and atomically renaming it once it is completely written. You can use ioutil.TempFile to create a temporary file and os.Rename to atomically rename it.

 

  • test-mr.sh runs all the processes in the sub-directory mr-tmp, so if something goes wrong and you want to look at intermediate or output files, look there.

 

转载地址:http://xtqv.baihongyu.com/

你可能感兴趣的文章
Nginx配置好ssl,但$_SERVER[‘HTTPS‘]取不到值
查看>>
Nginx配置如何一键生成
查看>>
Nginx配置实例-负载均衡实例:平均访问多台服务器
查看>>
Nginx配置文件nginx.conf中文详解(总结)
查看>>
Nginx配置负载均衡到后台网关集群
查看>>
ngrok | 内网穿透,支持 HTTPS、国内访问、静态域名
查看>>
NHibernate学习[1]
查看>>
NHibernate异常:No persister for的解决办法
查看>>
NIFI1.21.0_Mysql到Mysql增量CDC同步中_日期类型_以及null数据同步处理补充---大数据之Nifi工作笔记0057
查看>>
NIFI1.21.0_NIFI和hadoop蹦了_200G集群磁盘又满了_Jps看不到进程了_Unable to write in /tmp. Aborting----大数据之Nifi工作笔记0052
查看>>
NIFI1.21.0通过Postgresql11的CDC逻辑复制槽实现_指定表多表增量同步_增删改数据分发及删除数据实时同步_通过分页解决变更记录过大问题_02----大数据之Nifi工作笔记0054
查看>>
NIFI1.23.2_最新版_性能优化通用_技巧积累_使用NIFI表达式过滤表_随时更新---大数据之Nifi工作笔记0063
查看>>
NIFI从MySql中增量同步数据_通过Mysql的binlog功能_实时同步mysql数据_根据binlog实现数据实时delete同步_实际操作04---大数据之Nifi工作笔记0043
查看>>
NIFI从MySql中增量同步数据_通过Mysql的binlog功能_实时同步mysql数据_配置binlog_使用处理器抓取binlog数据_实际操作01---大数据之Nifi工作笔记0040
查看>>
NIFI从MySql中增量同步数据_通过Mysql的binlog功能_实时同步mysql数据_配置数据路由_实现数据插入数据到目标数据库_实际操作03---大数据之Nifi工作笔记0042
查看>>
NIFI从MySql中离线读取数据再导入到MySql中_03_来吧用NIFI实现_数据分页获取功能---大数据之Nifi工作笔记0038
查看>>
NIFI从MySql中离线读取数据再导入到MySql中_无分页功能_02_转换数据_分割数据_提取JSON数据_替换拼接SQL_添加分页---大数据之Nifi工作笔记0037
查看>>
NIFI从PostGresql中离线读取数据再导入到MySql中_带有数据分页获取功能_不带分页不能用_NIFI资料太少了---大数据之Nifi工作笔记0039
查看>>
nifi使用过程-常见问题-以及入门总结---大数据之Nifi工作笔记0012
查看>>
NIFI分页获取Mysql数据_导入到Hbase中_并可通过phoenix客户端查询_含金量很高的一篇_搞了好久_实际操作05---大数据之Nifi工作笔记0045
查看>>