代做CS 7280、代写Python程序语言
CS 7280 Special Topics in Database Management Spring 2024
Project 3: Big Data Analytics
1. Understanding Hadoop Ecosystem and Data Analytics
2. Become familiar with MapReduce programming and Spark
3. Gain experience with research on big data and data analytics
This will be a group project (by 2 students) for one semester. The main purpose of this
project is to become familiar with Big Data platform, including Hadoop system,
MapReduce programming, and cloud based big data solutions (e.g., Google Big Query).
You need to follow the instruction to conduct the project.
Phase 1 (15%): Selecting Data Set - Due: March 27, 2024 (Wed)
• Each student researches on any data that you are interested in, and collect the
information about the data.
• Find any characteristics of the data you select, and describe why you are
interested in
• If possible, prepare 3~4 sample data, which can be either real data or manipulated
• Make 2~ 3 pages of Powerpoint file as a report
• Submit the PPT file to Canvas
o PPT, PPTX or PDF file format ONLY
Phase 2 (15%): Defining Problems – Due: April 3, 2024 (Wed)
• In this 2nd phase, you are going to research on the following topics based on the
data you selected in Phase 1:
- What you can analyze using the selected data in terms of Hadoop HDFS with
Spark, and Google Big Query using GCP.
o 1 Spark
o 1 Google Big Query using GCP
- How you can collect the data at least 1GB. That means your data MUST be
uploaded to HDFS using VM in Phase 4-5.
• Make 2~ 3 pages of Powerpoint file as a report
• Submit the PPT file to Canvas
o PPT, PPTX or PDF file format ONLY
Phase 3 (20%): Preparing Proposal – Due: April 3, 2024 (Wed)
• Prepare a proposal using a MS word template: A proposal template can be found
at Canvas
o DOC, DOCX or PDF file format ONLY
• Prepare and submit 5~10 pages of Powerpoint file for presentation
o PPT, PPTX or PDF file format ONLY
• Then, submit 10 minutes presentation video to Canvas
o Submit a link such as YouTube, or record your presentation using Canvas
• In your proposal, you need to consider how to prepare the final deliverable of
following outputs
1. Write-up
2. Source code
3. Data set
4. Poster
** Note that this is a plan to prepare 1 ~ 4 above. NOT implementation right now.
• Then, submit your proposal to Canvas
• Prepare for 5 mins presentation for your proposal (submit PPT file also)
Phase 4 (25%): Implementation – Due: April 10, 2024 (Wed)
1. Preparing Data and Upload to HDFS. You can use variety of ways to prepare your
data set including:
- Use API provided by each website, such as Facebook API, Twitter API and
Flickr API
- Use benchmarking data sets, such as
o UCI data set: http://archive.ics.uci.edu/ml/datasets.html
o Wikipedia database: https://en.wikipedia.org/wiki/Database_testing
- Government database
o US Census data:
o NOAA weather data: https://www.ncdc.noaa.gov/cdo-web/
- Implement Data collection program using Web query
- Synthesized data set
- Use googling
2. You data set MUST have at least 100,000 instances (or rows)
3. Upload your data set into HDFS (VM)
4. Implement Spark or Big Query
- You can use PySpark or any Steaming with other program language such as
o 1 Spark, or
o 1 Big Query
5. Submit your source code to Canvas and download link for your data set
- All source files should be compressed with TAR (e.g., tar cvf XXX.tar) on
VM (JAR, TAR or ZIP file format ONLY)
- For the dataset, you can upload it to Google Drive (or any Web hard) and then
send a link when you submit your source
6. Then, submit 10 minutes demo video to Canvas
- Submit a link such as YouTube, or record your presentation using Canvas
Phase 5 (25%): Presentation of Project – Due: April 17, 2024 (Wed) before class.
1. Writing-up (at least 4 pages with IEEE format). You must use IEEE format.
o DOC, DOCX or PDF file format ONLY
2. Poster (36 x 24 inches Powerpoint file). You can use one of templates provided
on Canvas.
o PPT, PPTX or PDF file format ONLY
3. Submit your paper and poster to Canvas
4. Make 8 ~ 10 pages of Powerpoint file and submit to Canvas
o PPT, PPTX or PDF file format ONLY
5. Then, prepare 8 minutes final presentation on April 27, 2022 (Wednesday)
You will submit your program using Canvas. If you have any trouble to use blackboard,
you can contact TA or instructor.
15 Phase 1
15 Phase 2
20 Phase 3
25 Phase 4
25 Phase 5
Bonus +20 for high quality writing-up that can be submitted to either conference
or journal paper.
请加QQ:99515681 邮箱:99515681@qq.com WX:codinghelp
- 科华数据液冷技术领航中国移动长三角数据中心绿色变革,共创通信行业低碳新篇章
- Line协议号注册器市场巅峰:我的LINE营销工具心得,让你事业一飞冲天
- 世和基因与南京鼓楼医院签约研产融合共建
- Telegram群组活跃软件,TG自动化炒群工具,电报脚本炒群神器
- 高效互动,客户粘性 UP!商家选择 telegram 群发协议,品牌与用户深度互动无阻!
- Ins引流新高度!Instagram自动聊天软件,多功能群发器震撼上线
- TG-WS-LINE频道号市场颠覆者:zalo代筛料子推广助您领跑竞争
- 安徽建筑材料:品质卓越,筑梦未来
- Ins/Instagram最强营销群发利器,ins一键解锁引流营销工具!
- 信息定向,用户感知! 跨境电商Telegram协议号注册器群发软件,让您的品牌消息精准到每一位用户
- 款款而至,耀启新春 | 同仁堂健康品质年货“龙”重登场
- Instagram全球营销引粉软件,ins一键爆粉机器人震撼上线!
- Ins引流工具,Instagram打粉工具,助你实现营销快速增长!
- Telegram群发云控,精准定制推送!让每一条信息都直击用户心弦
- CISC3025代写、代做Natural Language Processing
- 苹果罕见大降价,华为的压力给到了?
- 法规合规,信息有序! 跨境电商VB代拉群,保障您的品牌安全推广
- 我用这个WhatsApp拉群工具 终于告别了无休止的销售焦虑 找到了安心和成功的路径
- 循环智能与华为云二度携手,签订“汽车智慧营销大模型”合作
- Telegram批量私信营销软件,TG一键群发私信助手,电报群发私信软件
- Line群发软件高效推广利器,解锁市场潜力!让我找到了一个让我事半功倍的Line群发云控工具
- 全球首个微流控专利检索平台-微芯知库正式发布!专访创始人叶嘉明博士
- instagram社交采集利器,一键群发引流,助你快速爆粉!
- 中国环保设备门户:引领绿色未来,共创美好家园
- 海外营销专家无一不推崇的WhatsApp拉群工具 助你轻松征服国际市场
- COMP 2012代做、代写Food Ordering System编程
- 《智慧城市“一网统管”运营研究报告2024》发布,新点软件参编助力
- 业务大咖心语 通过WhatsApp拉群营销工具 我业务效果的神奇转变
- CS 211编程代做、代写c/c++,Java程序
- WhatsApp协议号批发/ws拉群/ws协议号注册工具
如何经营一家好企业,需要具备什么要素特点 我们大多数人刚开始创办一家企业都遇到经营 科技
创意驱动增长,Adobe护城河够深吗? Adobe通过其Creative Cloud订阅捆绑包具有 科技
B站更新决策机构名单:共有 29 名掌权管理者,包括陈睿、徐逸、李旎、樊欣等人 1 月 15 日消息,据界面新闻,B站上周发布内部 科技
智慧驱动 共创未来| 东芝硬盘创新数据存储技术 为期三天的第五届中国(昆明)南亚社会公共安 科技
苹果罕见大降价,华为的压力给到了? 1、苹果官网罕见大降价冲上热搜。原因是苹 科技
疫情期间 这个品牌实现了疯狂扩张 记得第一次喝瑞幸,还是2017年底去北京出差的 科技
老杨第一次再度抓握住一瓶水,他由此产生了新的憧憬 瘫痪十四年后,老杨第一次再度抓握住一瓶水,他 科技
全力打造中国“创业之都”名片,第十届中国创业者大会将在郑州召开 北京创业科创科技中心主办的第十届中国创业 科技
丰田章男称未来依然需要内燃机 已经启动电动机新项目 尽管电动车在全球范围内持续崛起,但丰田章男 科技
升级的脉脉,正在以招聘业务铺开商业化版图 长久以来,求职信息流不对称、单向的信息传递 科技