代做lsc4ss-a21、代写Python编程语言
Assignment 6
1/4
Assignment 6
5/1/2024
10 Points Possible
In Progress
NEXT UP: Submit Assignment
Unlimited Attempts Allowed
4/22/2024
Attempt 1 Add Comment
Details
For this assignment, you will submit a README.md with your answers to the questions below, along with the code you used to
produce your answers (including all boto3 scripts necessary to reproduce your cloud infrastructure, where relevant). You should
commit your Assignment 6 file(s) to your private “a6” GitHub repository (click here (https://classroom.github.com/a/jXPdPm3s) to
accept the GitHub Classroom invitation to access this repository) and submit a link to your repository here on the Canvas (clicking
the “Submit Assignment” button to make your submission). You must work alone on this assignment. Before submitting your
assignment, please take a look at the tips one of the previous TAs for the course (Jinfei Zhu) compiled for writing a grader-friendly
README file and organizing your assignment GitHub repository (https://github.com/lsc4ss-a21/assignment-submission?template) if you have not already done so.
1. (6 Points Total) This first prompt builds on the survey submission pipeline you have been working on in Assignments 4 and 5. As
a final step in your survey submission pipeline, you will write a Python function that can be invoked on a survey participant’s
mobile device when they complete a survey to send their survey submission into an SQS queue, which should then trigger the
Lambda function you wrote in Assignments 4 and 5.
Note that each survey submission is initially saved as a JSON file (on the mobile device) when a participant completes a survey
via the mobile app (see example files here ()
() ). For the purposes of this prompt, you do
not need to worry about the implementation of the mobile app or the creation of these JSON files. Your job is to write a Python
function that will send a string representation of this JSON data (an individual survey) into an AWS SQS queue (your function
will then be incorporated into the mobile app by another researcher). The SQS queue should then trigger your AWS Lambda
function from Assignment 5, which will take this survey submission data and perform necessary processing and storage
operations in the cloud. You should accomplish all of these tasks programmatically (using boto3 ) to ensure reproducibility of
your architecture. Specifically, you should complete the following tasks:
a. (1 Point) Write a Python function send_survey (which you can assume will be installed with the mobile app and will
automatically be invoked after a survey is saved as a JSON file on the device) that has the following signature:
def send_survey(survey_path, sqs_url):
'''
Input: survey_path (str): path to JSON survey data
(e.g. `./survey.json')
sqs_url (str): URL for SQS queue
Output: StatusCode (int): indicating whether the survey
was successfully sent into the SQS queue (200) or not (400)
'''
In the function body, you should use boto3 to send the data from a survey (a JSON file on the mobile device, converted into
a string representation) into an AWS SQS queue.
b. (2 Points) Create an SQS queue and configure it to act as a trigger for your Lambda function from Assignment 5 (which will
process your data and write it to storage).
Note that if you test your full survey submission pipeline using the example JSON files provided above (in a loop,
using time.sleep(10) in between survey submissions, as in Assignment 5), you should see the following keys in your S3
Bucket:
['0001092821120000.json', '0001092921120000.json', '0001093021120300.json',
'0002092821120000.json', '0003092821120001.json', '0004092821120002.json',
Assignment 6
2/4
'0005092821122000.json']
You should also see the following records if you query your DynamoDB table:
{'q1': Decimal('1'), 'q2': Decimal('1'), 'user_id': '0001',
'q3': Decimal('2'), 'q4': Decimal('2'), 'q5': Decimal('2'),
'num_submission': Decimal('3'),
'freetext': "I lost my car keys this afternoon at lunch, so I'm more stressed than normal"}
{'q1': Decimal('4'), 'q2': Decimal('1'), 'user_id': '0002',
'q3': Decimal('1'), 'q4': Decimal('1'), 'q5': Decimal('3'),
'num_submission': Decimal('1'),
'freetext': "I'm having a great day!"}
{'q1': Decimal('1'), 'q2': Decimal('3'), 'user_id': '0003',
'q3': Decimal('3'), 'q4': Decimal('1'), 'q5': Decimal('4'),
'num_submission': Decimal('1'),
'freetext': 'It was a beautiful, sunny day today.'}
{'q1': Decimal('1'), 'q2': Decimal('1'), 'user_id': '0004',
'q3': Decimal('1'), 'q4': Decimal('1'), 'q5': Decimal('1'),
'num_submission': Decimal('1'),
'freetext': 'I had a very bad day today...'}
{'q1': Decimal('3'), 'q2': Decimal('3'), 'user_id': '0005',
'q3': Decimal('3'), 'q4': Decimal('3'), 'q5': Decimal('3'),
'num_submission': Decimal('1'),
'freetext': "I'm feeling okay, but not spectacular"}
c. (3 Points) Your PI, who is overseeing this project, is worried that if all of the participants in the study (potentially thousands)
submit surveys at the same time in the day, this might cause the system to crash and your lab might lose data (this
happened to your PI when they ran a similar digital survey via on-premise servers in the early 2000s). How would you
reassure your PI that your architecture is scalable and will be able to handle such spikes in demand? Your response should
be at least 200 words and discuss the scalability of each of the cloud services you used in your pipeline in detail.
2. (4 points) For this prompt, we ask you to declare whether you will complete a Final Project or a Final Exam as your capstone
assignment for the course. You are welcome to meet with course staff and discuss your options and ideas with us before
making your election and submitting your answer to this prompt.
If you wish to complete a Final Project, you should additionally write a ~250 word-proposal in your README for this
assignment, detailing your plan for the project (see expectations and sample projects on the Final Exam/Final Project
Assignment page () ). You should explain why your project
idea helps to solve a social science research problem using large-scale computing methods and outline a schedule for
completing the project by the deadline. If you are working in a group, you should also write down the names of your group
members and describe how you are going to split up the work amongst yourselves.
If you wish to take a Final Exam, you should instead write one question for possible inclusion in the Final Exam and submit it
in your README for this assignment. The better the question you submit, the higher the likelihood you will see the question
(or a closely related one) on the exam. We will additionally post the best questions to the Final Exam page on Canvas so that
you can use them as study material for the exam. Note: YOU WILL NEED TO PROVIDE THE SOLUTION FOR YOUR
QUESTIONS. A good question is one that goes beyond memorization and asks the student to apply a concept in a way that
is similar to what we do in our in-class activities and conceptual questions in assignments (we will not ask implementation
questions that involve writing code from scratch). Specifically, we plan to include questions of the following types (for
additional examples, you can take a look at past examples of questions used on the exam on the Final Exam/Final Project
assignment () page):
Applied Conceptual Questions, such as:
You are conducting a large digital experiment, in which you have designed an online music sharing application and
recruited participants to use the platform over the course of a month. During the experiment, you will manipulate features
of the website in order to test your research hypotheses. In order to run the experiment, you need to be able to
collect/record thousands of data points per second; for instance: tracking the songs that participants download, the
treatments that they were exposed to (by you the researcher), as well as all of the things that participants click on. When
the experiment is over, you would like to perform a statistical analysis on a subset of the data to identify experimental
interventions that caused participants to change their clicking/downloading behavior. Ultimately, when your work is
published, you would also like to have your (de-identified) data publicly accessible, so that future scholars can replicate
your statistical analysis.
Assignment 6
3/4
What databases and/or storage solutions would you use to solve these problems (storing data while you run the
experiment, as well as afterwards) in the AWS cloud ecosystem? Why? How about if you scaled the experiment up by
several orders of magnitude to include millions of participants? Would this change your data storage/management
solution?
Code Interpretation Questions, such as:
Below is a serial version of a Monte Carlo simulation to estimate π that is written in Python. Identify parts of this code
that could be accelerated using a GPU, as well as those that would best be run on a CPU – attempting to accelerate the
estimation of π as much as possible. For each section of code, you should explain why your answers are the best
hardware options for optimal performance (e.g. thinking in terms of some of the key bottlenecks and hardware limitations
for CPUs vs. GPUs).
# NumPy Pi Estimation with Monte Carlo Simulation
import numpy as np
import time
t0 = time.time()
n_runs = 10 ** 8 # Simulate Random Coordinates in Unit Square:
ran = np.random.uniform(low=-1, high=1, size=(2, n_runs))
# Identify Random Coordinates that fall within Unit Circle and count them
result = ran[0] ** 2 + ran[1] ** 2 <= 1
n_in_circle = np.sum(result)
# Estimate Pi
print("Pi Estimate: ", 4 * n_in_circle / n_runs)
print("Time Elapsed: ", time.time() - t0)
Troubleshooting Questions, such as:
You are training a linear regression model to predict the price of an AirBnB listing given a variety of text features derived
from the listing’s description on AirBnB (note that AirBnB publishes this data in CSV format for listings across the world
and the data is updated on a monthly basis).
You have written a machine learning workflow in PySpark that does the following on an AWS EMR cluster composed of 3
m5.xlarge EC2 instances (1 resource manager and 2 core instances), with 10 GB in EBS storage available on each
instance:
1. Cleans the description text data (e.g. drops stop words and punctuation) from all AirBnB listings around the world
from the past month (prior to the current month).
2. Engineers features based on the clean description data (such as categorical and binary features indicating whether
the description contains certain types of words).
3. Uses MLLib’s CrossValidator to identify the optimal hyperparameters for your linear regression model given a grid of
possible values used to tune the model (i.e. a grid search)
4. Trains the regression model using the optimal hyperparameters from (3) and make predictions on the prices of AirBnB
listings from the current month.
Having successfully run this workflow on one previous month of data, you want to increase your training data size to
several years worth of data. As you increase the amount of data entering into your pipeline, though, you begin to observe
unexpected (i.e. nonlinear) diminishing performance (in terms of speed) and beyond a certain data size, your job will not
complete at all – it keeps running indefinitely.
Describe at least two possible root causes of this slowdown (considering both hardware and software). Why would these
be concerns? Is it possible to remedy them? What would be your solutions?
Some hints for writing good questions:
You shouldn’t make the question needlessly complicated or overly verbose.
Try to be clear about what you’re asking and what you’re looking for.
Try to cover multiple topics from the course – i.e. a question that touches on the memory hierarchy, GPU vs. CPU
Parallelism, and Spark’s execution model, would be better than one that is narrowly relevant to invoking a Lambda function.
Assignment 6
4/4
Anything we’ve covered in the class is fair game (and you’re welcome to continue submitting relevant questions through
Wednesday of Week 9 related to material we cover after this assignment – you just will not receive additional credit).
Clear social science tie-ins are preferred
请加QQ:99515681 邮箱:99515681@qq.com WX:codinghelp
- 领跑数字化转型:望繁信科技荣登「2024智能自动化技术商Top 15」榜单
- WhatsApp协议号优势解析:为什么选择WS协议号进行营销
- Anaqua Enhances AcclaimIP Patent Search with AI-Powered Capabilities
- 上海润兴救护车出租转运公司您的首选服务机构
- 中信银行“一站式”服务助力企业汇率风险管理
- “2024福布斯中国白酒品牌评选”正式签约
- 视博医疗集团总部及研发生产基地项目封顶仪式顺利举行
- 贵州福贵氿酒业:传承千年酿酒工艺,酿造人间琼浆玉液
- NSERC Awards Funding to TRIUMF and General Fusion to Develop State-of-the-Art Diagnostic System for
- 一带一路 | 汉尔姆助力“埃塞俄比亚奥罗米亚合作银行新总部大楼项目”建设
- 精密交流恒流源0-5a可调可编程交流恒流源苏州亿光达电子
- 通过 SolidityScan 和 Blockscout 集成增强智能合约安全性
- 森马上海时装周上演“自在野出行” 携手胡兵演绎实感品质穿搭
- 未思途少儿成长中心:以全球视野打造中国孩子的双语及领导力教育
- 科膳它博会首次亮相圆满收官!
- 玫瑰传心意 —阳光人寿上海分公司“三八”妇女节温馨献礼女性客户
- CARBOGEN AMCIS 宣布旗下中国工厂成功通过 ANVISA 审计
- 持续锋芒!招商大城红盘背后藏着什么样的硬核实力?
- 品味了中西合壁的鹅肉汉堡,才知道为哈小金豆们都稀罕尔滨了!
- Jitterbit 任命 Bill Conner 为总裁兼首席执行官
- OSL 携瑞银完成全港首次投资级代币化权证分销模拟
- 海信5G+荣耀家:强大“智慧大脑”让空气主动健康
- 浙江干冰厂冷链干冰生物干冰清洗干冰
- 人工智能、包容性和可持续性:商学院申请者的必修课题
- 高压萃取机LC1008植源灵萃龙年送健康家庭智能
- KFSH&RC Celebrates 25 Scientists Named in Stanford's Top 2% Most-Cited Researchers Worldwid
- 10年狂飙之后,这里即将诞生大西安下一个封面
- 福建装饰:匠心独运,缔造家居美学新篇章
- 鸿泰鼎石引领金融清收新篇章,助力金融稳定保障体系建设
- 全球“像瑞典一样戒烟”运动将拯救数百万烟民的生命
推荐
- 王自如被强制执行3383万 据中国执行信息公开网消息,近期,王自如新增一 资讯
- 中国减排方案比西方更有优势 如今,人为造成的全球变暖是每个人都关注的问 资讯
- 海南大学生返校机票贵 有什么好的解决办法吗? 近日,有网友在“人民网领导留言板&rdqu 资讯
- 新增供热能力3200万平方米 新疆最大热电联产项目开工 昨天(26日),新疆最大的热电联产项目—&md 资讯
- 周星驰新片《少林女足》在台湾省举办海选,吸引了不少素人和足球爱好者前来参加 周星驰新片《少林女足》在台湾省举办海选,吸 资讯
- 奥运冠军刘翔更新社交账号晒出近照 时隔473天更新动态! 2月20日凌晨2点,奥运冠军刘翔更新社交账号晒 资讯
- 私域反哺公域一周带火一家店! 三四线城市奶茶品牌茶尖尖两年时间做到GMV 资讯
- 大家一起关注新疆乌什7.1级地震救援见闻 看到热气腾腾的抓饭马上就要出锅、村里大家 资讯
- 国足13次出战亚洲杯首次小组赛0进球 北京时间1月23日消息,2023亚洲杯小组 资讯
- 一个“江浙沪人家的孩子已经不卷学习了”的新闻引发议论纷纷 星标★ 来源:桌子的生活观(ID:zzdshg) 没 资讯