Syllabus

This web page will serve as the syllabus for the Spring 2021 version of STAT 432. Please read it carefully. You should become familiar with these policies. To do so, you will likely need to return to the syllabus several times throughout the semester. After the start of the semester, this document may continue to be updated. Any such changes will be announced.

Course Name and Number

  • Main: STAT 432 - Basics of Statistical Learning
  • Cross-list: ASRM 451 - Basics of Statistical Learning
  • Section: 1UG, 1GR

For simplicity, the course staff will exclusively refer to the course as STAT 432.

Location and Time

This Spring 2021 version of the course is online.

  • Location: Wherever you are!
  • Time: Mostly whenever you’d like!

Course Staff

Please refer to the course staff by their given names. For example, your instructor is named David. If you refer to the staff as “Professor” or “TA,” we might refer to you as “student,” which seems odd.

Instructor

Teaching Assistants

Course Content

Course Description

STAT 432 provides a broad overview of machine learning, through the eyes of a statistician. As a first course in machine learning, core ideas are stressed, and specific details are de-emphasized. After completing the course, students should be able to train and evaluate statistical models. While we will not discuss an exhaustive list of methods, given the framework developed throughout the course, students should feel comfortable exploring new methods and models on their own. Previous experience with R programming is necessary for success in the course as students will be tested on their ability to use the methods discussed through the use of a statistical computing environment.

Topics

Tentative subjects include:

  • Basics: Supervised and Unsupervised Learning, Parametric vs Non-Parametric Methods, Bias-Variance Trade-Off, Cross-Validation, Model Selection and Evaluation
  • Regression: Linear Regression, Trees, KNN, Penalized Regression
  • Classification: Logistic Regression, Trees, KNN, LDA, QDA, Naive Bayes
  • Modern Methods: Regularization (Ridge, Lasso, Elastic Net), Ensemble Learning (Bagging, Boosting, Random Forests)
  • Unsupervised: PCA, K-Means Clustering, Hierarchical Clustering, Mixture Models, EM Algorithm

Learning Objectives

After this course, students are expected to be able to …

  • identify supervised (regression and classification) and unsupervised (clustering) learning problems.
  • understand some fundamental theory behind statistical learning methods.
  • implement learning methods using a statistical computing environment.
  • formulate practical, real-world, problems as statistical learning problems.
  • evaluate effectiveness of learning methods when used as a tool for data analysis.

Note: These objectives are similar to the objectives for the Society of Actuaries Exam PA: Predictive Analytics. (See details in their linked syllabus.) While STAT 432 was not specifically designed to prepare students the the SOA Exam PA, the coverage may be sufficient to sit for the exam, although some additional exam-specific study may be required.

Textbooks

The main text for the course will be BSL. Within BSL, readings from ISL may be assigned. If BSL and ISL provide conflicting guidance, we will defer to BSL in this course. When reading ISL, you do not need to read the sections decided to R. We will follow the R conventions only from BSL.

Prerequisites

A course which covers linear regression that uses R, such as STAT 420 or STAT 425. Basic knowledge of probability and linear algebra is also assumed. A working knowledge of the material from the following three texts would also be sufficient.

Course Communication

We will use several forms of communication for this course. The website will be the one-stop-shop for all course information. Course announcements will be sent via email. Be sure you are regularly checking your @illinois.edu email account.

If you would like to communicate with the course staff, our preferred methods of communication, in order, are:

  1. Office Hours
  2. Piazza
  3. Email

Office Hours

For Spring 2021, all office hours will be held online via Zoom.

Staff and Link Day Time
Zoom with David Monday 8:00 PM - 9:00 PM
Zoom with Tianyi Monday 9:00 PM - 10:00 PM
Zoom with David Thursday 8:00 PM - 9:00 PM
Zoom with Tianyi Thursday 9:00 PM - 10:00 PM

The office hour schedule is always subject to change. As such, the dates and times will be posted each week along with the course materials.

David’s office hours will transition from his other course, STAT 510, at 8:00 PM. You are welcome to stop by early during the 7:00 PM - 8:00 PM hour, but priority will be given to questions about STAT 510 during that time.

Office hours are by far our preferred forum for discussing individual specific questions. In office hours, our response time will be literally instant. Also, since we are both present in the same physical location (or together on Zoom), follow-up is both expected, and easy. Using electronic forms of communication such as Piazza or email will have a slow response rate and a much lower communication bandwidth. In other words, please come to office hours!

Because our class is not gigantic, office hours will be a rather informal meeting. As such, if the instructor and a student are engaged in causal conversation not directly related to a pressing matter in STAT 432, like a homework question, please just jump into the conversation and interrupt! If office hours are “busy” the instructor may institute an informal queuing system, but the hope is to keep office hours more relaxed and informal.

If you would like to schedule a private meeting outside of regular office hours, please send an email suggesting two possible times, on two different days. (A total of four suggested times.) We have a preference for time-slots directly adjacent to current office hours. Please also indicate a brief agenda for the meeting. Requests to schedule a meeting at a time less than 24 hours in the future are unlikely to be granted.

Piazza

This course will use Piazza for some course communications.

Please register your account with your University email.

The course staff will attempt to check Piazza at least once a day during the week, thus you can often expect a response within 24 hours, except for weekends. If you need a quicker response, you should consider office hours as an alternative.

The course staff would strongly prefer the use of Piazza to GroupMe or similar services not officially supported by the course. The course staff feels that a GroupMe may exclude members of the course, whereas all are welcome on Piazza.

Private posts have been disabled. Any private matters should be discussed over email where your identity is known and private. Some anonymous posting is disabled. (You may post anonymously to your classmates, but not the course staff.)

Additional Piazza policy can be found in a pinned post on Piazza.

Email Policy

STAT 432 will follow a strict email policy. Instead of email, consider Piazza! Any quick, non-private communication should take place there.

If you’d like to email the instructor or course staff, consider the following:

  • Is your question about course administration? If so, have you read the syllabus? If your question is easily answered in the syllabus, we will either refer you to the syllabus, or ignore your email.
  • Is your question about part of an assignment? First and foremost: You should ask it in office hours. After that, consider Piazza. As a last resort, use email, but there is a good chance you will be re-directed to Piazza.

If you choose to send an email, you must adhere to the following three rules. If you do not, your email will be considered less import than other emails which follow the rules and response time will be slower.

  • All email must originate from an @illinois.edu email address or appear as sent on behalf or an @illinois.edu address.
    • Depending on the situation, failure to follow this rule may make a response impossible.
  • Your subject line must begin with exactly the following: [STAT 432]
    • Failure to follow this step exactly may result in your email simply not being answered.
  • After the above, put a single space, followed by a useful but short description of your message.
## good
[STAT 432] Grade feedback question
## bad
## improper format
## non-descriptive subject
[stat432] hi
## bad
## improper format
[STAT432] Grade feedback question
## bad
## improper format
## subject too long
## information found in syllabus or website
[STAT 432]when is the first exam and what is covered on the exam?

If your email is sent between 9:00 AM Monday and 11:59 PM Thursday, and you follow the above directions, we will try our best to respond within 24 hours. Questions about an assessment sent the same day the assessment is due will likely not receive a response before the assessment is due. Plan accordingly.

Course Staff Emails

Role Name Email
Instructor David Dalpiaz
Teaching Assistant Tianyi Qu

Code Discussion

If your question is technical in nature, there are several steps you can take to insure a speedy response on Piazza or in email.

First and foremost, you should ask Google before you ask the course staff. Take the error message you obtained and search it with Google. The ability to solve problems this way is an extremely value skill, possibly one of the most important you should learn (but are not taught) during your academic career. Make a legitimate effort to solve the problem on your own. You won’t always be able to, and if you can’t, post on Piaza. (Or better yet, stop by office hours.)

If you need to ask the course staff, include the following in your Piazza post or email:

  • All code that is required to re-create the error.
    • Staff should be able to run your code, without any modification, and obtain the same error or output.
  • The exact error message received.

Do not use screenshots of code and error messages to communicate about them. Copy paste them so that others can copy-paste them as well.

In this course, for everything expect exams, we greatly prefer over-sharing to under-sharing code. We would rather everyone learn from others “mistakes” than have everyone experience the same issues over and over again.

Assessments

With the exception of exams, all course assignments are due at 11:59 PM, Central (Champaign) time, on the listed due date.

PrairieLearn Quizzes

Throughout the semester, quizzes will be administered through the PrairieLearn system. (9 for undergraduates, 10 for graduate students.) These will be low-stakes, unlimited attempt quizzes. That is, there is no penalty for submitting incorrect answers, and your score can only go up, never down. These quizzes will serve as practice for exams. No quizzes will be dropped. Instead, there will be opportunity to earn buffer points with each quiz. Buffer points will allow you to obtain over 100% for a particular assignment, but your percentage on quizzes overall cannot exceed 100%.

The buffer point and late submission details can be seen in the details of each quiz on PrairieLearn. As an example, consider Quiz 01:

  • 105% Credit: Monday, February 1, 11:59 PM
  • 100% Credit: Monday, February 8, 11:59 PM
  • 85% Credit: Monday, February 15, 11:59 PM

To obtain the 105% credit, you must achieve a score of 100% before the “due” date for 105% credit. (The “due” dates, we will generally refer to the date to obtain 105% credit.)

PrairieLearn

Quizzes and exams will both use the PrairieLearn system. Use the link below to sign-up and add STAT 432.

Exams

There will be two midterm exams proctored using the CBTF Online. Details about the exams will be released on the course website as we approach the exams.

  • Exam 01: Monday, March 1
  • Exam 02: Monday, April 5

Data Analyses

There will be three data analyses (DA) throughout the semester. Specific policies and directions will be released alongside the analyses. Note that these will not be graded and weighted like projects. They will be graded mostly for completion to allow you to experiment with the techniques you have learned.

Deadlines

Except for the exam, all deadlines are at 11:59 PM, Champaign time, on the listed day.

Assessment Deadline
Quiz 01 Monday, February 1
Quiz 02 Monday, February 8
Quiz 03 Monday, February 15
Quiz 04 Monday, February 22
Exam 01 Monday, March 1
Quiz 05 Monday, March 8
Quiz 06 Monday, March 15
Quiz 07 Monday, March 22
Quiz 08 Monday, March 29
Exam 02 Monday, April 5
Quiz 09 Monday, April 19
Analysis 01 Wednesday, May 5
Analysis 02 Wednesday, May 5
Grad Quiz Wednesday, May 5

Course Technology

Statistical Computing

R and RStudio are required software for this course. You will need access to a computer where you have the ability to install and update this software.

It is your responsibility to make sure you are using the most recent version of both R and RStudio. Failure to use the most recent version of R will result in an inability to complete the quizzes.

Learning Management

A mixture of Compass, Piazza, and PrairieLearn will be used for Learning Management.

Until the end of the semester when we transition to Compass for analysis submission, PrairieLearn will be used for all assignments and grades.

Grading

Assessment Weights

Assessment Percentage
Quizzes 50
Exam 01 20
Exam 02 20
Analyses 10

The quiz sub-score will be the average of the 9 quizzes for undergraduates. (It will be the average of 10 quizzes for graduate students.) If your quiz sub-scores is above 100 as a result of buffer points, it will be recorded as 100. Similarly, the sub-score for the analyses will be the average of the individual analyses.

Grading Scale

A B C D
Plus 99 87 77 67
Neutral 93 83 73 63
Minus 90 80 70 60

The instructor reserves the right to lower, but not raise, grade cutoffs. However, this policy should not create an expectation that this will happen. Asking for a change in cutoffs will make any change in cutoffs less likely.

Grading in the course is not competitive. There is nothing (other than some statistical realities) that would prevent the entire class from receiving a grade of A.

Grade Disputes

If you feel an assignment was graded incorrectly, you have one week from the date you received a grade to discuss it with the instructor. After one week, grading is final except for exceptional circumstances. You may not simply ask for a re-grade, but instead must justify to the instructor why the grading was done incorrectly. By disputing any grading, you agree to allow the instructor to review the entire assessment in question for other errors missed during grading. Requests must be sent via email. (Failure to follow the email policy will result in your request being denied.) Grade disputes over trivial points will likely be met with frustration. (A grade on a single assignment is not reflective of your overall grade in the course. The generous buffer points should more than make up for a single point deduction on a single assignment.)

All grade disputes must be discussed with the course instructor. Teaching Assistants and Course Assistants do not have authority to modify grades.

Academic Integrity

The official University of Illinois policy related to academic integrity can be found in Article 1, Part 4 of the Student Code. Section 1-402 in particular outlines behavior which is considered an infraction of academic integrity. These sections of the Student Code will be upheld in this course. Any violations will be dealt with in a swift, fair, and strict manner. In short, do not cheat, it is not worth the risk. You are more likely to get caught than you believe. If you think you may be operating in a gray area, you most likely are.

Policies about specific assessment types will be released with directions for those assessments. Two heuristics to keep in mind:

  • Do not share files with other students. Do not copy-paste code from any source other than the course notes and website.
  • Use spoken language to exchange ideas, not code.

Under no circumstances should course materials be provided to Course Hero, Chegg, or any similar for-profit website. The course staff will seek the harshest possible academic integrity penalty for any students who do so.

Additional Information

Safety

The university values your safety. Please read this document or watch this video.

Disability Accommodations

To obtain disability-related academic adjustments or auxiliary aids, students with disabilities must contact the course instructor and the Disability Resources and Educational Services (DRES) as soon as possible. To contact DRES, you may visit 1207 S. Oak St., Champaign, call 217-333-4603, e-mail disability@illinois.edu or go to the DRES website.

To ensure appropriate accommodation is provided in a timely manner, please provide your Letter of Accommodation during the first week of class. Letters received after a relevant assessment has been administered will likely cause logistical issues that could result in an inability to accommodate.

If you have accommodations identified by the Division of Rehabilitation-Education Services (DRES) for exams, please email your Letter of Accommodations to Carleen Sacris, CBTF Manager, at .

The Extended Syllabus

For some thoughts on teaching philosophy, some explanation of policies, and some general tips for success, please see The Extended Syllabus.

Changes

The instructor reserves the right to make any changes he considers academically advisable. Such changes, if any, will be announced. Please note that it is your responsibility to keep track of the course proceedings.