This web page will serve as the syllabus for the Spring 2021 version of STAT 432. Please read it carefully. You should become familiar with these policies. To do so, you will likely need to return to the syllabus several times throughout the semester. After the start of the semester, this document may continue to be updated. Any such changes will be announced.
- Main: STAT 432 - Basics of Statistical Learning
- Cross-list: ASRM 451 - Basics of Statistical Learning
- Section: 1UG, 1GR
For simplicity, the course staff will exclusively refer to the course as STAT 432.
This Spring 2021 version of the course is online.
- Location: Wherever you are!
- Time: Mostly whenever you’d like!
Please refer to the course staff by their given names. For example, your instructor is named David. If you refer to the staff as “Professor” or “TA,” we might refer to you as “student,” which seems odd.
STAT 432 provides a broad overview of machine learning, through the eyes of a statistician. As a first course in machine learning, core ideas are stressed, and specific details are de-emphasized. After completing the course, students should be able to train and evaluate statistical models. While we will not discuss an exhaustive list of methods, given the framework developed throughout the course, students should feel comfortable exploring new methods and models on their own. Previous experience with
R programming is necessary for success in the course as students will be tested on their ability to use the methods discussed through the use of a statistical computing environment.
Tentative subjects include:
- Basics: Supervised and Unsupervised Learning, Parametric vs Non-Parametric Methods, Bias-Variance Trade-Off, Cross-Validation, Model Selection and Evaluation
- Regression: Linear Regression, Trees, KNN, Penalized Regression
- Classification: Logistic Regression, Trees, KNN, LDA, QDA, Naive Bayes
- Modern Methods: Regularization (Ridge, Lasso, Elastic Net), Ensemble Learning (Bagging, Boosting, Random Forests)
- Unsupervised: PCA, K-Means Clustering, Hierarchical Clustering, Mixture Models, EM Algorithm
After this course, students are expected to be able to …
- identify supervised (regression and classification) and unsupervised (clustering) learning problems.
- understand some fundamental theory behind statistical learning methods.
- implement learning methods using a statistical computing environment.
- formulate practical, real-world, problems as statistical learning problems.
- evaluate effectiveness of learning methods when used as a tool for data analysis.
Note: These objectives are similar to the objectives for the Society of Actuaries Exam PA: Predictive Analytics. (See details in their linked syllabus.) While STAT 432 was not specifically designed to prepare students the the SOA Exam PA, the coverage may be sufficient to sit for the exam, although some additional exam-specific study may be required.
BSL - Basics of Statistical Learning
- David Dalpiaz
ISL - An Introduction to Statistical Learning with Applications in R
- Gareth James, Daniela Witten, Trevor Hastie, Robert Tibshirani
The main text for the course will be BSL. Within BSL, readings from ISL may be assigned. If BSL and ISL provide conflicting guidance, we will defer to BSL in this course. When reading ISL, you do not need to read the sections decided to
R. We will follow the
R conventions only from BSL.
A course which covers linear regression that uses
R, such as STAT 420 or STAT 425. Basic knowledge of probability and linear algebra is also assumed. A working knowledge of the material from the following three texts would also be sufficient.
We will use several forms of communication for this course. The website will be the one-stop-shop for all course information. Course announcements will be sent via email. Be sure you are regularly checking your @illinois.edu email account.
If you would like to communicate with the course staff, our preferred methods of communication, in order, are:
- Office Hours
For Spring 2021, all office hours will be held online via Zoom.
|Staff and Link||Day||Time|
|Zoom with David||Monday||8:00 PM - 9:00 PM|
|Zoom with Tianyi||Monday||9:00 PM - 10:00 PM|
|Zoom with David||Thursday||8:00 PM - 9:00 PM|
|Zoom with Tianyi||Thursday||9:00 PM - 10:00 PM|
The office hour schedule is always subject to change. As such, the dates and times will be posted each week along with the course materials.
David’s office hours will transition from his other course, STAT 510, at 8:00 PM. You are welcome to stop by early during the 7:00 PM - 8:00 PM hour, but priority will be given to questions about STAT 510 during that time.
Office hours are by far our preferred forum for discussing individual specific questions. In office hours, our response time will be literally instant. Also, since we are both present in the same physical location (or together on Zoom), follow-up is both expected, and easy. Using electronic forms of communication such as Piazza or email will have a slow response rate and a much lower communication bandwidth. In other words, please come to office hours!
Because our class is not gigantic, office hours will be a rather informal meeting. As such, if the instructor and a student are engaged in causal conversation not directly related to a pressing matter in STAT 432, like a homework question, please just jump into the conversation and interrupt! If office hours are “busy” the instructor may institute an informal queuing system, but the hope is to keep office hours more relaxed and informal.
If you would like to schedule a private meeting outside of regular office hours, please send an email suggesting two possible times, on two different days. (A total of four suggested times.) We have a preference for time-slots directly adjacent to current office hours. Please also indicate a brief agenda for the meeting. Requests to schedule a meeting at a time less than 24 hours in the future are unlikely to be granted.
This course will use Piazza for some course communications.
- Register: https://piazza.com/illinois/spring2021/stat4321ug1gr
- Access Code:
Please register your account with your University email.
The course staff will attempt to check Piazza at least once a day during the week, thus you can often expect a response within 24 hours, except for weekends. If you need a quicker response, you should consider office hours as an alternative.
The course staff would strongly prefer the use of Piazza to GroupMe or similar services not officially supported by the course. The course staff feels that a GroupMe may exclude members of the course, whereas all are welcome on Piazza.
Private posts have been disabled. Any private matters should be discussed over email where your identity is known and private. Some anonymous posting is disabled. (You may post anonymously to your classmates, but not the course staff.)
Additional Piazza policy can be found in a pinned post on Piazza.
STAT 432 will follow a strict email policy. Instead of email, consider Piazza! Any quick, non-private communication should take place there.
If you’d like to email the instructor or course staff, consider the following:
- Is your question about course administration? If so, have you read the syllabus? If your question is easily answered in the syllabus, we will either refer you to the syllabus, or ignore your email.
- Is your question about part of an assignment? First and foremost: You should ask it in office hours. After that, consider Piazza. As a last resort, use email, but there is a good chance you will be re-directed to Piazza.
If you choose to send an email, you must adhere to the following three rules. If you do not, your email will be considered less import than other emails which follow the rules and response time will be slower.
- All email must originate from an
@illinois.eduemail address or appear as sent on behalf or an
- Depending on the situation, failure to follow this rule may make a response impossible.
- Your subject line must begin with exactly the following: [STAT 432]
- Failure to follow this step exactly may result in your email simply not being answered.
- After the above, put a single space, followed by a useful but short description of your message.
## good [STAT 432] Grade feedback question
## bad ## improper format ## non-descriptive subject [stat432] hi
## bad ## improper format [STAT432] Grade feedback question
## bad ## improper format ## subject too long ## information found in syllabus or website [STAT 432]when is the first exam and what is covered on the exam?
If your email is sent between 9:00 AM Monday and 11:59 PM Thursday, and you follow the above directions, we will try our best to respond within 24 hours. Questions about an assessment sent the same day the assessment is due will likely not receive a response before the assessment is due. Plan accordingly.
If your question is technical in nature, there are several steps you can take to insure a speedy response on Piazza or in email.
First and foremost, you should ask Google before you ask the course staff. Take the error message you obtained and search it with Google. The ability to solve problems this way is an extremely value skill, possibly one of the most important you should learn (but are not taught) during your academic career. Make a legitimate effort to solve the problem on your own. You won’t always be able to, and if you can’t, post on Piaza. (Or better yet, stop by office hours.)
If you need to ask the course staff, include the following in your Piazza post or email:
All code that is required to re-create the error.
- Staff should be able to run your code, without any modification, and obtain the same error or output.
- The exact error message received.
Do not use screenshots of code and error messages to communicate about them. Copy paste them so that others can copy-paste them as well.
In this course, for everything expect exams, we greatly prefer over-sharing to under-sharing code. We would rather everyone learn from others “mistakes” than have everyone experience the same issues over and over again.
With the exception of exams, all course assignments are due at 11:59 PM, Central (Champaign) time, on the listed due date.
Throughout the semester, quizzes will be administered through the PrairieLearn system. (9 for undergraduates, 10 for graduate students.) These will be low-stakes, unlimited attempt quizzes. That is, there is no penalty for submitting incorrect answers, and your score can only go up, never down. These quizzes will serve as practice for exams. No quizzes will be dropped. Instead, there will be opportunity to earn buffer points with each quiz. Buffer points will allow you to obtain over 100% for a particular assignment, but your percentage on quizzes overall cannot exceed 100%.
The buffer point and late submission details can be seen in the details of each quiz on PrairieLearn. As an example, consider Quiz 01:
- 105% Credit: Monday, February 1, 11:59 PM
- 100% Credit: Monday, February 8, 11:59 PM
- 85% Credit: Monday, February 15, 11:59 PM
To obtain the 105% credit, you must achieve a score of 100% before the “due” date for 105% credit. (The “due” dates, we will generally refer to the date to obtain 105% credit.)
Quizzes and exams will both use the PrairieLearn system. Use the link below to sign-up and add STAT 432.
There will be two midterm exams proctored using the CBTF Online. Details about the exams will be released on the course website as we approach the exams.
- Exam 01: Monday, March 1
- Exam 02: Monday, April 5
There will be three data analyses (DA) throughout the semester. Specific policies and directions will be released alongside the analyses. Note that these will not be graded and weighted like projects. They will be graded mostly for completion to allow you to experiment with the techniques you have learned.
Except for the exam, all deadlines are at 11:59 PM, Champaign time, on the listed day.
|Quiz 01||Monday, February 1|
|Quiz 02||Monday, February 8|
|Quiz 03||Monday, February 15|
|Quiz 04||Monday, February 22|
|Exam 01||Monday, March 1|
|Quiz 05||Monday, March 8|
|Quiz 06||Monday, March 15|
|Quiz 07||Monday, March 22|
|Quiz 08||Monday, March 29|
|Exam 02||Monday, April 5|
|Quiz 09||Monday, April 12|
|Analysis 01||Monday, April 19|
|Analysis 02||Monday, April 26|
|Analysis 03||Monday, May 3|
|Grad Quiz||Wednesday, May 5|
R and RStudio are required software for this course. You will need access to a computer where you have the ability to install and update this software.
Ris a freely available language and environment for statistical computing and graphics.
- RStudio is a free and open-source integrated development environment (IDE) for
It is your responsibility to make sure you are using the most recent version of both
R and RStudio. Failure to use the most recent version of
R will result in an inability to complete the quizzes.
The quiz sub-score will be the average of the 9 quizzes for undergraduates. (It will be the average of 10 quizzes for graduate students.) If your quiz sub-scores is above 100 as a result of buffer points, it will be recorded as 100. Similarly, the sub-score for the analyses will be the average of the individual analyses.
The instructor reserves the right to lower, but not raise, grade cutoffs. However, this policy should not create an expectation that this will happen. Asking for a change in cutoffs will make any change in cutoffs less likely.
Grading in the course is not competitive. There is nothing (other than some statistical realities) that would prevent the entire class from receiving a grade of A.
If you feel an assignment was graded incorrectly, you have one week from the date you received a grade to discuss it with the instructor. After one week, grading is final except for exceptional circumstances. You may not simply ask for a re-grade, but instead must justify to the instructor why the grading was done incorrectly. By disputing any grading, you agree to allow the instructor to review the entire assessment in question for other errors missed during grading. Requests must be sent via email. (Failure to follow the email policy will result in your request being denied.) Grade disputes over trivial points will likely be met with frustration. (A grade on a single assignment is not reflective of your overall grade in the course. The generous buffer points should more than make up for a single point deduction on a single assignment.)
All grade disputes must be discussed with the course instructor. Teaching Assistants and Course Assistants do not have authority to modify grades.
The official University of Illinois policy related to academic integrity can be found in Article 1, Part 4 of the Student Code. Section 1-402 in particular outlines behavior which is considered an infraction of academic integrity. These sections of the Student Code will be upheld in this course. Any violations will be dealt with in a swift, fair, and strict manner. In short, do not cheat, it is not worth the risk. You are more likely to get caught than you believe. If you think you may be operating in a gray area, you most likely are.
Policies about specific assessment types will be released with directions for those assessments. Two heuristics to keep in mind:
- Do not share files with other students. Do not copy-paste code from any source other than the course notes and website.
- Use spoken language to exchange ideas, not code.
Under no circumstances should course materials be provided to Course Hero, Chegg, or any similar for-profit website. The course staff will seek the harshest possible academic integrity penalty for any students who do so.
To obtain disability-related academic adjustments or auxiliary aids, students with disabilities must contact the course instructor and the Disability Resources and Educational Services (DRES) as soon as possible. To contact DRES, you may visit 1207 S. Oak St., Champaign, call 217-333-4603, e-mail firstname.lastname@example.org or go to the DRES website.
To ensure appropriate accommodation is provided in a timely manner, please provide your Letter of Accommodation during the first week of class. Letters received after a relevant assessment has been administered will likely cause logistical issues that could result in an inability to accommodate.
If you have accommodations identified by the Division of Rehabilitation-Education Services (DRES) for exams, please email your Letter of Accommodations to Carleen Sacris, CBTF Manager, at email@example.com.
For some thoughts on teaching philosophy, some explanation of policies, and some general tips for success, please see The Extended Syllabus.
The instructor reserves the right to make any changes he considers academically advisable. Such changes, if any, will be announced. Please note that it is your responsibility to keep track of the course proceedings.