For the project you will implement and write up a neural network model for a problem and data set of your choice. Deliverables for the project include:
- Your primary analysis write-up. This can be written in LaTeX or a word/google docs document.
- All Python code that went into your analysis on GitHub.
- A poster to be presented towards the end of the semester (exact date to be determined – likely a lunch time slot on a MWF during the last week).
Overview of Process/Deadlines
- Find some friends to work with. (5:00 PM Fri Feb. 28)
- Fill out this google form with your preference for group members. Groups should have 2 or 3 members. Only one person per proposed group needs to do this. If you need help finding group members fill out the form saying that. If you have a strong preference to work alone we can discuss, but I may force you to work with people anyways. I also reserve the right to rearrange your groups; there is no guarantee you will get your requested group, and I may add someone to your group.
- Pick an application or kind of model you think is really interesting! (5:00 PM Fri Mar. 6)
- Write up something brief to tell me what you’re thinking about working on. This can be 1 or 2 short paragraphs; it does not need to be formal or complete sentences or your final topic (you can still change your mind for another week). You should include:
- A brief statement of the general type of application and model you’re interested in
- A start at ideas for data, including specific links to actual data sets (or share a google drive or Box folder of your data with me).
- At least one reference for a relevant article, book, etc. with a related neural network model.
- Finalize your data set choice and start reading and summarizing related articles. (5:00 PM Fri Mar. 13)
- You should now have a definite data set that you will work with. Share a google drive or Box folder of your data with me. Write a description of your data set (How big? How are you processing it? Put a figurer or two illustrating the data in your report.)
- You should also have identified at least 2-3 relevant articles doing similar things. Put the citation information for the articles in a LaTeX document or word/google doc. Also put the pdfs of the articles in a google drive folder that is shared with me. Read at least two of these articles moderately thoroughly (you don’t have to completely understand them yet), and write a 1-2 paragraph summary of what they did. What was their application/data set? How was their model set up? Were there any limitations to their analysis? Do you have ideas for changes (hopefully, improvements) that could be made to their analysis/model? I emphasize that at this point it’s ok to still have questions about these articles.
- Continue reading and summarizing articles. Implement a baseline model. (5:00 PM Fri Mar 27)
- You should now have written up reasonably good summaries of at least 3-4 relevant articles.
- If you’re doing a supervised learning task, do a train/validation/test split of your data. Hide your test data away and pretend it doesn’t exist.
- Identify and start implementing a baseline model. It’s OK if it’s not completely functional yet.
- Continue reading and summarizing articles. Finish baseline model. (5:00 PM Fri Apr 3)
- Target at least 5 relevant articles read and summarized.
- Finish baseline model.
- Work on your ideas for modifications to the model.
- Implement your ideas for changes to the model. They probably won’t work at first. That’s OK, keep trying things.
- At least one attempted thing by 5:00 PM Fri Apr 10
- As necessary, at least a couple more attempted things by 5:00 PM Fri Apr 17
- Important: Don’t obsess about beating the state of the art. That would of course be great, but your success in this project is not dependent on that.
- Tell the people about what you’ve done. Poster session some time the week of Apr 20-24
- Write it all up! Final write up due by end of finals period
- Go do good work in the world and/or get paid the big bucks.
Possible Topics and Data Sets
- Lists
- Miscellaneous other topics:
- I have available some accelerometer data (think a fitbit recording acceleration 80 times per second) along with labels indicating the type of activity the subject was doing.
- Last year one of my independent study students made a recommendation system based on this paper: https://www.comp.nus.edu.sg/~xiangnan/papers/ncf.pdf
- https://gym.openai.com/ (Reinforcement learning is neat, but unfortunately I don’t know of any really accessible sources for learning it – this might be challenging, but you should do it if you’re excited about it.)
If you have a specific idea you’re excited about that doesn’t fit into the kinds of topics on this list, get in touch and we can discuss.
Guidelines for the primary analysis write-up
The project write-up should be written in clear, concise prose. No code should be shown unless it is explicitly needed to make a point.
Please follow the structure below:
- Title
- Abstract: a brief introduction to the problem you are addressing, brief description of the methods you consider, and summary of the results. Aim for 1 paragraph.
- Introduction: a little more detail about your problem set up and review of relevant literature.
- Data: a brief summary of key features of the dataset. Aim for about 2 pages.
- Methods: This should include a clear statement of any models fit and/or description of procedures used to fit the models. Also describe your use of train/test splits and validation to evaluate model performance, and your procedures for selecting hyperparameters. Aim for about 2 pages.
- Results: a presentation of your results, including at a minimum a comparison of the relative performance of the different models you considered. For some projects, other results may need to be presented as well.
- Discussion: summarize your work, its limitations, and possible future steps/improvements.
- References: cite all sources in a standard format. This includes related work in the introduction, your data source, any models used for transfer learning, and any software packages used (e.g. Keras).
Items one through 6 above should total 10 pages at most, including figures and tables. Fewer pages is also fine, but if your report is looking like it will be less than 5 pages please check with me to make sure you’re describing your work in enough detail. You should use 11 or 12 point font and no less than 1 inch margins all around.
Proper Citations
Any direct quotes from other sources should be minimal (a sentence or two, not multiple paragraphs!), should be between quotation marks, and should be cited at the point of quotation. It’s ok to include a figure from another source, but this should be cited in the figure caption.
I will deduct a very large amount of points if you take text from another source without citing it properly. If you have any questions on how to cite appropriately, ask!
Grading and Assessment Criteria
Your project grade makes up 25% of your final grade for the class. We’re doing this instead of a final. The project will be evaluated on the following points:
Novelty (15 points): I’m not looking for you to have a publishable article come out of this, but if I could do your project by copying and pasting stuff from a tutorial online, you get 0 points for novelty. If you essentially replicate a published paper, you’ll be fine if you tweak the model a little, apply it in a new setting, or do some exploration to determine why it works (or doesn’t work).
Understanding of methods (25 points): You should demonstrate that you understand the methods you are using. You should be able to explain everything that your model does, why it does that, and how it does that. You should understand all of your code.
Technical implementation (20 points): Your code should work.
Written Report (20 points): How effectively does the written report communicate the goals, procedures, and results of the study? Are the claims adequately supported? How well is the report structured and organized? Are all of the figures and tables numbered, captioned and appropriately referenced? Does the writing style enhance what the group is trying to communicate? How well is the report edited?
Poster (10 points): How effectively does the poster communicate what you did?
Checkpoint deadlines (10 points): To enforce consistent progress, I’m assigning points to doing the intermediate steps within a couple of days of the posted deadlines.